
##  Viability of opening a Dry Cleaning business in Brooklyn, NY

## 1. Data Description

In order to focus on a specific type of business in a specific neighborhood, I will utilize the json data set utilized in Week 3 of this unit. This data contains information for all 5 boroughs and 306 neigborhoods of New York. One of the first steps is to restrict the borough to just "Brooklyn"

Utilizing the Foursquare API, I will obtain longitude, latitude, venues, and extract venues and category type. 
Then we will convert the json file to a pandas dataframe (whose libraries have been loaded) and filter the datafram for venues who have "Cleaning" in their name.


Obtaining data from source: https://geo.nyu.edu/catalog/nyu_2451_34572
and downloading it to local directory

Loading all the necessary dependies


In [None]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

In [2]:
# Loading the data
with open('C:/Users/KIMTEOT/Documents/Python Notebooks/nyu-geojson.json') as json_data:
    newyork_data = json.load(json_data)

In [5]:
#Extracting all the needed *features*
nbh_data = newyork_data['features']

#look at first item on list
nbh_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

Creating a pandas dataframe and filling it with the essential information (Borough, Neighborhood, Latitude, Longitude)

In [6]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

#Looping thru the data to fill dataframe of "neighborhoods"
for data in nbh_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Examine the neighborhoods data frame

In [7]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


First step is to filter down to the Borough of *Brooklyn*, this is where we want to start our business.

In [8]:
bklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
bklyn_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Bay Ridge,40.625801,-74.030621
1,Brooklyn,Bensonhurst,40.611009,-73.99518
2,Brooklyn,Sunset Park,40.645103,-74.010316
3,Brooklyn,Greenpoint,40.730201,-73.954241
4,Brooklyn,Gravesend,40.59526,-73.973471


Utilizing Fousquare Credentials and Version to extract Venues and Categories for exploration

In [9]:
CLIENT_ID = 'OGM5JL50C1VVUDLNGABP3EJPJE1M3WDZBXGCE5P3S4Z20ZYH' # your Foursquare ID
CLIENT_SECRET = '1MAUTXEWQ0LZCAU5BX1NP0GCWPFSUSEZ35SQF1TJBZ1IFO43' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OGM5JL50C1VVUDLNGABP3EJPJE1M3WDZBXGCE5P3S4Z20ZYH
CLIENT_SECRET:1MAUTXEWQ0LZCAU5BX1NP0GCWPFSUSEZ35SQF1TJBZ1IFO43


Getting the lat and long to Create the Get request URL with our credentials

In [13]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [16]:
# Creating a new dataframe bklyn_venues
bklyn_venues = getNearbyVenues(names=bklyn_data['Neighborhood'],
                                   latitudes=bklyn_data['Latitude'],
                                   longitudes=bklyn_data['Longitude']
                                  )

Broadway Junction
Dumbo
Homecrest
Highland Park
Madison
Erasmus


In [15]:
# Getting counts by Neighborhood for next step.
bklyn_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bath Beach,45,45,45,45,45,45
Bay Ridge,84,84,84,84,84,84
Bedford Stuyvesant,26,26,26,26,26,26
Bensonhurst,36,36,36,36,36,36
Bergen Beach,6,6,6,6,6,6
Boerum Hill,87,87,87,87,87,87
Borough Park,22,22,22,22,22,22
Brighton Beach,45,45,45,45,45,45
Broadway Junction,17,17,17,17,17,17
Brooklyn Heights,100,100,100,100,100,100
