# Locating an 'underground' art gallery in New York City
### By J. Martin

The principle goal of this exercise is to inform the placement of an art gallery in New York City. Art galleries are common, but in conversation with several artists who live in NYC, I have learned several facts. First, the art gallery community is principally concerned with *value*, which is to say about how much a painting might sell for in the future. This has two effects. First, it raises an incredible barrier of entry, since after all an artist who has already sold is more likely to sell (and to re-sell in the future). It also tends to prevent people who want art in their homes from seeking it, as the gallery world is focused on selling high-end art (which most families cannot afford).

The goal, then, is to place a new art gallery focused on selling unknown artist's for cheaper prices. It would establish an upper end of price for each art piece, as well as not sell artists successful in other areas of the art businesses.

## Where to place such an art gallery

Two criteria are important: location close to other entertainment/exercise/food. Incidental walkins are encouraged, i.e. people open to but not specifically searching for art. Second, distance away from other art galleries - since they attract a crowd that tends to look for other art.

## The data to be used
The data will use the neighborhood data and a truncated body of the Foursquare data, i.e. excluding certain types of establishments (like churches, zoos, parks) which aren't associated with shopping or a "night-life" type entertainment we are focused on (even though "night life" is a bad describer. I've been to NYC on a weekend, they drink sun up to sun down). A complete listing will be described in the actual code notebook. 

Using these data, first a simple K-means testing will be done to establish the type and amount of entertainment. I will use K=10, to allow greater division among the groups.

Using these means groups, the 3 "best" groups will be identified - i.e. a higher concentration of bars/restaurants. K means testing will then be repeated among these to identify the inclusion of other galleries. The "loser" of these will then be the best neighborhood to open the gallery - having a sufficiently high portion of other attracting businesses and being distant from those neighborhoods which would attract customers who might turn their nose up at "cheap" and "undiscovered" artists' work.

## Part 1 - Data Intake

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [4]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [5]:
with open('newyork_data.json') as json_data:
       newyork_data = json.load(json_data)

In [6]:
neighborhoods_data = newyork_data['features']

In [7]:
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

neighborhoods = pd.DataFrame(columns=column_names)

In [8]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [9]:
neighborhoods.Borough.unique()

array(['Bronx', 'Manhattan', 'Brooklyn', 'Queens', 'Staten Island'],
      dtype=object)

In [10]:
neighborhood1=neighborhoods
neighborhood1['Borough']=neighborhood1['Borough'].replace('Staten Island', 1)
neighborhood1['Borough']=neighborhood1['Borough'].replace('Brooklyn', 2)
neighborhood1['Borough']=neighborhood1['Borough'].replace('Manhattan', 3)
neighborhood1['Borough']=neighborhood1['Borough'].replace('Queens', 4)
neighborhood1['Borough']=neighborhood1['Borough'].replace('Bronx', 5)
neighborhood1.Borough.unique()

array([5, 3, 2, 4, 1])

### The map of New York with Neighborhoods superimposed - colored by Borough

In [11]:
address = 'Brooklyn, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York are 40.6501038, -73.9495823.


In [12]:
# set color scheme for the clusters
x = np.arange(5)
ys = [i + x + (i*x)**2 for i in range(5)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


In [13]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10.5)


# add markers to the map
markers_colors = []
for lat, lon, poi, borough in zip(neighborhood1['Latitude'], neighborhood1['Longitude'], neighborhood1['Neighborhood'], neighborhood1['Borough']):
    label = folium.Popup(str(poi) + ' Borough ' + str(borough), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=8,
        popup=label,
        color=rainbow[borough-1],
        fill=True,
        fill_color=rainbow[borough-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Gathing the Venue Data

In [14]:
CLIENT_ID = 'MJ2INACVWX1BIQZF4PZKQTRJKVYSBM0JHLEPZWPMOAMQPURV' # your Foursquare ID
CLIENT_SECRET = '5TH5H3G5TOXAKNNGKR22LC2EWJAV5MD0UQBJHVQAZ1WHVEW2' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 200

In [15]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


In [16]:
venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker

In [112]:
print(venues.shape)
venues.head()

(10290, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
2,Wakefield,40.894705,-73.847201,Cooler Runnings Jamaican Restaurant Inc,40.898276,-73.850381,Caribbean Restaurant
3,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
4,Wakefield,40.894705,-73.847201,SUBWAY,40.890656,-73.849192,Sandwich Place


In [113]:
#venues.groupby('Neighborhood').count()

### Combine Categories into simpler groups (i.e. "restaruants", "bars").  



In [114]:
# one hot encoding
venue_onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
venue_onehot['Neighborhood'] = venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [venue_onehot.columns[-1]] + list(venue_onehot.columns[:-1])
venue_onehot = venue_onehot[fixed_columns]

venue_onehot.head()

Unnamed: 0,Zoo Exhibit,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Terminal,Airport Tram,American Restaurant,Animal Shelter,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Bath House,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buffet,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Camera Store,Campground,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Check Cashing Service,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Circus,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Basketball Court,College Bookstore,College Cafeteria,College Gym,College Theater,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cultural Center,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Daycare,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General College & University,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,Herbs & Spices Store,High School,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Indoor Play Area,Insurance Office,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Lawyer,Leather Goods Store,Lebanese Restaurant,Library,Lingerie Store,Liquor Store,Locksmith,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Museum,Music School,Music Store,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Newsstand,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Sculpture,Outdoors & Recreation,Outlet Store,Paella Restaurant,Paintball Field,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Piano Bar,Pie Shop,Pier,Piercing Parlor,Pilates Studio,Pizza Place,Platform,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Portuguese Restaurant,Post Office,Print Shop,Pub,Public Art,Racetrack,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Rock Climbing Spot,Rock Club,Roller Rink,Romanian Restaurant,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Salsa Club,Sandwich Place,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shop & Service,Shopping Mall,Skate Park,Skating Rink,Ski Area,Smoke Shop,Smoothie Shop,Snack Place,Soba Restaurant,Soccer Field,Social Club,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stables,State / Provincial Park,Steakhouse,Storage Facility,Street Art,Strip Club,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Temple,Tennis Court,Tennis Stadium,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Tibetan Restaurant,Tiki Bar,Toll Plaza,Tourist Information Center,Toy / Game Store,Track,Trail,Train,Train Station,Tree,Turkish Restaurant,Udon Restaurant,Used Bookstore,Vape Store,Varenyky restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Waste Facility,Watch Shop,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [115]:
columnnames = venue_onehot.columns.values.tolist()
#columnnames

In [116]:
# first two are nice and parsimonious, but the other three just needed to be brute forced.

restaurant = [s for s in columnnames if "Restaurant" in s]
bar = [s for s in columnnames if "Bar" in s]


In [171]:
entertainment = []
load=[10, 16, 49, 81, 96, 159, 166, 201, 202, 212, 245, 250, 254, 256, 265, 270, 274, 277, 280, 285, 304, 315, 324, 325, 332, 354, 373, 393, 394, 395]
for index in load:
    entertainment.append(columnnames[index])
print(entertainment)

['Arcade', 'Arts & Entertainment', 'Bowling Alley', 'Circus', 'Comic Shop', 'Gaming Cafe', 'General Entertainment', 'Indie Movie Theater', 'Indie Theater', 'Jazz Club', 'Mini Golf', 'Monument / Landmark', 'Movie Theater', 'Multiplex', 'Nightclub', 'Opera House', 'Other Nightlife', 'Outdoors & Recreation', 'Paintball Field', 'Performing Arts Venue', 'Pool', 'Recreation Center', 'Rock Club', 'Roller Rink', 'Salsa Club', 'Social Club', 'Strip Club', 'Theater', 'Theme Park', 'Theme Park Ride / Attraction']


In [181]:
otherfood = []
load=[24,26, 27, 37,38,39, 44, 52, 53, 56, 57, 59, 60, 66, 67, 68, 72, 77, 80, 86, 90, 100, 103, 106, 111, 114, 116, 144, 145, 148, 149, 150, 151, 152, 155, 156, 157, 163, 
                       171, 173, 185, 194, 199,230, 267, 272, 294, 298, 302, 309, 330, 333, 350, 351, 355, 362, 370, 374, 380, 426, 427]
for index in load:
    otherfood.append(columnnames[index])
print(otherfood)

['BBQ Joint', 'Bagel Shop', 'Bakery', 'Beer Bar', 'Beer Garden', 'Beer Store', 'Bistro', 'Breakfast Spot', 'Brewery', 'Bubble Tea Shop', 'Buffet', 'Burger Joint', 'Burrito Place', 'Cafeteria', 'Café', 'Cajun / Creole Restaurant', 'Candy Store', 'Cheese Shop', 'Church', 'Coffee Shop', 'College Cafeteria', 'Convenience Store', 'Creperie', 'Cupcake Shop', 'Deli / Bodega', 'Dessert Shop', 'Diner', 'Fish & Chips Shop', 'Fish Market', 'Food', 'Food & Drink Shop', 'Food Court', 'Food Stand', 'Food Truck', 'Fried Chicken Joint', 'Frozen Yogurt Shop', 'Fruit & Vegetable Store', 'Gastropub', 'Gourmet Shop', 'Grocery Store', 'Herbs & Spices Store', 'Hot Dog Joint', 'Ice Cream Shop', 'Liquor Store', 'Noodle House', 'Organic Grocery', 'Pie Shop', 'Pizza Place', 'Poke Place', 'Pub', 'Salad Place', 'Sandwich Place', 'Smoothie Shop', 'Snack Place', 'Soup Place', 'Speakeasy', 'Steakhouse', 'Supermarket', 'Taco Place', 'Wine Shop', 'Wings Joint']


In [185]:
exercise = []
load=[18,30,32, 50, 82, 88, 91, 107, 109, 174, 175, 176, 177, 304, 323, 353, 366, 389, 418, 423, 428 ]
for index in load:
    exercise.append(columnnames[index])
print(exercise)

['Athletics & Sports', 'Baseball Field', 'Basketball Court', 'Boxing Gym', 'Climbing Gym', 'College Basketball Court', 'College Gym', 'Cycle Studio', 'Dance Studio', 'Gym', 'Gym / Fitness Center', 'Gym Pool', 'Gymnastics Gym', 'Pool', 'Rock Climbing Spot', 'Soccer Field', 'Sports Club', 'Tennis Court', 'Volleyball Court', 'Weight Loss Center', "Women's Store"]


In [184]:
#used to find the indices for creating the above lists, for later summing
columnnames.index('Polish Restaurant')

303

### Now that everything is created, we create a new summed field.

Instead of representing each type individually, we have a frequency of a group. After that, we proceed as before by taking the mean and grouping by neighborhood.

In [197]:
venue_onehot1=venue_onehot
venue_onehot1['Restaurantsum'] = venue_onehot[restuarant].sum(axis=1)
venue_onehot1['Barsum'] = venue_onehot[bar].sum(axis=1)
venue_onehot1['Entertainment'] = venue_onehot[entertainment].sum(axis=1)
venue_onehot1['OtherFood'] = venue_onehot[otherfood].sum(axis=1)
venue_onehot1['Exercise'] = venue_onehot[exercise].sum(axis=1)

venue_onehot1=venue_onehot1[['Restaurantsum', 'Barsum', 'Entertainment', 'OtherFood', 'Exercise', 'Neighborhood', 'Art Gallery']]

venue_onehot1.head(15)

Unnamed: 0,Restaurantsum,Barsum,Entertainment,OtherFood,Exercise,Neighborhood,Art Gallery
0,0,0,0,1,0,Wakefield,0
1,0,0,0,0,0,Wakefield,0
2,1,0,0,0,0,Wakefield,0
3,0,0,0,1,0,Wakefield,0
4,0,0,0,1,0,Wakefield,0
5,0,0,0,0,0,Wakefield,0
6,0,0,0,1,0,Wakefield,0
7,0,0,0,1,0,Wakefield,0
8,0,0,0,0,0,Wakefield,0
9,0,0,0,1,0,Co-op City,0


In [198]:
venue_grouped = venue_onehot1.groupby('Neighborhood').mean().reset_index()
venue_grouped.head()

Unnamed: 0,Neighborhood,Restaurantsum,Barsum,Entertainment,OtherFood,Exercise,Art Gallery
0,Allerton,0.125,0.0,0.0,0.4375,0.0,0.0
1,Annadale,0.222222,0.111111,0.0,0.444444,0.111111,0.0
2,Arden Heights,0.0,0.0,0.0,0.6,0.0,0.0
3,Arlington,0.5,0.0,0.0,0.166667,0.0,0.0
4,Arrochar,0.263158,0.0,0.052632,0.421053,0.052632,0.0


## Actually Clustering

In [207]:
# set number of clusters
kclusters = 5

venue_grouped_clustering = venue_grouped.drop('Neighborhood', 1)
venue_grouped_clustering = venue_grouped_clustering.drop('Art Gallery', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(venue_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 1, 2, 3, 3, 1, 3, 1, 3, 3], dtype=int32)

In [208]:
#put cluster labels and neighborhood back in
venue_grouped_clustering.insert(0, 'Cluster Labels', kmeans.labels_)
venue_grouped_clustering=venue_grouped_clustering.join(venue_grouped['Neighborhood'])
venue_grouped_clustering=venue_grouped_clustering.join(venue_grouped['Art Gallery'])
venue_grouped_clustering.head(15)

Unnamed: 0,Cluster Labels,Restaurantsum,Barsum,Entertainment,OtherFood,Exercise,Neighborhood,Art Gallery
0,2,0.125,0.0,0.0,0.4375,0.0,Allerton,0.0
1,1,0.222222,0.111111,0.0,0.444444,0.111111,Annadale,0.0
2,2,0.0,0.0,0.0,0.6,0.0,Arden Heights,0.0
3,3,0.5,0.0,0.0,0.166667,0.0,Arlington,0.0
4,3,0.263158,0.0,0.052632,0.421053,0.052632,Arrochar,0.0
5,1,0.058824,0.0,0.0,0.235294,0.0,Arverne,0.0
6,3,0.34,0.17,0.0,0.4,0.05,Astoria,0.0
7,1,0.1,0.0,0.1,0.4,0.0,Astoria Heights,0.0
8,3,0.263158,0.105263,0.0,0.157895,0.052632,Auburndale,0.0
9,3,0.413043,0.021739,0.0,0.26087,0.021739,Bath Beach,0.0


## Examine the Clusters to see common values among them

In [209]:
venue_grouped_clustering.loc[venue_grouped_clustering['Cluster Labels'] == 0] 

Unnamed: 0,Cluster Labels,Restaurantsum,Barsum,Entertainment,OtherFood,Exercise,Neighborhood,Art Gallery
15,0,0.0,0.0,0.0,0.0,0.333333,Bayswater,0.0
24,0,0.0,0.0,0.0,0.0,0.333333,Bergen Beach,0.0
26,0,0.0,0.0,0.4,0.0,0.0,Bloomfield,0.0
29,0,0.0,0.0,0.25,0.0,0.0,Breezy Point,0.0
40,0,0.0,0.0,0.333333,0.166667,0.666667,Butler Manor,0.0
55,0,0.1,0.0,0.1,0.1,0.1,Clason Point,0.0
91,0,0.0,0.0,0.0,0.25,0.25,Emerson Hill,0.0
94,0,0.0,0.0,0.0,0.0,0.0,Fieldston,0.0
123,0,0.0,0.0,0.0,0.0,0.333333,Grymes Hill,0.0
141,0,0.0,0.0,0.0,0.0,0.0,Jamaica Estates,0.0


In [210]:
venue_grouped_clustering.loc[venue_grouped_clustering['Cluster Labels'] == 1] 

Unnamed: 0,Cluster Labels,Restaurantsum,Barsum,Entertainment,OtherFood,Exercise,Neighborhood,Art Gallery
1,1,0.222222,0.111111,0.0,0.444444,0.111111,Annadale,0.0
5,1,0.058824,0.0,0.0,0.235294,0.0,Arverne,0.0
7,1,0.1,0.0,0.1,0.4,0.0,Astoria Heights,0.0
10,1,0.09,0.02,0.03,0.37,0.06,Battery Park City,0.0
12,1,0.14,0.02,0.02,0.16,0.08,Bay Terrace,0.0
13,1,0.2,0.0,0.05,0.3,0.05,Baychester,0.0
18,1,0.117647,0.0,0.0,0.294118,0.176471,Beechhurst,0.0
20,1,0.166667,0.0,0.0,0.333333,0.0,Belle Harbor,0.0
21,1,0.25,0.0,0.0,0.4,0.0,Bellerose,0.0
25,1,0.047619,0.047619,0.047619,0.190476,0.0,Blissville,0.047619


In [211]:
venue_grouped_clustering.loc[venue_grouped_clustering['Cluster Labels'] == 2] 

Unnamed: 0,Cluster Labels,Restaurantsum,Barsum,Entertainment,OtherFood,Exercise,Neighborhood,Art Gallery
0,2,0.125,0.0,0.0,0.4375,0.0,Allerton,0.0
2,2,0.0,0.0,0.0,0.6,0.0,Arden Heights,0.0
16,2,0.15,0.025,0.0,0.5,0.025,Bedford Park,0.0
17,2,0.107143,0.178571,0.0,0.535714,0.0,Bedford Stuyvesant,0.0
32,2,0.0,0.25,0.25,0.5,0.0,Broad Channel,0.0
33,2,0.125,0.0,0.0,0.4375,0.0,Broadway Junction,0.0
36,2,0.0,0.0,0.0,0.5,0.0,Brookville,0.0
45,2,0.142857,0.0,0.0,0.428571,0.0,Castle Hill,0.0
46,2,0.125,0.0,0.0625,0.625,0.0,Castleton Corners,0.0
54,2,0.133333,0.0,0.0,0.533333,0.066667,Claremont Village,0.0


In [212]:
venue_grouped_clustering.loc[venue_grouped_clustering['Cluster Labels'] == 3] 

Unnamed: 0,Cluster Labels,Restaurantsum,Barsum,Entertainment,OtherFood,Exercise,Neighborhood,Art Gallery
3,3,0.5,0.0,0.0,0.166667,0.0,Arlington,0.0
4,3,0.263158,0.0,0.052632,0.421053,0.052632,Arrochar,0.0
6,3,0.34,0.17,0.0,0.4,0.05,Astoria,0.0
8,3,0.263158,0.105263,0.0,0.157895,0.052632,Auburndale,0.0
9,3,0.413043,0.021739,0.0,0.26087,0.021739,Bath Beach,0.0
11,3,0.337079,0.089888,0.0,0.224719,0.011236,Bay Ridge,0.011236
14,3,0.367647,0.132353,0.0,0.264706,0.029412,Bayside,0.0
19,3,0.333333,0.0,0.0,0.25,0.0,Bellaire,0.0
22,3,0.316327,0.030612,0.0,0.469388,0.0,Belmont,0.0
23,3,0.4,0.0,0.0,0.266667,0.0,Bensonhurst,0.0


In [213]:
venue_grouped_clustering.loc[venue_grouped_clustering['Cluster Labels'] == 4]

Unnamed: 0,Cluster Labels,Restaurantsum,Barsum,Entertainment,OtherFood,Exercise,Neighborhood,Art Gallery
198,4,0.0,1.0,0.0,0.0,0.0,Oakwood,0.0


In [214]:
means = venue_grouped_clustering.groupby('Cluster Labels').mean()
std = venue_grouped_clustering.groupby('Cluster Labels').std()
print(means)
print(std)

                Restaurantsum    Barsum  Entertainment  OtherFood  Exercise  \
Cluster Labels                                                                
0                    0.019845  0.015664       0.079653   0.087242  0.178781   
1                    0.164224  0.033066       0.018124   0.307518  0.042844   
2                    0.119928  0.020606       0.016267   0.549634  0.012929   
3                    0.348261  0.049218       0.013442   0.307059  0.032913   
4                    0.000000  1.000000       0.000000   0.000000  0.000000   

                Art Gallery  
Cluster Labels               
0                  0.000000  
1                  0.006797  
2                  0.000000  
3                  0.002671  
4                  0.000000  
                Restaurantsum    Barsum  Entertainment  OtherFood  Exercise  \
Cluster Labels                                                                
0                    0.046353  0.058707       0.130942   0.113111  0.191997   

In [222]:
neighborhood1.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,5,Wakefield,40.894705,-73.847201
1,5,Co-op City,40.874294,-73.829939
2,5,Eastchester,40.887556,-73.827806
3,5,Fieldston,40.895437,-73.905643
4,5,Riverdale,40.890834,-73.912585


In [223]:
venue_groups=pd.merge(venue_grouped_clustering, neighborhood1, on= "Neighborhood")
venue_groups.head()

Unnamed: 0,Cluster Labels,Restaurantsum,Barsum,Entertainment,OtherFood,Exercise,Neighborhood,Art Gallery,Borough,Latitude,Longitude
0,2,0.125,0.0,0.0,0.4375,0.0,Allerton,0.0,5,40.865788,-73.859319
1,1,0.222222,0.111111,0.0,0.444444,0.111111,Annadale,0.0,1,40.538114,-74.178549
2,2,0.0,0.0,0.0,0.6,0.0,Arden Heights,0.0,1,40.549286,-74.185887
3,3,0.5,0.0,0.0,0.166667,0.0,Arlington,0.0,1,40.635325,-74.165104
4,3,0.263158,0.0,0.052632,0.421053,0.052632,Arrochar,0.0,1,40.596313,-74.067124


In [224]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(venue_groups['Latitude'], venue_groups['Longitude'], venue_groups['Neighborhood'], venue_groups['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters