## Introduction

Currently, I am a senior undergrad applying for grad schools. Two of my most accessible grad programs are in schools located in Durham and Ithaca. Considering the fact that I might have to live in one of these two places for quite some years, I would like to evaluate how life would be from different aspects, for example, cost of living, transportation, crime and entertainment. A similar approach could be used to compare living in two or multiple places given specific purposes.

## Data Description

Neighbourhood segmentation data will be collected for both cities. The data will be cleaned and regrouped in given structures. The data is extracted from Zillow - US Neighborhoods using opendatasoft webpage. In a similar approach from previous assignments, I will pull data of neighbourhoods in these two places using Foursquare. After further analysis like k-means to cluster regions given interested metrics, I can choose the regions in the place that are most relevant to my preferences

## Methodology
See the codes and comments below.

In [2]:
import requests # library to handle requests
import lxml.html as lh # library to parse the relevant fields
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
import json

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Libraries imported.')

Libraries imported.


In [3]:
# import neighborhoods information in Durham and Ithaca from json file
with open(r'Durham/Durham-neighborhoods.json') as json_data:
    durham_data = json.load(json_data)
    
with open(r'Ithaca/Ithaca-neighborhoods.json') as json_data:
    ithaca_data = json.load(json_data)

In [4]:
# extract neighborhood name, latitude and longitude information and collect into panda data frame
column_names = ['Neighborhood', 'Latitude', 'Longitude'] 
durham_neighborhoods = pd.DataFrame(columns=column_names)
ithaca_neighborhoods = pd.DataFrame(columns=column_names)

for data in durham_data:
    neighborhood_name = data['fields']['name']
    try:    
        neighborhood_latlon = data['fields']['geo_point_2d']
    except:
        continue
    neighborhood_lat = neighborhood_latlon[0]
    neighborhood_lon = neighborhood_latlon[1]
    
    durham_neighborhoods = durham_neighborhoods.append({
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    
for data in ithaca_data:
    neighborhood_name = data['fields']['name']
    try:    
        neighborhood_latlon = data['fields']['geo_point_2d']
    except:
        continue
    neighborhood_lat = neighborhood_latlon[0]
    neighborhood_lon = neighborhood_latlon[1]
    
    ithaca_neighborhoods = ithaca_neighborhoods.append({
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Let's take a look at the data frame for durham and ithaca.

In [5]:
durham_neighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Duke East Campus,36.005433,-78.915657
1,Knollwood,35.954212,-78.960946
2,Milan Woods,36.009101,-78.854065
3,Valley Run,35.961186,-78.950374
4,Old Five Points,36.002196,-78.894549
...,...,...,...
113,Merrick Moore,36.004874,-78.854176
114,West End,35.994598,-78.923427
115,Burch Avenue,35.998685,-78.918051
116,Stephen's Woods,36.086406,-78.898318


In [6]:
ithaca_neighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Cayuga Heights,42.467976,-76.487485
1,East Ithaca,42.426217,-76.462672
2,Forest Home,42.452821,-76.470074
3,South Hill,42.411264,-76.488267
4,Northwest Ithaca,42.470589,-76.541453
5,Northeast Ithaca,42.47032,-76.462285


From the data, we can notice that Durham covers a much larger than Ithaca. It can also be obviously seen from their distributions on map.

In [7]:
address = 'Durham, NC'

geolocator = Nominatim(user_agent="ny_explorer")
#location = geolocator.geocode(address)
#durham_latitude = location.latitude
#durham_longitude = location.longitude
durham_latitude = 35.994034
durham_longitude = -78.898621
print('The geograpical coordinate of Durham are {}, {}.'.format(durham_latitude, durham_longitude))

The geograpical coordinate of Durham are 35.994034, -78.898621.


In [8]:
# create map of New York using latitude and longitude values
map_durham = folium.Map(location=[durham_latitude, durham_longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(durham_neighborhoods['Latitude'], durham_neighborhoods['Longitude'],durham_neighborhoods['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_durham)  
    
map_durham

In [9]:
address = 'Ithaca, NY'

geolocator = Nominatim(user_agent="ny_explorer")
#location = geolocator.geocode(address)
#ithaca_latitude = location.latitude
#ithaca_longitude = location.longitude
ithaca_latitude = 42.443962
ithaca_longitude = -76.501884
print('The geograpical coordinate of Ithaca are {}, {}.'.format(ithaca_latitude, ithaca_longitude))

The geograpical coordinate of Ithaca are 42.443962, -76.501884.


In [10]:
# create map of New York using latitude and longitude values
map_ithaca = folium.Map(location=[ithaca_latitude, ithaca_longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(ithaca_neighborhoods['Latitude'], ithaca_neighborhoods['Longitude'],ithaca_neighborhoods['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_ithaca)  
    
map_ithaca

To scale down, we find the largest distance between neighborhoods and work location in Ithaca and choose neighborhoods only within this upper distance limit. Let us first define work location in Durham and Ithaca and a function to calculate distance between two points given their coordinates.

In [11]:
# define Foursquare Credentials and version
CLIENT_ID = 'DC1UAGXZVU1OQZKHUOKJFCYCV4ULXED2KGBMFFR1OSORPWUW' # your Foursquare ID
CLIENT_SECRET = 'L5UEEWX3O5U1LVYFQGOM5MJ3SNX40WSNRBHGZ5UBU4GBPUEB' # your Foursquare Secret
VERSION = '20200127'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: DC1UAGXZVU1OQZKHUOKJFCYCV4ULXED2KGBMFFR1OSORPWUW
CLIENT_SECRET:L5UEEWX3O5U1LVYFQGOM5MJ3SNX40WSNRBHGZ5UBU4GBPUEB


In [12]:
# work location in Durham
durham_work_coordinates = [36.003625, -78.939653]
ithaca_work_coordinates = [42.449951, -76.481211]

In [13]:
# define function to calculate distance
def haversine(lat1, lon1, lat2, lon2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
    c = 2 * np.arcsin(np.sqrt(a)) 
    r = 6371 # Radius of earth in kilometers. Use 3956 for miles
    return c * r

In [14]:
largest_distance = np.max(haversine(ithaca_neighborhoods['Latitude'].values, ithaca_neighborhoods['Longitude'].values, ithaca_work_coordinates[0], ithaca_work_coordinates[1]))
print('The largest distance is: ' + str(largest_distance) + ' km')

The largest distance is: 5.448715526749547 km


In [15]:
selected = haversine(durham_neighborhoods['Latitude'].values, durham_neighborhoods['Longitude'].values, durham_work_coordinates[0], durham_work_coordinates[1]) < largest_distance
durham_neighborhoods = durham_neighborhoods[selected].reset_index(drop=True)
durham_neighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Duke East Campus,36.005433,-78.915657
1,Valley Run,35.961186,-78.950374
2,Old Five Points,36.002196,-78.894549
3,Omah Street,36.042563,-78.932375
4,Southside / St. Teresa,35.984349,-78.905823
5,Duke Park,36.011588,-78.890147
6,Duke Homestead,36.034517,-78.920988
7,Tuscaloosa-Lakewood,35.978813,-78.93041
8,Warehouse District,36.001422,-78.905249
9,Downtown,35.993745,-78.903139


In [16]:
# create map of New York using latitude and longitude values
map_durham = folium.Map(location=[durham_work_coordinates[0], durham_work_coordinates[1]], zoom_start=12)

# add markers to map
for lat, lng, label in zip(durham_neighborhoods['Latitude'], durham_neighborhoods['Longitude'],durham_neighborhoods['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_durham) 
    
label = folium.Popup('work location', parse_html=True)
folium.CircleMarker(
    [durham_work_coordinates[0], durham_work_coordinates[1]],
    radius=5,
    popup=label,
    color='red',
    fill=True,
    fill_color='red',
    fill_opacity=0.7,
    parse_html=False).add_to(map_durham)  
    
map_durham

In [17]:
# create map of New York using latitude and longitude values
map_ithaca = folium.Map(location=[ithaca_work_coordinates[0], ithaca_work_coordinates[1]], zoom_start=12)

# add markers to map
for lat, lng, label in zip(ithaca_neighborhoods['Latitude'], ithaca_neighborhoods['Longitude'], ithaca_neighborhoods['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_ithaca) 
    
label = folium.Popup('work location', parse_html=True)
folium.CircleMarker(
    [ithaca_work_coordinates[0], ithaca_work_coordinates[1]],
    radius=5,
    popup=label,
    color='red',
    fill=True,
    fill_color='red',
    fill_opacity=0.7,
    parse_html=False).add_to(map_ithaca)  
    
map_ithaca

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=800, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [19]:
durham_venues = getNearbyVenues(names=durham_neighborhoods['Neighborhood'],
                                   latitudes=durham_neighborhoods['Latitude'],
                                   longitudes=durham_neighborhoods['Longitude']
                                  )

ithaca_venues = getNearbyVenues(names=ithaca_neighborhoods['Neighborhood'],
                                   latitudes=ithaca_neighborhoods['Latitude'],
                                   longitudes=ithaca_neighborhoods['Longitude']
                                  )

Duke East Campus
Valley Run
Old Five Points
Omah Street
Southside / St. Teresa
Duke Park
Duke Homestead
Tuscaloosa-Lakewood
Warehouse District
Downtown
Old North Durham
Cleveland-Holloway
Morehead Hill
Trinity Heights
American Village
Stadium Heights
Old West Durham
Westwood Estates
Lyon Park
Trinity Commons
Preston Woods
Northgate Park
Lakewood Park
Central Park
Sheridan Drive
Edgemont
Duke Forest
Bennet Place
Scarsdale Village
Watts Hospital-Hillandale
Long Meadow
Walltown
Crest Street
Croasdaile
West Hills
Rockwood
Golden Belt
North Carolina Central University
Dixon Road Area
Franklin Village
Forest Hills
Colony Park
Duke West Campus
Lochn'ora
Carillon Forest
Trinity Park
Duke Manor Apartments
Waterford
West End
Burch Avenue
Albright
Cayuga Heights
East Ithaca
Forest Home
South Hill
Northwest Ithaca
Northeast Ithaca


In [20]:
# check the size of the resulting dataframe
print(durham_venues.shape)
durham_venues.head()

(1218, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Duke East Campus,36.005433,-78.915657,Brodie Gym,36.007296,-78.917006,College Gym
1,Duke East Campus,36.005433,-78.915657,Whole Foods Market,36.0071,-78.920572,Grocery Store
2,Duke East Campus,36.005433,-78.915657,Baldwin Auditorium,36.009001,-78.914656,College Theater
3,Duke East Campus,36.005433,-78.915657,Duke Wall,36.007274,-78.919429,Track
4,Duke East Campus,36.005433,-78.915657,Mad Hatter Bakeshop & Café,36.006341,-78.920099,Bakery


In [21]:
print(ithaca_venues.shape)
ithaca_venues.head()

(42, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Cayuga Heights,42.467976,-76.487485,Gimme! Coffee,42.46861,-76.479428,Café
1,Cayuga Heights,42.467976,-76.487485,Ned's Pizza,42.469587,-76.478573,Pizza Place
2,Cayuga Heights,42.467976,-76.487485,The Heights,42.469073,-76.479728,American Restaurant
3,Cayuga Heights,42.467976,-76.487485,Burton S. Markowitz,42.468526,-76.481237,Optical Shop
4,Cayuga Heights,42.467976,-76.487485,Dr Markowitz Optometrist,42.468391,-76.481208,Optical Shop


In [22]:
durham_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Albright,9,9,9,9,9,9
American Village,4,4,4,4,4,4
Bennet Place,10,10,10,10,10,10
Burch Avenue,29,29,29,29,29,29
Carillon Forest,2,2,2,2,2,2
Central Park,100,100,100,100,100,100
Cleveland-Holloway,46,46,46,46,46,46
Colony Park,2,2,2,2,2,2
Crest Street,30,30,30,30,30,30
Croasdaile,1,1,1,1,1,1


In [23]:
ithaca_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Cayuga Heights,12,12,12,12,12,12
East Ithaca,7,7,7,7,7,7
Forest Home,13,13,13,13,13,13
Northeast Ithaca,4,4,4,4,4,4
Northwest Ithaca,6,6,6,6,6,6


In [24]:
# one hot encoding
durham_onehot = pd.get_dummies(durham_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
durham_onehot['Neighborhood'] = durham_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [durham_onehot.columns[-1]] + list(durham_onehot.columns[:-1])
durham_onehot = durham_onehot[fixed_columns]

durham_onehot.head()

Unnamed: 0,Zoo Exhibit,Accessories Store,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Train Station,Used Bookstore,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [32]:
# one hot encoding
ithaca_onehot = pd.get_dummies(ithaca_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ithaca_onehot['Neighborhood'] = ithaca_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [ithaca_onehot.columns[-1]] + list(ithaca_onehot.columns[:-1])
ithaca_onehot = ithaca_onehot[fixed_columns]

ithaca_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Bakery,Botanical Garden,Bowling Alley,Café,College Gym,College Lab,Convenience Store,Cosmetics Shop,...,Nightlife Spot,Optical Shop,Park,Pizza Place,Playground,Scenic Lookout,Science Museum,Shopping Mall,Stadium,Tourist Information Center
0,Cayuga Heights,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Cayuga Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
2,Cayuga Heights,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Cayuga Heights,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
4,Cayuga Heights,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0


In [33]:
durham_grouped = durham_onehot.groupby('Neighborhood').mean().reset_index()
durham_grouped

Unnamed: 0,Neighborhood,Zoo Exhibit,Accessories Store,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Train Station,Used Bookstore,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,Albright,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,American Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bennet Place,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Burch Avenue,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,...,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Carillon Forest,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Central Park,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0
6,Cleveland-Holloway,0.0,0.0,0.0,0.021739,0.0,0.0,0.065217,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.021739
7,Colony Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Crest Street,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Croasdaile,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [34]:
ithaca_grouped = ithaca_onehot.groupby('Neighborhood').mean().reset_index()
ithaca_grouped

Unnamed: 0,Neighborhood,American Restaurant,Bakery,Botanical Garden,Bowling Alley,Café,College Gym,College Lab,Convenience Store,Cosmetics Shop,...,Nightlife Spot,Optical Shop,Park,Pizza Place,Playground,Scenic Lookout,Science Museum,Shopping Mall,Stadium,Tourist Information Center
0,Cayuga Heights,0.083333,0.083333,0.0,0.0,0.166667,0.0,0.0,0.0,0.083333,...,0.0,0.166667,0.0,0.083333,0.0,0.0,0.0,0.166667,0.0,0.0
1,East Ithaca,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,...,0.0,0.0,0.285714,0.0,0.285714,0.0,0.0,0.0,0.0,0.0
2,Forest Home,0.0,0.0,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.076923
3,Northeast Ithaca,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.25,0.0,0.25,0.0,0.25,0.0,0.0,0.25,0.0,0.0
4,Northwest Ithaca,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0


In [35]:
num_top_venues = 5

for hood in durham_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = durham_grouped[durham_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Albright----
            venue  freq
0  Coffee Roaster  0.11
1            Park  0.11
2          Garden  0.11
3     Gas Station  0.11
4      Food Truck  0.11


----American Village----
          venue  freq
0  Concert Hall  0.25
1       Dog Run  0.25
2           Bar  0.25
3          Park  0.25
4   Zoo Exhibit  0.00


----Bennet Place----
                  venue  freq
0        Gymnastics Gym   0.1
1           Supermarket   0.1
2               Theater   0.1
3  Gym / Fitness Center   0.1
4    Salon / Barbershop   0.1


----Burch Avenue----
          venue  freq
0           Bar  0.07
1   Coffee Shop  0.07
2   Pizza Place  0.07
3         Hotel  0.07
4  Dessert Shop  0.03


----Carillon Forest----
              venue  freq
0  Recording Studio   0.5
1              Farm   0.5
2       Zoo Exhibit   0.0
3      Optical Shop   0.0
4         Multiplex   0.0


----Central Park----
                venue  freq
0                 Bar  0.07
1               Hotel  0.04
2        Cocktail Bar  0.03
3    

In [36]:
num_top_venues = 5

for hood in ithaca_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = ithaca_grouped[ithaca_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Cayuga Heights----
                 venue  freq
0        Shopping Mall  0.17
1                 Café  0.17
2         Optical Shop  0.17
3  American Restaurant  0.08
4          Pizza Place  0.08


----East Ithaca----
            venue  freq
0      Playground  0.29
1            Park  0.29
2  Cosmetics Shop  0.14
3         Dog Run  0.14
4            Lake  0.14


----Forest Home----
               venue  freq
0        Golf Course  0.08
1  Convenience Store  0.08
2            Stadium  0.08
3               Lake  0.08
4     Ice Cream Shop  0.08


----Northeast Ithaca----
                 venue  freq
0        Shopping Mall  0.25
1           Playground  0.25
2                 Park  0.25
3       Nightlife Spot  0.25
4  American Restaurant  0.00


----Northwest Ithaca----
            venue  freq
0            Café  0.17
1  Science Museum  0.17
2  Scenic Lookout  0.17
3            Farm  0.17
4          Museum  0.17




In [38]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [39]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = durham_grouped['Neighborhood']

for ind in np.arange(durham_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(durham_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Albright,Food Truck,Coffee Roaster,Seafood Restaurant,Garden,Gas Station,Brewery,Grocery Store,Spa,Park,Discount Store
1,American Village,Concert Hall,Park,Dog Run,Bar,Yoga Studio,Fabric Shop,Food Truck,Food Court,Food,Flower Shop
2,Bennet Place,Theater,Chinese Restaurant,Salon / Barbershop,Supermarket,Discount Store,Gym / Fitness Center,Gymnastics Gym,Historic Site,American Restaurant,Restaurant
3,Burch Avenue,Bar,Pizza Place,Hotel,Coffee Shop,Plaza,Café,Brewery,Market,Southern / Soul Food Restaurant,Cocktail Bar
4,Carillon Forest,Recording Studio,Farm,Yoga Studio,Ethiopian Restaurant,Food Court,Food,Flower Shop,Fish Market,Fast Food Restaurant,Farmers Market


In [40]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = ithaca_grouped['Neighborhood']

for ind in np.arange(ithaca_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ithaca_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Cayuga Heights,Optical Shop,Shopping Mall,Café,Pizza Place,Gym / Fitness Center,Cosmetics Shop,American Restaurant,Bakery,Flower Shop,Bowling Alley
1,East Ithaca,Playground,Park,Lake,Cosmetics Shop,Dog Run,Tourist Information Center,Food Court,Bakery,Botanical Garden,Bowling Alley
2,Forest Home,Tourist Information Center,Convenience Store,Garden,Stadium,Golf Course,Ice Cream Shop,Lake,Food Court,College Lab,College Gym
3,Northeast Ithaca,Shopping Mall,Playground,Park,Nightlife Spot,Tourist Information Center,College Lab,Farm,Dog Run,Cosmetics Shop,Convenience Store
4,Northwest Ithaca,Gift Shop,Science Museum,Scenic Lookout,Café,Museum,Farm,Food Court,Bakery,Botanical Garden,Bowling Alley


In [41]:
durham_grouped['distance'] = 

Unnamed: 0,Neighborhood,Zoo Exhibit,Accessories Store,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Train Station,Used Bookstore,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,Albright,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,American Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bennet Place,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Burch Avenue,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,...,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Carillon Forest,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Central Park,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0
6,Cleveland-Holloway,0.0,0.0,0.0,0.021739,0.0,0.0,0.065217,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.021739
7,Colony Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Crest Street,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Croasdaile,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Results

Durham has a larger area of entertainment and is also convenient in transportation.

## Discussion

Dataset for Ithaca is much smaller than for Durham. More considerations should be made when a larger dataset is avaiable. Besides, information including house renting, cost of living, environments are not considered in this notebook.

## Conclusion

Maybe it is a better idea to live in Durham if just considering lives in these two places.