# Neighborhood Locations and Nearby Venue Analysis of Nashville, TN  

**James Newsome**

**10/12/2019**

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Data-Collection" data-toc-modified-id="Data-Collection-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Data Collection</a></span></li><li><span><a href="#Data-Processing" data-toc-modified-id="Data-Processing-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Data Processing</a></span><ul class="toc-item"><li><span><a href="#Import-Libraries-and-Initial-Data" data-toc-modified-id="Import-Libraries-and-Initial-Data-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Import Libraries and Initial Data</a></span></li><li><span><a href="#Collect-and-Clean-Location-Data" data-toc-modified-id="Collect-and-Clean-Location-Data-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Collect and Clean Location Data</a></span><ul class="toc-item"><li><span><a href="#Neighborhoods-Data" data-toc-modified-id="Neighborhoods-Data-3.2.1"><span class="toc-item-num">3.2.1&nbsp;&nbsp;</span>Neighborhoods Data</a></span></li><li><span><a href="#Bank-of-America-Data" data-toc-modified-id="Bank-of-America-Data-3.2.2"><span class="toc-item-num">3.2.2&nbsp;&nbsp;</span>Bank of America Data</a></span></li><li><span><a href="#Dog-Park-Data" data-toc-modified-id="Dog-Park-Data-3.2.3"><span class="toc-item-num">3.2.3&nbsp;&nbsp;</span>Dog Park Data</a></span></li><li><span><a href="#Schools-Data" data-toc-modified-id="Schools-Data-3.2.4"><span class="toc-item-num">3.2.4&nbsp;&nbsp;</span>Schools Data</a></span></li><li><span><a href="#Combine-and-Filter-Findings" data-toc-modified-id="Combine-and-Filter-Findings-3.2.5"><span class="toc-item-num">3.2.5&nbsp;&nbsp;</span>Combine and Filter Findings</a></span></li></ul></li><li><span><a href="#Collect-and-Clean-Venue-Data" data-toc-modified-id="Collect-and-Clean-Venue-Data-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Collect and Clean Venue Data</a></span></li></ul></li><li><span><a href="#Results" data-toc-modified-id="Results-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Results</a></span></li></ul></div>

## Introduction

This report is centered around a Bank of America executive needing to move from New York City to Nashville, TN. He currently resides in Riverdale neighborhood in The Bronx. An analysis of the neighborhoods in Nashville will need to be done to find ones that meet the following four criteria:  
  - It must be within 5 miles of a Bank of America branch, as he can set up his office at any branch.  
  - It must lie within 3 miles of a charter high school, as he is preferential to charter schools. The minimal distance will ensure that his two kids can get to and from school even if he can’t get there in time to pick them up.   
  - There must be a dog park within ½ mile so that he can easily take his dog on trips to the park.  
  - The neighborhood venues should be as similar to what is available in Riverdale as possible.


## Data Collection

The data for locations of Bank of America branches and dog parks, as well as local venue data was obtained from FourSquare. This data needed very little cleaning, mainly just filtering out Bank of America locations that didn’t contain an office, such as ATM locations. The type and location of schools was obtained from https://data.nashville.gov/Education/Metro-Nashville-Public-School-Locations/4qyp-5xc3 and has been downloaded and named 'NashSchool.csv'. This was used because Foursquare does not differentiate charter schools from regular public schools. This data needed to be cleaned to eliminate schools that were not charter schools or high schools. The geographic locations for the neighborhoods in Nashville were obtained from https://data.nashville.gov/Metro-Government/Neighborhood-Association-Boundaries-GIS-/qytv-2cu8 and the corresponding downloaded csv is entitled 'NashvilleNeighborhoods.csv'. This data did not contain a single point coordinate for the neighborhood, but rather multiple points that defined the perimeter of each neighborhood. This need to be cleaned and then the points averaged to find a single, somewhat centered, point as the location for each neighborhood.
The geographic coordinates of Nashville were obtained from https://www.latlong.net/place/nashville-tn-usa-2899.html.

## Data Processing

### Import Libraries and Initial Data

In [1]:
import pandas as pd
import numpy as np
import json
import requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline 
import folium # map rendering library
import math
from sklearn.cluster import KMeans # for clustering

FourSquare credentials must be input in the below code to in oder to complete the data processing.

In [2]:
fsid = 'FOURSQUAREID' # Input Foursquare ID
fssecret = 'FOURSQUARESECRET' # Input Foursquare Secret
version = '20191012' # Foursquare API version

Geographic coordinates of Nashville  
Otained from https://www.latlong.net/place/nashville-tn-usa-2899.html.

In [3]:
lat = 36.174465
long = -86.767960

Nashville neighborhoods dataset obtained from:  
https://data.nashville.gov/Metro-Government/Neighborhood-Association-Boundaries-GIS-/qytv-2cu8

In [4]:
nash = pd.read_csv('DataFiles/NashvilleNeighborhoods.csv')
nash.head(3)

Unnamed: 0,the_geom,NAME
0,MULTIPOLYGON (((-86.79511056795417 36.17575964...,Historic Buena Vista
1,MULTIPOLYGON (((-86.87459668651866 36.15757702...,Charlotte Park
2,MULTIPOLYGON (((-86.87613708067906 36.13554098...,Hillwood


Nashville schools dataset obtained from:  
https://data.nashville.gov/Education/Metro-Nashville-Public-School-Locations/4qyp-5xc3

In [5]:
school = pd.read_csv('DataFiles/NashSchools.csv')
school.head(3)

Unnamed: 0,School ID,School Name,Street Address,City,State,ZIP Code,Phone Number,School State ID,School Level,Lowest Grade,Highest Grade,Latitude,Longitude,School Website,Cluster,Principal Name,Mapped Location
0,310,J.E. Moss Elementary,4701 Bowfield Drive,Antioch,TN,37013,(615)333-5200,370,Elementary School,Grade P3,Grade 4,36.067353,-86.669747,http://schools.mnps.org/je-moss-elementary-school,Antioch,Mr. Anthony Febles,"(36.06735338, -86.66974658)"
1,687,Smithson Craighead Academy,730 Neely�s Bend Road,Nashville,TN,37115,(615)228-9886,8001,Charter,Grade K,Grade 4,36.251025,-86.69662,http://www.projectreflect.org/smithson-craighe...,,Mr. Ahmed White,"(36.25102507, -86.69661996)"
2,532,McGavock High,3150 McGavock Pike,Nashville,TN,37214,(615)885-8850,470,High School,Grade 9,Grade 12,36.185443,-86.677849,http://schools.mnps.org/mcgavock-high-school,McGavock,Mr. Robbin Wall,"(36.18544319, -86.67784935)"


### Collect and Clean Location Data

#### Neighborhoods Data

Nashville contains 288 neighborhoods. The data for each neighborhood contained a list of latitude and longitude coordinates that marked the outline of the neighborhood. To define a single point, a loop was run to determine the mean latitude and longitude for each neighborhood, and this point was used as the neighborhoods location. This could possibly result in a single point that lies outside the actual neighborhood, but due to the relatively small size of a neighborhood, this did not seem to be too much of a concern. Once each neighborhood was located, a map was created with Folium to display each neighborhood

In [6]:
numberofneighborhoods = nash.shape[0]
print('There are', numberofneighborhoods, 'neighborhoods in Nashville')

There are 288 neighborhoods in Nashville


The geographic coordinates for each neighborhood ('the_geom') is a string that has some unneeded characters and then groups of coordinates that designate vertices of the neighborhood polygon. Each pair of coordinates are grouped and separated by a comma. First, step is to remove the superfluous characters.

In [7]:
for i in range(numberofneighborhoods):
    nash.iat[i, 0] = nash.iloc[i][0].lstrip('MULTIPOLYGON').replace('(',"").replace(')',"").split(', ')

Obtain the mean of the 'the_geom' column by separating each coordinate into lat/long. Currently each lat/long set is separated by a comma. This loop may take could take several minutes to complete, especially on slower computers.

In [8]:
nash['Longitude'] = ""  # Create an empty Longitude column
nash['Latitude'] = ""  # Create an empty Latitude column

# Loop to convert list of string coordinates into single float type lat and long coordinates
for neighborhood in range(numberofneighborhoods):
    temparray = np.array(nash.iloc[neighborhood][0])
    numrows = temparray.shape[0]  # Finds the number of lat and longs for this neighborhood
    array1 = [[0 for col in range(2)] for row in range(numrows)]
    for i in range(numrows):
        array1[i] = temparray[i].split()
        tempdf = pd.DataFrame(array1, columns=['Long', 'Lat']).astype('float')
        thislat = tempdf['Lat'].mean()
        thislong = tempdf['Long'].mean()
        nash.at[neighborhood, 'Latitude'] = thislat
        nash.at[neighborhood, 'Longitude'] = thislong
# Drop unneeded column ('the_geom'), and rename column 'NAME'
nash = nash.drop('the_geom', axis=1)
nash.rename(columns={'NAME': 'Neighborhood'}, inplace=True)
nash.head(3)

Unnamed: 0,Neighborhood,Longitude,Latitude
0,Historic Buena Vista,-86.7973,36.1763
1,Charlotte Park,-86.8751,36.1504
2,Hillwood,-86.8762,36.1272


Map the neighborhoods in Nashville

In [9]:
# Map of Nashville Neighborhoods using latitude and longitude of Nashville to center map
nashmap = folium.Map(location=[lat, long], zoom_start=11)

# add markers to map
for nashlat, nashlong, nashneigh in zip(nash['Latitude'], nash['Longitude'], nash['Neighborhood']):
    label = '{}'.format(nashneigh)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([nashlat, nashlong], radius=2, popup=label, color='green', fill=True, fill_color='green',
        fill_opacity=1, parse_html=True).add_to(nashmap)  
    
nashmap

#### Bank of America Data

The Bank of America locations data can be obtained from FourSquare. However, locations that are only ATMs cannot be considered as the businessman needs office space. Due to the size of the output of the json file, it is not displayed below the code.

In [10]:
# search for Bank of America locations
boaquery = 'Bank of America'
boaradius = 16100  # Approximately 10 mile search radius
limit = 100

# Define the corresponding URL
boaurl = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'\
.format(fsid, fssecret, lat, long, version, boaquery, boaradius, limit)

# Send the GET Request
boaresults = requests.get(boaurl).json()
# boaresults # deselect to see results of the input

Need to obtain the information under 'response:venues' to gather Bank of America locations.

In [11]:
boainput = boaresults['response']['venues']
boa = json_normalize(boainput)
print('There were', boa.shape[0], 'locations pulled from this search.')
boa.head(3)

There were 20 locations pulled from this search.


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet
0,4b96ad25f964a5204fdd34e3,Bank of America ATM,"[{'id': '52f2ab2ebcbc57f1066b8b56', 'name': 'A...",v-1585162985,False,800 Main St,36.175173,-86.75821,"[{'label': 'display', 'lat': 36.17517268834245...",879,37206.0,US,Nashville,TN,United States,"[800 Main St, Nashville, TN 37206, United States]",
1,4c3f162980bc20a1ceddab58,Bank of America ATM,"[{'id': '52f2ab2ebcbc57f1066b8b56', 'name': 'A...",v-1585162985,False,6602 Charlotte Pike,36.139093,-86.881173,"[{'label': 'display', 'lat': 36.13909322309947...",10910,37209.0,US,Nashville,TN,United States,"[6602 Charlotte Pike (at Hillwood Blvd), Nashv...",at Hillwood Blvd
2,4e73ce7aae60c3285064506e,Bank Of America ATM,"[{'id': '52f2ab2ebcbc57f1066b8b56', 'name': 'A...",v-1585162985,False,2nd Ave N,36.163039,-86.776059,"[{'label': 'display', 'lat': 36.16303947625448...",1465,,US,Nashville,TN,United States,"[2nd Ave N, Nashville, TN, United States]",


It is not visible in the output above since only 3 rows are displayed, but not all locations that were pulled are Bank of America. Further clean this dataset by subsetting and renaming necessary columns and display all 20 rows.

In [12]:
boa = boa[['location.formattedAddress', 'location.lat', 'location.lng', 'name' ]]
boa.columns = ['Location', 'Latitude', 'Longitude', 'Name']
boa

Unnamed: 0,Location,Latitude,Longitude,Name
0,"[800 Main St, Nashville, TN 37206, United States]",36.175173,-86.75821,Bank of America ATM
1,"[6602 Charlotte Pike (at Hillwood Blvd), Nashv...",36.139093,-86.881173,Bank of America ATM
2,"[2nd Ave N, Nashville, TN, United States]",36.163039,-86.776059,Bank Of America ATM
3,"[222 2nd Ave S, Nashville, TN 37201, United St...",36.159591,-86.773196,Bank of America Private Client Advisor Madelei...
4,"[2720 Lebanon Pike (Donelson), Nashville, TN 3...",36.168959,-86.666487,Bank of America
5,"[645 Thompson Ln, Nashville, TN 37204, United ...",36.110936,-86.755696,Bank of America
6,"[4661 Nolensville Pike, Nashville, TN 37211, U...",36.06822,-86.719783,Bank of America
7,"[111 N 1st St (exit 48,I-24 west), Nashville, ...",36.171148,-86.771773,TravelCenters of America
8,"[1033 Demonbreun St, Nashville, TN 37203, Unit...",36.163542,-86.779131,The Bank of Nashville
9,"[150 4th Ave N, Nashville, TN 37219, United St...",36.162863,-86.77794,Smile Direct Club


Remove all ATM locations and entries that are not Bank of Americalocations.

In [13]:
boa = boa[(boa.Name == 'Bank Of America') | (boa.Name == 'Bank of America') | (boa.Name == 'Bank of America Building')].reset_index(drop=True)
print('There are', boa.shape[0], 'Bank of America locations within 10 miles of the  center of Nashville.')

There are 3 Bank of America locations within 10 miles of the  center of Nashville.


Display the map of the Bank of America locations showing the 5 mile radius to their locations. This is the requirement that the new neighborhood be within this radius.

In [14]:
# Map of Bank of America locations with 5 mile radius circles
boamap = folium.Map(location=[lat, long], zoom_start=11)

# add markers to map
for boalat, boalong, boaloc in zip(boa['Latitude'], boa['Longitude'], boa['Location']):
    label = '{}'.format(boaloc)
    label = folium.Popup(label, parse_html=True)
    # Circle used (not CircleMarker) in order to use a radius of meters not pixels
    folium.Circle([boalat, boalong],radius=8046, popup=label, color='blue', fill=True, fill_color='blue',
        fill_opacity=.1, parse_html=True).add_to(boamap)  
    
boamap

#### Dog Park Data

The dog park locations data can be obtained from FourSquare. It is a requirement that the neighbohood lie within 1/2 mile of a dog park. Due to the size of the output of the json file, it is not displayed below the code.

In [15]:
# search for dog parks
dogparkquery = 'Dog Park'
dogparkradius = 26100  # Approx 10 miles search radius

# Define the corresponding URL
dogparkurl = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'\
.format(fsid, fssecret, lat, long, version, dogparkquery, dogparkradius, limit)

# Send the GET Request and examine the results
dogparkresults = requests.get(dogparkurl).json()
# dogparkresults

As with the Bank of America dataset, the information under 'response:venues' specifies the  dog park locations information.

In [16]:
dogparkinput = dogparkresults['response']['venues']
dogpark = json_normalize(dogparkinput)
dogpark.head(3)

Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.city,location.state,location.country,location.formattedAddress,location.address,location.crossStreet,location.postalCode
0,570171ef498e711c9192ac12,Riverfront Dog Park,"[{'id': '4bf58dd8d48988d1e5941735', 'name': 'D...",v-1585162985,False,36.16118,-86.772976,"[{'label': 'display', 'lat': 36.16118, 'lng': ...",1546,US,Nashville,TN,United States,"[Nashville, TN, United States]",,,
1,4b116d32f964a520597c23e3,Centennial Dog Park,"[{'id': '4bf58dd8d48988d1e5941735', 'name': 'D...",v-1585162985,False,36.148532,-86.81756,"[{'label': 'display', 'lat': 36.14853199422004...",5310,US,Nashville,TN,United States,"[31st Ave. S and Park Plaza (Parthenon Ave.), ...",31st Ave. S and Park Plaza,Parthenon Ave.,37203.0
2,539318ec498e4cf319a9389d,Donelson Dog Park,"[{'id': '4bf58dd8d48988d1e5941735', 'name': 'D...",v-1585162985,False,36.19135,-86.67563,"[{'label': 'display', 'lat': 36.19134979362192...",8506,US,Nashville,TN,United States,"[Nashville, TN 37214, United States]",,,37214.0


Further clean this dataset by subsetting and renaming necessary columns.

In [17]:
dogpark = dogpark[['location.formattedAddress', 'location.lat', 'location.lng', 'name' ]]
dogpark.columns = ['Location', 'Latitude', 'Longitude', 'Name']
print('There were', dogpark.shape[0], 'dog parks pulled from this search.')
dogpark.head(3)


There were 50 dog parks pulled from this search.


Unnamed: 0,Location,Latitude,Longitude,Name
0,"[Nashville, TN, United States]",36.16118,-86.772976,Riverfront Dog Park
1,"[31st Ave. S and Park Plaza (Parthenon Ave.), ...",36.148532,-86.81756,Centennial Dog Park
2,"[Nashville, TN 37214, United States]",36.19135,-86.67563,Donelson Dog Park


Display the map of the dog park locations showing the 1/2 mile radius to their locations.

In [18]:
# Map of dog parks locations with 1/2 mile radius circles
dpmap = folium.Map(location=[lat, long], zoom_start=11)

# add markers to map
for dplat, dplong, dploc in zip(dogpark['Latitude'], dogpark['Longitude'], dogpark['Location']):
    label = '{}'.format(dploc)
    label = folium.Popup(label, parse_html=True)
    # Circle used (not CircleMarker) in order to use a radius of meters not pixels
    folium.Circle([dplat, dplong], radius=805, popup=label, color='red', fill=True, fill_color='red',
        fill_opacity=.1, parse_html=True).add_to(dpmap)  

dpmap

#### Schools Data

The type and location of schools was obtained from https://data.nashville.gov/Education/Metro-Nashville-Public-School-Locations/4qyp-5xc3, has been downloaded and named 'NashSchool.csv', and has been previously loaded to the variable 'school'.

In [19]:
print('There are', school.shape[0], 'schools in Nashville')

There are 169 schools in Nashville


Only Charter high schools are of concern, so eliminate schools that are not charter or do not go though grade 12.

In [20]:
# Keep only necessary columns
school = school[['School Name', 'School Level', 'Lowest Grade', 'Highest Grade', 'Latitude', 'Longitude']]
school = school[school['Highest Grade'] == 'Grade 12']
school = school[school['School Level'] == 'Charter']
print('There are only', school.shape[0], 'schools that meet this criteria.')
school.head()

There are only 5 schools that meet this criteria.


Unnamed: 0,School Name,School Level,Lowest Grade,Highest Grade,Latitude,Longitude
59,STEM Prep High School,Charter,Grade 9,Grade 12,36.135417,-86.738622
91,KIPP Nashville Collegiate High School,Charter,Grade 9,Grade 12,36.194798,-86.769186
131,Knowledge Academies High School,Charter,Grade 9,Grade 12,36.045212,-86.651873
154,RePublic High School,Charter,Grade 9,Grade 12,36.2421,-86.779309
164,LEAD Academy,Charter,Grade 9,Grade 12,36.147885,-86.763951


Display the map of these 5 schools showing the 3 mile radius to their locations as required.

In [21]:
# Map of charter high schools with 3 mile radius circles
schoolmap = folium.Map(location=[lat, long], zoom_start=11)

# add markers to map
for schoollat, schoollong, schoolname in zip(school['Latitude'], school['Longitude'], school['School Name']):
    label = '{}'.format(schoolname)
    label = folium.Popup(label, parse_html=True)
    folium.Circle([schoollat, schoollong], radius=4828, popup=label, color='yellow', fill=True, fill_color='yellow',
        fill_opacity=.1, parse_html=True).add_to(schoolmap)

schoolmap

#### Combine and Filter Findings

All of these locations (neighborhoods, Bank of America, dog parks, chart high schools) can be combined into one map. For a neighborhood to meet the project requirements, a neighborhood (designated with a green dot) must lie within 5 miles of a Bank of America office location (blue shaded circle), 1/2 mile of a dog park (red shaded circle) and 3 miles of a charter high school (yellow shaded circle).

In [22]:
mapall = folium.Map(location=[lat, long], zoom_start=11)

# BOA markers
for boalat, boalong, boaloc in zip(boa['Latitude'], boa['Longitude'], boa['Location']):
    label = '{}'.format(boaloc)
    label = folium.Popup(label, parse_html=True)
    folium.Circle([boalat, boalong],radius=8046, popup=label, color='blue', fill=True, fill_color='blue',
        fill_opacity=.1, parse_html=True).add_to(mapall)  
    
# Dog park markers
for dplat, dplong, dploc in zip(dogpark['Latitude'], dogpark['Longitude'], dogpark['Location']):
    label = '{}'.format(dploc)
    label = folium.Popup(label, parse_html=True)
    folium.Circle([dplat, dplong], radius=805, popup=label, color='red', fill=True, fill_color='red',
        fill_opacity=.1, parse_html=True).add_to(mapall)  
  
    
# Charter high school markers
for schoollat, schoollong, schoolname in zip(school['Latitude'], school['Longitude'], school['School Name']):
    label = '{}'.format(schoolname)
    label = folium.Popup(label, parse_html=True)
    folium.Circle([schoollat, schoollong], radius=4828, popup=label, color='yellow', fill=True, fill_color='yellow',
        fill_opacity=.1, parse_html=True).add_to(mapall)

# add neighborhood markers to map
for nashlat, nashlong, nashneigh in zip(nash['Latitude'], nash['Longitude'], nash['Neighborhood']):
    label = '{}'.format(nashneigh)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([nashlat, nashlong], radius=2, popup=label, color='green', fill=True, fill_color='green',
        fill_opacity=1, parse_html=True).add_to(mapall)  

mapall

This map is overwhelming when trying to make sure that all green dots that lie within all three shaded circles is found. All neighborhoods that do not meet the qualifying criteria need to be removed to simplyify the results. To do this, the distance between locations must be determined. The Haversine formula is used to determine the distance between places using latitude and longitude coordinates. This formula is:

<img src="files/DataFiles/haversine.png">

To use this formula, it is necessary to know the volumetric mean radius of the Earth, which was obtained at
https://nssdc.gsfc.nasa.gov/planetary/factsheet/earthfact.html and is 6371.000 kilometers which is 3958.756 miles.

First the neighborhoods are tested against each school to see if they are within the required 3 mile radius. Then duplicates are eliminated since a neighborhood can be within 3 miles of moe than one charter high school.

In [23]:
r = 3958.756 # Set the volumentric radius of earth 

schooldistance = pd.DataFrame()
schoolnum = school.shape[0]  # for the loop
neighborhoodnum = nash.shape[0]  # for the loop
nash['School'] = ''
nash['Distance'] = ''
school.reset_index(drop=True, inplace=True)  # needed to loop through the dataframe

# Loop through all charter high schools to determine distance to each neighborhood
for j in range(schoolnum):
    schoolname = school.loc[j, 'School Name']
    schoollat = school.loc[j, 'Latitude']
    schoollong = school.loc[j, 'Longitude']
    slat = schoollat * math.pi / 180  # Converts degrees to radians
    slong = schoollong * math.pi / 180  # Converts degrees to radians
    for i in range(neighborhoodnum):
        nlat = math.pi * nash.loc[i, 'Latitude'] / 180  # Neighborhood lat to radians
        nlong = math.pi * nash.loc[i, 'Longitude'] / 180  # Neighborhood long to radians
        a = math.sin(nlat - slat) / 2  # First term under square root of Haversine, needs to be squared
        b = math.cos(slat)  # Second term
        c = math.cos(nlat)  # Third term
        d = math.sin(nlong - slong) / 2  # Last term of Haversine, needs to be squared
        distance = 2 * r * math.asin(math.sqrt(a**2 + b * c * d**2))  # Haversine formula for distance
        nash.loc[i, 'Distance'] = distance
        nash.loc[i, 'School'] = schoolname
    # Dataframe of all neighborhoods within 3 miles of that school
    withinrange = nash[(nash.Distance <= 3.0)]
    schooldistance = pd.concat([schooldistance, withinrange])
schooldistance.drop_duplicates(subset=['Neighborhood'], keep='first', inplace=True)
print('Of the original 288 neighborhoods, only', schooldistance.shape[0], 'are within 3 miles of a charter high school.')

Of the original 288 neighborhoods, only 91 are within 3 miles of a charter high school.


This narrows the results down to 91 schools. Now each of these 91 schools should be checked for distance to dog parks.

In [24]:
nash1 = schooldistance.reset_index(drop=True)
nash1.rename(columns={'School': 'Park'}, inplace=True)
dogparkdistance = pd.DataFrame()
dogparknum = dogpark.shape[0]  # for the loop
neighborhoodnum = nash1.shape[0]  # for the loop
dogpark.reset_index(drop=True, inplace=True)  # needed to loop through the dataframe

# Loop through all dog parks to determine distance to each neighborhood
for j in range(dogparknum):
    dpname = dogpark.loc[j, 'Name']
    dplat = dogpark.loc[j, 'Latitude']
    dplong = dogpark.loc[j, 'Longitude']
    dlat = dplat * math.pi / 180  # Converts degrees to radians
    dlong = dplong * math.pi / 180  # Converts degrees to radians
    for i in range(neighborhoodnum):
        nlat = math.pi * nash1.loc[i, 'Latitude'] / 180  # Neighborhood lat to radians
        nlong = math.pi * nash1.loc[i, 'Longitude'] / 180  # Neighborhood long to radians
        a = math.sin(nlat - dlat) / 2  # First term under square root of Haversine, needs to be squared
        b = math.cos(dlat)  # Second term
        c = math.cos(nlat)  # Third term
        d = math.sin(nlong - dlong) / 2  # Last term of Haversine, needs to be squared
        distance = 2 * r * math.asin(math.sqrt(a**2 + b * c * d**2))  # Haversine formula for distance
        nash1.loc[i, 'Distance'] = distance  # Fills distance column of 
        nash1.loc[i, 'Park'] = dpname
    # Dataframe of all neighborhoods within 1/2 mile of that dog park
    withinrange = nash1[(nash1.Distance <= 0.5)]
    dogparkdistance = pd.concat([dogparkdistance, withinrange])
dogparkdistance.drop_duplicates(subset=['Neighborhood'], keep='first', inplace=True)
print('Of the remaining 91 neighborhoods, only', dogparkdistance.shape[0], 'are within 1/2 mile of a dog park.')

Of the remaining 91 neighborhoods, only 8 are within 1/2 mile of a dog park.


The number of neighborhoods has now been reduced to just 8 that meet the criteria. Now these final 8 should be checked to see if they are within 5 miles of a Bank of America office location.

In [25]:
nash2 = dogparkdistance.reset_index(drop=True)
nash2.rename(columns={'Park': 'Address'}, inplace=True)
boadistance = pd.DataFrame()
boanum = boa.shape[0]  # for the loop
neighborhoodnum = nash2.shape[0]  # for the loop
boa.reset_index(drop=True, inplace=True)  # needed to loop through the dataframe

# Loop through all Bank of Americas to determine distance to each neighborhood
for j in range(boanum):
    boaaddress = boa.loc[j, 'Location']
    boalat = boa.loc[j, 'Latitude']
    boalong = boa.loc[j, 'Longitude']
    blat = boalat * math.pi / 180  # Converts degrees to radians
    blong = boalong * math.pi / 180  # Converts degrees to radians
    for i in range(neighborhoodnum):
        nlat = math.pi * nash2.loc[i, 'Latitude'] / 180  # Neighborhood lat to radians
        nlong = math.pi * nash2.loc[i, 'Longitude'] / 180  # Neighborhood long to radians
        a = math.sin(nlat - blat) / 2  # First term under square root of Haversine, needs to be squared
        b = math.cos(blat)  # Second term
        c = math.cos(nlat)  # Third term
        d = math.sin(nlong - blong) / 2  # Last term of Haversine, needs to be squared
        distance = 2 * r * math.asin(math.sqrt(a**2 + b * c * d**2))  # Haversine formula for distance
        nash2.loc[i, 'Distance'] = distance  # Fills distance column of 
        nash2.loc[i, 'Address'] = boaaddress
    # Dataframe of all neighborhoods within 5 miles of that Bank of America location
    withinrange = nash2[(nash2.Distance <= 5)]
    boadistance = pd.concat([boadistance, withinrange])
boadistance.drop_duplicates(subset=['Neighborhood'], keep='first', inplace=True)
print('Of the final 8 neighborhoods, only', boadistance.shape[0], 'are within 5 miles of a Bank of America office.')

Of the final 8 neighborhoods, only 6 are within 5 miles of a Bank of America office.


There are only 6 neighborhoods that meet the requirements specified.

In [26]:
final = boadistance[['Neighborhood', 'Latitude', 'Longitude']].copy()
print('The qualifying neighborhoods are:', ", ".join([str(x) for x in final.Neighborhood]))

The qualifying neighborhoods are: Eastwood, Greenwood, Renraw, Urban Residents, Bransford Avenue, Cambridge Forest


### Collect and Clean Venue Data

Geographic coordinates Riverdale, The Bronx were obtained from: http://elevation.maplogs.com/poi/riverdale_bronx_ny_usa.178337.html

In [27]:
rlat = 40.8940853
rlong = -73.910997

These are the qualifying neighborhoods along with their geographic coordinates. Riverdale has also been added.

In [28]:
final.loc[8] = ['Riverdale, Bronx', rlat, rlong]
print(final.to_string(index=False))

     Neighborhood Latitude Longitude
         Eastwood  36.1883  -86.7396
        Greenwood  36.1886  -86.7475
           Renraw  36.1958   -86.748
  Urban Residents  36.1618  -86.7776
 Bransford Avenue  36.1309   -86.768
 Cambridge Forest  36.0673  -86.6489
 Riverdale, Bronx  40.8941   -73.911


The new neighborhood should be as similar as possible to the Riverdale neighborhood in The Bronx by way of comparing nearby venues. The venues were obtained by using the FourSquare API. A maximum of 250 venues within 1 mile (1609 meters).

In [29]:
def getNearbyVenues(names, latitudes, longitudes, radius=1609, maxlimit = 250):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            fsid, fssecret, version, lat, lng, radius, maxlimit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(name, lat, lng, v['venue']['name'], v['venue']['location']['lat'], 
            v['venue']['location']['lng'], v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 
                  'Venue Latitude', 'Venue Longitude', 'Venue Category']
    return(nearby_venues)
venues = getNearbyVenues(names=final['Neighborhood'], latitudes=final['Latitude'], longitudes=final['Longitude'])
venues.head(3)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Eastwood,36.18829,-86.739641,Spot's Pet Supply & Dog Wash,36.188998,-86.745649,Pet Store
1,Eastwood,36.18829,-86.739641,Jeni's Splendid Ice Creams,36.182483,-86.735633,Ice Cream Shop
2,Eastwood,36.18829,-86.739641,Eastland Cafe,36.182728,-86.735866,American Restaurant


In [30]:
venuecount = venues.groupby('Neighborhood').count()
for i in range(venuecount.shape[0]):
    print('There are', venuecount.iloc[i,0], 'venues within 1 mile of', venuecount.index[i])

There are 63 venues within 1 mile of Bransford Avenue
There are 34 venues within 1 mile of Cambridge Forest
There are 72 venues within 1 mile of Eastwood
There are 91 venues within 1 mile of Greenwood
There are 45 venues within 1 mile of Renraw
There are 76 venues within 1 mile of Riverdale, Bronx
There are 100 venues within 1 mile of Urban Residents


At this point, the number of venues are known, but the types of each venue still need to be classified as well as how many of each type there are.

In [31]:
onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")  # one hot encoding
onehot['Neighborhood'] = venues['Neighborhood'] # add neighborhood column back to dataframe since it was set to the index
fixed_columns = [onehot.columns[-1]] + list(onehot.columns[:-1]) # move neighborhood column to the first column
onehot = onehot[fixed_columns]

grouped = onehot.groupby('Neighborhood').mean().reset_index()
i = 0
for hood in grouped['Neighborhood']:
    temp = grouped[grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['Venue','Freq']
    temp = temp.iloc[1:]
    temp['Freq'] = temp['Freq'].astype(float)
    temp = temp.sort_values('Freq', ascending=False)
    top3 = temp.head(3)
    print('Top 3 (out of', venuecount.iloc[i,0],') most common venues in', hood, 'are:', ', '.join([str(x) for x in top3.Venue]))
    i=i+1

Top 3 (out of 63 ) most common venues in Bransford Avenue are: Coffee Shop, Gay Bar, Thrift / Vintage Store
Top 3 (out of 34 ) most common venues in Cambridge Forest are: Pizza Place, Fast Food Restaurant, Sandwich Place
Top 3 (out of 72 ) most common venues in Eastwood are: Bar, American Restaurant, Coffee Shop
Top 3 (out of 91 ) most common venues in Greenwood are: Bar, American Restaurant, Coffee Shop
Top 3 (out of 45 ) most common venues in Renraw are: Fast Food Restaurant, Thrift / Vintage Store, Discount Store
Top 3 (out of 76 ) most common venues in Riverdale, Bronx are: Pizza Place, Burger Joint, Mexican Restaurant
Top 3 (out of 100 ) most common venues in Urban Residents are: Bar, Hotel, Music Venue


From here, the neighborhoods can be clustered according to the types of of venus to group them together in terms of similarity.

In [32]:
kclusters = 3  # more than 3 put Riverdale in a cluster by itself

clusters = grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(clusters)
kmeans.labels_[:] # check cluster labels to verify Riverdale is not isolated, trial and error since number of elements is low

array([1, 2, 1, 1, 1, 1, 0])

In [33]:
# Add clustering labels
sevenvenues = grouped[:]  # The seven neighborhoods with their venue info
sevencoords = final[:].reset_index(drop=True) # The seven neighborhoods with their geogrpahic coordinates
sevenvenues.insert(0, 'Clusters', kmeans.labels_)
sevencoords = sevencoords.join(sevenvenues.set_index('Neighborhood'), on='Neighborhood')

# Find cluster number for Riverdale
riverdaleindex = sevencoords.loc[sevencoords['Neighborhood'] == 'Riverdale, Bronx'].index[0]
riverdaleclusternumber = sevencoords.loc[riverdaleindex, 'Clusters']
print('Riverdale is in cluster', riverdaleclusternumber)

Riverdale is in cluster 1


## Results

In [34]:
# Find other neighborhoods in the same cluster
samecluster = sevencoords[sevencoords.Clusters == riverdaleclusternumber]
# Remove Riverdale
qualifying = samecluster[samecluster.Neighborhood != 'Riverdale, Bronx'].reset_index(drop=True)
print('These neighborhoods meet all location criteria and are the most similar to Riverdale, The Bronx in terms of nearby venues:')
print('\n'.join([str(x) for x in qualifying.Neighborhood]))

These neighborhoods meet all location criteria and are the most similar to Riverdale, The Bronx in terms of nearby venues:
Eastwood
Greenwood
Renraw
Bransford Avenue


In [35]:
# Map of final Nashville Neighborhoods using latitude and longitude of Nashville to center map
qualifyingmap = folium.Map(location=[lat, long], zoom_start=12)

# add markers to map
for finallat, finallong, finalneigh in zip(qualifying['Latitude'], qualifying['Longitude'], qualifying['Neighborhood']):
    label = '{}'.format(finalneigh)
    folium.CircleMarker([finallat, finallong], radius=2, color='purple', fill=True, fill_color='purple', fill_opacity=1,
        parse_html=True).add_child(folium.Popup(label)).add_to(qualifyingmap)  
    
qualifyingmap

Out of the 288 neighborhoods, only 6 met the location proximity requirements. From here, two neighborhoods were more dissimilar to Riverdale than the other and could be excluded. The neighborhoods of Eastwood, Greenwood, Renraw, and Bransford Ave have the best chances of being most similar to Riverdale. Three of these are in close proximity to each other while one is on the other side of town. If necessary, each of these locations could be listed along with which charter schools were within that 3 mile distance so that those few schools could be compared. The same could be done for Bank of America offices and dog parks.