# Introduction

Atlanta, my hometown, is a highly diverse city with poorly-defined neighborhoods. While the neighborhoods are often defined by road boundaries, they are characterized by a number of other factors. In this project, I will use data from Four Square to examine the fluctuating popularity of restaurants of various kinds and use that information to classify locations into neighborhoods. I will then compare that data with google-assigned neighborhoods and determine if the two classifications are similar. 

There are a number of possible uses for this neighborhood clustering information, from consumers who would like to know where to go for a particular cuisine, to developers looking to fit a new restaurant into the neighborhood. Additionally, this information could be used by city planning officials in the permitting process.

# Data

This project will utilize data from 2 sources:
1. Trending restaurant data from Foursquare. This will be pulled between 6-7 PM on a Friday (peak restaurant selection time) and saved to a file, since repeated calls to the Foursquare API would result in different trending information. An extension of this project could be to examine the fluctuation of neighborhood boundaries over the course of a week, but that is beyond the scope of this project. 
2. Neighborhood Planning Unit data from The City of Atlanta's GIS database. Accessed through this URL: https://dcp-coaplangis.opendata.arcgis.com/datasets/npu/geoservice The city provides an API to interact with the GIS data, which I will use to extract the geometric boundaries of each neighborhood. The output of this call contains the following information: 
     
     "attributes": {
            "OBJECTID": 260,
            "LOCALID": null,
            "NAME": "K",
            "GEOTYPE": "NPU",
            "FULLFIPS": null,
            "LEGALAREA": null,
            "ACRES": 1528.29,
            "SQMILES": 2.39,
            "OLDNAME": null,
            "NPU": null
         },
         "geometry": {
            "rings": [
               [
                  [
                     -84.4173772073577,
                     33.772197013770004
                  ],
                  
The "Name" attribute is the neighborhood planning unit name, a proxy for a traditional neighborhood name like "midtown". The geometry information will be used to build a boundary box, and each trending restaurant will be placed within those boundaries and classified. 

# Initializations

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import folium
import folium
import requests
from pandas.io.json import json_normalize

# Methodology
## Connecting with the City of Atlanta's GIS system
### Exploring GIS results
The GIS system includes both the neighborhood planning unit (NPU) name, and coordinates for its boundaries. In this section, we will plot these boundaries as an initial exploration. The URL below was created using the City of Atlanta GIS API Explorer system listed in Section 2.

In [7]:
url = 'https://gis.atlantaga.gov/dpcd/rest/services/OpenDataService/FeatureServer/4/query?where=1%3D1&outFields=NAME&outSR=4326&f=json'

We will process the json file into a readable dataframe.

In [8]:
results = requests.get(url).json()
dataframe = json_normalize(results)
data = dataframe['features'][0]
cleandata = json_normalize(data)
cleandata.head()

Unnamed: 0,attributes.NAME,geometry.rings
0,T,"[[[-84.41391130598113, 33.75469930680354], [-8..."
1,K,"[[[-84.4173772073577, 33.772197013770004], [-8..."
2,C,"[[[-84.4175773783347, 33.83996741007558], [-84..."
3,S,"[[[-84.45199196698579, 33.73370062523282], [-8..."
4,R,"[[[-84.45466114171202, 33.721230458664635], [-..."


In order to visualize the structure of the official neighborhoods listed in the city's documentation, we will plot their borders using folium and the geometry in the above dataframe.

In [10]:
venues_map = folium.Map(tiles='Stamen Toner',location=[33.755845, -84.38902], zoom_start=10)

for index, row in cleandata.iterrows():
    coord = row['geometry.rings'][0][:]
    for ll in coord:
        lat = ll[1]
        long = ll[0]
        if index%2 == 0:
            folium.Circle(
                [lat, long],
                radius=3,
                fill=True
                ).add_to(venues_map)
venues_map

## Connect with Foursquare

In this hidden cell, the authorization criteria are provided for interacting with the Foursquare API.

In [None]:
# @hidden_cell
client_secret = 'PLV5C4EDNBSRH4AMBKOXNGHFZWHEWCGULGTQLPQZVQFJFYZO'
client_id = 'HE3V0C15YMFATHFBWRAZ0B32H12HYYSJ1HJ523U02CKPEO3M'
ver = '20180604'

In order to query the foursquare API, we have to specify a latitude, longitude, row limit, and radius in order to create the request URL.

In [None]:
lat = 33.755845
long = -84.38901
limit = 5000
radius=10000

In [None]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&limit={}&radius={}'.format(client_id, client_secret, lat, long, ver, limit,radius)

Using this URL, we interact with Foursquare and then process the results into a dataframe.

In [None]:
results = requests.get(url).json()
items = results['response']['groups'][0]['items']
dataframe = json_normalize(items)

Within this dataframe, the "Categories" column contains structured data, so we will define a function that extracts the category from this structure.

In [None]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

The results dataframe contains much more data than needed for this analysis, so we will create a smaller dataframe with only the information of interest.

In [None]:
data=pd.DataFrame()
data['name'] = dataframe['venue.name']
data['categories'] = dataframe['venue.categories']
data['latitude'] = dataframe['venue.location.lat']
data['longitude'] = dataframe['venue.location.lng']
data['neighborhood'] = dataframe['venue.location.neighborhood']
data['categories'] = data.apply(get_category_type, axis=1)

In [None]:
data.head()

## Classify each restaurant into a neighborhood planning unit

# Clustering Analysis
## Visualize Raw Data

In [None]:
venues_map = folium.Map(location=[lat, long], zoom_start=11)

for index, row in data.iterrows():
    lat = row['latitude']
    long = row['longitude']
    label = row['name']
    category = row['categories']
    
    folium.Circle(
        [lat, long],
        popup=label,
        radius=10,
        fill=True
        ).add_to(venues_map)

# display map
venues_map

## Perform clustering analysis on restaurant data

### Convert neighborhood strings into dummy variables
We will use the label encoder function from SKlearn to create an integer representing each given neighborhood.

In [None]:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = data['neighborhood'].to_list()
neighborhood_labels = le.fit_transform(y)
data['neighborhood'] = neighborhood_labels
data.head()

### Cross Validation Tuning

In [None]:
from sklearn.model_selection import cross_val_score
neighbors = list(range(1,11,1))
cv_scores = []
for k in neighbors:
    knn = KNeighborsClassifier(n_neighbors=k)
    scores = cross_val_score(knn, X_train, y_train, cv=10, scoring='accuracy')
    cv_scores.append(scores.mean())
    
plt.plot(neighbors, cv_scores)
plt.xlabel('K Value')
plt.ylabel('Score')
plt.show()

### Perform K Nearest Neighbors Classification with the optimal number of clusters.
We will use the graph above to select the best k value

In [None]:
k = 5
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
pred = knn.predict(X_test)
print(knn.score(X_test,y_test, pred))
print(pred)

## Compare results of analysis to City of Atlanta Neighborhood data