# Capstone Project - the Battle of Neighborhoods
### 1.1 Intoduction
    I am a data science student currently pursuing master degree in the University of Michigan, Ann Arbor. I hope to find a job in New York after graduation. New York is a big city unlike Ann Arbor, I would like to find a affordable place to live in Manhattan with easy access to metro station and similar venues to my current location in Ann Arbor. 

### 1.2 Business Problem
    This project aims to find living location in New York City the resembles to the currently area I am living in. It should meet the following criterias: Apartment with min 1 bedrooms with monthly rent not exceed 3000USD/month; area should located close to a metro station with walkign distance of 1 mile; area with ammentities and nenues similar to my current location in Ann Arbor. 

### 2.1 Data Acqusition
    The data acquired for this project is a combination of data from two sources. the first data source of the project uses a Manhattan neighborhood data which is a CSV file that I download and stored in this Github repository. The second data source is foursquare API which will retrieve data about venues in different neighborhoods. Venues retrieved from all the neighborhoods are categorized broadly into "Arts & Entertainment", "College & University", "Event", "Food", "Nightlife Spot", "Outdoors & Recreation", etc. An extract of an API call is as follows.


In [36]:
import numpy as np # library to handle data in a vectorized manner
import time
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe


!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
import folium # map rendering library
from folium import plugins

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import seaborn as sns

# import k-means from clustering stage
from sklearn.cluster import KMeans



print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [37]:
# Main Street, Ann Arbor
address = 'Main Street, Ann Arbor'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Ann Arbor home are {}, {}.'.format(latitude, longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of Ann Arbor home are 42.278316, -83.446068.


In [38]:
neighborhood_latitude = 42.278316
neighborhood_longitude = -83.446068

In [39]:
# @hidden_cell
CLIENT_ID = 'PCIYT1TURHZH5PWRZQEW5OHMWG4FDECALQL52YLVKBUCSHUU' # your Foursquare ID
CLIENT_SECRET = 'ESCH1WQI5P2SR1KRGUWQARYIEJONJXK3JXB4XDOM0SBEAKGR' # your Foursquare Secret
VERSION = '20200114' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PCIYT1TURHZH5PWRZQEW5OHMWG4FDECALQL52YLVKBUCSHUU
CLIENT_SECRET:ESCH1WQI5P2SR1KRGUWQARYIEJONJXK3JXB4XDOM0SBEAKGR


In [45]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=PCIYT1TURHZH5PWRZQEW5OHMWG4FDECALQL52YLVKBUCSHUU&client_secret=ESCH1WQI5P2SR1KRGUWQARYIEJONJXK3JXB4XDOM0SBEAKGR&v=20200114&ll=42.278316,-83.446068&radius=500&limit=100'

In [46]:
# results display is hidden for report simplification 
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e1e7d121835dd001b1f42ca'},
 'response': {'headerLocation': 'Canton',
  'headerFullLocation': 'Canton',
  'headerLocationGranularity': 'city',
  'totalResults': 4,
  'suggestedBounds': {'ne': {'lat': 42.2828160045, 'lng': -83.43999732964825},
   'sw': {'lat': 42.273815995499994, 'lng': -83.45213867035174}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '514213c23d7ca5b4135a9d9a',
       'name': "Jimmy John's",
       'location': {'address': '41439 Michigan Ave',
        'crossStreet': 'at Haggerty',
        'lat': 42.27810943782562,
        'lng': -83.44768015763326,
        'labeledLatLngs': [{'label': 'display',
          'lat': 42.27810943782562,
          'lng': -83.44768015763326}],
        'distance': 134,
        'postalCode':

### Function that extracts the category of the venue - from Foursquare lab

In [47]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [48]:

venues = results['response']['groups'][0]['items']
AAnearby_venues = json_normalize(venues) # flatten JSON
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
AAnearby_venues =AAnearby_venues.loc[:, filtered_columns]
# filter the category for each row
AAnearby_venues['venue.categories'] = AAnearby_venues.apply(get_category_type, axis=1)
# clean columns
AAnearby_venues.columns = [col.split(".")[-1] for col in AAnearby_venues.columns]

AAnearby_venues.shape

(4, 4)

In [50]:
AAnearby_venues

Unnamed: 0,name,categories,lat,lng
0,Jimmy John's,Sandwich Place,42.278109,-83.44768
1,Leos Coney Island,Food & Drink Shop,42.278966,-83.449362
2,Schwan's Consumer Brands,Food,42.275358,-83.446662
3,Kraft Pizza,Pizza Place,42.275185,-83.448583
