# Introduction/Business Problem:
John is the COO Regal Cinemas and is tasked with analyzing locations for the next movie theater to be built. John must report his findings to the Board of Directors, therefore the Board is the audience and stakeholder of this data science report. John has narrowed down his options to Manhattan and Boston. John must analyze  market saturation in these locations using Foursquare API and its data in order to determine if a new bakery will be a winning or losing venture. In order to perform the analysis of whether to build a new bakery in Manhattan vs Boston, I will create dataframes of these neighborhoods that include Foursquare API data included showing the existing number of bakeries in various distances from neighborhoods.

## 1. Import and analyze data for Manhattan and Toronto and analyze using Foursquare

In [1]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.0.2p             |       h470a237_1         3.1 MB  conda-forge
    certifi-2018.10.15         |        py36_1000         138 KB  conda-forge
    geopy-1.17.0               |             py_0          49 KB  conda-forge
    ca-certificates-2018.10.15 |       ha4d7672_0         135 KB  conda-forge
    conda-4.5.11               |        py36_1000         651 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.1 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0            conda-forge
    geopy:           

### Fourquare API setup

In [184]:
CLIENT_ID = '01KIWQCAD1RX3XL3CMLS34D0KV00X4MGJHW1YKKQZYZVJSIV' # your Foursquare ID
CLIENT_SECRET = 'X0YCUYOZLATPQMFUP3AEQNEO0HBIPD4FTDBZYNKXGCVQ0Z4G' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 01KIWQCAD1RX3XL3CMLS34D0KV00X4MGJHW1YKKQZYZVJSIV
CLIENT_SECRET:X0YCUYOZLATPQMFUP3AEQNEO0HBIPD4FTDBZYNKXGCVQ0Z4G


### NYC lat and long

In [3]:
address = '102 North End Ave, New York, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)



40.7149555 -74.0153365


In [187]:
cat_url = 'https://api.foursquare.com/v2/venues/explore?&client_id=01KIWQCAD1RX3XL3CMLS34D0KV00X4MGJHW1YKKQZYZVJSIV&client_secret=X0YCUYOZLATPQMFUP3AEQNEO0HBIPD4FTDBZYNKXGCVQ0Z4G&categoryId=4bf58dd8d48988d17f941735&ll=40.7149555,-74.0153365&radius=500&limit=50&v=20181010'
categor = requests.get(cat_url).json()
#categor

All the information is in the items key. Before we proceed, let's borrow the get_category_type function from the Foursquare lab.

In [188]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.

In [189]:
venues = categor['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.shape


(5, 4)

### Verify that Pandas DF contains Movie Theaters

In [183]:
nearby_venues

Unnamed: 0,name,categories,lat,lng
0,Regal Cinemas Battery Park 11,Movie Theater,40.714683,-74.015023
1,Express DVD,Movie Theater,40.717675,-74.015639
2,Tribeca Performing Arts Center,Performing Arts Venue,40.717594,-74.012198
3,Brookfield Place - BFPL,Shopping Mall,40.713204,-74.015619
4,Online Theatre,Indie Movie Theater,40.712696,-74.012262


### How many Movie Theaters exist within 500m of 102nd street in NYC

In [190]:
print('{} theaters returned by Foursquare.'.format(nearby_venues.shape[0]))

5 theaters returned by Foursquare.


### Boston lat and long

In [174]:
latitudeT = 42.3601
longitudeT = 71.0589
print(latitudeT, longitudeT)

42.3601 71.0589


Create restful API URI

In [175]:
urlT = 'https://api.foursquare.com/v2/venues/explore?&client_id=01KIWQCAD1RX3XL3CMLS34D0KV00X4MGJHW1YKKQZYZVJSIV&client_secret=X0YCUYOZLATPQMFUP3AEQNEO0HBIPD4FTDBZYNKXGCVQ0Z4G&categoryId=4bf58dd8d48988d17f941735&ll=42.3601,-71.0589&radius=500&v=20181010'
categor2 = requests.get(urlT).json()
#categor2

In [176]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_listT = row['categories']
    except:
        categories_listT = row['venue.categories']
        
    if len(categories_listT) == 0:
        return None
    else:
        return categories_listT[0]['name']

In [177]:
venuesT = categor2['response']['groups'][0]['items']
    
nearby_venuesT = json_normalize(venuesT) # flatten JSON

# filter columns
filtered_columnsT = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venuesT =nearby_venuesT.loc[:, filtered_columnsT]

# filter the category for each row
nearby_venuesT['venue.categories'] = nearby_venuesT.apply(get_category_type, axis=1)

# clean columns
nearby_venuesT.columns = [col.split(".")[-1] for col in nearby_venuesT.columns]

nearby_venuesT.shape

(3, 4)

### Verify that Pandas DF contains Movie Theaters

In [178]:
nearby_venuesT

Unnamed: 0,name,categories,lat,lng
0,French Library,Movie Theater,42.358544,-71.059377
1,Spirt Stern Hall,Movie Theater,42.357577,-71.056338
2,Video Cinema,Movie Theater,42.3634,-71.055801


### How many Movie Theaters exist within 500m of 102nd street in NYC

In [191]:
print('{} theaters returned by Foursquare.'.format(nearby_venuesT.shape[0]))

3 theaters returned by Foursquare.
