# Capstone Project - The Battle of Neighborhoods (Week 2)

## 1. A description of the problem and a discussion of the background

#### Background

It's common for people to move in search of new experiences, opportunities, and change. That whole thing about finding yourself? Relocating can help with that discovery.But the grass isn't always greener on the other side. Moving to a new city should be a well-thought out decision rather than a rash one, in which you truly consider the things you like in the new city.

#### Business Problem

Our client is a world renowned pizza connoisseur that reviews pizza places around the world. His next destination is the the United State and he has never been to the country before. He would like to live in a city where there is a high density of pizza places so that he can find out the best pizza that city can offer. In order to solve this problem, we will have to perform an analysis of the pizza store locations in major US cities and find out which city would be the most ideal for my client to perform his review of the best pizza place in that particular city.

## 2. Data: A description of the data and how it will be used to solve the problem

In order to address the aforementioned problem, we will use the FourSquare API for data collection of the locations of pizza stores in five US major cities listed below :

<ol>
  <li>New York,NY</li>
  <li>San Francisco, CA</li>
  <li>Jersey City, NJ</li>
  <li>Boston, MA</li>
  <li>Chicago,IL</li>
</ol>

These cities are some of the most populated cities in the United States. Thus, they have a high potential of having a large density of pizza places so that my client can perform his review adequately.

I will be using the Four Square API to determine which of the five aforementioned cities has the highest volume of pizza places. By doing so, I can narrow down which city will be having the most pizza places suitable for our client to live in.

## 3. Methodology section

#### Collecting Data

In [1]:
!conda install -c conda-forge folium=0.5.0

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [2]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis and data manipulation
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [3]:
CLIENT_ID = 'CLUNPRG0MTBOITNDUVNWU5TLCKHEPVYC52W3HVWUAFJIMXJP' # your Foursquare ID
CLIENT_SECRET = 'X4YDBEWY3Y5JSIVIUCK01LHITEKWCBNAA5ZQO3FWZ0CO3HXR' # your Foursquare Secret
VERSION = '20200301' # Foursquare API version

print('Foursquare credentails imported')

Foursquare credentails imported


In [4]:
# Call data from Foursquare API
LIMIT = 500
cities = ["New York, NY", 'Chicago, IL', 'San Francisco, CA', 'Jersey City, NJ', 'Boston, MA']
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        city,
        LIMIT,
        "4bf58dd8d48988d1ca941735")
    results[city] = requests.get(url).json()

In [5]:
df_venues={}
for city in cities:
    venues = json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']

The FourSquare API limits us to the 500 nearest pizza venues in the city.
#### Modeling

In [6]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])  
    print(f"Total number of pizza places in {city} = ", results[city]['response']['totalResults'])
    print("Showing Top 100")

Total number of pizza places in New York, NY =  253
Showing Top 100
Total number of pizza places in Chicago, IL =  216
Showing Top 100
Total number of pizza places in San Francisco, CA =  169
Showing Top 100
Total number of pizza places in Jersey City, NJ =  128
Showing Top 100
Total number of pizza places in Boston, MA =  187
Showing Top 100


In [7]:
maps[cities[0]]

In [8]:
maps[cities[1]]

In [9]:
maps[cities[2]]

In [10]:
maps[cities[3]]

In [11]:
maps[cities[4]]

Through examination of the FourSquare data and map illustration of pizza place density within the 5 cities. We can conclude that New York City and Jersey City have the highest density of pizza places within the five major cities I examined. 

In order to quantitatively measure this density, we will be using statistical methods. First, we will get the mean location of pizza places which should be close to most locations and if they are either really dense or far if not. Lastly, we will take the average of the distance of the pizza venues to the mean coordinates.

In [12]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)
    venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 
    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])
        folium.PolyLine([venues_mean_coor, [lat, lng]], color="green", weight=1.5, opacity=0.5).add_to(maps[city])
    
    label = folium.Popup("Mean Co-ordinate", parse_html=True)
    folium.CircleMarker(
        venues_mean_coor,
        radius=10,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(maps[city])

    print(city)
    print("Mean Distance from Mean coordinates")
    print(np.mean(np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)))

New York, NY
Mean Distance from Mean coordinates
0.021863756439890622
Chicago, IL
Mean Distance from Mean coordinates
0.05899275452529534
San Francisco, CA
Mean Distance from Mean coordinates
0.029482815390114707
Jersey City, NJ
Mean Distance from Mean coordinates
0.019382806123706672
Boston, MA
Mean Distance from Mean coordinates
0.03585525093215624


In [13]:
maps[cities[0]]

In [14]:
maps[cities[1]]

In [15]:
maps[cities[2]]

In [16]:
maps[cities[3]]

In [17]:
maps[cities[4]]

### Conclusion

Through examination of the mean distance of pizza places from the mean coordinates. Regardless, of the number of pizza places of a given city, we can conclude that New York City has the highest density of pizza places with a small radius. Chicago and Boston on the other hand have their pizza places spread apart more and this is illustrated through their mean distance from the mean coordinates. Thus, our pizza connoisseur client would make the most of his time reviewing pizza places in New York City. 

Due to the high density of pizza places in New York City than the other 4 cities, the client will be able to do more reviews of the pizza places in the short amount of time he has on his trip. Moreover, the clients best option would be to book a hotel near the mean coordinates of New York and this would give him access to the 100+ pizza stores that the city has to offer for reviewing.
