# Boston vs New York City
Investigation conducted by Jacob Rossner

## Introduction
The classic Boston versus New York City rivalry thriving today had its roots planted in the early 1600s. The vast difference in religious cultures of European Settlers bred hostility: the Puritan culture of those who settled Massachusetts clashed with the religious freedom promoted in present-day New York. Sporting contests have breathed life into the rivalry; including the New York Yankees meeting the Boston Red Sox in the MLB's World Series, the New England Patriots' stand-offs against the New York Giants in the NFL's Superbowl, and the illustrious history of matchups between the NBA's Boston Celtics and New York Knicks. 

There is a culture of disdain for the other city entwined in the blood of each city, so surely they must be nothing like each other? These cities are less than 250 miles apart, so what really separates them? Besides being loyal to their city, how can a person choose their side if they come from outside the competition? Let's use Foursquare's location data to see if the features of neighborhoods in Boston and New York City are different, as well as crime data to see if the cities experience different types of crime.

## Data
I will be using Foursquare location data in conjunction with a New York City crime data set and a Boston crime data set.

#### Foursquare Data
Foursquare has built up a massive amount of location data by utilizing crowd-sourced data. People use apps that are powered by Foursquare to "check-in", which effectively gives Foursquare a geospatial point and information on the feature at that point in the form of ratings, reviews, tips, etc. Foursquare claims they register 9 billion places visited a month, and that they boast a database of over 105 million places.

How can we use this data? Well perhaps we'd like to compare the availability of pizza places in both cities, we could use Foursquare data to check out the distribution of businesses in each city with "Pizza" in their name and plot them on a map. Check out process below for creating the maps of all the pizza places within 1 mile of the epicenter of each city.

First we import the necessary tools.

In [6]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    branca-0.4.1               |             py_0          26 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    ------------------------------------------------------------
                       

In [7]:
CLIENT_ID = 'ZFPLKJUYQ4QRIQ54EXFUZIYTDJJMLA341EIHRQW3BTVWRKKU'
CLIENT_SECRET = 'GY5KRZEPMXPQAUSYE0KR50TF4Y1MGEECFDSWG5JA4LWC55IW'
VERSION = '20180604'
LIMIT = 30

We define the coordinates of the epicenter of each city.

In [8]:
bos_lat = 42.361145
bos_lon = -71.057083
nyc_lat = 40.730610
nyc_lon = -73.935242

search_query = "Pizza"
radius = 1609.34 # 1609.34 meters in a mile

And create our url for making calls to the Foursquare servers.

In [9]:
bos_url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, bos_lat, bos_lon, VERSION, search_query, radius, LIMIT)
nyc_url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, nyc_lat, nyc_lon, VERSION, search_query, radius, LIMIT)

Here we make our request to the Foursquare servers for the json file of the features near our coordinates.

In [10]:
bos_results = requests.get(bos_url).json()
nyc_results = requests.get(nyc_url).json()
bos_results, nyc_results

({'meta': {'code': 200, 'requestId': '5f0e6c6186c2e752a01d3e2d'},
  'response': {'venues': [{'id': '4c2a2790ce3fc928e9da6f88',
     'name': "Sal's Pizza",
     'location': {'address': '150 Tremont St',
      'crossStreet': 'at West St',
      'lat': 42.35489042678564,
      'lng': -71.06351671773069,
      'labeledLatLngs': [{'label': 'display',
        'lat': 42.35489042678564,
        'lng': -71.06351671773069}],
      'distance': 874,
      'postalCode': '02111',
      'cc': 'US',
      'city': 'Boston',
      'state': 'MA',
      'country': 'United States',
      'formattedAddress': ['150 Tremont St (at West St)',
       'Boston, MA 02111',
       'United States']},
     'categories': [{'id': '4bf58dd8d48988d1ca941735',
       'name': 'Pizza Place',
       'pluralName': 'Pizza Places',
       'shortName': 'Pizza',
       'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/pizza_',
        'suffix': '.png'},
       'primary': True}],
     'referralId': 'v-1594781226',
  

In [11]:
# assign relevant part of JSON to venues
bos_venues = bos_results['response']['venues']
nyc_venues = nyc_results['response']['venues']

# tranform venues into a dataframe
bos_df = json_normalize(bos_venues)
nyc_df = json_normalize(nyc_venues)

And we can check out the dataframes populated by pizza venues...

In [12]:
# keep only columns that include venue name, and anything that is associated with location
bos_filtered_columns = ['name', 'categories'] + [col for col in bos_df.columns if col.startswith('location.')] + ['id']
bos_dataframe_filtered = bos_df.loc[:, bos_filtered_columns]
nyc_filtered_columns = ['name', 'categories'] + [col for col in nyc_df.columns if col.startswith('location.')] + ['id']
nyc_dataframe_filtered = nyc_df.loc[:, nyc_filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
bos_dataframe_filtered['categories'] = bos_dataframe_filtered.apply(get_category_type, axis=1)
nyc_dataframe_filtered['categories'] = nyc_dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
bos_dataframe_filtered.columns = [column.split('.')[-1] for column in bos_dataframe_filtered.columns]
nyc_dataframe_filtered.columns = [column.split('.')[-1] for column in nyc_dataframe_filtered.columns]

bos_dataframe_filtered.head(), nyc_dataframe_filtered.head()

(                         name   categories           address  cc    city  \
 0                 Sal's Pizza  Pizza Place    150 Tremont St  US  Boston   
 1        Boston Kitchen Pizza  Pizza Place       1 Stuart St  US  Boston   
 2                 Blaze Pizza  Pizza Place    123 Stuart St,  US  Boston   
 3       Caesar's Pizza & Subs  Pizza Place       34 Essex St  US  Boston   
 4  Oath Pizza - South Station  Pizza Place  700 Atlantic Ave  US  Boston   
 
          country    crossStreet  distance  \
 0  United States     at West St       874   
 1  United States  Washington St      1240   
 2  United States            NaN      1325   
 3  United States            NaN      1013   
 4  United States   at Summer St      1085   
 
                                     formattedAddress  \
 0  [150 Tremont St (at West St), Boston, MA 02111...   
 1  [1 Stuart St (Washington St), Boston, MA 02116...   
 2  [123 Stuart St,, Boston, MA 02116, United States]   
 3     [34 Essex St, Boston, M

Now lets plot the pizza places we've found on Folium maps using these data frames.

##### Pizza places within 1 mile of the epicenter of Boston

In [13]:
bos_venues_map = folium.Map(location=[bos_lat, bos_lon], zoom_start=14)

for lat, lng, label in zip(bos_dataframe_filtered.lat, bos_dataframe_filtered.lng, bos_dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(bos_venues_map)

bos_venues_map

##### Pizza places within 1 mile of the epicenter of New York City

In [15]:
nyc_venues_map = folium.Map(location=[nyc_lat, nyc_lon], zoom_start=14)

for lat, lng, label in zip(nyc_dataframe_filtered.lat, nyc_dataframe_filtered.lng, nyc_dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='green',
        popup=label,
        fill = True,
        fill_color='green',
        fill_opacity=0.6
    ).add_to(nyc_venues_map)

nyc_venues_map

#### New York City Crime Data
The New York City crime data set includes felony, misdemeanor, and violations of city laws reported to the New York City Police Department. The data comes from NYC Open Data, a program supported by the partnership of New York's civic intelligence center (know as The Mayor's Office of Data Analystics or MODA) and The New York City Department of Information Technology and Telecommunications (DoITT).  The original data set, which can be found at this link:
https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Current-Year-To-Date-/5uac-w243,
contains over 108,000 observations and 35 variables. Each observation is a complaint reported to the New York City Police Department (NYPD). 

It is important to note that this data set contains only officially reported complaints to the NYPD, and does not contain any information on if the reported crime was verified as legitimate, nor if any further actions were pursued by the person(s) who reported the crime. There may also be multiple complaints stemming from the same incident; if in a single act a person assaults a homeowner and burglarizes their home, this would likely have been recorded as separate complaints: at least one for the act of assault, and one for the act of burglary.

I am using this data set as a representation of the prevalence of crime in New York City neighborhoods. The original data set contains complaints from 1919 through 2020, but I have reduced the data set to complaints reported in 2018. This was done in the interest of making comparisons to Boston crime and adhereing to maximum file sizes accomodated by Github. Our pared-down dataset contains 229 complaints.

In [20]:
nyc = pd.read_csv("https://raw.githubusercontent.com/jrossner/Coursera_Capstone/master/NYPD_crime2018.csv", error_bad_lines=False)
nyc.head()

Unnamed: 0,CMPLNT_NUM,ADDR_PCT_CD,BORO_NM,CMPLNT_FR_DT,CMPLNT_FR_TM,CMPLNT_TO_DT,CMPLNT_TO_TM,CRM_ATPT_CPTD_CD,HADEVELOPT,HOUSING_PSA,...,SUSP_SEX,TRANSIT_DISTRICT,VIC_AGE_GROUP,VIC_RACE,VIC_SEX,X_COORD_CD,Y_COORD_CD,Latitude,Longitude,Lat_Lon
0,981432964,105,QUEENS,05/24/2018,12:00:00,,,COMPLETED,,,...,U,,45-64,WHITE,F,1056840,205482,40.730388,-73.738089,"(40.73038766700005, -73.73808927199997)"
1,358551291,84,BROOKLYN,11/01/2018,08:00:00,11/30/2019,16:00:00,COMPLETED,,,...,,,25-44,WHITE,F,984377,191776,40.693066,-73.999543,"(40.69306580400007, -73.99954347299997)"
2,151755406,88,BROOKLYN,12/28/2018,17:00:00,03/31/2020,17:12:00,COMPLETED,,,...,M,,25-44,WHITE,F,992826,192283,40.694453,-73.969075,"(40.69445324900005, -73.96907507999998)"
3,625737137,41,BRONX,10/01/2018,07:00:00,11/08/2019,15:00:00,COMPLETED,,,...,,,18-24,BLACK,F,1013232,236725,40.816392,-73.895296,"(40.816391847000034, -73.89529641399997)"
4,899799765,23,MANHATTAN,04/01/2018,00:00:00,04/01/2018,00:00:00,COMPLETED,,672.0,...,U,,25-44,BLACK,F,999243,227082,40.789959,-73.945857,"(40.78995927500005, -73.94585684099997)"


In [21]:
min(nyc['CMPLNT_FR_DT']),max(nyc['CMPLNT_FR_DT']),nyc.shape

('01/01/2018', '12/29/2018', (229, 35))

#### Boston Crime Data
The Boston crime data set contains crime incident reports . The data comes from Analyze Boston, the City of Boston's open data hub, which is managed by the City of Boston's central data organization: The Citywide Analytics Team.  The original data set, which can be found at this link:
https://data.boston.gov/dataset/crime-incident-reports-august-2015-to-date-source-new-system/resource/12cb3883-56f5-47de-afa5-3b1cf61b257b,
contains over 496,986 observations and 17 variables. Each observation is an individual incident Boston Police Department officers responded to. It is important to note that this data set contains only incidents which were officially responded to by the Boston Police Department. Analyze Boston also clarifies that the data they provide is collected through the new crime incident report system.

I am using this data set as a representation of the prevalence of crime in Boston neighborhoods. The original data set contains complaints from 1919 through 2020, but I have reduced the data set to complaints reported in 2018. This was done in the interest of making comparisons to New York City crime and adhereing to maximum file sizes accomodated by Github. Our pared-down dataset contains 98,888 complaints.

In [23]:
bos=pd.read_csv("https://raw.githubusercontent.com/jrossner/Coursera_Capstone/master/BOS_crime2018.csv", error_bad_lines=False)
bos.head()

Unnamed: 0,INCIDENT_NUMBER,OFFENSE_CODE,OFFENSE_CODE_GROUP,OFFENSE_DESCRIPTION,DISTRICT,REPORTING_AREA,SHOOTING,OCCURRED_ON_DATE,YEAR,MONTH,DAY_OF_WEEK,HOUR,UCR_PART,STREET,Lat,Long,Location
0,I192077559,3115,Investigate Person,INVESTIGATE PERSON,B3,468,,2018-04-30 09:00:00,2018,4,Monday,9,Part Three,HAZLETON ST,42.279971,-71.095534,"(42.27997063, -71.09553354)"
1,I192077332,619,Larceny,LARCENY ALL OTHERS,E18,496,,2018-03-06 08:00:00,2018,3,Tuesday,8,Part One,HYDE PARK AVE,42.269224,-71.120853,"(42.26922388, -71.12085347)"
2,I192076660,2629,Harassment,HARASSMENT,E5,662,,2018-10-31 12:00:00,2018,10,Wednesday,12,Part Two,PRIMROSE ST,42.290765,-71.130211,"(42.29076521, -71.13021098)"
3,I192075386,2629,Harassment,HARASSMENT,A1,96,,2018-04-09 08:43:00,2018,4,Monday,8,Part Two,ATLANTIC AVE,42.355264,-71.050988,"(42.35526402, -71.05098788)"
4,I192075335,3208,Property Lost,PROPERTY - MISSING,D4,132,,2018-01-01 00:00:00,2018,1,Monday,0,Part Three,COMMONWEALTH AVE,42.353522,-71.072838,"(42.35352153, -71.07283786)"


In [24]:
min(bos["YEAR"]),max(bos["YEAR"]),bos.shape

(2018, 2018, (98888, 17))

The difference in our New York City and Boston crime data sets is immense, and may be due to recording procedures. According to the United States Census Bureau, the population of New York City in 2018 was almost 8.4 million people, while Boston's 2018 population was less than a tenth of that: just under 695 thousand people. There remains questions as to if Boston, a city with a population size of 10% of New York City's, really had 98,000 more criminal acts in 2018 than New York City, but that will not be a focal point of my investigation. I will be comparing what crimes are more common in New York City Neighborhoods compared to Boston Neighborhoods.