# The Battle of the Neighborhoods

### Problem background

The City of New York, is the most populous city in the United States. It is diverse and is the financial capital of USA. It is multicultural. It provides lot of business oppourtunities and business friendly environment. It has attracted many different players into the market. It is a global hub of business and commerce. The city is a major center for banking and finance, retailing, world trade, transportation, tourism, real estate, new media, traditional media, advertising, legal services, accountancy, insurance, theater, fashion, and the arts in the United States.

This also means that the market is highly competitive. As it is highly developed city the cost of doing business is also one of the highest. Thus, any new business venture or expansion needs to be analysed carefully. The insights derived from analysis will give good understanding of the business environment which help in strategically targeting the market. This will help in reduction of risk. And the Return on Investment will be reasonable.

### Problem Description

Throughout its history, New York City has been a major point of entry for immigrants; the term “melting pot” was coined to describe densely populated immigrant neighbourhoods on the Lower East Side. As many as 800 languages are spoken in New York, making it the most linguistically diverse city in the world. English remains the most widely spoken language, although there are areas in the outer boroughs in which up to 25% of people speak English as an alternate language, and/or have limited or no English language fluency. English is least spoken in neighbourhoods such as Flushing, Sunset Park, and Corona.

With New York's diverse culture , comes diverse food items. There are many restaurants in New York City, each belonging to different categories like Chinese ,Italian, Indian , French etc. So as part of this project , we will list and visualise all major parts of New York City that has great Italian restaurants.


### Target Audience

To recommend the correct location, AH Food Company Ltd has appointed me to lead of the Data Science team. The objective is to locate and recommend to the management which neighborhood of New York city will be best choice to start a Italian Restaurant. 


## Data

For this project, we will be using the following Data:

1) New York City data that contains list Boroughs, Neighborhoods along with their latitude and longitude. Which can be found at: https://cocl.us/new_york_dataset

2) Use the FourSquare API to find and filter all the Italian restaurants to find the Boroughs and Neighborhoods that have the least Italian Restaurants

3) Use Borough Boundries data for the use of helpful maps found at https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm

4) use the Demographic Data to find the boroughs and neighborhood with the most population at https://en.wikipedia.org/wiki/Demographics_of_New_York_City

### Approach

1) Find the details of the Italian restaurants. Example: Ratings, Density of Population in a Borough/Neighborhood

2) Using the New York Data and FourSquare API, I will create a dataframe of all the Italian Restaurants

3) Visualize the Data using folium

4) Determine the best place/places to build a restaurant.

### Methodology

#### Step 1: Import necessary libraries

In [None]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import os
%pip install folium
import folium 
%pip install geopy 
from geopy.geocoders import Nominatim 
%pip install wget
import wget
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline
import seaborn as sns

print('Done!')

In [1]:
CLIENT_ID = ID
CLIENT_SECRET = SECRET
VERSION = VERSION

#### Step 2 : Making the New York DF with boroughs, neighborhoods and Coordinates

In [None]:
wget.download('https://cocl.us/new_york_dataset', 'newyork_data.json')
print('Data downloaded!')

In [None]:
import json
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [None]:
ny_data = newyork_data['features']

In [None]:
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude']
new_york_data = pd.DataFrame(columns=column_names)

for data in ny_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    new_york_data = new_york_data.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
new_york_data.head()

In [None]:
new_york_data.shape

In [None]:
print('New York City has {} boroughs and {} neighborhoods.'.format(
        len(new_york_data['Borough'].unique()),
        new_york_data.shape[0]
    )
)

Let us try to visualize this.

In [None]:
addres = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(addres)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

In [None]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(new_york_data['Latitude'], new_york_data['Longitude'], new_york_data['Borough'], new_york_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

#### Step 3 : Getting the italian restaurant details into a dataframe

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT=100
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name,  
            v['venue']['name'],
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Venue',
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
ny_venues = getNearbyVenues(names=new_york_data['Neighborhood'],latitudes=new_york_data['Latitude'],longitudes=new_york_data['Longitude'])

In [None]:
ny_ir = ny_venues[ny_venues['Venue Category']=='Italian Restaurant']
ny_ir.head().reset_index(drop=True)

In [None]:
ny_ir.shape

We can see that there are 314 italian restaurants in NYC, let us see try to visulaize this

In [None]:
#to merge ny_ir and new_york_data
ny_data1 = pd.merge(new_york_data,ny_ir,on=['Neighborhood'])

In [None]:
ny_data1.head()

In [None]:
ny_data1.dropna()
ny_data1.shape

In [None]:
ny_data1.groupby('Borough')['Venue'].count()

In [None]:
ny_data1.groupby('Borough')['Venue'].count().plot.bar(figsize=(10,5), color = '#9400D3')
plt.title('Italian Resturants per Borough: NYC', fontsize = 20)
plt.xlabel('Borough', fontsize = 15)
plt.ylabel('No.of Italian Resturants', fontsize=15)
plt.xticks(rotation = 'horizontal')
plt.show()

Let us also see the top neighborhoods in terms of number of Italian Restaurants

In [None]:
Num = 10 # top number for graphing all the same past 6
ny_data1.groupby('Neighborhood')['Venue'].count().nlargest(Num).plot.bar(figsize=(20,5), color='#9400D3')
plt.title('Italian Resturants per Neighborhood: NYC', fontsize = 20)
plt.xlabel('Neighborhood', fontsize = 15)
plt.ylabel('Italian Resturants', fontsize=15)
plt.xticks(rotation = 'horizontal')
plt.show()

##### Inference 1 :  Manhattan is the Borough with the most number of restaurants(128) and Belmont, Bronx is the Neighborhood with the most number of Restaurants(18) although Bronx itself has only 39 Italian Restaurants. Queens with the second least 43.

#### Step 4 : Get the population data for each borough so that we can select a borough

In [None]:
url = 'https://en.wikipedia.org/wiki/Demographics_of_New_York_City'
demo=requests.get(url).text
from bs4 import BeautifulSoup
soup = BeautifulSoup(demo, 'html.parser')

print(soup.prettify())

In [None]:
table = soup.find( "table", {"class":"wikitable sortable"} )
table

In [None]:
def tableDataText(table):    
    """Parses a html segment started with tag <table> followed 
    by multiple <tr> (table rows) and inner <td> (table data) tags. 
    It returns a list of rows with inner columns. 
    Accepts only one <th> (table header/data) in the first row.
    """
    def rowgetDataText(tr, coltag='td'): # td (data) or th (header)       
        return [td.get_text(strip=True) for td in tr.find_all(coltag)]  
    rows = []
    trs = table.find_all('tr')
    headerow = rowgetDataText(trs[0], 'th')
    if headerow: # if there is a header row include first
        rows.append(headerow)
        trs = trs[1:]
    for tr in trs: # for every table row
        rows.append(rowgetDataText(tr, 'td') ) # data row       
    return rows

In [None]:
list_table = tableDataText(table)
list_table

In [None]:
dftable = pd.DataFrame(list_table[3:8],columns=['Borough','County','Estimate','billions','per capita','sqmiles','sqkm','persons/sqmiles','persons/sqkm'])
dftable

In [None]:
dftable1 = dftable[['Borough','Estimate','per capita']]
dftable1

In [None]:
dftable1.rename(columns={'Estimate':'Population'},inplace=True)
dftable1

##### Inference 2 : We will select queens as our Borough to build a new italian restaurant as it has the second highest population and per capita income which means  a lot of people will be able to go out and spend their money. (from inference 1: queens has second least no of italian restaurants)

#### Step5 : Get ratings of all italian restaurants in queens so that we can select a neighborhood

In [None]:
ny_data1.head()

In [None]:
ny_queens = ny_data1[ny_data1['Borough']=='Queens']
ny_queens.head().reset_index(drop=True)

In [None]:
ny_queens.shape


In [None]:
def getNearbyVenuesRarings(latitudes,longitudes,radius=500):
    LIMIT=100
    venues_list=[]
    
            
        # create the API request URL
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            lat,
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    print(results)
        
        # return only relevant information for each nearby venue
    venues_list.append([(
            v['venue']['name'],
            v['venue']['id'],
            v['venue']['likes']['count'],
            v['venue']['rating'],
            v['venue']['tips']['count']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['ID','Name','Likes','Rating','Tips']
    
    return(nearby_venues)

In [None]:
ny_rating = getNearbyVenuesRarings(ny_queens['Latitude'],ny_queens['Longitude'])

In [None]:
ny_rating.head()