PROBLEM STATEMENT:

The objective is to locate and recommend to the client which neighbourhood in New York City will be best choice to start a restaurant.

New York city population and demographic data Data source : https://en.wikipedia.org/wiki/New_York_City ; https://en.wikipedia.org/wiki/Demographics_of_New_York_City. Web scraping techniques was used to get NYC population density and demographics data from Wikipedia.

In [1]:
import pandas as pd
import numpy as np
import requests # library to handle requests
import json # library to handle JSON files

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!pip install beautifulsoup4
from bs4 import BeautifulSoup

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import urllib.request
from urllib.request import urlopen

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib as mp
import re
import csv
import seaborn as sns
import time
from datetime import datetime
import matplotlib.dates as mdates
import matplotlib.ticker as ticker
%matplotlib inline


# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library


!pip install lxml
!pip install et_xmlfile

print('Libraries imported.')


Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    openssl-1.1.1g             |       h516909a_1         2.1 MB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0          conda-forge
    geopy:           

In [2]:
source = requests.get("https://en.wikipedia.org/wiki/New_York_City").text
soup = BeautifulSoup(source, "html.parser")
My_Neighborhoods_NYC_Table = soup.find('table', class_='wikitable sortable')

In [3]:
rows = My_Neighborhoods_NYC_Table.select("tbody > tr")[3:8]

My_boroughs = []
for row in rows:
    My_borough = {}
    tds = row.select("td")
    My_borough["borough"] = tds[0].text.strip()
    My_borough["county"] = tds[1].text.strip()
    My_borough["population"] = float(tds[2].text.strip().replace(",",""))
    My_borough["gdp_billions"] = float(tds[3].text.strip().replace(",",""))
    My_borough["gdp_per_capita"] = float(tds[4].text.strip().replace(",",""))
    My_borough["land_sqm"] = float(tds[5].text.strip().replace(",",""))
    My_borough["land_sqkm"] = float(tds[6].text.strip().replace(",",""))
    My_borough["persons_sqm"] = float(tds[7].text.strip().replace(",",""))
    My_borough["persons_sqkm"] = float(tds[8].text.strip().replace(",",""))
    
    My_boroughs.append(My_borough)

print(My_boroughs)

[{'borough': 'The Bronx', 'county': 'Bronx', 'population': 1418207.0, 'gdp_billions': 42.695, 'gdp_per_capita': 30100.0, 'land_sqm': 42.1, 'land_sqkm': 109.04, 'persons_sqm': 33867.0, 'persons_sqkm': 13006.0}, {'borough': 'Brooklyn', 'county': 'Kings', 'population': 2559903.0, 'gdp_billions': 91.559, 'gdp_per_capita': 35800.0, 'land_sqm': 70.82, 'land_sqkm': 183.42, 'persons_sqm': 36147.0, 'persons_sqkm': 13957.0}, {'borough': 'Manhattan', 'county': 'New York', 'population': 1628706.0, 'gdp_billions': 600.244, 'gdp_per_capita': 368500.0, 'land_sqm': 22.83, 'land_sqkm': 59.13, 'persons_sqm': 71341.0, 'persons_sqkm': 27544.0}, {'borough': 'Queens', 'county': 'Queens', 'population': 2253858.0, 'gdp_billions': 93.31, 'gdp_per_capita': 41400.0, 'land_sqm': 108.53, 'land_sqkm': 281.09, 'persons_sqm': 20767.0, 'persons_sqkm': 8018.0}, {'borough': 'Staten Island', 'county': 'Richmond', 'population': 476143.0, 'gdp_billions': 14.514, 'gdp_per_capita': 30500.0, 'land_sqm': 58.37, 'land_sqkm': 15

In [4]:
df = pd.DataFrame(My_boroughs, columns=["borough","county", "population", "gdp_per_capita", "persons_sqkm"]) 
df.head()

Unnamed: 0,borough,county,population,gdp_per_capita,persons_sqkm
0,The Bronx,Bronx,1418207.0,30100.0,13006.0
1,Brooklyn,Kings,2559903.0,35800.0,13957.0
2,Manhattan,New York,1628706.0,368500.0,27544.0
3,Queens,Queens,2253858.0,41400.0,8018.0
4,Staten Island,Richmond,476143.0,30500.0,3150.0


In [5]:
source = requests.get('https://en.wikipedia.org/wiki/Demographics_of_New_York_City').text
soup = BeautifulSoup(source, "html.parser")
My_Population_Census_Table = soup.select_one('.wikitable:nth-of-type(5)') #use css selector to target correct table.

My_jurisdictions = []
rows = My_Population_Census_Table.select("tbody > tr")[2:8]
for row in rows:
    My_jurisdiction = {}
    tds = row.select('td')
    My_jurisdiction["jurisdiction"] = tds[0].text.strip()
    My_jurisdiction["population_census"] = tds[1].text.strip()
    My_jurisdiction["%_white"] = float(tds[2].text.strip().replace(",",""))
    My_jurisdiction["%_black_or_african_amercian"] = float(tds[3].text.strip().replace(",",""))
    My_jurisdiction["%_Asian"] = float(tds[4].text.strip().replace(",",""))
    My_jurisdiction["%_other"] = float(tds[5].text.strip().replace(",",""))
    My_jurisdiction["%_mixed_race"] = float(tds[6].text.strip().replace(",",""))
    My_jurisdiction["%_hispanic_latino_of_other_race"] = float(tds[7].text.strip().replace(",",""))
    My_jurisdiction["%_catholic"] = float(tds[10].text.strip().replace(",",""))
    My_jurisdiction["%_jewish"] = float(tds[12].text.strip().replace(",",""))
    
    My_jurisdictions.append(My_jurisdiction)

print(My_jurisdictions)

[{'jurisdiction': 'Brooklyn', 'population_census': '2,465,326', '%_white': 41.2, '%_black_or_african_amercian': 36.4, '%_Asian': 7.5, '%_other': 10.6, '%_mixed_race': 4.3, '%_hispanic_latino_of_other_race': 19.8, '%_catholic': 4.0, '%_jewish': 8.0}, {'jurisdiction': 'Queens', 'population_census': '2,229,379', '%_white': 44.1, '%_black_or_african_amercian': 20.0, '%_Asian': 17.6, '%_other': 12.3, '%_mixed_race': 6.1, '%_hispanic_latino_of_other_race': 25.0, '%_catholic': 37.0, '%_jewish': 5.0}, {'jurisdiction': 'Manhattan', 'population_census': '1,537,195', '%_white': 54.4, '%_black_or_african_amercian': 17.4, '%_Asian': 9.4, '%_other': 14.7, '%_mixed_race': 4.1, '%_hispanic_latino_of_other_race': 27.2, '%_catholic': 11.0, '%_jewish': 9.0}, {'jurisdiction': 'Bronx', 'population_census': '1,332,650', '%_white': 29.9, '%_black_or_african_amercian': 35.6, '%_Asian': 3.0, '%_other': 25.7, '%_mixed_race': 5.8, '%_hispanic_latino_of_other_race': 48.4, '%_catholic': 14.0, '%_jewish': 5.0}, {'j

In [6]:
df = pd.DataFrame(My_jurisdictions, columns=["jurisdiction","%_white", "%_black_or_african_amercian", "%_Asian", "%_other", "%_mixed_race", "%_hispanic_latino_of_other_race"])
df.head()

Unnamed: 0,jurisdiction,%_white,%_black_or_african_amercian,%_Asian,%_other,%_mixed_race,%_hispanic_latino_of_other_race
0,Brooklyn,41.2,36.4,7.5,10.6,4.3,19.8
1,Queens,44.1,20.0,17.6,12.3,6.1,25.0
2,Manhattan,54.4,17.4,9.4,14.7,4.1,27.2
3,Bronx,29.9,35.6,3.0,25.7,5.8,48.4
4,Staten Island,77.6,9.7,5.7,4.3,2.7,12.1


Preliminary finding indicates that

        1.Queens is the second most populous urban area in New York City (NYC), behind Brooklyn. However, with that being 
          said, it is the most ethnically diverse urban area in NYC with the highest Asian ethnic minority population.
          
        2.Despite the fact that Manhattan is the third most populous urban area in New York City (NYC), it has a population
          density of 27,826 people per square km, making it highest of any borough in the United States. It has the 
          second highest Asian ethnic minority population in NYC.
          