## Table of contents

* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

### Introduction : Business Problem <a name="introduction"></a>

During these unprecedented times with Pandemic. Medical supplies are utmost required in most populated places in New York City. In this project we are trying to find the optimal location for opening a Pharmacy in the borough of Brooklyn.

Since the borough has a lot of pharmacies already we need to find a location with less density of pharmacies in a neighborhood. 

We will use data science techniques to generate the most promising neighborhoods. This will give the stakeholders boroughs with less number of pharmacies so that they can shortlist and review other characterestics

### Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission will be:

* Number of pharmacies in the neighborhood.

Following data sources will be needed to extract/generate the required information:

* Extract NYC Neighborhoods from Wiki : https://en.wikipedia.org/wiki/Neighborhoods_in_New_York_City
* centers of candidate areas will be generated algorithmically.
* Number of Pharmacies and their type and location in every neighborhood will be obtained using **Foursquare API**

In [2]:
# Use Beautiful Soup Object to do Web Scraping of the URL
!pip install beautifulsoup4
from bs4 import BeautifulSoup
import pandas as pd
from urllib.request import urlopen
from geopy.geocoders import Nominatim # Convert Address to Latitude and Longitude Values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

url = "https://en.wikipedia.org/wiki/Neighborhoods_in_New_York_City"
page = urlopen(url)
html = page.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")




#### Now do Web Scraping and extract NYC neighborhoods from Wikipedia Communityboards

In [4]:
table = soup.find('table', attrs={'class':'wikitable sortable'})
table_body = table.find('tbody')
rows = table_body.find_all('tr')
#for row in rows:
#    cols = row.find_all('td')
#    cols = [ele.text.strip() for ele in cols]
#    data.append([ele for ele in cols if ele]) # Get rid of empty values
#data

# Define Column Names for Dataframe
column_names = ['CommunityBoard','Area', 'Census', 'Neighborhood']
#Instantiate the dataframe
neighbourhoods = pd.DataFrame(columns=column_names)
for row in rows:
    tds = row.find_all('td')
    cb = ''
    area =''
    cns=''
    nbd=''
    for n, td in enumerate(tds):
        #print(f"index {n}: {td.text.strip()}")
        if n == 0: cb = td.text.replace('\n', '')  
        if n == 1: area = td.text.replace('\n', '')  
        if n == 2: cns = td.text.replace('\n', '')  
        if n == 4: nbd = td.text.replace('\n', '')  
    neighbourhoods = neighbourhoods.append({ 'CommunityBoard' : cb,
                                              'Area' : area,
                                              'Census' : cns, 
                                              'Neighborhood' : nbd  }, ignore_index=True)
indexNames = neighbourhoods[ neighbourhoods['CommunityBoard'] == 'New York City' ].index
neighbourhoods.drop(indexNames , inplace=True)
neighbourhoods.head()

Unnamed: 0,CommunityBoard,Area,Census,Neighborhood
0,,,,
1,Bronx CB 1,7.17,91497.0,"Melrose, Mott Haven, Port Morris"
2,Bronx CB 2,5.54,52246.0,"Hunts Point, Longwood"
3,Bronx CB 3,4.07,79762.0,"Claremont, Concourse Village, Crotona Park, Mo..."
4,Bronx CB 4,5.28,146441.0,"Concourse, Highbridge"


#### Now transpose neighborhoods to make it individual rows

In [5]:
# One Way to Explode and reset the index to avoid duplicate index names

neighbourhoods_exp=neighbourhoods.assign(Neighborhood=neighbourhoods['Neighborhood']
                                            .str.split(',')).explode('Neighborhood').reset_index(drop=True)
neighbourhoods_exp

Unnamed: 0,CommunityBoard,Area,Census,Neighborhood
0,,,,
1,Bronx CB 1,7.17,91497,Melrose
2,Bronx CB 1,7.17,91497,Mott Haven
3,Bronx CB 1,7.17,91497,Port Morris
4,Bronx CB 2,5.54,52246,Hunts Point
...,...,...,...,...
326,Staten Island CB 3,58.97,152908,Prince's Bay
327,Staten Island CB 3,58.97,152908,Richmond Valley
328,Staten Island CB 3,58.97,152908,Rossville
329,Staten Island CB 3,58.97,152908,Tottenville


#### Extract Borough Names from Community Board

In [7]:
neighbourhoods_exp['Borough'] = neighbourhoods_exp['CommunityBoard'].str.split().str[0]
neighbourhoods_exp

Unnamed: 0,CommunityBoard,Area,Census,Neighborhood,Borough
0,,,,,
1,Bronx CB 1,7.17,91497,Melrose,Bronx
2,Bronx CB 1,7.17,91497,Mott Haven,Bronx
3,Bronx CB 1,7.17,91497,Port Morris,Bronx
4,Bronx CB 2,5.54,52246,Hunts Point,Bronx
...,...,...,...,...,...
326,Staten Island CB 3,58.97,152908,Prince's Bay,Staten
327,Staten Island CB 3,58.97,152908,Richmond Valley,Staten
328,Staten Island CB 3,58.97,152908,Rossville,Staten
329,Staten Island CB 3,58.97,152908,Tottenville,Staten


#### Enrich the Data Set with Latitude And Longitude

In [9]:
col01 =column_names = ['Borough','CommunityBoard','Area', 'Census', 'Neighborhood','Latitude', 'Longitude']
#Instantiate the dataframe
geo_bor = pd.DataFrame(columns=col01)
t_Borough = ''
for index, row in neighbourhoods_exp.iterrows():
    t_CommunityBoard = row['CommunityBoard']
    t_Area = row['Area']
    t_Census = row['Census']
    t_Neighborhood = row['Neighborhood']
    if t_CommunityBoard.startswith("Bronx"): t_Borough = 'Bronx'
    if t_CommunityBoard.startswith("Queens"): t_Borough = 'Queens'
    if t_CommunityBoard.startswith("Brooklyn"): t_Borough = 'Brooklyn'
    if t_CommunityBoard.startswith("Manhattan"): t_Borough = 'Manhattan'
    if t_CommunityBoard.startswith("Staten Island"): t_Borough = 'Staten Island'
    address = '{}, {}, NY'.format(t_Neighborhood,t_Borough)
    geolocator = Nominatim(user_agent="ny_explorer") 
    g = geolocator.geocode(address)
    if g != None : 
        latitude = g.latitude 
        longitude = g.longitude 
        #print(latitude,longitude) 
    else:
        latitude = '' 
        longitude = ''
    geo_bor = geo_bor.append({ 'Borough' : t_Borough,
                           'CommunityBoard' : row['CommunityBoard'],
                           'Area' : row['Area'],
                           'Neighborhood' : row['Neighborhood'],
                           'Census' : row['Census'],
                           'Neighborhood' : row['Neighborhood'],
                           'Latitude' : latitude,
                           'Longitude' : longitude
                         }, ignore_index=True)
               
geo_bor

Unnamed: 0,Borough,CommunityBoard,Area,Census,Neighborhood,Latitude,Longitude
0,,,,,,43.1562,-75.845
1,Bronx,Bronx CB 1,7.17,91497,Melrose,40.8257,-73.9152
2,Bronx,Bronx CB 1,7.17,91497,Mott Haven,40.809,-73.9229
3,Bronx,Bronx CB 1,7.17,91497,Port Morris,40.8015,-73.9096
4,Bronx,Bronx CB 2,5.54,52246,Hunts Point,40.8126,-73.884
...,...,...,...,...,...,...,...
326,Staten Island,Staten Island CB 3,58.97,152908,Prince's Bay,40.529,-74.1976
327,Staten Island,Staten Island CB 3,58.97,152908,Richmond Valley,40.5201,-74.2293
328,Staten Island,Staten Island CB 3,58.97,152908,Rossville,40.5556,-74.2129
329,Staten Island,Staten Island CB 3,58.97,152908,Tottenville,40.5112,-74.2493


#### Visualize data in a map of all neighborhoods in NYC

In [51]:
geo_bor[geo_bor['Borough'] == ''].index
indexNames = geo_bor[ geo_bor['Borough'] == '' ].index
geo_bor.drop(indexNames , inplace=True)
geo_bor_filter = geo_bor[geo_bor['Latitude'] != '' ]

import folium
address01 = 'Manhattan, NY'
geolocator = Nominatim(user_agent="ny_explorer") 
location = geolocator.geocode(address01)
latt= location.latitude
long = location.longitude


# Instantiate map of NYC Boroughs
map_nyc = folium.Map(location=[latt,long],zoom_start=10)
for lat, lng, borough, neighborhood in zip( geo_bor_filter['Latitude'],
                                            geo_bor_filter['Longitude'],
                                            geo_bor_filter['Borough'],
                                            geo_bor_filter['Neighborhood']):
    label = '{}, {}'.format(neighborhood,borough)
    label = folium.Popup(label, parse_html =True)
    folium.CircleMarker(
                [lat,lng],
                radius=5,
                popup=label,
                color='blue',
                fill=True,
                fill_color='#3186cc',
                fill_opacity=0.7,
                parse_html=False).add_to(map_nyc)
    
map_nyc


## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of Brooklyn that have low density of pharmacies, particularly in the borough of brooklyn.

Once we do the exploratory analysis and then we will create kmeans clustering algorithm to group it into clusters for further analysis.

### Analysis

Since the area we interested is brooklyn. We filter the data set.

In [15]:
geo_bor_brooklyn = geo_bor_filter[ geo_bor_filter['Borough'] == 'Brooklyn' ]
geo_bor_brooklyn.head()

Unnamed: 0,Borough,CommunityBoard,Area,Census,Neighborhood,Latitude,Longitude
61,Brooklyn,Brooklyn CB 1,12.82,160338,Greenpoint,40.7237,-73.951
62,Brooklyn,Brooklyn CB 1,12.82,160338,Williamsburg,40.7146,-73.9535
63,Brooklyn,Brooklyn CB 1,12.82,160338,Williamsburg Houses,40.7096,-73.9419
64,Brooklyn,Brooklyn CB 2,7.72,98620,Boerum Hill,40.6856,-73.9842
65,Brooklyn,Brooklyn CB 2,7.72,98620,Brooklyn Heights,40.6961,-73.995


#### Foursquare

Now that we have our location candidates, let's use Foursquare API to get info on pharmacies in each neighborhood.

In [18]:
#CLIENT_ID = 'VCVBVK4ZJW3YXYXGG0E0PHPAYZZY3FCKLDHVYQ2GUON25CBN' # your Foursquare ID
CLIENT_ID = '01V5YITO50521DZGP1LPOHU5UY2R2JHWHU3URE2QJNLNJV2R' # your Foursquare ID
#CLIENT_SECRET = 'EKKJXKVMOOPKJBQPG03G0DES5XA25B3XTMQVHYDZWRZQQ4KS' # your Foursquare Secret
CLIENT_SECRET = 'RMBOZLVD0PHGPWVD4XEXIJMGGW0QOHUGBNLPLAJX15ZQVIXQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
radius = 250 # define radius
category = '4bf58dd8d48988d104941735'
col001 =['Borough','CommunityBoard','Area', 'Census', 'Neighborhood','Latitude',
                       'Longitude', 'Pharmacy_Name', 'Pharma_Latitude', 'Pharma_Longitude']
#Instantiate the dataframe
geo_bor_brooklyn_pharma = pd.DataFrame(columns=col001)

for index, row in geo_bor_brooklyn.iterrows():    
        lat = row['Latitude']
        lon = row['Longitude']
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lon,
        category,
        radius, 
        LIMIT)          
        results = requests.get(url).json()['response']['groups'][0]['items']
        for item in results:
            item_pharma = item['venue']['name']
            item_lat = item['venue']['location']['lat']
            item_lng = item['venue']['location']['lng']
            geo_bor_brooklyn_pharma = geo_bor_brooklyn_pharma.append({ 'Borough' : row['Borough'],
                           'CommunityBoard' : row['CommunityBoard'],
                           'Area' : row['Area'],
                           'Neighborhood' : row['Neighborhood'],
                           'Census' : row['Census'],
                           'Neighborhood' : row['Neighborhood'],
                           'Latitude' : row['Latitude'],
                           'Longitude' : row['Longitude'],
                           'Pharmacy_Name' : item_pharma,
                           'Pharma_Latitude' : item_lat,
                           'Pharma_Longitude' : item_lng                                                                    
                         }, ignore_index=True)
    
geo_bor_brooklyn_pharma

Unnamed: 0,Borough,CommunityBoard,Area,Census,Neighborhood,Latitude,Longitude,Pharmacy_Name,Pharma_Latitude,Pharma_Longitude
0,Brooklyn,Brooklyn CB 1,12.82,160338,Greenpoint,40.723713,-73.950971,JAG-ONE Physical Therapy,40.725820,-73.951291
1,Brooklyn,Brooklyn CB 1,12.82,160338,Greenpoint,40.723713,-73.950971,Polska Przychodnia,40.724210,-73.949222
2,Brooklyn,Brooklyn CB 1,12.82,160338,Greenpoint,40.723713,-73.950971,Greenpoint Footcare,40.725489,-73.950907
3,Brooklyn,Brooklyn CB 1,12.82,160338,Greenpoint,40.723713,-73.950971,Tribeca Pediatrics - Greenpoint,40.722475,-73.949241
4,Brooklyn,Brooklyn CB 1,12.82,160338,Greenpoint,40.723713,-73.950971,The Smilist Dental,40.724378,-73.948829
...,...,...,...,...,...,...,...,...,...,...
577,Brooklyn,Brooklyn CB 18,24.68,194653,Mill Island,40.650104,-73.949582,Diamond Braces Church Ave,40.650555,-73.949753
578,Brooklyn,Brooklyn CB 18,24.68,194653,Mill Island,40.650104,-73.949582,Advanced Dental Center,40.650555,-73.949753
579,Brooklyn,Brooklyn CB 18,24.68,194653,Mill Island,40.650104,-73.949582,Diamond Braces,40.650555,-73.949753
580,Brooklyn,Brooklyn CB 18,24.68,194653,Mill Island,40.650104,-73.949582,"Advanced Oral Surgery of Brooklyn, PLLC",40.650426,-73.949818


In [19]:
geo_bor_brooklyn_pharma.head()

Unnamed: 0,Borough,CommunityBoard,Area,Census,Neighborhood,Latitude,Longitude,Pharmacy_Name,Pharma_Latitude,Pharma_Longitude
0,Brooklyn,Brooklyn CB 1,12.82,160338,Greenpoint,40.723713,-73.950971,JAG-ONE Physical Therapy,40.72582,-73.951291
1,Brooklyn,Brooklyn CB 1,12.82,160338,Greenpoint,40.723713,-73.950971,Polska Przychodnia,40.72421,-73.949222
2,Brooklyn,Brooklyn CB 1,12.82,160338,Greenpoint,40.723713,-73.950971,Greenpoint Footcare,40.725489,-73.950907
3,Brooklyn,Brooklyn CB 1,12.82,160338,Greenpoint,40.723713,-73.950971,Tribeca Pediatrics - Greenpoint,40.722475,-73.949241
4,Brooklyn,Brooklyn CB 1,12.82,160338,Greenpoint,40.723713,-73.950971,The Smilist Dental,40.724378,-73.948829


#### Explore the data set to find out the nearby pharmacies


In [22]:
geo_bor_brooklyn_pharma_cluster = geo_bor_brooklyn_pharma.groupby('Neighborhood').count()
geo_bor_brooklyn_pharma_cluster.head(10)

Unnamed: 0_level_0,Borough,CommunityBoard,Area,Census,Latitude,Longitude,Pharmacy_Name,Pharma_Latitude,Pharma_Longitude
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Bensonhurst,10,10,10,10,10,10,10,10,10
Brighton Beach,20,20,20,20,20,20,20,20,20
Brooklyn Heights,21,21,21,21,21,21,21,21,21
Canarsie,12,12,12,12,12,12,12,12,12
Clinton Hill,6,6,6,6,6,6,6,6,6
Cobble Hill,9,9,9,9,9,9,9,9,9
Coney Island,2,2,2,2,2,2,2,2,2
Cypress Hills,1,1,1,1,1,1,1,1,1
Dumbo,5,5,5,5,5,5,5,5,5
East Flatbush,4,4,4,4,4,4,4,4,4


#### Cluster Neighborhoods

Looking good. What we have now is a clear indication of zones with low number of pharmacies in vicinity. Let us now cluster those locations to create centers of zones containing good locations.

In [23]:
# import k-means from clustering stage
from sklearn.cluster import KMeans
kclusters = 10
# run k-means clustering
#geo_bor_brooklyn_pharma_cluster = geo_bor_brooklyn_pharma_cluster.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(geo_bor_brooklyn_pharma_cluster)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 


array([8, 2, 2, 4, 0, 8, 7, 7, 0, 5])

In [24]:
geo_bor_brooklyn_pharma_cluster.insert(0, 'Cluster Labels', kmeans.labels_)

In [25]:
geo_bor_brooklyn_pharma_cluster

Unnamed: 0_level_0,Cluster Labels,Borough,CommunityBoard,Area,Census,Latitude,Longitude,Pharmacy_Name,Pharma_Latitude,Pharma_Longitude
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Bensonhurst,8,10,10,10,10,10,10,10,10,10
Brighton Beach,2,20,20,20,20,20,20,20,20,20
Brooklyn Heights,2,21,21,21,21,21,21,21,21,21
Canarsie,4,12,12,12,12,12,12,12,12,12
Clinton Hill,0,6,6,6,6,6,6,6,6,6
Cobble Hill,8,9,9,9,9,9,9,9,9,9
Coney Island,7,2,2,2,2,2,2,2,2,2
Cypress Hills,7,1,1,1,1,1,1,1,1,1
Dumbo,0,5,5,5,5,5,5,5,5,5
East Flatbush,5,4,4,4,4,4,4,4,4,4


#### Now the kmeans has clustered the data set based on the density

Filter the dataset for cluster with lowest density

In [28]:
geo_bor_brooklyn_pharma_recom = geo_bor_brooklyn_pharma_cluster.loc[geo_bor_brooklyn_pharma_cluster['Cluster Labels'] == 7]

In [56]:
geo_bor_brooklyn_pharma_recom

Unnamed: 0_level_0,Cluster Labels,Borough,CommunityBoard,Area,Census,Latitude,Longitude,Pharmacy_Name,Pharma_Latitude,Pharma_Longitude
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Coney Island,7,2,2,2,2,2,2,2,2,2
Cypress Hills,7,1,1,1,1,1,1,1,1,1
Fulton Ferry,7,1,1,1,1,1,1,1,1,1
Gerritsen Beach,7,1,1,1,1,1,1,1,1,1
Gowanus,7,1,1,1,1,1,1,1,1,1
Mapleton,7,1,1,1,1,1,1,1,1,1
Remsen Village,7,2,2,2,2,2,2,2,2,2
Rugby,7,1,1,1,1,1,1,1,1,1
Bushwick,7,1,1,1,1,1,1,1,1,1
Carroll Gardens,7,2,2,2,2,2,2,2,2,2


#### Visualize the dataset based on the recommendation consolidated by the algorithm


In [34]:
column_final = ['Borough','Neighborhood','Latitude', 'Longitude']
#Instantiate the dataframe
geo_bor_brooklyn_pharma_recom_visual = pd.DataFrame(columns=column_final)

for index, row in geo_bor_brooklyn_pharma_recom.iterrows():
    t_fNeighborhood = index
    t_fBorough='Brooklyn'
    address = '{}, {}, NY'.format(t_Neighborhood,t_Borough)
    geolocator = Nominatim(user_agent="nybk_explorer") 
    g = geolocator.geocode(address)
    if g != None : 
        latitude = g.latitude 
        longitude = g.longitude 
        #print(latitude,longitude) 
    else:
        latitude = '' 
        longitude = ''
    geo_bor_brooklyn_pharma_recom_visual = geo_bor_brooklyn_pharma_recom_visual.append({ 'Borough' : t_fBorough,
                           'Neighborhood' : t_fNeighborhood,
                           'Latitude' : latitude,
                           'Longitude' : longitude
                         }, ignore_index=True)
               
geo_bor_brooklyn_pharma_recom_visual



Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Coney Island,40.543439,-74.197644
1,Brooklyn,Cypress Hills,40.543439,-74.197644
2,Brooklyn,Fulton Ferry,40.543439,-74.197644
3,Brooklyn,Gerritsen Beach,40.543439,-74.197644
4,Brooklyn,Gowanus,40.543439,-74.197644
5,Brooklyn,Mapleton,40.543439,-74.197644
6,Brooklyn,Remsen Village,40.543439,-74.197644
7,Brooklyn,Rugby,40.543439,-74.197644
8,Brooklyn,Bushwick,40.543439,-74.197644
9,Brooklyn,Carroll Gardens,40.543439,-74.197644


## Results and Discussion <a name="results"></a>

After reviewing the pharmacies across the Brooklyn neighborhoods the above locations came out as the least number of pharmacies.

This, of course, does not imply that those zones are actually optimal locations for a new Pharmacy! 

Purpose of this analysis was to only provide info on areas in Brooklyn where a new pharmacy can be established 
This will give an initial list of optimal locations to stakeholders where a further deep evaluation will be required to nail down the location considering additional factors.

## Conclusion <a name="conclusion"></a>


Purpose of this project was to identify areas in Brooklyn with low number of pharmacies in order to aid stakeholders in narrowing down the search for optimal location.
By calculating density distribution from Foursquare data we have first identified general boroughs that justify further analysis, and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby pharmacies. 
Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations) to be used as starting points for final exploration by stakeholders.

Final decission on optimal location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, 
taking into consideration additional factors like attractiveness of each location, zoning, city approvals, proximity to hospitals, real estate availability, prices etc.