# Coursara Capstone project - Villages of New York

## This study will compare the villages of New York both by population and venues and determine the distance to their closest hospital.

Villages have many issues that are similar to larger cities without the resources of larger cities. 
Some villages are near major cities, others are very rural in which there may not be essential services like
hospitals or even grocery stores.

We will use foursquare to determine the types of venues available in these villages or the distance to essential services.  We will group the venues using k-means to determine the villages of similar caracteristics 
that may require similar services.  
### The audiance:
State and local governments can use the results of this study to allocate resources and develop plans to help the people and their villages.  This help could be in the form of indicating where to build rural medical clinics so everyone in the state can have access to health care.

### First download the required libraries


In [389]:
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests  # this module helps us to download a web page
print("BeautifulSoup installed")
import pandas as pd
import numpy as np

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import requests # library to handle requests
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

##!pip install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
print('Libraries imported.')


BeautifulSoup installed
Libraries imported.


In [390]:
#The below url contains html tables with data about New York villages.
villageurl = "https://en.wikipedia.org/wiki/List_of_villages_in_New_York_(state)"
hospitalurl = "https://en.wikipedia.org/wiki/List_of_hospitals_in_New_York_(state)"
emergencyurl ="https://profiles.health.ny.gov/hospital/county_or_region/service:Emergency+Department"
providerurl = "https://www.health.ny.gov/regulations/hcra/provider/provhosp.htm"
citylisturl ="http://www.theus50.com/newyork/cities.php"  #just has the names of the cities


## Use BeautifulSoup to get data about villages from the web site

 https://en.wikipedia.org/wiki/List_of_villages_in_New_York_(state) contians a list of the incorporated villages of New York State with some data about each village including the village latitude and longitude.


In [391]:
# get the contents of the webpage in text format and store in a variable called data
data  = requests.get(villageurl).text
#soup = BeautifulSoup(data,"lxml")
soup = BeautifulSoup(data,"html5lib")

In [392]:
#find all html tables in the web page
tables = soup.find_all('table') # in html table is represented by the tag <table>
#tables

The following code produces the columns for the dataframe and drops rows with unassigned boroughs

In [393]:
data = []
table = soup.find('table', attrs={'class':'wikitable sortable'})
table_body = table.find('tbody')

rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele]) # Get rid of empty values

In [394]:
df=pd.DataFrame(data)
df.columns = ["Village","County","Population","Land","Water","Lat-Long","geoID","ANSIcode", "Town"]

In [395]:
df.drop([0],inplace = True)
df = df.reset_index(drop=True)
# Drop columns not needed for our analysis
df.drop(columns=['Land','Water', 'geoID','ANSIcode'],inplace = True)
df.head()

Unnamed: 0,Village,County,Population,Lat-Long,Town
0,Adams,Jefferson,1775,"43.810092, -76.022913",Adams
1,Addison,Steuben,1763,"42.106321, -77.231990",Addison
2,Afton,Chenango,822,"42.229199, -75.524750",Afton
3,Airmont,Rockland,8628,"41.099156, -74.097916",Ramapo
4,Akron,Erie,2868,"43.018115, -78.497576",Newstead


Separate the Lat-Long value into separte Latitude and Longitude columns and include them in the dataframe

In [396]:
dflatlong = pd.DataFrame(df['Lat-Long'].str.split(',').tolist(),columns = ['Latitude','Longitude'])
#dflatlong.head()

In [397]:
df = df.join(dflatlong)
#df.head(10)

In [398]:
df.drop(columns=['Lat-Long'],inplace = True)
df.shape

(533, 6)

verify that the dataframe does not contain any null values.  Null values could crate a problem

In [399]:
check_nan_in_df = df.isnull().values.any()
print (check_nan_in_df)
df.shape

False


(533, 6)

## get latitute and Longintude of New York state in order to locate villages on a map

In [400]:
address = 'NY'

geolocator = Nominatim(user_agent="NewYork_explorer")
location = geolocator.geocode(address)
nylatitude = location.latitude
nylongitude = location.longitude
print('The geograpical coordinates of New York are {}, {}.'.format(nylatitude, nylongitude))

The geograpical coordinates of New York are 43.1561681, -75.8449946.


In [401]:
# create map of New York state using latitude and longitude values
map_newyork = folium.Map(location=[nylatitude, nylongitude], zoom_start=7)

# add markers to map
for lat, lng, county, village in zip(df['Latitude'], df['Longitude'], df['County'], df['Village']):
    label = '{}, {}'.format(village, county)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

In [402]:
# set up variables for Foursquare access
import config
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
radius = 500

Set up for using foresquare to get venues around villages

In [404]:
url4sq = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)

Build functions for use in analysis. First function gets the category of the venues Next function gets the venues around the neighboorhoods

In [405]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [407]:
def getNearbyVenues(names,latitudes,longitudes):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Village', 
                  'Village Latitude', 
                  'Village Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


## Execute venues function by neighborhood

In [408]:
village_venues = getNearbyVenues(names=df['Village'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

## Explore the venues

In [409]:
print(village_venues.shape)
#village_venues.head()

(4106, 7)


In [410]:
village_venues.groupby('Village').count()

Unnamed: 0_level_0,Village Latitude,Village Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Village,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adams,4,4,4,4,4,4
Addison,6,6,6,6,6,6
Afton,5,5,5,5,5,5
Airmont,3,3,3,3,3,3
Akron,10,10,10,10,10,10
...,...,...,...,...,...,...
Woodridge,2,2,2,2,2,2
Woodsburgh,1,1,1,1,1,1
Wurtsboro,11,11,11,11,11,11
Yorkville,13,13,13,13,13,13


In [411]:
print('There are {} unique categories.'.format(len(village_venues['Venue Category'].unique())))

There are 304 unique categories.


## Onehot encoding of the venues to build dataframe for analysis

In [412]:
# one hot encoding
village_onehot = pd.get_dummies(village_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
village_onehot['Village'] = village_venues['Village'] 

# move neighborhood column to the first column
fixed_columns = [village_onehot.columns[-1]] + list(village_onehot.columns[:-1])
village_onehot = village_onehot[fixed_columns]

#village_onehot.head()

In [413]:
village_grouped = village_onehot.groupby('Village').mean().reset_index()
#village_grouped

Define function to return the most common venues

In [414]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [None]:
Build dataframe of villages with their most commonly used venues ranked

In [415]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Village']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
villages_venues_sorted = pd.DataFrame(columns=columns)
villages_venues_sorted['Village'] = village_grouped['Village']

for ind in np.arange(village_grouped.shape[0]):
    villages_venues_sorted.iloc[ind, 1:] = return_most_common_venues(village_grouped.iloc[ind, :], num_top_venues)

#villages_venues_sorted.head()

# Perform KMeans clustering.  Build 7 clusters

### I tried building 5,6,7,8 or 9 clusters and found that I had the most even separation of groups by using 7 clusters

In [416]:
kclusters = 7

village_grouped_clustering = village_grouped.drop('Village', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(village_grouped_clustering)

# check cluster labels generated for each row in the datafr
kmeans.labels_[0:10] 

array([0, 1, 5, 2, 1, 0, 1, 1, 1, 5])

In [171]:
# add clustering labels
if 'Cluster Labels' in villages_venues_sorted.columns:
    del villages_venues_sorted['Cluster Labels']
villages_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

village_merged = df
# merge toronto_grouped with toronto_metro to add latitude/longitude for each neighborhood
village_merged = village_merged.join(villages_venues_sorted.set_index('Village'), on='Village')

In [None]:
## validate that all of the villages are in a cluster

In [417]:
print(village_merged.loc[~village_merged['Cluster Labels'].isin([0,1,2,3,4,5,6,7])])


Empty DataFrame
Columns: [Village, County, Population, Town, Latitude, Longitude, Cluster Labels, 1st Most Common Venue, 2nd Most Common Venue, 3rd Most Common Venue, 4th Most Common Venue, 5th Most Common Venue, 6th Most Common Venue, 7th Most Common Venue, 8th Most Common Venue, 9th Most Common Venue, 10th Most Common Venue]
Index: []


In [None]:
## drop villages that are not in a cluster

In [418]:
index_names = village_merged.loc[~village_merged['Cluster Labels'].isin([0,1,2,3,4,5,6,7])].index
# drop these given row
# indexes from dataFrame
village_merged.drop(index_names, inplace = True)

convert the cluster lable to an integer and check the dataframe

In [420]:
village_merged['Cluster Labels'] = village_merged['Cluster Labels'].astype(int)

village_merged.head() # check the last columns!

Unnamed: 0,Village,County,Population,Town,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adams,Jefferson,1775,Adams,43.810092,-76.022913,1,Construction & Landscaping,Pizza Place,Sandwich Place,Diner,Convenience Store,Cupcake Shop,Filipino Restaurant,Eye Doctor,Fair,Falafel Restaurant
1,Addison,Steuben,1763,Addison,42.106321,-77.23199,6,Convenience Store,Discount Store,Bar,Gas Station,Chinese Restaurant,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market
2,Afton,Chenango,822,Afton,42.229199,-75.52475,1,Gift Shop,American Restaurant,Pizza Place,New American Restaurant,Yoga Studio,Farmers Market,Event Space,Eye Doctor,Fair,Falafel Restaurant
3,Airmont,Rockland,8628,Ramapo,41.099156,-74.097916,0,Music Venue,Park,Nightlife Spot,Baseball Field,Event Service,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market
4,Akron,Erie,2868,Newstead,43.018115,-78.497576,6,Bar,Playground,Pizza Place,Optical Shop,American Restaurant,Deli / Bodega,Business Service,Pharmacy,Liquor Store,Sandwich Place


## Summarize the clusters of villages

In [421]:
# summarize clusters
print(kclusters," Clusters containing: ")
for clust in np.arange(kclusters):
    print("Cluster number ",clust,"has ",village_merged['Cluster Labels'].eq(clust).sum()," villages. ")

7  Clusters containing: 
Cluster number  0 has  44  villages. 
Cluster number  1 has  92  villages. 
Cluster number  2 has  6  villages. 
Cluster number  3 has  6  villages. 
Cluster number  4 has  32  villages. 
Cluster number  5 has  14  villages. 
Cluster number  6 has  293  villages. 


In [426]:
# create map
map_clusters = folium.Map(location=[nylatitude, nylongitude], zoom_start=7)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(village_merged['Latitude'], village_merged['Longitude'], village_merged['Village'], village_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.8).add_to(map_clusters)
       
map_clusters

# The following are the neighborhood clusters

In [190]:
village_merged.loc[village_merged['Cluster Labels'] == 0, village_merged.columns[[0] + list(range(6, village_merged.shape[1]))]]

Unnamed: 0,Village,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Airmont,0,Music Venue,Park,Nightlife Spot,Baseball Field,Event Service,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market
12,Altamont,0,Arts & Crafts Store,Food,Park,American Restaurant,Restaurant,Fast Food Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant
20,Ardsley,0,Park,Hotel,Pharmacy,Bus Station,Donut Shop,Chinese Restaurant,Diner,Japanese Restaurant,Plaza,Filipino Restaurant
22,Arkport,0,Motorcycle Shop,Park,Convenience Store,Sporting Goods Shop,Grocery Store,Cupcake Shop,Dance Studio,Eye Doctor,Fair,Falafel Restaurant
27,Aurora,0,Food & Drink Shop,Bed & Breakfast,Park,Gastropub,Restaurant,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Flea Market
43,Bergen,0,Post Office,Bakery,Park,Deli / Bodega,Fast Food Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm
69,Canastota,0,Italian Restaurant,Pool,Park,Baseball Field,Yoga Studio,Farmers Market,Event Space,Eye Doctor,Fair,Falafel Restaurant
71,Canisteo,0,Park,American Restaurant,Pizza Place,Dive Bar,Yoga Studio,Fast Food Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant
87,Central Square,0,Chinese Restaurant,Park,Sandwich Place,Pet Store,Yoga Studio,Event Service,Event Space,Eye Doctor,Fair,Falafel Restaurant
96,Chittenango,0,Park,Home Service,Yoga Studio,Farmers Market,Event Service,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm


In [191]:
village_merged.loc[village_merged['Cluster Labels'] == 1, village_merged.columns[[0] + list(range(6, village_merged.shape[1]))]]

Unnamed: 0,Village,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adams,1,Construction & Landscaping,Pizza Place,Sandwich Place,Diner,Convenience Store,Cupcake Shop,Filipino Restaurant,Eye Doctor,Fair,Falafel Restaurant
2,Afton,1,Gift Shop,American Restaurant,Pizza Place,New American Restaurant,Yoga Studio,Farmers Market,Event Space,Eye Doctor,Fair,Falafel Restaurant
5,Albion,1,Business Service,Post Office,Bar,Liquor Store,Pizza Place,Donut Shop,Farmers Market,Eye Doctor,Fair,Falafel Restaurant
11,Almond,1,Convenience Store,Bar,Filipino Restaurant,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Financial or Legal Service
21,Argyle,1,Convenience Store,Baseball Field,Filipino Restaurant,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Financial or Legal Service
...,...,...,...,...,...,...,...,...,...,...,...,...
506,Webster,1,Intersection,Pizza Place,Yoga Studio,Filipino Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market
511,West Carthage,1,Supermarket,Hotel,Pizza Place,Sandwich Place,Video Store,Yoga Studio,Event Service,Event Space,Eye Doctor,Fair
514,West Winfield,1,Construction & Landscaping,Gas Station,Pizza Place,Sandwich Place,Event Service,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm
519,Whitesboro,1,Pizza Place,Convenience Store,Bakery,Clothing Store,Pharmacy,Diner,Cosmetics Shop,Hobby Shop,Flower Shop,Dumpling Restaurant


In [192]:
village_merged.loc[village_merged['Cluster Labels'] == 2, village_merged.columns[[0] + list(range(6, village_merged.shape[1]))]]

Unnamed: 0,Village,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
80,Catskill,2,Harbor / Marina,River,Park,Scenic Lookout,Dumpling Restaurant,Entertainment Service,Event Service,Event Space,Eye Doctor,Fair
82,Cayuga,2,Park,Harbor / Marina,Yoga Studio,Electronics Store,Event Service,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm
88,Centre Island,2,Harbor / Marina,Yoga Studio,Electronics Store,Event Service,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market
216,Hewlett Neck,2,Harbor / Marina,Yoga Studio,Electronics Store,Event Service,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market
267,Lloyd Harbor,2,Harbor / Marina,Yoga Studio,Electronics Store,Event Service,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market
336,North Haven,2,Harbor / Marina,Yoga Studio,Electronics Store,Event Service,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market


In [193]:
village_merged.loc[village_merged['Cluster Labels'] == 3, village_merged.columns[[0] + list(range(6, village_merged.shape[1]))]]

Unnamed: 0,Village,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
215,Hewlett Harbor,3,Beach Bar,Golf Course,Food Truck,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant
286,Matinecock,3,Golf Course,Yoga Studio,Fast Food Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Filipino Restaurant
334,Nissequogue,3,Golf Course,Yoga Studio,Fast Food Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Filipino Restaurant
348,Old Brookville,3,Golf Course,Yoga Studio,Fast Food Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Filipino Restaurant
421,Sands Point,3,IT Services,Golf Course,Fast Food Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Filipino Restaurant
528,Woodsburgh,3,Golf Course,Yoga Studio,Fast Food Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Filipino Restaurant


In [194]:
village_merged.loc[village_merged['Cluster Labels'] == 4, village_merged.columns[[0] + list(range(6, village_merged.shape[1]))]]

Unnamed: 0,Village,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Andover,4,Post Office,Business Service,Mountain,Bar,Deli / Bodega,Filipino Restaurant,Fair,Falafel Restaurant,Farm,Farmers Market
59,Brushton,4,Post Office,Market,Gas Station,Fast Food Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market
61,Burdett,4,Post Office,Bed & Breakfast,Dessert Shop,Food,Fast Food Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm
62,Burke,4,Post Office,Bar,Fast Food Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Filipino Restaurant
75,Cassadaga,4,Bar,Yoga Studio,Filipino Restaurant,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Financial or Legal Service
79,Cato,4,Grocery Store,Post Office,Farmers Market,Entertainment Service,Event Service,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm
99,Clayville,4,Bar,Post Office,Fast Food Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Filipino Restaurant
105,Cohocton,4,Post Office,Food,Soccer Field,Farmers Market,Event Service,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm
106,Cold Brook,4,Post Office,Yoga Studio,Fast Food Restaurant,Event Service,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market
129,DeRuyter,4,Bar,Post Office,Pizza Place,Breakfast Spot,Filipino Restaurant,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market


In [195]:
village_merged.loc[village_merged['Cluster Labels'] == 5, village_merged.columns[[0] + list(range(6, village_merged.shape[1]))]]

Unnamed: 0,Village,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Antwerp,5,Post Office,Business Service,Fast Food Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Filipino Restaurant
68,Canaseraga,5,Business Service,Convenience Store,Post Office,Hookah Bar,Electronics Store,Event Service,Hot Dog Joint,Event Space,Eye Doctor,Fair
78,Castorland,5,Post Office,Business Service,Fast Food Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Filipino Restaurant
89,Champlain,5,Business Service,Paper / Office Supplies Store,Transportation Service,Yoga Studio,Farmers Market,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm
100,Cleveland,5,Business Service,Yoga Studio,Filipino Restaurant,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Financial or Legal Service
170,Fort Johnson,5,Gas Station,Yoga Studio,Filipino Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
213,Heuvelton,5,Gas Station,Yoga Studio,Filipino Restaurant,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
287,Maybrook,5,Business Service,Park,Yoga Studio,Entertainment Service,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market
290,McGraw,5,Business Service,Food,Baseball Field,Yoga Studio,Fast Food Restaurant,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market
390,Port Leyden,5,Business Service,Yoga Studio,Filipino Restaurant,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Financial or Legal Service


In [428]:
village_merged.loc[village_merged['Cluster Labels'] == 6, village_merged.columns[[0] + list(range(6, village_merged.shape[1]))]]

Unnamed: 0,Village,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Addison,6,Convenience Store,Discount Store,Bar,Gas Station,Chinese Restaurant,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market
4,Akron,6,Bar,Playground,Pizza Place,Optical Shop,American Restaurant,Deli / Bodega,Business Service,Pharmacy,Liquor Store,Sandwich Place
6,Alden,6,Pharmacy,Gas Station,Ice Cream Shop,Park,Grocery Store,Liquor Store,Thrift / Vintage Store,Martial Arts School,Chinese Restaurant,Diner
7,Alexander,6,Construction & Landscaping,High School,Gas Station,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
8,Alexandria Bay,6,Boat or Ferry,Fast Food Restaurant,Gift Shop,Harbor / Marina,Castle,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market
...,...,...,...,...,...,...,...,...,...,...,...,...
524,Windsor,6,Arts & Entertainment,Sandwich Place,Chinese Restaurant,Italian Restaurant,Yoga Studio,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm
527,Woodridge,6,Campground,Health & Beauty Service,Yoga Studio,Entertainment Service,Event Space,Eye Doctor,Fair,Falafel Restaurant,Farm,Farmers Market
529,Wurtsboro,6,Snack Place,Spiritual Center,Chinese Restaurant,Bank,Arts & Crafts Store,Vegetarian / Vegan Restaurant,Bar,Pizza Place,Ice Cream Shop,Food
531,Yorkville,6,Fast Food Restaurant,Pub,Clothing Store,Middle Eastern Restaurant,Grocery Store,Automotive Shop,Chinese Restaurant,Bowling Alley,Deli / Bodega,Pizza Place


# Hospital locations
## locate the hospitals in New York 
This part of the project is to locate the hospitals in New York then locate the distance between the villages and their nearest hospital.   we will then group the villages in terms of distance from hospitals.   This will help to identify those villages that will need additional hospitals.

## get table of hospitals with emergency rooms

People need to get to an ER fast so nearby emergency rooms are critical

In [429]:
# get the contents of the webpage in text format and store in a variable called data
hospitaldata  = requests.get(emergencyurl).text
#soup = BeautifulSoup(data,"lxml")
hospitalsoup = BeautifulSoup(hospitaldata,"html5lib")

In [430]:
#find all html tables in the web page
tables = hospitalsoup.find_all('table') # in html table is represented by the tag <table>
#tables

In [431]:
hospdata = []
hosptable = hospitalsoup.find('table', attrs={'class':'table table-striped dataTable no-footer'})
table_body = hosptable.find('tbody')

rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    hospdata.append([ele for ele in cols if ele]) # Get rid of empty values

In [432]:
dfhosp=pd.DataFrame(hospdata)
dfhosp.columns = ["Name","Designations","County/Region"]

In [433]:
dfhosp

Unnamed: 0,Name,Designations,County/Region
0,Adirondack Medical Center-Lake Placid Site,,Essex (Capital District)
1,Adirondack Medical Center-Saranac Lake Site,Level 1 Perinatal Center,Franklin (Capital District)
2,Albany Medical Center Hospital,AIDS Center; Comprehensive Stroke Center; Leve...,Albany (Capital District)
3,Samaritan Hospital - Albany Memorial Campus,SAFE Designated Hospital,Albany (Capital District; Capital District)
4,The University of Vermont Health Network - Ali...,Level 1 Perinatal Center,Franklin (Capital District)
...,...,...,...
183,Woman's Christian Association Hospital,Level 1 Perinatal Center; Primary Stroke Cente...,Chautauqua (Western NY - Buffalo)
184,John R. Oishei Children's Hospital,AIDS Center; Level 1 Pediatric Trauma Center; ...,Erie (Western NY - Buffalo)
185,Woodhull Medical & Mental Health Center,AIDS Center; Level 3 Perinatal Center; Primary...,Kings (New York Metro - New York City)
186,Wyckoff Heights Medical Center,Level 3 Perinatal Center; Primary Stroke Center,Kings (New York Metro - New York City)


Separate the county from the region to help identify locations

In [434]:
dfcounty = pd.DataFrame(dfhosp['County/Region'].str.split(' ',1).tolist(),columns = ['County','Region'])

In [435]:
dfhosp = dfhosp.join(dfcounty)
dfhosp.drop(columns=['County/Region','Designations'],inplace = True)

So dfhosp contains a list of hospitals with emergency rooms with count and region.  We will use this to compare with hospital list later

# Get Hospitals that are OPEN
## Not all hospitals listed are currently open. 
### the following web site has hospitals and their operating status.  It is important to use open hospitals for this study

In [436]:
# get the contents of the webpage in text format and store in a variable called data
hospadddata  = requests.get(providerurl).text
#soup = BeautifulSoup(data,"lxml")
hospaddsoup = BeautifulSoup(hospadddata,"html5lib")

In [437]:
hospadddata = []
hospaddtable = hospaddsoup.find('table', attrs={'summary':'Alphabetical Listing of Providers'})
table_body = hospaddtable.find('tbody')

rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    hospadddata.append([ele for ele in cols if ele]) # Get rid of empty values

In [438]:
dfhospadd=pd.DataFrame(hospadddata)
dfhospadd.columns = ["Opcert","Name","Address","City","State","Zip","Status","Type"]

In [439]:
dfhospadd.shape

(279, 8)

Note that there are 279 hospitals in NY.  Now determine how many of them are actually open

In [440]:
dfhospopen = dfhospadd[dfhospadd['Status']=='OPEN']
dfhospopen.is_copy = None
dfhospopen.reset_index(drop=True)
dfhospopen.shape

(166, 8)

Reindex the dataframe and add columns for the latitude and longitude of the hospitals

In [441]:
#dfhospopen.assign(Latitude='',Longitude='')
dfhospopen = dfhospopen.reindex(columns = dfhospopen.columns.tolist() + ["Latitude","Longitude"])


### use geopy Nominatime to convert the address of the hospital to a latitude and Longitude.
If something in the address (i.e. the address is a P.O. Box) prevents geocode from returning a Latitude and Longitude,
Then use the address of the city where the hospital is located to find the Latitude and Longitude.  While this may introduce an insugnificant error I do not expect it will affect the final result as hospitals tend to be located near city centers.

In [443]:
for ind in np.arange(dfhospopen.shape[0]):
    address = '{}, {}, {}, {}'.format(dfhospopen['Address'].iloc[ind],dfhospopen['City'].iloc[ind],dfhospopen['State'].iloc[ind],dfhospopen['Zip'].iloc[ind])
    #print(address)
    geolocator = Nominatim(timeout=10,user_agent="Hospital_explorer")
    hosplocation = geolocator.geocode(address)
    #print(hosplocation)
    if hosplocation is not None:
        dfhospopen['Latitude'].iloc[ind] = hosplocation.latitude
        dfhospopen['Longitude'].iloc[ind] = hosplocation.longitude
    else:
        address = '{}, {}, {}'.format(dfhospopen['City'].iloc[ind],dfhospopen['State'].iloc[ind],dfhospopen['Zip'].iloc[ind])
        geolocator = Nominatim(timeout=10,user_agent="Hospital_explorer")
        hosplocation = geolocator.geocode(address)
        #print(hosplocation)
        dfhospopen['Latitude'].iloc[ind] = hosplocation.latitude
        dfhospopen['Longitude'].iloc[ind] = hosplocation.longitude

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iloc._setitem_with_indexer(indexer, value)


In [320]:
dfhospopen.head()

Unnamed: 0,Opcert,Name,Address,City,State,Zip,Status,Type,Latitude,Longitude
0.0,3801000H,A.O. FOX MEMORIAL HOSPITAL,ONE NORTON AVE,ONEONTA,NY,13820,OPEN,,42.457473,-75.052216
1.0,1623001H,ADIRONDACK MEDICAL CENTER - SARANAC LAKE SITE,"2233 RT 86, PO BOX 471",SARANAC LAKE,NY,12983,OPEN,,44.330927,-74.132613
3.0,0101000H,ALBANY MEDICAL CENTER HOSPITAL,PO BOX 619,ALBANY,NY,12201,OPEN,,42.59869,-73.9844
7.0,0701000H,ARNOT OGDEN MEDICAL CENTER,600 ROE AVE,ELMIRA,NY,14905,OPEN,,42.100184,-76.827271
8.0,0501000H,AUBURN COMMUNITY HOSPITAL,LANSING ST,AUBURN,NY,13021,OPEN,,42.940992,-76.55716


We now have a list of open hospitals with their latitude and longitude.

Next step is get the distance in miles between each village and it's nearest hospitals
One method would be to determine the distance between each village and each hospital and pick the minimum value for each paring.

Build array with village vs hospitals.  

In [444]:
dfhospopen.drop(columns=['Opcert','Status','Type'],inplace = True)
dfhospopen.reset_index(drop=True)
dfhospopen.head()

Unnamed: 0,Name,Address,City,State,Zip,Latitude,Longitude
0,A.O. FOX MEMORIAL HOSPITAL,ONE NORTON AVE,ONEONTA,NY,13820,42.457473,-75.052216
1,ADIRONDACK MEDICAL CENTER - SARANAC LAKE SITE,"2233 RT 86, PO BOX 471",SARANAC LAKE,NY,12983,44.330927,-74.132613
3,ALBANY MEDICAL CENTER HOSPITAL,PO BOX 619,ALBANY,NY,12201,42.59869,-73.9844
7,ARNOT OGDEN MEDICAL CENTER,600 ROE AVE,ELMIRA,NY,14905,42.100184,-76.827271
8,AUBURN COMMUNITY HOSPITAL,LANSING ST,AUBURN,NY,13021,42.940992,-76.55716


## Find the distance between each village and each hospital.
### we are using geopy distance to determine this value.  The distance uses the 'Great Circle' measurement technique and defines the shortest 'as a bird flies' distance between two locations.
we will build an array of village vs hospitals with the distance in miles

In [445]:
from geopy import distance
#from geopy.distance import great_circle
#vilhosp = np.array([df.shape[0],dfhospopen.shape[0]-3])
vilhosp =[]
for vil in np.arange(df.shape[0]):
    village = (df['Latitude'].iloc[vil], df['Longitude'].iloc[vil])
    vilhosp.append([])
    for hosp in np.arange(dfhospopen.shape[0]-3):
        hospital = (dfhospopen['Latitude'].iloc[hosp], dfhospopen['Longitude'].iloc[hosp])
        vilhosp[vil].append(distance.great_circle(village, hospital).miles)

convert the array to a dataframe and evaluate its shape

In [446]:
#vilhosp
dfvilhosp=pd.DataFrame(vilhosp)

dfvilhosp.shape

(533, 163)

Get the village names to be used lables for the datafarme and determine the hospital distance that minimizes the number of miles from the village

In [448]:
dfvillnames = df['Village']

In [449]:
#result = pd.concat([dfvillnames, dfvilhosp], axis=1, ignore_index=True)
result = pd.concat([dfvillnames, dfvilhosp.min(axis =1)], axis=1, ignore_index=True)

In [450]:
result.rename(columns={0:'Village',1:'Hospital Distance'}, inplace=True)

In [451]:
result.head()

Unnamed: 0,Village,Hospital Distance
0,Adams,11.975383
1,Addison,9.459687
2,Afton,20.770746
3,Airmont,2.066487
4,Akron,10.94827


# Concluding analysis:
### build dataframes with villages that are:
 less then 5 miles from a hospital, 
 between 5 and 10 miles from a hospital, 
 between 10 and 20 miles from a hospital,
 and more than 20 miles from a hospital

In [452]:
hospclose = result.loc[result['Hospital Distance'] <= 5]
hospmid = result.loc[(result['Hospital Distance'] > 5) & (result['Hospital Distance'] <= 10 )]
hospmidtofar = result.loc[(result['Hospital Distance'] > 10) & (result['Hospital Distance'] <= 20) ]
hospfar = result.loc[result['Hospital Distance'] > 20]

In [453]:
print('Number of villages less than 5 miles from a hospital: ',hospclose.shape[0])
print('Number of villages 5 to 10 milles from a hospital: ',hospmid.shape[0])
print('Number of villages 10 to 20 milles from a hospital: ',hospmidtofar.shape[0])
print('Number of villages more than 20 milles from a hospital: ',hospfar.shape[0])

Number of villages less than 5 miles from a hospital:  218
Number of villages 5 to 10 milles from a hospital:  134
Number of villages 10 to 20 milles from a hospital:  163
Number of villages more than 20 milles from a hospital:  18


## Identify those villages that are more than 20 miles from the nearest hospital

In [455]:
print('The following villages are more than 20 miles from the nearest hospital and are canidates for somekind of emergency medical services')
print(hospfar)

The following villages are more than 20 miles from the nearest hospital and are canidates for somekind of emergency medical services
            Village  Hospital Distance
2             Afton          20.770746
64        Cambridge          21.479638
73     Cape Vincent          23.389949
128   Dering Harbor          20.485303
138    East Hampton          24.966050
179     Gainesville          23.021352
189       Granville          20.254532
228          Hunter          22.034637
244          Lacona          23.506046
333         Nichols          22.781216
394         Pulaski          20.776109
412    Rouses Point          20.719803
418      Sag Harbor          20.504951
419      Sagaponack          21.152290
422     Sandy Creek          23.804047
440  Silver Springs          20.988831
457      Speculator          35.784296
518       Whitehall          20.973112
