# IBM Data Science Capstone
## The Battle of Neighborhoods - Week Two

### Introduction:

There are several large hospitals located in the Texas Medical Center in Houston and in other states in the US.  Most of these hospitals have designated cafeteria space for patients, staff, and visitors. They try to offer a variety of options, including outsources options like franchises, casual dining places, self-serve station, and traditional cafeteria food. TMC is one of the world’s largest medical center and as such had thousands of visitors each year from around the country and the world.

### Background and Problem Description:

The TMC is a very attractive area to open a franchise or casual dining spot due to the large volume of visitors and staff in the area. However, there is also another large medical center that is very attractive in New York City that could offer more foot traffic and less competition by being a smaller medical center.   Several restaurants already exist in both areas and it is important to understand the competition before opening a new restaurant. There are also several options on locations and type of dining experience. The new restaurant will be fast casual focusing on healthy options, offering both free delivery to nearby hospitals by pedestrian couriers. The restaurant will be called Light Delights and has the funds to only open in one location.  This report will explore the options of opening in either the TMC or the New York City medical center and choose the option with the least amount of competition with healthy restaurants. 

### Data and Methodology:

This project will utilize open source Python tools, such as Pandas and Numpy, to conduct the descriptive statistical analysis section, as well as to create visualizations for the project. Data cleaning, web scraping and calls to APIs will be used to gather data from multiple sources. The Foursquare API will be used to gather data about the different restaurants in and around the Texas Medical Center, New York City medical center, segmenting them by venue type, and then comparing the different types within the medical center. That data will then be compared with cancer rate data to determine how many patients are anticipated to visit each medical area.  Patient volume  is not readily available, however, cancer rates for each region are and those will be used with the assumption that people will get treated locally.  These data for each are will help provide a recommendation of which area to offer services in.

## Load Needed Libraries and Connect to APIs

To start, we will load libraries/packages into the enviroment, and Connect to the Foursquare API.

In [1]:
import numpy as np 
import pandas as pd 

import matplotlib.pyplot
import seaborn as sns
# To see full dataframe...
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)

# library to handle requests
import requests
# library to handle JSON files
import json 

# convert an address into latitude and longitude values from Openstreetmap Data. https://nominatim.org/
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 
 
# tranform JSON file into a pandas dataframe
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering
from sklearn.cluster import KMeans

# plotting library
!conda install -c conda-forge folium=0.5.0 --yes
import folium 

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

print('Imported Libraries...')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1

In [2]:
# @hidden_cell
CLIENT_ID = '1AF0GXCYTVVBBNYDYYG2FI2V3U2BCY4X4PZDF3FH3EP3HMGQ' # your Foursquare ID
CLIENT_SECRET = 'CO4ED5GLVLNIKDSI3DRFLZQ1FFYZZDL3I4C2YEACZLWUKFSE' # your Foursquare Secret
limit = 20
print('Your credentails:')
print('CLIENT_ID:Submitted!')
print('CLIENT_SECRET:Submitted!')

Your credentails:
CLIENT_ID:Submitted!
CLIENT_SECRET:Submitted!


## Obtain Venue Data from Foursquare for Resteraunts in or near the Texas Medical Center's MD Anderson and Methodist Hospitals.

First, we will obtain the coordinates by calling Nominatim's API with each hospital's address then use to create a data frame with resteraunt info in the proximity of each, by calling Foursquare's API.

In [3]:
#All MD Anderson Resteraunts 
#Coordinates for Hospital
geolocator = Nominatim(user_agent="myGeocoder")
location = geolocator.geocode("1515 Holcombe Blvd, Houston, TX 77030")
latitude = location.latitude
longitude = location.longitude
VERSION = '20180604'
LIMIT = 60
print("Latitude = {}, Longitude = {}".format(location.latitude, location.longitude))

#Find Restaurants within 1000 meters

search_query = 'Restaurant'
radius = 1000
print(search_query + ' .... OK!')


#Define the corresponding URL

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

#Send the GET Request and examine the results
 
results = requests.get(url).json()
results
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframeMDAnderson = json_normalize(venues)
dataframeMDAnderson.head()

Latitude = 29.70766895, Longitude = -95.39762257223782
Restaurant .... OK!


Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId
0,"[{'id': '4bf58dd8d48988d143941735', 'name': 'B...",False,5809222a38fafba1ebee8ef8,6550 Bertner Ave,US,Houston,United States,,259,"[6550 Bertner Ave, Houston, TX 77030, United S...","[{'label': 'display', 'lat': 29.70999605434124...",29.709996,-95.397619,,77030,TX,Third Coast Restaurant,v-1588531164
1,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",False,50ca02c0245f2d4aa8c2b230,6633 Travis St,US,Houston,United States,Hilton Houston Plaza/Medical Center,632,[6633 Travis St (Hilton Houston Plaza/Medical ...,"[{'label': 'display', 'lat': 29.71007400897320...",29.710074,-95.40355,University Place,77030,TX,Garden Court Restaurant,v-1588531164


In [4]:
dataframeMDAnderson.shape

(2, 18)

In [5]:
#Healthy MD Anderson Resteraunts 
#Coordinates for Hospital
geolocator = Nominatim(user_agent="myGeocoder")
location = geolocator.geocode("1515 Holcombe Blvd, Houston, TX 77030")
latitude = location.latitude
longitude = location.longitude
VERSION = '20180604'
LIMIT = 60
print("Latitude = {}, Longitude = {}".format(location.latitude, location.longitude))

#Find Restaurants within 1000 meters

search_query = 'Healthy Restaurant'
radius = 1000
print(search_query + ' .... OK!')


#Define the corresponding URL

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

#Send the GET Request and examine the results

results = requests.get(url).json()
results
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframeMDAndersonHealthy = json_normalize(venues)
dataframeMDAndersonHealthy.head()

Latitude = 29.70766895, Longitude = -95.39762257223782
Healthy Restaurant .... OK!


Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId
0,"[{'id': '4bf58dd8d48988d143941735', 'name': 'B...",False,5809222a38fafba1ebee8ef8,6550 Bertner Ave,US,Houston,United States,,259,"[6550 Bertner Ave, Houston, TX 77030, United S...","[{'label': 'display', 'lat': 29.70999605434124...",29.709996,-95.397619,,77030,TX,Third Coast Restaurant,v-1588531165
1,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",False,50ca02c0245f2d4aa8c2b230,6633 Travis St,US,Houston,United States,Hilton Houston Plaza/Medical Center,632,[6633 Travis St (Hilton Houston Plaza/Medical ...,"[{'label': 'display', 'lat': 29.71007400897320...",29.710074,-95.40355,University Place,77030,TX,Garden Court Restaurant,v-1588531165


In [6]:
dataframeMDAndersonHealthy.shape

(2, 18)

## Now use the data frames with venue information for all restaurant as well as healthy restaurants to plot maps of the venues' locations with folium.

In [7]:
#Create a map of the all restaurant using folium
mapMDAnderson = folium.Map(
    location=[latitude,longitude],
    tiles='cartodbpositron',
    zoom_start=16,
)
dataframeMDAnderson.apply(lambda row:folium.CircleMarker(location=[row["location.lat"], row["location.lng"]]).add_to(mapMDAnderson), axis=1)
mapMDAnderson

In [8]:
#Create a map of the Healthy restaurants using folium
mapMDAndersonHealthy = folium.Map(
    location=[latitude,longitude],
    tiles='cartodbpositron',
    zoom_start=16,
)
dataframeMDAndersonHealthy.apply(lambda row:folium.CircleMarker(location=[row["location.lat"], row["location.lng"]]).add_to(mapMDAndersonHealthy), axis=1)
mapMDAndersonHealthy


In [9]:
#All MSK Resteraunts 
#Coordinates for Hospital
geolocator = Nominatim(user_agent="myGeocoder")
location = geolocator.geocode("1275 York Avenue New York NY 10065")
latitude = location.latitude
longitude = location.longitude
VERSION = '20180604'
LIMIT = 60
print("Latitude = {}, Longitude = {}".format(location.latitude, location.longitude))

#Find Restaurants within 1000 meters

search_query = 'Restaurant'
radius = 1000
print(search_query + ' .... OK!')


#Define the corresponding URL

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

#Send the GET Request and examine the results

results = requests.get(url).json()
results
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframeMSK = json_normalize(venues)
dataframeMSK.head()


Latitude = 40.76445335, Longitude = -73.95694407847097
Restaurant .... OK!


Unnamed: 0,categories,delivery.id,delivery.provider.icon.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.name,delivery.url,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d146941735', 'name': 'D...",66149.0,/delivery_provider_seamless_20180129.png,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",seamless,https://www.seamless.com/menu/pj-bernstein-del...,False,4abf9767f964a520399120e3,1215 3rd Ave,US,New York,United States,btwn 70th & 71st St,599,"[1215 3rd Ave (btwn 70th & 71st St), New York,...","[{'label': 'display', 'lat': 40.76857336605853...",40.768573,-73.961528,Upper East Side,10075,NY,PJ Bernstein Deli Restaurant,v-1588531167,
1,"[{'id': '4bf58dd8d48988d149941735', 'name': 'T...",622279.0,/delivery_provider_seamless_20180129.png,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",seamless,https://www.seamless.com/menu/thep-thai-restau...,False,5997796d603d2a7019b8c58e,1439 2nd Ave,US,New York,United States,75th St,700,"[1439 2nd Ave (75th St), New York, NY 10021, U...","[{'label': 'display', 'lat': 40.77074334700357...",40.770743,-73.95706,,10021,NY,THEP Thai Restaurant,v-1588531167,451274738.0
2,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",,,,,,,False,4c056685d3842d7fea21be41,227 E 56th St,US,New York,United States,btwn 2nd & 3rd Ave,1023,"[227 E 56th St (btwn 2nd & 3rd Ave), New York,...","[{'label': 'display', 'lat': 40.7592264, 'lng'...",40.759226,-73.966934,,10022,NY,"Lips Drag Queen Show Palace, Restaurant & Bar",v-1588531167,41620102.0
3,"[{'id': '4bf58dd8d48988d147941735', 'name': 'D...",,,,,,,False,4bcb13850687ef3b3af4dccc,965 1st Ave,US,New York,United States,at E 53rd St.,1174,"[965 1st Ave (at E 53rd St.), New York, NY 100...","[{'label': 'display', 'lat': 40.75579429438176...",40.755794,-73.964896,,10022,NY,Madison Restaurant,v-1588531167,
4,"[{'id': '4f04af1f2fb6e1c99f3db0bb', 'name': 'T...",,,,,,,False,4b49123af964a520b16426e3,1435 2nd Ave,US,New York,United States,74th Street,663,"[1435 2nd Ave (74th Street), New York, NY 1002...","[{'label': 'display', 'lat': 40.770399, 'lng':...",40.770399,-73.957518,,10021,NY,Sultan Restaurant - Turkish,v-1588531167,


In [10]:
dataframeMSK.shape

(50, 25)

In [11]:
#Healthy MSK Resteraunts 
#Coordinates for Hospital
geolocator = Nominatim(user_agent="myGeocoder")
location = geolocator.geocode("1275 York Avenue New York NY 10065")
latitude = location.latitude
longitude = location.longitude
VERSION = '20180604'
LIMIT = 60
print("Latitude = {}, Longitude = {}".format(location.latitude, location.longitude))

#Find Restaurants within 1000 meters

search_query = 'Healthy Restaurant'
radius = 1000
print(search_query + ' .... OK!')


#Define the corresponding URL

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

#Send the GET Request and examine the results

results = requests.get(url).json()
results
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframeMSKHealthy = json_normalize(venues)
dataframeMSKHealthy.head()


Latitude = 40.76445335, Longitude = -73.95694407847097
Healthy Restaurant .... OK!


Unnamed: 0,categories,delivery.id,delivery.provider.icon.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.name,delivery.url,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d146941735', 'name': 'D...",66149.0,/delivery_provider_seamless_20180129.png,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",seamless,https://www.seamless.com/menu/pj-bernstein-del...,False,4abf9767f964a520399120e3,1215 3rd Ave,US,New York,United States,btwn 70th & 71st St,599,"[1215 3rd Ave (btwn 70th & 71st St), New York,...","[{'label': 'display', 'lat': 40.76857336605853...",40.768573,-73.961528,Upper East Side,10075,NY,PJ Bernstein Deli Restaurant,v-1588531168,
1,"[{'id': '52e81612bcbc57f1066b7a3a', 'name': 'C...",,,,,,,False,4c74f9d78d70b7137253d4ad,121 E 60th St,US,New York,United States,Park & Lexington Ave.,973,"[121 E 60th St (Park & Lexington Ave.), New Yo...","[{'label': 'display', 'lat': 40.76319412926730...",40.763194,-73.968365,,10022,NY,Healthy Choice Chiropractic,v-1588531168,
2,"[{'id': '4bf58dd8d48988d177941735', 'name': 'D...",,,,,,,False,518d2470498ea3305f83fe0b,65 E 76th St,US,New York,United States,,1166,"[65 E 76th St, New York, NY 10021, United States]","[{'label': 'display', 'lat': 40.774095, 'lng':...",40.774095,-73.962366,,10021,NY,The Healthy Memory And Aging Services,v-1588531168,
3,"[{'id': '4bf58dd8d48988d149941735', 'name': 'T...",622279.0,/delivery_provider_seamless_20180129.png,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",seamless,https://www.seamless.com/menu/thep-thai-restau...,False,5997796d603d2a7019b8c58e,1439 2nd Ave,US,New York,United States,75th St,700,"[1439 2nd Ave (75th St), New York, NY 10021, U...","[{'label': 'display', 'lat': 40.77074334700357...",40.770743,-73.95706,,10021,NY,THEP Thai Restaurant,v-1588531168,451274738.0
4,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",,,,,,,False,4c056685d3842d7fea21be41,227 E 56th St,US,New York,United States,btwn 2nd & 3rd Ave,1023,"[227 E 56th St (btwn 2nd & 3rd Ave), New York,...","[{'label': 'display', 'lat': 40.7592264, 'lng'...",40.759226,-73.966934,,10022,NY,"Lips Drag Queen Show Palace, Restaurant & Bar",v-1588531168,41620102.0


In [12]:
dataframeMSKHealthy.shape

(50, 25)

In [13]:
#Create a map of all restaurants using folium
mapMSK = folium.Map(
    location=[latitude,longitude],
    tiles='cartodbpositron',
    zoom_start=16
)
dataframeMSK.apply(lambda row:folium.CircleMarker(location=[row["location.lat"], row["location.lng"]]).add_to(mapMSK), axis=1)
mapMSK

In [14]:
#Create a map of healthy restaurants using folium
mapMSKHealthy = folium.Map(
    location=[latitude,longitude],
    tiles='cartodbpositron',
    zoom_start=14,
)
dataframeMSKHealthy.apply(lambda row:folium.CircleMarker(location=[row["location.lat"], row["location.lng"]]).add_to(mapMSKHealthy), axis=1)
mapMSKHealthy


# Conclusion

Based on an analysis of the number of all resteraunts, as well as Healthy restraunts for two prominant cancer centers of comperable size, located in Houston and New York, Houston's MD Anderson Cancer Center would be a better choice since it has fewer of both Healty resteraunts and all resteraunts. Also, Houston showed a greater percent growth of cancer volumes YOY when compared to the New York area according to a ranking compiled by 
[Men's Health](https://www.menshealth.com/health/a19531725/10-most-and-least-cancer-prone-cities/ "The 10 Worst Cities For Your Cancer Risk—and the 10 That Keep You Safest"), which should allow for a growth in customer volumes over time. 