# Virginia Hospital Escapes - A comparison of the venues surrounding Virginia hospitals
## IBM Capstone Project
#### Author Micah C. Gray

## Introduction

If you ever spent a night at a hospital, either as a patient or a staff member, you were probably grateful for any contact with the outside world. You might also have appreciated having venues nearby for food, prescriptions, flowers, or just a path with some fresh air. This report compares hospitals in the state of Virginia with respect to their surrounding venues, with the intent that your next hospital stay in Virginia is a little more freeing. 

This analysis draws upon location data obtained from Foursquare in order to explore the diversity of venues surrounding the 200-plus hospitals in Virginia. I discover the five most common venue types surrounding each hospital. I also list hospitals with little or few options for nearby food, pharmacies, or nature walks. 

## Data

Data about the hospitals in Virginia, including the geocoordinates, were obtained from http://www.lat-long.com/ while data about the venues surrounding the hospitals were obtained from Foursquare using a 1000 meter radius.

### Data extraction and pre-conditioning

My first step was to obtain the hospital data. I copied the results of my Virginia hospital search from Lat-Long.com and pasted them into an Excel spreadsheet. I had to add the latitude and longitude values individually but it was a manageable task given the size of my data. Next I saved the spreadsheet and uploaded it to my jupyter notebook as a pandas dataframe, as shown below

In [1]:
# Uploading the .csv file from my local computer to IBM Watson project 

# Some code removed to conceal IBM credentials

df_hospitals = pd.read_excel(body)
df_hospitals.head()


Unnamed: 0,Name,Feature Type,County,State,Latitude,Longitude
0,A B Adams Convalescent Center,Hospital,Emporia (city),VA,36.685705,-77.537758
1,A D Williams Memorial Clinic,Hospital,Richmond,VA,37.539869,-77.430261
2,Access Emergency Hospital,Hospital,Fairfax,VA,38.965666,-77.35693
3,Albemarle County Health Department,Hospital,Charlottesville (city),VA,38.042083,-78.482789
4,Alexander W Terrell Memorial Infirmary,Hospital,Lynchburg (city),VA,37.438475,-79.172247


In [2]:
# Next I save the dataframe as a .csv file for easy access
df_hospitals.to_csv('hospital_location_data.csv', index = None)

In [3]:
# import pandas as pd
df_hospitals2 = pd.read_csv('hospital_location_data.csv')
df_hospitals2.head()

Unnamed: 0,Name,Feature Type,County,State,Latitude,Longitude
0,A B Adams Convalescent Center,Hospital,Emporia (city),VA,36.685705,-77.537758
1,A D Williams Memorial Clinic,Hospital,Richmond,VA,37.539869,-77.430261
2,Access Emergency Hospital,Hospital,Fairfax,VA,38.965666,-77.35693
3,Albemarle County Health Department,Hospital,Charlottesville (city),VA,38.042083,-78.482789
4,Alexander W Terrell Memorial Infirmary,Hospital,Lynchburg (city),VA,37.438475,-79.172247


### Cleanup
Now I begin cleaning up the hospital data.

In [4]:
# First I remove historical hospitals by dropping rows that have "historical" in the Name field.
bool_historical = df_hospitals2['Name'].str.contains('historical')
df_hospitals3=df_hospitals2[~bool_historical] # apply the boolean mask to the dataframe and save with a new name
df_hospitals3.head(14) # Let's see if it worked

Unnamed: 0,Name,Feature Type,County,State,Latitude,Longitude
0,A B Adams Convalescent Center,Hospital,Emporia (city),VA,36.685705,-77.537758
1,A D Williams Memorial Clinic,Hospital,Richmond,VA,37.539869,-77.430261
2,Access Emergency Hospital,Hospital,Fairfax,VA,38.965666,-77.35693
3,Albemarle County Health Department,Hospital,Charlottesville (city),VA,38.042083,-78.482789
4,Alexander W Terrell Memorial Infirmary,Hospital,Lynchburg (city),VA,37.438475,-79.172247
5,Alleghany Memorial Hospital,Hospital,Covington (city),VA,37.794847,-79.999502
6,Alleghany Regional Hospital,Hospital,Alleghany,VA,37.792204,-79.88099
7,Andrew Rader Clinic,Hospital,Arlington,VA,38.87039,-77.07609
8,Arlington Free Clinic,Hospital,Arlington,VA,38.882449,-77.105439
9,Ashland Convalescent Center,Hospital,Hanover,VA,37.767643,-77.49554


You can see that row 13 was dropped. Now I will reset the index.

In [5]:
df_hospitals3.reset_index(inplace=True, drop = True)
df_hospitals3.head(14)

Unnamed: 0,Name,Feature Type,County,State,Latitude,Longitude
0,A B Adams Convalescent Center,Hospital,Emporia (city),VA,36.685705,-77.537758
1,A D Williams Memorial Clinic,Hospital,Richmond,VA,37.539869,-77.430261
2,Access Emergency Hospital,Hospital,Fairfax,VA,38.965666,-77.35693
3,Albemarle County Health Department,Hospital,Charlottesville (city),VA,38.042083,-78.482789
4,Alexander W Terrell Memorial Infirmary,Hospital,Lynchburg (city),VA,37.438475,-79.172247
5,Alleghany Memorial Hospital,Hospital,Covington (city),VA,37.794847,-79.999502
6,Alleghany Regional Hospital,Hospital,Alleghany,VA,37.792204,-79.88099
7,Andrew Rader Clinic,Hospital,Arlington,VA,38.87039,-77.07609
8,Arlington Free Clinic,Hospital,Arlington,VA,38.882449,-77.105439
9,Ashland Convalescent Center,Hospital,Hanover,VA,37.767643,-77.49554


Just in case there are missing values, I'll drop any rows that have missing Latitutde coordinates.

In [6]:
print('shape prior to dropping missing values:', df_hospitals3.shape) # print the dimensions of the dataframe
df_hospitals3.dropna(subset=["Latitude"], axis=0, inplace=True)
print('shape after dropping missing values:', df_hospitals3.shape) # print the dimensions again to see changes

shape prior to dropping missing values: (247, 6)
shape after dropping missing values: (247, 6)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


### Now get the Foursquare data

In [7]:
# Initialize Foursquare credentials
# Code removed to conceal credentials

In [8]:
# Import necessary libraries
import pandas as pd
import numpy as np
import requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [9]:
# Define some parameters for the call to Foursquare
VERSION = '20180605' # Foursquare API version
radius = 1000 # Include venues within a 1 kilometer radius
INTENT = 'browse'
#search_query = 'Pharmacy' ## Optionally, search for food, garden, parks, walking trails

In [11]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [64]:
# Define a function that returns a dataframe with nearby venue data given a names and pair of coordinates
def getNearbyVenues(name, lat, lng, radius=500):
    url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&intent={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, INTENT)
    # send the get request
    results = requests.get(url).json()['response']['venues']
    nearby_venues = json_normalize(results) # flatten JSON and save as a dataframe
    
    #troubleshoot
    #print(nearby_venues.head()) # take a peek at the raw dataframe
    
    # filter columns
    filtered_columns = ['name', 'location.lat', 'location.lng', 'location.distance', 'categories']
    nearby_venues =nearby_venues.loc[:, filtered_columns]
    
    #troubleshoot
    #print(nearby_venues.head()) # take a peek at the raw dataframe
    
    # filter the category for each row
    nearby_venues['categories'] = nearby_venues.apply(get_category_type, axis=1)
    
    #troubleshoot
    #print('After filtering categories \n', nearby_venues.head()) # take a peek at the raw dataframe
    
    # clean columns. This part is not always necessary.
    nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
    #print('After cleaning columns \n', nearby_venues.head()) # take a peek at the raw dataframe
    
    len = nearby_venues.size  # if 150 rows
    #print('length of nearby_venues:', len)
    index = 0
    names_list = []
    while index < len:   # 0 < 150
        names_list.append(name)
        index = index + 1
    names_series = pd.Series(data = names_list)
    #print('names_list size:', names_series.size)
    nearby_venues['Hospital']= names_series  #set each row in this column of nearby_venues to 'name'
    
    #troubleshoot
    #print('After adding hospital name... \n', nearby_venues.head()) # take a peek at the raw dataframe
    
    # add hospital data
    return(nearby_venues) 



In [65]:
# Here I call the function above to get a dataframe with venues for all hospitals in Virginia
dictionary1 = {'name': ['value'], 'lat': ['NaN'],
               'lng': ['NaN'], 'distance': ['NaN'], 'categories': ['value'], 'Hospital': ['value']}
all_venues = pd.DataFrame(dictionary1)
for i, hospital in enumerate(df_hospitals3['Name']):
    all_venues = all_venues.append(getNearbyVenues(hospital, df_hospitals3.iloc[i,4],
                                                   df_hospitals3.iloc[i,5], radius))
    # Append the nearby venue data for each hospital in the dataframe

print('all_venues size:', all_venues.size)
all_venues.head()

all_venues size: 44466


Unnamed: 0,name,lat,lng,distance,categories,Hospital
0,value,,,,value,value
0,Greensville County Courthouse,36.6866,-77.5423,417.0,Courthouse,A B Adams Convalescent Center
1,New Century Hospice - Emporia,36.6853,-77.5435,511.0,Medical Center,A B Adams Convalescent Center
2,Peggy Malone - State Farm Insurance Agent,36.6938,-77.538,898.0,Office,A B Adams Convalescent Center
3,Nationwide Insurance: Radke & Associates LLC,36.6846,-77.5432,504.0,Insurance Office,A B Adams Convalescent Center


In [66]:
#Save the dataframe to a .csv
all_venues.to_csv('all_venues_rough.csv')

In [8]:
# Get the dataframe from the .csv
import pandas as pd
all_venues = pd.read_csv('all_venues_rough.csv')

In [9]:
# Reset the index
all_venues.reset_index()
all_venues.head()

Unnamed: 0.1,Unnamed: 0,name,lat,lng,distance,categories,Hospital
0,0,value,,,,value,value
1,0,Greensville County Courthouse,36.686604,-77.542302,417.0,Courthouse,A B Adams Convalescent Center
2,1,New Century Hospice - Emporia,36.685332,-77.543467,511.0,Medical Center,A B Adams Convalescent Center
3,2,Peggy Malone - State Farm Insurance Agent,36.693774,-77.538012,898.0,Office,A B Adams Convalescent Center
4,3,Nationwide Insurance: Radke & Associates LLC,36.684614,-77.543248,504.0,Insurance Office,A B Adams Convalescent Center


In [10]:
# Remove the null row and old index
all_venues.drop('Unnamed: 0', axis = 1, inplace = True)
all_venues.drop(labels = 0,axis = 0, inplace = True)
all_venues.head()

Unnamed: 0,name,lat,lng,distance,categories,Hospital
1,Greensville County Courthouse,36.686604,-77.542302,417.0,Courthouse,A B Adams Convalescent Center
2,New Century Hospice - Emporia,36.685332,-77.543467,511.0,Medical Center,A B Adams Convalescent Center
3,Peggy Malone - State Farm Insurance Agent,36.693774,-77.538012,898.0,Office,A B Adams Convalescent Center
4,Nationwide Insurance: Radke & Associates LLC,36.684614,-77.543248,504.0,Insurance Office,A B Adams Convalescent Center
5,Calvary Baptist Church,36.693578,-77.542884,988.0,Church,A B Adams Convalescent Center


In [11]:
# Rename the columns
all_venues.rename(columns={"name":"Venue","distance":"Meters from Hospital", "lat":"Venue Lat", "lng":"Venue Lng", "categories":"Category"}, inplace=True)

In [13]:
all_venues.head(35)

Unnamed: 0,Venue,Venue Lat,Venue Lng,Meters from Hospital,Category,Hospital
1,Greensville County Courthouse,36.686604,-77.542302,417.0,Courthouse,A B Adams Convalescent Center
2,New Century Hospice - Emporia,36.685332,-77.543467,511.0,Medical Center,A B Adams Convalescent Center
3,Peggy Malone - State Farm Insurance Agent,36.693774,-77.538012,898.0,Office,A B Adams Convalescent Center
4,Nationwide Insurance: Radke & Associates LLC,36.684614,-77.543248,504.0,Insurance Office,A B Adams Convalescent Center
5,Calvary Baptist Church,36.693578,-77.542884,988.0,Church,A B Adams Convalescent Center
6,Veteran's Memorial Park,36.688216,-77.540897,395.0,Park,A B Adams Convalescent Center
7,LifeSafer Ignition Interlock,36.692696,-77.539833,799.0,Automotive Shop,A B Adams Convalescent Center
8,U-Haul Neighborhood Dealer,36.69247,-77.53998,778.0,Storage Facility,A B Adams Convalescent Center
9,dr. adams foot care,36.694221,-77.543526,1078.0,Doctor's Office,A B Adams Convalescent Center
10,L.U. Online,36.696852,-77.541702,1289.0,Fraternity House,A B Adams Convalescent Center


In [14]:
# Let's see how many venues were returned for each hospital

all_venues.groupby('Hospital').count() # Just 30 venues per hospital

Unnamed: 0_level_0,Venue,Venue Lat,Venue Lng,Meters from Hospital,Category
Hospital,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A B Adams Convalescent Center,30,30,30,30,27
A D Williams Memorial Clinic,30,30,30,30,30
Access Emergency Hospital,30,30,30,30,30
Albemarle County Health Department,30,30,30,30,30
Alexander W Terrell Memorial Infirmary,30,30,30,30,29
Alleghany Memorial Hospital,30,30,30,30,30
Alleghany Regional Hospital,30,30,30,30,29
Andrew Rader Clinic,30,30,30,30,28
Arlington Free Clinic,30,30,30,30,30
Ashland Convalescent Center,30,30,30,30,26


From the results listed above, it appears that 30 venues were obtained for each hospital. Hopefully this will be enough to provide some useful insights about the venues for escape near each hospital in Virginia.