# IBM Data Science Capstone Project

By Jonathan de Steuben

## Introduction:

In this project, we will identify areas within Washington, D.C. that would benefit from the placement of various healthcare facilities. The target audiences for this report will be local health officials, local elected officials, and healthcare providers.

While there are existing healthcare facilities in Washington, D.C., this report will aim to locate areas within the city that have a smaller concentration as well as areas that could be considered underserved. We will determine what type of facility would best fit the area and make an appropriate recommendation.

## Data:

Based on the description of the problem, this report will leverage open data provided by different sources. The following factors will influence the recommendations that will be made:

* Neighborhood demographic data
* Number of healthcare facilities within each neighborhood
* Type of healthcare facilities
* Distance between facilities (i.e. accessibility)

For this project, we will group neighborhoods into clusters. The following sources will be used to generate our data and subsequent recommendations:

* [Urban Institute] (https://greaterdc.urban.org/data-explorer?geography=cl17) - neighborhood cluster demographic data
* [Washington, D.C. Open Data] (https://opendata.dc.gov/) - geoJSON files for visualization
* [Foursquare API] (https://developer.foursquare.com/docs/resources/categories) - location data for healthcare facilities

### Neighborhood Data Sample:

We will use the coordinates of neighborhoods to find healthcare facilities within a specific range. After, we will overlay the demographic data on choropleth and heatmaps to gain insights on the neighborhood clusters.

In [3]:
# The code was removed by Watson Studio for sharing.

In [4]:
neighborhoods.head()

Unnamed: 0,OBJECTID,GIS_ID,NAME,WEB_URL,LABEL_NAME,DATELASTMODIFIED,X,Y,LON,LAT
0,1,nhood_050,Fort Stanton,http://NeighborhoodAction.dc.gov,Fort Stanton,2003-04-10T00:00:00.000Z,-76.980348,38.855658,-76.980348,38.855658
1,2,nhood_031,Congress Heights,http://NeighborhoodAction.dc.gov,Congress Heights,2003-04-10T00:00:00.000Z,-76.99795,38.841077,-76.99795,38.841077
2,3,nhood_123,Washington Highlands,http://NeighborhoodAction.dc.gov,Washington Highlands,2003-04-10T00:00:00.000Z,-76.995636,38.830237,-76.995636,38.830237
3,4,nhood_008,Bellevue,http://NeighborhoodAction.dc.gov,Bellevue,2003-04-10T00:00:00.000Z,-77.009271,38.826952,-77.009271,38.826952
4,5,nhood_073,Knox Hill/Buena Vista,http://NeighborhoodAction.dc.gov,Knox Hill/Buena Vista,2003-04-10T00:00:00.000Z,-76.96766,38.853688,-76.96766,38.853688



### Demographic Data Sample:

Now that we've loaded our neighborhood coordinate data, we'll import a file with population statistics. Note that this file has the neighborhoods clustered.

In [5]:
body = client_ff094178277849deb6d0c79aab125c5e.get_object(Bucket='capstoneprojectworkshop-donotdelete-pr-bvmy1ome1hfq5t',Key='pop_cluster.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

# If you are reading an Excel file into a pandas DataFrame, replace `read_csv` by `read_excel` in the next statement.
population = pd.read_csv(body)
population.head()

Unnamed: 0,timeframe,cluster2017,TotPop,TotPop_m,cluster2017_nf,start_date,end_date,PctPopUnder18Years,PctPop65andOverYears,PctForeignBorn,...,PctForeignBorn_m,PctBlackNonHispBridge_m,PctWhiteNonHispBridge_m,PctHisp_m,PctAPINonHispBridge_m,PctFamiliesOwnChildFH_m,PctChgTotPop,PctChgPopUnder18Years,PctChgPop65andOverYear,indc
0,2012-16,"Kalorama Heights, Adams Morgan, Lanier Heights",19593,769,Cluster 1,1-Jan-12,31-Dec-16,7.9,9.9,20.0,...,2.4,1.7,2.8,1.8,1.4,7.2,X,X,X,1
1,2012-16,"Columbia Heights, Mt. Pleasant, Pleasant Plain...",51220,1649,Cluster 2,1-Jan-12,31-Dec-16,15.0,7.5,24.0,...,1.8,1.8,1.4,2.4,0.8,5.6,X,X,X,1
2,2012-16,"Howard University, Le Droit Park, Cardozo/Shaw",14043,705,Cluster 3,1-Jan-12,31-Dec-16,8.4,6.6,15.0,...,2.4,3.8,3.0,2.3,1.2,13.0,X,X,X,1
3,2012-16,"Georgetown, Burleith/Hillandale",14991,855,Cluster 4,1-Jan-12,31-Dec-16,9.7,10.0,18.0,...,2.8,2.6,2.2,1.9,1.6,11.0,X,X,X,1
4,2012-16,"West End, Foggy Bottom, GWU",16779,895,Cluster 5,1-Jan-12,31-Dec-16,2.5,9.6,24.0,...,3.1,2.1,3.3,1.8,1.7,37.0,X,X,X,1


The new pandas dataframes will be cleaned to make them easier to work with. We will also import additional demographic files that will appear in the final project.