<a href="https://colab.research.google.com/github/ejf78/cdc_vitalsigns/blob/master/Vital_Signs_Data_Supplement.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vital Signs Data Supplement 

A notebook to grab additional data from BNIA's APIs. The plan is to grab all available indicators that I didn't previously pull, as well as grab the Baltimore City data for all indicators. 

## Set Up

In [2]:
# clone the github respository, so that we have all the necessary files 
!git clone https://github.com/ejf78/cdc_vitalsigns.git

Cloning into 'cdc_vitalsigns'...
remote: Enumerating objects: 180, done.[K
remote: Counting objects: 100% (180/180), done.[K
remote: Compressing objects: 100% (159/159), done.[K
remote: Total 180 (delta 90), reused 69 (delta 21), pack-reused 0[K
Receiving objects: 100% (180/180), 75.63 MiB | 10.82 MiB/s, done.
Resolving deltas: 100% (90/90), done.
Checking out files: 100% (42/42), done.


In [3]:
!pip install geopandas

Collecting geopandas
  Downloading geopandas-0.10.2-py2.py3-none-any.whl (1.0 MB)
[K     |████████████████████████████████| 1.0 MB 5.1 MB/s 
Collecting pyproj>=2.2.0
  Downloading pyproj-3.2.1-cp37-cp37m-manylinux2010_x86_64.whl (6.3 MB)
[K     |████████████████████████████████| 6.3 MB 50.2 MB/s 
[?25hCollecting fiona>=1.8
  Downloading Fiona-1.8.21-cp37-cp37m-manylinux2014_x86_64.whl (16.7 MB)
[K     |████████████████████████████████| 16.7 MB 503 kB/s 
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Collecting click-plugins>=1.0
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Collecting cligj>=0.5
  Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Installing collected packages: munch, cligj, click-plugins, pyproj, fiona, geopandas
Successfully installed click-plugins-1.1.1 cligj-0.7.2 fiona-1.8.21 geopandas-0.10.2 munch-2.5.0 pyproj-3.2.1


In [4]:
# load packages
import pandas as pd
import numpy as np
import os # for navigating directories
import requests # for API pull 
import geopandas as gpd

In [5]:
# navigate into the directory
os.chdir("cdc_vitalsigns")

## Pull from APIs

In [41]:
# api info 
# read list of indicators 
api_df = pd.read_csv("archive/VS-Indicator-APIs_EF.csv") # new version - I've labeled which API calls to make under 'pull'
api_df.set_index("ShortName", inplace=True, drop = False) # drop = False I want ShortName as a column 
# add column for indicator name (used in my own data)
api_df["indicator"] = [string.replace("XX","") if type(string) == str else None for string in api_df.ShortName ]
# get full list of indicators we indend to pull
full_indicator_list = set(api_df[api_df.pull == 1].indicator)
api_df.head()

Unnamed: 0_level_0,Indicator Number,Indicator,ShortName,Section,API,pull,indicator
ShortName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
tpopXX,1,Total Population,tpopXX,Census Demographics,https://services1.arcgis.com/mVFRs7NF4iFitgbY/...,1,tpop
maleXX,2,Total Male Population,maleXX,Census Demographics,https://services1.arcgis.com/mVFRs7NF4iFitgbY/...,1,male
femaleXX,3,Total Female Population,femaleXX,Census Demographics,https://services1.arcgis.com/mVFRs7NF4iFitgbY/...,1,female
paaXX,4,Percent of Residents - Black/African-American ...,paaXX,Census Demographics,https://services1.arcgis.com/mVFRs7NF4iFitgbY/...,1,paa
pwhiteXX,5,Percent of Residents - White/Caucasian (Non-Hi...,pwhiteXX,Census Demographics,https://services1.arcgis.com/mVFRs7NF4iFitgbY/...,1,pwhite


In [155]:
# making use of previously created functions
def getGDFfromURL(url, layer=0):
    #GDF stands for GeoDataFrame; this is the innermost function called by getGDF
    tail = "/"+str(layer)+"/query?where=1%3D1&outFields=*&outSR=4326&f=json" #worked this out
    url+=tail
    print(url)
    # EF edits - for error handling in large batches
    try: 
      gdf = gpd.read_file(url) #GeoPandas has a built in function to read APIs given right URL
    except: 
      gdf = pd.DataFrame()
    return gdf

def getGDF(shortname, level=0):
    #This is outermost function called by user; it calls getGDFfromURL
    url = api_df.loc[shortname, "API"]
    return getGDFfromURL(url, level)

def getCollect(check_list, level = 0): # slight edit: I added level to this function 
    #This function collects all the target GDFs and puts into collection
    collect=[]
    for shortname in check_list:
        gdf=getGDF(shortname, level)
        collect.append(gdf)    
    return collect

#### Pulling new CSA-level values 

In [43]:
### which indicators are new to pull? 

# get list of indicators that already exist in the data 
existing_df = pd.read_csv("full_vital_signs.csv")
# identify the new ones
new_indicators = full_indicator_list - set(existing_df.indicator)
# but now we need the shortnames again 
new_indicator_shortnames = list(api_df[api_df.indicator.isin(new_indicators)].ShortName)

In [54]:
# % driving alone is broken. Looking at the API url, it's an error 400
new_indicator_shortnames[21]
api_df.loc['drvaloneXX', "API"]
getGDF('drvaloneXX', level=0)

'https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Dralone/FeatureServer'

In [55]:
### make API pull
# pull in segments so that I can isolate errors 
collect_new_1 = getCollect(new_indicator_shortnames[:21])
# FOUND ERROR in new_indicator_shortnames[21] ('drvaloneXX')
collect_new_2 = getCollect(new_indicator_shortnames[22:])

https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Cashsa/FeatureServer/0/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Taxlien/FeatureServer/0/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Demper/FeatureServer/0/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Histax/FeatureServer/0/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Homtax/FeatureServer/0/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Owntax/FeatureServer/0/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Nomail/FeatureServer/0/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFR

In [60]:
# turn the collections into dataframes 
df1 = pd.concat(collect_new_1)
df2 = pd.concat(collect_new_2)
# one full dataframe 
new_indicator_df = df1.append(df2)
new_indicator_df.head()

Unnamed: 0,OBJECTID,CSA2010,cashsa11,cashsa12,cashsa13,cashsa14,cashsa15,cashsa16,cashsa17,cashsa18,cashsa19,cashsa20,Shape__Area,Shape__Length,geometry,taxlien15,taxlien16,taxlien17,taxlien18,taxlien19,demper11,demper12,demper13,demper14,demper15,demper16,demper17,demper18,demper19,demper20,histax12,histax13,histax14,histax15,histax16,histax17,histax18,histax19,OBJECTID_1,homtax11,...,treeplnt19,cebus11,cebus12,cebus13,cebus14,cebus15,cebus16,cebus17,cebus18,cebus19,ceemp11,ceemp12,ceemp13,ceemp14,ceemp15,ceemp16,ceemp17,Ceemp18,ceemp19,murals14,murals15,murals16,murals17,murals18,murals19,murals20,totjobs10,totjobs11,totjobs12,totjobs13,totjobs14,totjobs15,totjobs16,totjobs17,totjobs18,lights16,lights17,lights18,lights19,lights20
0,1.0,Allendale/Irvington/S. Hilton,78.22,76.086957,78.787879,76.5823,78.26087,71.038251,64.197531,57.471264,53.475936,49.565217,63770460.0,38770.165571,"POLYGON ((-76.65726 39.27600, -76.65726 39.276...",,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2.0,Beechfield/Ten Hills/West Hills,32.05,25.373134,29.032258,34.7458,27.777778,30.120482,25.925926,15.568862,20.261438,13.496933,47882530.0,37524.950533,"POLYGON ((-76.69479 39.30201, -76.69465 39.301...",,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,3.0,Belair-Edison,66.67,67.391304,67.741935,69.1542,68.468468,59.745763,53.623188,50.482315,47.457627,40.15748,44950030.0,31307.314843,"POLYGON ((-76.56761 39.32636, -76.56746 39.326...",,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,4.0,Brooklyn/Curtis Bay/Hawkins Point,73.4,72.033898,76.859504,75.4237,74.814815,73.248408,69.306931,53.846154,60.427807,56.321839,176077700.0,150987.703639,"MULTIPOLYGON (((-76.58867 39.21283, -76.58824 ...",,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,5.0,Canton,26.64,20.064725,15.460526,18.2836,18.360656,15.064103,14.438503,17.013889,12.759644,9.895833,15408540.0,23338.611948,"POLYGON ((-76.57140 39.28441, -76.57138 39.284...",,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [97]:
### reformat data 
# drop geometry and objectID
#new_indicator_df = new_indicator_df.drop(['OBJECTID','Shape__Area', 'Shape__Length', "geometry"], axis = 1)
# melt (pivot longer)
new_indicator_melted = new_indicator_df.melt(id_vars = ["CSA2010"], 
                                  var_name = "year-indicator", 
                                  value_name = "value")
# drop NAs (a result of simply appending everything together)
new_indicator_melted.dropna(subset = ["value"], inplace = True)
# drop a strange value (indicator = 'City', value = 'Baltimore City')
new_indicator_melted = new_indicator_melted[new_indicator_melted.value != "Baltimore City"]
# add year column 
new_indicator_melted["year"] = ['20' + i[-2:] for i in new_indicator_melted['year-indicator']]
new_indicator_melted["year_numeric"] = [int(y) for y in new_indicator_melted.year]
# add column for indicator 
new_indicator_melted["indicator"] = [i[:-2] for i in new_indicator_melted["year-indicator"]]
# drop indicator-year field 
new_indicator_melted = new_indicator_melted.drop(["year-indicator"], axis = 1)

# pivot
new_indicator_melted

Unnamed: 0,CSA2010,value,year,year_numeric,indicator
0,Allendale/Irvington/S. Hilton,78.22,2011,2011,cashsa
1,Beechfield/Ten Hills/West Hills,32.05,2011,2011,cashsa
2,Belair-Edison,66.67,2011,2011,cashsa
3,Brooklyn/Curtis Bay/Hawkins Point,73.4,2011,2011,cashsa
4,Canton,26.64,2011,2011,cashsa
...,...,...,...,...,...
614105,Southwest Baltimore,24.322058,2020,2020,lights
614106,The Waverlies,23.990713,2020,2020,lights
614107,Upton/Druid Heights,13.923806,2020,2020,lights
614108,Washington Village/Pigtown,32.164274,2020,2020,lights


In [94]:
# troubleshooting 
set(new_indicator_melted[new_indicator_melted.year == '20ty']["year-indicator"])
new_indicator_melted[(new_indicator_melted.year == '20ty')]

Unnamed: 0,CSA2010,year-indicator,value,year
272277,,City,Baltimore City,20ty


In [None]:
### combining existing and new DF 
# PLACEHOLDER FOR NOW 

#### Pulling Baltimore City values

In [105]:
## using the full list of indicators, run the APIs to collect baltimore data 

# get shortnames 
full_indicator_shortnames = list(api_df[api_df.indicator.isin(full_indicator_list)].ShortName)
# pull in segments to isolate errors
collect_balt_1 = getCollect(full_indicator_shortnames[:51], level = 1)

https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Tpop/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Male/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Female/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Paa/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Pwhite/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Pasi/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/P2more/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitg

In [122]:
# errors
full_indicator_shortnames[51] # url: https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Domvio/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
# URL returns error 400
full_indicator_shortnames[52] # url: https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Juvarr/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
full_indicator_shortnames[53] # url: https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Juvviol/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
full_indicator_shortnames[54] # url: https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Juvdrug/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
full_indicator_shortnames[69] # url: https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Susp/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
full_indicator_shortnames[70] # url: https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Farms/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
full_indicator_shortnames[71] # url: https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Sped/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
full_indicator_shortnames[72] # url: https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Ready/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
full_indicator_shortnames[73] # url: https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Math3/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
full_indicator_shortnames[74:83]
full_indicator_shortnames[97:104]
# there are a lot of errors, I implemented error handling and stopped manually tracking 

'suspXX'

In [156]:
collect_balt_2 = getCollect(full_indicator_shortnames[55:69], level = 1)
collect_balt_3 = getCollect(full_indicator_shortnames[83:97], level = 1)
collect_balt_4 = getCollect(full_indicator_shortnames[104:115], level = 1)
# after implementing a try/except clause in the API function.... we can just let it run in one batch now and return whatever results it can find 
collect_balt_5 = getCollect(full_indicator_shortnames[115:], level = 1)

https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Overd/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Libcard/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Artevnt/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Publart/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Artbus/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Artemp/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs7NF4iFitgbY/arcgis/rest/services/Empl/FeatureServer/1/query?where=1%3D1&outFields=*&outSR=4326&f=json
https://services1.arcgis.com/mVFRs