## Socio-economic Profiles 2019

This code creates the socio-economic profile data for the San Francisco Planning Department's Neighborhood Socio-Economic Profiles. Socio-economic profiles data is derived from the [American Community Survey](https://www.census.gov/programs-surveys/acs) 5-year data and is created annually by the Planning Department. This code is based off methods created by Michael Webster and others. Run the notebook to:

- Download ACS data using the Census API
- Calculate socio-economic profiles data
- Export data to csv in two formats. 

API documentation and data for the 2019 ACS data and previous years is available [here](https://www.census.gov/data/developers/data-sets/acs-5year.html)

In [1]:
#Import libraries
import requests
import pandas as pd
from collections import defaultdict 

### Attribute ID List
The attribute ID list below is a list of ASC attribute IDs that correspond to attributes included in the socio-economic data calcs. For a list of attribute ID and their meanings visit the API docs [here](https://api.census.gov/data/2019/acs/acs5/variables.html). 

Attributes included in the list below were derived from the attributes in the [Data_Items_and_Sources.xlsx](https://github.com/jsherba/socio-economic-profiles/raw/main/Data_Items_And_Sources_2019.xlsx) file developed by Michael Webster.

In [15]:
#List of attribute id
attributes_ids=['B11001_001E','B11001_002E','B11001_007E','B11001_008E','B11005_002E','B11006_002E',
                'B11002_001E','B11002_002E','B02001_001E','B02001_005E','B02001_003E','B02001_002E',
                'B02001_004E','B02001_006E','B02001_008E','B02001_007E','B03001_003E','B03001_001E']

### Census API Url
The code below builds the Url for teh census API call to get relevant 2019 ACS attribute data at the tract level for San Francisco County. Below define:
- Tract code is '*' to collect all tracts
- State code is '06' for CA
- County code is '075' for San Francisco County
- Attributes are defined by the attribute id list and includes all relevant attributes for the socio-economic data calcs

In [16]:
#Build API call for census.gov
tract_code = "*"
state_code = "06"
county_code = "075"
attributes = ','.join(attributes_ids)
census_url = r'https://api.census.gov/data/2019/acs/acs5?get={}&for=tract:{}&in=state:{}&in=county:{}'\
.format(attributes, tract_code, state_code, county_code)

In [17]:
#Print Census URL
census_url

'https://api.census.gov/data/2019/acs/acs5?get=B11001_001E,B11001_002E,B11001_007E,B11001_008E,B11005_002E,B11006_002E,B11002_001E,B11002_002E,B02001_001E,B02001_005E,B02001_003E,B02001_002E,B02001_004E,B02001_006E,B02001_008E,B02001_007E,B03001_003E,B03001_001E&for=tract:*&in=state:06&in=county:075'

In [18]:
#Make API call to census
resp = requests.get(census_url)
if resp.status_code != 200:
    # This means something went wrong.
    raise ApiError('GET {}'.format(resp.status_code))
    
#Retrieve data as json and convert to Pandas Dataframe
data = resp.json()
headers = data.pop(0)
df = pd.DataFrame(data, columns=headers)

#Convert values that are not state, county, or tract to numeric type
cols=[i for i in df.columns if i not in ["state","county","tract"]]
for col in cols:
    df[col]=pd.to_numeric(df[col])
    
#Print first rows of inot data    
df.head()

Unnamed: 0,B11001_001E,B11001_002E,B11001_007E,B11001_008E,B11005_002E,B11006_002E,B11002_001E,B11002_002E,B02001_001E,B02001_005E,...,B02001_002E,B02001_004E,B02001_006E,B02001_008E,B02001_007E,B03001_003E,B03001_001E,state,county,tract
0,891,691,200,157,355,382,2507,2250,2507,462,...,1801,0,0,230,14,104,2507,6,75,42800
1,715,613,102,93,279,336,2897,2781,2897,1473,...,277,0,159,137,466,718,2897,6,75,26404
2,899,649,250,171,313,410,2487,2099,2495,653,...,1388,0,0,241,151,339,2495,6,75,30600
3,1444,1138,306,221,459,740,3798,3322,3804,1377,...,2073,0,0,179,119,348,3804,6,75,31000
4,645,192,453,135,80,145,1838,512,4551,1035,...,1747,17,27,380,996,1618,4551,6,75,33201


### Tract/Neighborhood Lookup
The dictionary below is a lookup dictionary that defines the tracts within each neighborhood. This lookup file was derived from the San Francisco Analysis Neighborhoods layer file on the SDE

In [19]:
#Define tract lookup
tract_nb_lookup = {"Hayes Valley":["016400","016802","016200","016300","016801"],
"Western Addition":["016100","015900","015100","015802","016000","015801"],
"Japantown":["015500"],
"Pacific Heights":["015300","013400","013200","013102","015200","013500","013101"],
"Marina":["012700","012902","012602","012601","013000","012800","012901"],
"Nob Hill":["012100","012000","011200","011100","010800","011902","011901"],
"Chinatown":["011300","010700","011800","061100"],
"Russian Hill":["010900","010200","011000","010300"],
"Bayview Hunters Point":["980600","980900","061200","023103","061000","023003","023200","023400","023300","023102","023001"],
"Tenderloin":["012502","012302","012301","012401","012201","012501","012402","012202"],
"Oceanview/Merced/Ingleside":["031202","031301","031302","031400","031201"],
"Potrero Hill":["022704","061400","022702","022600"],
"Financial District/South Beach":["061500","010500","011700"],
"South of Market":["017802","017801","018000","017601"],
"North Beach":["010100","010600","010400"],
"Outer Richmond":["047802","042601","042602","047801","047902","047702","047600","047901","042700","047701"],
"Sunset/Parkside":["032902","032901","032801","032802","032602","032601","033100","035400","035201","035100","033000","035300","035202","032700"],
"McLaren Park":["980501"],
"Lakeshore":["033204","033203","033201","060400"],
"Lincoln Park":["980200"],
"Golden Gate Park":["980300"],
"Mission Bay":["060700"],
"Visitacion Valley":["060502","026403","026401","026402","026404"],
"Lone Mountain/USF":["016500","015700","015600"],
"Presidio Heights":["015400","013300"],
"Noe Valley":["021400","021300","021100","021500","021200","021600"],
"Mission":["020900","020700","020100","022903","022802","022901","022801","021000","020800","020200","017700","022902","022803"],
"Castro/Upper Market":["020300","016900","020401","020600","020500","017000"],
"Haight Ashbury":["016600","017102","017101","016700"],
"Inner Richmond":["045200","040200","040100","045100"],
"Seacliff":["042800"],
"West of Twin Peaks":["030900","031100","031000","030700","030600","030800","030400"],
"Excelsior":["026303","026004","026001","026002","026302","026003","026301","025600"],
"Portola":["025702","025900","025800","025701"],
"Twin Peaks":["030500","020402"],
"Inner Sunset":["030302","030301","030102","030101","030202","030201"],
"Outer Mission":["026200","026100","025500"],
"Bernal Heights":["025401","025200","025402","025403","025300","025100"],
"Presidio":["060100"],
"Glen Park":["021800","021700"],
"Treasure Island":["017902"]}


In [7]:
#Define helper functions
def sum_values(df, attribute_id):
    return df[attribute_id].sum()

def nomalize_values(df, attribute_id, attribute_id2):
    return (df[attribute_id].sum()/df[attribute_id2].sum())
    
def sum_normalize_values(df, attribute_id, attribute_id2, attribute_id2):
    return ((df[attribute_id].sum()+df[attribute_id2].sum())/df[attribute_id3].sum())

### Calculate socio-economic data
The `calc_socio_economic_data` function takes tract level data from the API call and the tract/neighborhood lookup dictionary. This function creates all of the socio-economic data calcs and returns a dictionary. The calcs in this function are derived from the [Data_Items_and_Sources.xlsx](https://github.com/jsherba/socio-economic-profiles/raw/main/Data_Items_And_Sources_2019.xlsx) file developed by Michael Webster

In [20]:
# Function runs all calcs for each neighborhood
def calc_socio_economic_data(df, tract_nb_lookup):
    # Create empty dictionary to add calculated attribute information to
    all_calc_data = defaultdict(dict) 
    # Calculate all stats for each neighborhood
    for nb_name, tracts in tract_nb_lookup.items():
        # Extract attribute information for tracks associated with a neighborhood
        df_tract = df[df['tract'].isin(tracts)]
        # Build dictionary with all stats for a neighborhood
        # Household Stats
        all_calc_data[nb_name]["Housholds"] = sum_values(df_tract, 'B11001_001E')
        all_calc_data[nb_name]["Family Households"] = nomalize_values(df_tract, 'B11001_002E', 'B11001_001E')
        all_calc_data[nb_name]["Non-Family Households"] = nomalize_values(df_tract, 'B11001_007E', 'B11001_001E')
        all_calc_data[nb_name]["Single Person Households, % of Total"] = nomalize_values(df_tract, 'B11001_008E', 'B11001_001E')
        all_calc_data[nb_name]["Households with Children, % of Total"] = nomalize_values(df_tract, 'B11005_002E', 'B11001_001E')
        all_calc_data[nb_name]["Households with 60 years and older, % of Total"] = nomalize_values(df_tract, 'B11006_002E', 'B11001_001E')
        all_calc_data[nb_name]["Average Household Size"] = nomalize_values(df_tract, 'B11002_001E', 'B11001_001E')
        all_calc_data[nb_name]["Average Family Household Size"] = nomalize_values(df_tract, 'B11002_002E', 'B11001_002E')
        # Race and ethnicity stats
        all_calc_data[nb_name]["Asian"] = nomalize_values(df_tract, 'B02001_005E', 'B02001_001E')
        all_calc_data[nb_name]["Black/African American"] = nomalize_values(df_tract, 'B02001_003E', 'B02001_001E')
        all_calc_data[nb_name]["White"] = nomalize_values(df_tract, 'B02001_002E', 'B02001_001E')
        all_calc_data[nb_name]["Native American Indian"] = nomalize_values(df_tract, 'B02001_005E', 'B02001_001E')
        all_calc_data[nb_name]["Native Hawaiian/Pacific Islander"] = nomalize_values(df_tract, 'B02001_006E', 'B02001_001E')
        all_calc_data[nb_name]["Other/Two or More Races"] = sum_normalize_values(df_tract, 'B02001_008E', 'B02001_007E', 'B02001_001E')
        all_calc_data[nb_name]["% Latino (of Any Race)"] = nomalize_values(df_tract, 'B03001_003E', 'B03001_001E')
    #Return calc dictionary
    return all_calc_data
    

In [21]:
#Run functions to calculate all stats and convert calc dictionary to pandas dataframe
all_calc_data = calc_socio_economic_data(df, tract_nb_lookup)
df_all_calcs = pd.DataFrame.from_dict(all_calc_data).reset_index()
df_all_calcs.rename(columns = {'index':'Attribute'}, inplace = True) 
df_all_calcs.head()

NameError: name 'sum_normalize_values' is not defined

In [10]:
#Transpose dataset for second neighborhood view of dataset
df_all_calcs2 = df_all_calcs.T.reset_index()
df_all_calcs2.columns = df_all_calcs2.iloc[0]
df_all_calcs2 = df_all_calcs2[1:].rename(columns={'Attribute':'Neighborhood'})
df_all_calcs2.head()

Unnamed: 0,Neighborhood,Housholds,Family Households,Non-Family Households,"Single Person Households, % of Total","Households with Children, % of Total","Households with 60 years and older, % of Total",Average Household Size,Average Family Household Size
1,Hayes Valley,9652.0,0.301077,0.698923,0.432864,0.104123,0.201098,1.990157,2.700964
2,Western Addition,11677.0,0.350347,0.649653,0.474094,0.114841,0.384602,1.912991,2.830604
3,Japantown,2266.0,0.227273,0.772727,0.616505,0.039718,0.572816,1.543689,2.446602
4,Pacific Heights,12815.0,0.353726,0.646274,0.48295,0.128911,0.27858,1.854155,2.731745
5,Marina,13583.0,0.309578,0.690422,0.469852,0.110211,0.228374,1.861003,2.711772


In [11]:
#Export both dataset views to csv
df_all_calcs.to_csv(r'C:\Users\Jason Sherba\Downloads\data\test.csv', index = False)
df_all_calcs2.to_csv(r'C:\Users\Jason Sherba\Downloads\data\test2.csv', index = False)