# Socio-economic Profiles

This code creates the socio-economic profile data for the San Francisco Planning Department's Neighborhood Socio-Economic Profiles. Socio-economic profiles data is derived from the [American Community Survey](https://www.census.gov/programs-surveys/acs) 5-year data and is created annually by the Planning Department. Tract level socio-economic data from is combined at the neighborhood and district level for the City of San Francisco. This code is based off methods created by Michael Webster and others. Run the notebook to:

- Download ACS data using the Census API
- Calculate socio-economic profiles data
- Visualize socio-economic data accross Neighborhoods and Supervisor Districts
- Export data to csv. 


## Import packages

In [1]:
# base libraries
import requests, json, os
import pandas as pd
import numpy as np
from collections import defaultdict

# graph libraries
import plotly.express as px
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

# map libraries
#import folium
#import branca.colormap as cm
#import geopandas

## Set analysis year

In [2]:
year = 2021

## Retrieve data from Census API
All socio-economic data comes from the Census ACS 5-year estimates and is available at the tract level through the census API. API documentation and data for the 2019 ACS data and previous years is available [here](https://www.census.gov/data/developers/data-sets/acs-5year.html)

### Census Attribute IDs
The census API returns ACS attribute vales for provided attribute IDs. A list of relevant attribute ID's needed for calculating the socio-economic profile data is compiled below from IDs stored in [attribute_lookup.csv](https://github.com/jsherba/socio-economic-profiles/blob/main/lookup_tables/attribute_lookup.csv). These attribute ID's were derived from the attributes in the [Data_Items_and_Sources.xlsx](https://github.com/jsherba/socio-economic-profiles/raw/main/Data_Items_And_Sources_2019.xlsx) file developed by Michael Webster.

For a full list of ACS attribute IDs and their meanings visit the API docs [here](https://api.census.gov/data/2019/acs/acs5/variables.html).

In [3]:
# Create list of attribute IDs from attribute_lookup.csv
attribute_lookup_df = pd.read_csv (r'./lookup_tables/attribute_lookup.csv', dtype=str)

attribute_ids_extracted = attribute_lookup_df['attribute_id'].tolist()
attribute_ids = []
for attribute_id in attribute_ids_extracted:
    attribute_ids.extend(attribute_id.split(", "))
attribute_ids = list(set([x+"E" for x in attribute_ids]))
attribute_ids[:10]

['B16007_003E',
 'B16007_017E',
 'B23001_064E',
 'B19001_005E',
 'B19101_016E',
 'B23001_094E',
 'B07003_001E',
 'B25063_025E',
 'B16007_004E',
 'B25038_005E']

### Build Census API URL and Make Query
The code below builds the URL for the census API call to get relevant 2019 ACS attribute data at the tract level for San Francisco County. The Census API accepts up to 50 attributes at a time. Therefore the attribute list is first grouped into sublists of 45 attribute IDs. An API call is. Below define:
- Tract code is '*' to collect all tracts
- State code is '06' for CA
- County code is '075' for San Francisco County
- Attributes are defined by the attribute id list and includes all relevant attributes for the socio-economic data calcs

In [4]:
# function builds the api URL from tract_code, state_code, county_code, and attribute ids. 
def build_census_url(tract_code, state_code, county_code, attribute_ids):
    attributes = ','.join(attribute_ids)
    census_url = r'https://api.census.gov/data/{}/acs/acs5?get={}&for=tract:{}&in=state:{}&in=county:{}'\
                .format(year, attributes, tract_code, state_code, county_code)
    return census_url
    

In [5]:
# function makes a single api call and collects results in a pandas dataframe
def make_census_api_call(census_url):
    # make API call to Census
    resp = requests.get(census_url)
    if resp.status_code != 200:
        # this means something went wrong
        resp.raise_for_status()
       
    # retrieve data as json and convert to Pandas Dataframe
    data = resp.json()
    headers = data.pop(0)
    df = pd.DataFrame(data, columns=headers)

    # convert values that are not state, county, or tract to numeric type
    cols=[i for i in df.columns if i not in ["state","county","tract"]]
    for col in cols:
        df[col]=pd.to_numeric(df[col])
        
    return df

In [6]:
# set geo variables for api call
tract_code = "*"
state_code = "06"
county_code = "075"

# split attributes into groups of 45, run a census query for each, merge outputs into a single df
split_attribute_ids = [attribute_ids[i:i+45] for i in range(0, len(attribute_ids), 45)]
df=None
first = True
for ids in split_attribute_ids:
    census_url = build_census_url(tract_code, state_code, county_code, ids)
    returned_df = make_census_api_call(census_url)
    if first:
        df = returned_df
        first = False
    else:
        returned_df = returned_df.drop(columns=['state', 'county'])
        df = pd.merge(df, returned_df, on='tract', how='left')

df.head()

Unnamed: 0,B16007_003E,B16007_017E,B23001_064E,B19001_005E,B19101_016E,B23001_094E,B07003_001E,B25063_025E,B16007_004E,B25038_005E,...,B11001_008E,B25034_008E,B25063_021E,B25056_023E,B25070_005E,B25075_006E,B02001_006E,B15003_011E,B25075_003E,B23001_084E
0,68,140,0,66,0,0,2118,61,0,35,...,661,412,14,130,85,0,50,0,0,11
1,136,25,0,18,38,0,1783,74,128,31,...,314,30,0,42,36,0,0,0,0,0
2,38,13,0,25,35,0,2499,158,35,14,...,562,88,39,100,158,0,0,0,0,14
3,38,14,0,14,46,0,1879,111,0,58,...,430,490,0,205,49,0,17,0,0,64
4,11,39,0,30,106,0,3993,231,0,103,...,906,125,294,149,112,0,0,0,0,84


## Prepare Lookup Dictionaries and Helper Functions

### Calculating Medians
To calculate median values of aggregated geographies you cannot use the mean of component geographies. Instead a statistical approximation of the median must be calculated from range tables. 

Range variables in the ACS have a unique ID like any other Census variable. They represent the amount of a variable within a select range. e.g. number of households with household incomes between $45000-50000. Range variable ID's and range information is stored in the median_ranges.csv file in the repository. These range variables and ranges are needed for calculating the median at the neighborhood level. 

The below function calculates a median based on range data. This method follows the offical ACS documentation for [calculating a median](https://www.dof.ca.gov/Forecasting/Demographics/Census_Data_Center_Network/documents/How_to_Recalculate_a_Median.pdf)


In [7]:
# import median tables from median_ranges csv and add empty columns for rows 'households and 'cumulative_totals'
range_df = pd.read_csv (r'./lookup_tables/median_ranges.csv')
range_df['households']=0
range_df['cumulative_total']=0
range_df.head()

Unnamed: 0,name,id,range_start,range_end,households,cumulative_total
0,median_household_income,B19001_002E,2500.0,9999.0,0,0
1,median_household_income,B19001_003E,10000.0,14999.0,0,0
2,median_household_income,B19001_004E,15000.0,19999.0,0,0
3,median_household_income,B19001_005E,20000.0,24999.0,0,0
4,median_household_income,B19001_006E,25000.0,29999.0,0,0


In [8]:
# define median helper function
def calc_median(tract_df, range_df, median_to_calc):
    
    # subset range df for current median variable to calc
    range_df = range_df[range_df['name']==median_to_calc]
    
    # sort dataframe low to high by range start column
    range_df = range_df.sort_values(by=['range_start'])
    
    # calculate households as sum of tract level households for each row based on range id
    range_df['households'] = range_df.apply(lambda row : tract_df[row['id']], axis = 1)
    
    # calculate the cumulative total of households
    range_df['cumulative_total'] = range_df['households'].cumsum()
    
    # calculate total households and return 0 if total households is 0
    total_households = range_df['households'].sum()
    
    # if total households is 0 set median to 0
    if total_households == 0:
        return 0
    
    # calculate midpoint
    midpoint = total_households/2

    # if midpoint is below first range return median as end of first range value
    if midpoint < range_df['cumulative_total'].min():
        new_median = range_df['range_end'].min()
        return new_median
    
    # if midpoint is above last range set median to end of last range value
    if midpoint > range_df['cumulative_total'].max():
        new_median = range_df['range_end'].max()
        return new_median
    
    less_midpoint_df = range_df[range_df['cumulative_total']<midpoint]
    
    # get the single row containing the range just below the mid range by getting the row with the max range start from the subsetted median df
    range_below_mid_range_df = less_midpoint_df[less_midpoint_df['range_start'] == less_midpoint_df['range_start'].max()]
    
    # get the cumulative total value for the first row of the range below mid range dictionary
    total_hh_previous_range = range_below_mid_range_df['cumulative_total'].iloc[0]
    hh_to_mid_range = midpoint - total_hh_previous_range
    
    # extract rows above midrange by subsetting median df for rows with cumulative total grearter than midpoint.
    greater_midpoint_df = range_df[range_df['cumulative_total']>midpoint]
    
    # get the single row containing the mid range by getting the row with the min range start from the subsetted median df
    mid_range_df = greater_midpoint_df[greater_midpoint_df['range_start'] == greater_midpoint_df['range_start'].min()]
    
    # get the households value for the first row of the mid range dictionary
    hh_in_mid_range = mid_range_df['households'].iloc[0]
    
    # calculate proportion of number of households in the mid range that would be needed to get to the mid-point
    prop_of_hh = hh_to_mid_range/hh_in_mid_range
    
    # calculate width of the mid range
    width = (mid_range_df['range_end'].iloc[0]-mid_range_df['range_start'].iloc[0])+1
    
    # apply proportion to width of mid range
    prop_to_width = prop_of_hh*width
    beginning_of_mid_range = mid_range_df['range_start'].iloc[0]
    
    # calculate new median
    new_median = beginning_of_mid_range + prop_to_width
    
    return new_median

## Define functions for calculating socio-economic data
The `calc_socio_economic_data` function takes tract level data from the API call and the tract/neighborhood lookup dictionary. This function creates all of the socio-economic data calcs and returns a dictionary. The calcs in this function are derived from the [Data_Items_and_Sources.xlsx](https://github.com/jsherba/socio-economic-profiles/raw/main/Data_Items_And_Sources_2019.xlsx) file developed by Michael Webster

In [9]:
# define other helper functions
def calc_sum(df, attribute_id):
 
    return df[attribute_id]

def calc_normalized(df, attribute_id, attribute_id2):
    if df[attribute_id2] == 0:
        return 0
    else:
        return (df[attribute_id]/df[attribute_id2])

def calc_sum_normalized(df, attribute_list, attribute_id2):
    if df[attribute_id2]==0:
        return 0
    else:
        sum_of_attributes = 0
        for attribute_id in attribute_list:
            sum_of_attributes+=df[attribute_id]
        return sum_of_attributes/df[attribute_id2]

In [13]:
# function runs all calcs for each neighborhood or supervisor district
def calc_socio_economic_data(df):
    # create empty dictionary to add calculated attribute information to
    all_calc_data = defaultdict(dict) 
    # calculate all stats for each neighborhood
    for index, row in df.iterrows():
        tract_df = row
        
        all_calc_data_nb = all_calc_data[row['tract']]
        # population
        all_calc_data_nb["Total Population"] = calc_sum(tract_df, 'B01001_001E')
        all_calc_data_nb["Group Quarter Population"] = calc_sum(tract_df, 'B26001_001E')
        all_calc_data_nb["Percent Female"] = calc_normalized(tract_df, 'B01001_026E', 'B01001_001E')
        # household stats
        all_calc_data_nb["Housholds"] = calc_sum(tract_df, 'B11001_001E')
        all_calc_data_nb["Family Households"] = calc_normalized(tract_df, 'B11001_002E', 'B11001_001E')
        all_calc_data_nb["Non-Family Households"] = calc_normalized(tract_df, 'B11001_007E', 'B11001_001E')
        all_calc_data_nb["Single Person Households, % of Total"] = calc_normalized(tract_df, 'B11001_008E', 'B11001_001E')
        all_calc_data_nb["Households with Children, % of Total"] = calc_normalized(tract_df, 'B11005_002E', 'B11001_001E')
        all_calc_data_nb["Households with 60 years and older, % of Total"] = calc_normalized(tract_df, 'B11006_002E', 'B11001_001E')
        all_calc_data_nb["Average Household Size"] = calc_normalized(tract_df, 'B11002_001E', 'B11001_001E')
        all_calc_data_nb["Average Family Household Size"] = calc_normalized(tract_df, 'B11002_002E', 'B11001_002E')
        # race and ethnicity stats
        all_calc_data_nb["Asian"] = calc_normalized(tract_df, 'B02001_005E', 'B02001_001E')
        all_calc_data_nb["Black/African American"] = calc_normalized(tract_df, 'B02001_003E', 'B02001_001E')
        all_calc_data_nb["White"] = calc_normalized(tract_df, 'B02001_002E', 'B02001_001E')
        all_calc_data_nb["Native American Indian"] = calc_normalized(tract_df, 'B02001_004E', 'B02001_001E')
        all_calc_data_nb["Native Hawaiian/Pacific Islander"] = calc_normalized(tract_df, 'B02001_006E', 'B02001_001E')
        all_calc_data_nb["Other/Two or More Races"] = calc_sum_normalized(tract_df, ['B02001_008E', 'B02001_007E'], 'B02001_001E')
        all_calc_data_nb["% Latino (of Any Race)"] = calc_normalized(tract_df, 'B03001_003E', 'B03001_001E')
        # age
        all_calc_data_nb["0-4 Years"] = calc_sum_normalized(tract_df, ['B01001_003E', 'B01001_027E'], 'B01001_001E')
        all_calc_data_nb["5-17 Years"] = calc_sum_normalized(tract_df, ['B01001_004E', 'B01001_005E', 'B01001_006E', 'B01001_028E', 'B01001_029E', 'B01001_030E'],'B01001_001E')
        all_calc_data_nb["18-34 Years"] = calc_sum_normalized(tract_df, ['B01001_007E','B01001_008E','B01001_009E', 'B01001_010E', 'B01001_011E', 'B01001_012E','B01001_031E','B01001_032E','B01001_033E','B01001_034E','B01001_035E','B01001_036E'], 'B01001_001E')
        all_calc_data_nb["35-59 Years"] = calc_sum_normalized(tract_df, ['B01001_013E', 'B01001_014E', 'B01001_015E', 'B01001_016E', 'B01001_017E', 'B01001_037E', 'B01001_038E', 'B01001_039E', 'B01001_040E', 'B01001_041E'], 'B01001_001E')
        all_calc_data_nb["60 and older"] = calc_sum_normalized(tract_df, ['B01001_018E', 'B01001_019E', 'B01001_020E', 'B01001_021E', 'B01001_022E', 'B01001_023E', 'B01001_024E', 'B01001_025E', 'B01001_042E', 'B01001_043E', 'B01001_044E', 'B01001_045E', 'B01001_046E', 'B01001_047E', 'B01001_048E', 'B01001_049E'], 'B01001_001E')
        #all_calc_data_nb["Median Age"]
        # educationa attainment
        all_calc_data_nb["High School or Less"] = calc_sum_normalized(tract_df, ['B15003_002E', 'B15003_003E', 'B15003_004E', 'B15003_005E', 'B15003_006E', 'B15003_007E', 'B15003_008E', 'B15003_009E', 'B15003_010E', 'B15003_011E', 'B15003_012E', 'B15003_013E', 'B15003_014E', 'B15003_015E', 'B15003_016E', 'B15003_017E', 'B15003_018E'], 'B15003_001E')
        all_calc_data_nb["Some College/Associate Degree"] = calc_sum_normalized(tract_df, ['B15003_019E', 'B15003_020E', 'B15003_021E'], 'B15003_001E')
        all_calc_data_nb["College Degree"] = calc_normalized(tract_df, 'B15003_022E', 'B15003_001E')
        all_calc_data_nb["Graduate/Professional Degree"] = calc_sum_normalized(tract_df, ['B15003_023E', 'B15003_024E', 'B15003_025E'], 'B15003_001E')
        # nativity
        all_calc_data_nb["Foreign Born"] = calc_normalized(tract_df, 'B05002_013E', 'B05002_001E')
        # language spoken at home
        all_calc_data_nb["English Only"] = calc_sum_normalized(tract_df, ['B16007_003E', 'B16007_009E', 'B16007_015E'], 'B16007_001E')
        all_calc_data_nb["Spanish Only"] = calc_sum_normalized(tract_df, ['B16007_004E', 'B16007_010E', 'B16007_016E'], 'B16007_001E')
        all_calc_data_nb["Asian/Pacific Islander"] = calc_sum_normalized(tract_df, ['B16007_006E', 'B16007_012E', 'B16007_018E'], 'B16007_001E')
        all_calc_data_nb["Other European Languages"] = calc_sum_normalized(tract_df, ['B16007_005E', 'B16007_011E', 'B16007_017E'], 'B16007_001E')
        all_calc_data_nb["Other Languages"] = calc_sum_normalized(tract_df, ['B16007_007E', 'B16007_013E', 'B16007_019E'], 'B16007_001E')
        # linguistic isolation
        all_calc_data_nb["% of All Households"] = calc_sum_normalized(tract_df, ['B16003_002E', 'B16003_008E'], 'B16004_001E')
        all_calc_data_nb["% of Spanish-Speaking Households"] = calc_sum_normalized(tract_df, ['B16003_004E', 'B16003_009E'], 'B16004_001E')
        all_calc_data_nb["% of Asian-Speaking Households"] = calc_sum_normalized(tract_df, ['B16003_006E', 'B16003_011E'], 'B16004_001E')
        all_calc_data_nb["% of Other European-Speaking Households"] = calc_sum_normalized(tract_df, ['B16003_005E', 'B16003_010E'], 'B16004_001E')
        all_calc_data_nb["% of Households Speaking Other Languages"] = calc_sum_normalized(tract_df, ['B16003_007E', 'B16003_012E'], 'B16004_001E')
        # housing
        all_calc_data_nb["Total Number of Units"] = calc_sum(tract_df, 'B25001_001E')
        all_calc_data_nb["Median Year Structure Built"] = calc_median(tract_df, range_df, 'median_year_structure_built')
        all_calc_data_nb["Owner Occupied"] = calc_normalized(tract_df, 'B25007_002E', 'B25007_001E')
        all_calc_data_nb["Renter Occupied"] = calc_normalized(tract_df, 'B25007_012E', 'B25007_001E')
        all_calc_data_nb["Vacant Units"] = calc_normalized(tract_df, 'B25004_001E', 'B25001_001E')
        all_calc_data_nb["For Rent"] = calc_normalized(tract_df, 'B25004_002E', 'B25004_001E')
        all_calc_data_nb["For Sale Only"] = calc_normalized(tract_df, 'B25004_004E', 'B25004_001E')
        all_calc_data_nb["Rented or Sold, Not Occupied"] = calc_sum_normalized(tract_df, ['B25004_003E', 'B25004_005E'], 'B25004_001E')
        all_calc_data_nb["For Seasonal, Recreation or Occasional Use"] = calc_normalized(tract_df, 'B25004_006E', 'B25004_001E')
        all_calc_data_nb["Other Vacant"] = calc_normalized(tract_df, 'B25004_008E', 'B25004_001E')
        all_calc_data_nb["Median Year Moved in to Unit (Own)"] = calc_median(tract_df, range_df, 'median_year_moved_owner')
        all_calc_data_nb["Median Year Moved in to Unit (Rent)"] = calc_median(tract_df, range_df, 'median_year_moved_renter')
        all_calc_data_nb["Percent in Same House Last Year"] = calc_normalized(tract_df, 'B07001_017E', 'B07001_001E')
        all_calc_data_nb["Percent Abroad Last Year"] = calc_normalized(tract_df, 'B07003_016E', 'B07003_001E')
        # structure type
        all_calc_data_nb["Single Family Housing"] = calc_sum_normalized(tract_df, ['B25024_002E', 'B25024_003E'], 'B25024_001E')
        all_calc_data_nb["2-4 Units"] = calc_sum_normalized(tract_df, ['B25024_004E', 'B25024_005E'], 'B25024_001E')
        all_calc_data_nb["5-9 Units"] = calc_normalized(tract_df, 'B25024_006E', 'B25024_001E')
        all_calc_data_nb["10-19 Units"] = calc_normalized(tract_df, 'B25024_007E', 'B25024_001E')
        all_calc_data_nb["20 Units or More"] = calc_sum_normalized(tract_df, ['B25024_008E', 'B25024_009E'], 'B25024_001E')
        all_calc_data_nb["Other Type"] = calc_sum_normalized(tract_df, ['B25024_010E', 'B25024_011E'], 'B25024_001E')
        # unit size
        all_calc_data_nb["No Bedroom"] = calc_normalized(tract_df,'B25041_002E', 'B25041_001E')
        all_calc_data_nb["1 Bedroom"] = calc_normalized(tract_df, 'B25041_003E', 'B25041_001E')
        all_calc_data_nb["2 Bedrooms"] = calc_normalized(tract_df, 'B25041_004E', 'B25041_001E')
        all_calc_data_nb["3-4 Bedrooms"] = calc_sum_normalized(tract_df, ['B25041_005E', 'B25041_006E'], 'B25041_001E')
        all_calc_data_nb["5 or More Bedrooms"] = calc_normalized(tract_df, 'B25041_007E', 'B25041_001E')
        # housing prices
        all_calc_data_nb["Median Rent"] = calc_median(tract_df, range_df, 'median_rent')
        all_calc_data_nb["Median Contract Rent"] = calc_median(tract_df, range_df, 'median_rent_contract')
        all_calc_data_nb["Median Rent as % of Household Income"] = calc_median(tract_df, range_df, 'median_rent_percent_of_income')
        all_calc_data_nb["Median Home Value"] = calc_median(tract_df, range_df, 'median_home_value')
        # vehicles available
        all_calc_data_nb["Vehicles Available"] = calc_sum(tract_df, 'B25046_001E')
        all_calc_data_nb["Vehicles Homeowners"] = calc_normalized(tract_df, 'B25046_002E', 'B25046_001E')
        all_calc_data_nb["Vehicles Renters"] = calc_normalized(tract_df, 'B25046_003E', 'B25046_001E')
        all_calc_data_nb["Vehicles Per Capita"] = calc_normalized(tract_df, 'B25046_001E', 'B01001_001E')
        all_calc_data_nb["Households with no Vehicle"] = calc_sum_normalized(tract_df, ['B25044_003E', 'B25044_010E'], 'B25044_001E')
        all_calc_data_nb["Percent of Homeowning Households"] = calc_normalized(tract_df, 'B25044_003E', 'B25044_002E')
        all_calc_data_nb["Percent of Renting Households"] = calc_normalized(tract_df, 'B25044_010E', 'B25044_009E')
        # income
        all_calc_data_nb["Median Household Income (B19013_001)"] = calc_median(tract_df, range_df, 'median_household_income')
        all_calc_data_nb["Median Family Income (B19113_001)"] = calc_median(tract_df, range_df, 'median_family_income')
        all_calc_data_nb["Per Capita Income"] = calc_normalized(tract_df, 'B19025_001E', 'B01001_001E')
        all_calc_data_nb["Percent in Poverty"] = calc_normalized(tract_df, 'B17001_002E', 'B17001_001E')
        # employment
        all_calc_data_nb["Unemployment Rate"] = calc_normalized(tract_df, 'B23025_005E', 'B23025_002E')
        all_calc_data_nb["Percent Unemployment Female"] = calc_sum_normalized(tract_df, ['B23001_094E', 'B23001_101E', 'B23001_108E', 'B23001_115E', 'B23001_122E', 'B23001_129E', 'B23001_136E', 'B23001_143E', 'B23001_150E', 'B23001_157E', 'B23001_162E', 'B23001_167E', 'B23001_172E', 'B23001_090E', 'B23001_097E', 'B23001_104E', 'B23001_111E', 'B23001_118E', 'B23001_125E', 'B23001_132E', 'B23001_139E', 'B23001_146E', 'B23001_153E', 'B23001_160E', 'B23001_165E', 'B23001_170E'], 'B23025_002E')
        all_calc_data_nb["Percent Unemployment Male"] = calc_sum_normalized(tract_df, ['B23001_008E', 'B23001_015E', 'B23001_022E', 'B23001_029E', 'B23001_036E', 'B23001_043E', 'B23001_050E', 'B23001_057E', 'B23001_064E', 'B23001_071E', 'B23001_076E', 'B23001_081E', 'B23001_086E', 'B23001_004E', 'B23001_011E', 'B23001_018E', 'B23001_025E', 'B23001_032E', 'B23001_039E', 'B23001_046E', 'B23001_053E', 'B23001_060E', 'B23001_067E', 'B23001_074E', 'B23001_079E', 'B23001_084E'], 'B23025_002E')
        all_calc_data_nb["Employed Residents"] = calc_sum(tract_df, 'C24050_001E')
        all_calc_data_nb["Managerial Professional"] = calc_normalized(tract_df, 'C24050_015E', 'C24050_001E')
        all_calc_data_nb["Services"] = calc_normalized(tract_df, 'C24050_029E', 'C24050_001E')
        all_calc_data_nb["Sales and Office"] = calc_normalized(tract_df, 'C24050_043E', 'C24050_001E')
        all_calc_data_nb["Natural Resources"] = calc_normalized(tract_df, 'C24050_057E', 'C24050_001E')
        all_calc_data_nb["Production Transport Materials"] = calc_normalized(tract_df, 'C24050_071E', 'C24050_001E')
        # journey to work
        all_calc_data_nb["Workers 16 Years and Older"] = calc_sum(tract_df, 'B08006_001E')
        all_calc_data_nb["Car"] = calc_normalized(tract_df, 'B08006_002E', 'B08006_001E')
        all_calc_data_nb["Drove Alone"] = calc_normalized(tract_df, 'B08006_003E', 'B08006_001E')
        all_calc_data_nb["Carpooled"] = calc_normalized(tract_df, 'B08006_004E', 'B08006_001E')
        all_calc_data_nb["Transit"] = calc_normalized(tract_df, 'B08006_008E', 'B08006_001E')
        all_calc_data_nb["Bike"] = calc_normalized(tract_df, 'B08006_014E', 'B08006_001E')
        all_calc_data_nb["Walk"] = calc_normalized(tract_df, 'B08006_015E', 'B08006_001E')
        all_calc_data_nb["Other Journey Type"] = calc_normalized(tract_df, 'B08006_016E', 'B08006_001E')
        all_calc_data_nb["Worked at Home"] = calc_normalized(tract_df, 'B08006_017E', 'B08006_001E')
        # population density
        all_calc_data_nb["Population Density per Acre"] = calc_sum(tract_df, 'B01001_001E')
            
    #return calc dictionary
    return all_calc_data

## Caculate Socioeconomic Profiles

### Run Socioeconomic Profiles Calcs

In [15]:
# run functions to calculate all stats and convert calc dictionary to pandas dataframe
all_calc_data = calc_socio_economic_data(df)
df_all_calcs = pd.DataFrame.from_dict(all_calc_data).reset_index()
df_all_calcs.rename(columns = {'index':'Attribute'}, inplace = True) 
df_all_calcs.head()


Unnamed: 0,Attribute,010101,010102,010201,010202,010300,010401,010402,010500,010600,...,061507,061508,980200,980300,980401,980501,980600,980900,990100,990200
0,Total Population,2118.0,1783.0,2499.0,1891.0,3996.0,2202.0,2292.0,3430.0,3277.0,...,1940.0,2006.0,171.0,25.0,0.0,104.0,970.0,296.0,0.0,0.0
1,Group Quarter Population,0.0,0.0,0.0,5.0,0.0,16.0,0.0,0.0,0.0,...,0.0,263.0,171.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Percent Female,0.507082,0.57263,0.489796,0.492332,0.488488,0.509083,0.411867,0.497959,0.414403,...,0.522165,0.415254,0.0,0.36,0.0,1.0,0.509278,0.35473,0.0,0.0
3,Housholds,1232.0,870.0,1325.0,1030.0,2151.0,993.0,1311.0,1810.0,1792.0,...,684.0,947.0,0.0,17.0,0.0,99.0,405.0,127.0,0.0,0.0
4,Family Households,0.37013,0.555172,0.394717,0.390291,0.427708,0.478348,0.323417,0.435912,0.343192,...,0.343567,0.457233,0.0,0.176471,0.0,0.050505,0.575309,0.464567,0.0,0.0


In [16]:
# transpose dataset for second geo view of dataset
df_all_calcs_tp = df_all_calcs.T.reset_index()
df_all_calcs_tp.columns = df_all_calcs_tp.iloc[0]
df_all_calcs_tp = df_all_calcs_tp[1:].rename(columns={'Attribute': "tract"})
df_all_calcs_tp = df_all_calcs_tp.sort_values(by=["tract"])
df_all_calcs_tp.head()

Unnamed: 0,tract,Total Population,Group Quarter Population,Percent Female,Housholds,Family Households,Non-Family Households,"Single Person Households, % of Total","Households with Children, % of Total","Households with 60 years and older, % of Total",...,Workers 16 Years and Older,Car,Drove Alone,Carpooled,Transit,Bike,Walk,Other Journey Type,Worked at Home,Population Density per Acre
1,10101,2118.0,0.0,0.507082,1232.0,0.37013,0.62987,0.536526,0.098214,0.324675,...,1047.0,0.221585,0.2149,0.006686,0.287488,0.043935,0.160458,0.016237,0.270296,2118.0
2,10102,1783.0,0.0,0.57263,870.0,0.555172,0.444828,0.36092,0.255172,0.466667,...,739.0,0.270636,0.247632,0.023004,0.391069,0.0,0.224628,0.0,0.113667,1783.0
3,10201,2499.0,0.0,0.489796,1325.0,0.394717,0.605283,0.424151,0.055849,0.316226,...,1762.0,0.219637,0.182747,0.03689,0.222474,0.017026,0.123156,0.092509,0.325199,2499.0
4,10202,1891.0,5.0,0.492332,1030.0,0.390291,0.609709,0.417476,0.064078,0.363107,...,1396.0,0.27149,0.229943,0.041547,0.283668,0.062321,0.148997,0.012894,0.22063,1891.0
5,10300,3996.0,0.0,0.488488,2151.0,0.427708,0.572292,0.421199,0.06927,0.402139,...,2680.0,0.229851,0.213806,0.016045,0.233582,0.025746,0.215672,0.04403,0.251119,3996.0


## Export

In [17]:
df_all_calcs = pd.merge(df_all_calcs, attribute_lookup_df[["category",'attribute_name']], how='left', left_on = 'Attribute', right_on = 'attribute_name')
df_all_calcs=df_all_calcs.drop(['attribute_name'], axis=1)
df_all_calcs.rename(columns = {'category':'Category'}, inplace = True) 
df_all_calcs

Unnamed: 0,Attribute,010101,010102,010201,010202,010300,010401,010402,010500,010600,...,061508,980200,980300,980401,980501,980600,980900,990100,990200,Category
0,Total Population,2118.000000,1783.000000,2499.000000,1891.000000,3996.000000,2202.000000,2292.000000,3430.000000,3277.000000,...,2006.000000,171.0,25.000000,0.0,104.000000,970.000000,296.000000,0.0,0.0,Population
1,Group Quarter Population,0.000000,0.000000,0.000000,5.000000,0.000000,16.000000,0.000000,0.000000,0.000000,...,263.000000,171.0,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.0,Population
2,Percent Female,0.507082,0.572630,0.489796,0.492332,0.488488,0.509083,0.411867,0.497959,0.414403,...,0.415254,0.0,0.360000,0.0,1.000000,0.509278,0.354730,0.0,0.0,Population
3,Housholds,1232.000000,870.000000,1325.000000,1030.000000,2151.000000,993.000000,1311.000000,1810.000000,1792.000000,...,947.000000,0.0,17.000000,0.0,99.000000,405.000000,127.000000,0.0,0.0,Households
4,Family Households,0.370130,0.555172,0.394717,0.390291,0.427708,0.478348,0.323417,0.435912,0.343192,...,0.457233,0.0,0.176471,0.0,0.050505,0.575309,0.464567,0.0,0.0,Households
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
92,Bike,0.043935,0.000000,0.017026,0.062321,0.025746,0.088665,0.000000,0.032992,0.035346,...,0.014719,0.0,0.000000,0.0,0.000000,0.011278,0.051546,0.0,0.0,Journey to Work
93,Walk,0.160458,0.224628,0.123156,0.148997,0.215672,0.365631,0.286140,0.376564,0.411880,...,0.385281,0.0,0.227273,0.0,0.000000,0.007519,0.025773,0.0,0.0,Journey to Work
94,Other Journey Type,0.016237,0.000000,0.092509,0.012894,0.044030,0.000000,0.030551,0.005119,0.042219,...,0.054545,0.0,0.000000,0.0,0.000000,0.063910,0.077320,0.0,0.0,Journey to Work
95,Worked at Home,0.270296,0.113667,0.325199,0.220630,0.251119,0.198355,0.249627,0.242321,0.116348,...,0.159307,0.0,0.227273,0.0,0.000000,0.114662,0.185567,0.0,0.0,Journey to Work


In [18]:
# export both dataset views to csv
# set path to download csvs
download_path = r"./output"
df_all_calcs.to_csv(os.path.join(download_path,"all_tracts"+"_"+'profiles_by_attribute_{}.csv'.format(year)), index = False)
df_all_calcs_tp.to_csv(os.path.join(download_path,"all_tracts"+"_"+'profiles_by_geo_{}.csv'.format(year)), index = False)