# Socio-economic Profiles 2019

This code creates the socio-economic profile data for the San Francisco Planning Department's Neighborhood Socio-Economic Profiles. Socio-economic profiles data is derived from the [American Community Survey](https://www.census.gov/programs-surveys/acs) 5-year data and is created annually by the Planning Department. Tract level socio-economic data from is combined at the neighborhood and zip code level for the City of San Francisco. This code is based off methods created by Michael Webster and others. Run the notebook to:

- Download ACS data using the Census API
- Calculate socio-economic profiles data
- Export data to csv in two formats. 


## Import packages

In [32]:
# Import libraries
import requests
import pandas as pd
from collections import defaultdict


## Retrieve data from Census API
All socio-economic data comes from the Census ACS 5-year estimates and is available at the tract level through the census API. API documentation and data for the 2019 ACS data and previous years is available [here](https://www.census.gov/data/developers/data-sets/acs-5year.html)

### Census Attribute IDs
The attribute ID list below is a list of ASC attribute IDs that correspond to attributes included in the socio-economic data calcs. For a list of attribute ID and their meanings visit the API docs [here](https://api.census.gov/data/2019/acs/acs5/variables.html). 

Attributes included in the list below were derived from the attributes in the [Data_Items_and_Sources.xlsx](https://github.com/jsherba/socio-economic-profiles/raw/main/Data_Items_And_Sources_2019.xlsx) file developed by Michael Webster.

In [33]:
# list of attribute ids
attribute_lookup_df = pd.read_csv (r'./attribute_lookup.csv', dtype=str)
attribute_ids_extracted = attribute_lookup_df['attribute_id'].tolist()
attribute_ids = []
for attribute_id in attribute_ids_extracted:
    attribute_ids.extend(attribute_id.split(", "))
attribute_ids = list(set([x+"E" for x in attribute_ids]))
len(attribute_ids)

285

### Build Census API URL and Make Query
The code below builds the Url for the census API call to get relevant 2019 ACS attribute data at the tract level for San Francisco County.The Census API accepts up to 50 attributes at a time. Therefore a the attribute list is first grouped into sublists of 45. A recursive function queries the API for each group of 45 attributes and appends results together. Below define:
- Tract code is '*' to collect all tracts
- State code is '06' for CA
- County code is '075' for San Francisco County
- Attributes are defined by the attribute id list and includes all relevant attributes for the socio-economic data calcs

In [34]:
# function builds the api URL from tract_code, state_code, county_code, and attribute ids. 
def build_census_url(tract_code, state_code, county_code, attribute_ids):
    attributes = ','.join(attribute_ids)
    census_url = r'https://api.census.gov/data/2019/acs/acs5?get={}&for=tract:{}&in=state:{}&in=county:{}'\
                .format(attributes, tract_code, state_code, county_code)
    return census_url
    

In [35]:
# function makes a single api call and collects results in a pandas dataframe
def make_census_api_call(census_url):
    # make API call to Census
    resp = requests.get(census_url)
    if resp.status_code != 200:
        # This means something went wrong.

        resp.raise_for_status()
        #raise requests.ApiError('GET {}'.format(resp.status_code))

    # retrieve data as json and convert to Pandas Dataframe
    data = resp.json()
    headers = data.pop(0)
    df = pd.DataFrame(data, columns=headers)

    # convert values that are not state, county, or tract to numeric type
    cols=[i for i in df.columns if i not in ["state","county","tract"]]
    for col in cols:
        df[col]=pd.to_numeric(df[col])
    return df

In [36]:
# make api calls and merge results into single df
tract_code = "*"
state_code = "06"
county_code = "075"
split_attribute_ids = [attribute_ids[i:i+45] for i in range(0, len(attribute_ids), 45)]
df=None
first = True
for ids in split_attribute_ids:
    census_url = build_census_url(tract_code, state_code, county_code, ids)
    returned_df = make_census_api_call(census_url)
    if first:
        df = returned_df
        first = False
    else:
        returned_df = returned_df.drop(columns=['state', 'county'])
        df = pd.merge(df, returned_df, on='tract', how='left')

df.head()

Unnamed: 0,B01001_033E,B16007_013E,B19101_012E,B23001_129E,B25004_002E,B02001_008E,B25024_008E,B16003_005E,B19001_012E,B15003_003E,...,B23001_018E,B16007_011E,B02001_001E,B15003_017E,B01001_009E,B15003_012E,B05002_001E,B23001_036E,B01001_035E,B01001_020E
0,0,0,42,0,0,230,0,0,62,0,...,0,74,2507,41,0,3,2507,0,46,8
1,29,0,52,7,0,137,0,0,42,0,...,26,37,2897,391,20,54,2897,0,147,53
2,13,4,0,0,0,241,0,0,18,9,...,6,39,2495,107,0,0,2495,7,38,12
3,27,27,73,0,0,179,45,0,60,0,...,32,179,3804,278,47,11,3804,0,63,28
4,213,46,0,0,45,380,212,0,28,0,...,190,199,4551,20,167,5,4551,6,136,0


## Prepare Lookup Dictionaries and Helper Functions

### Tract/Neighborhood Lookup
Two lookup dictionaries are created below that relate neghborhoods to tracts and supervisor districts to tracts. These dictionaries are used to subset the census dataframe for each neighborhood or district so that calcs can be run on each set of tracts. The lookup dictionaries are created from geo_lookup.csv in the repository.

In [37]:
# import geo_lookup csv
geo_lookup_df = pd.read_csv (r'./geo_lookup.csv', dtype=str)

tract_nb_lookup = defaultdict(list)
tract_sd_lookup = defaultdict(list)

# create tract lookup dictionary for neighborhoods
for i, j in zip(geo_lookup_df['neighborhood'], geo_lookup_df['tractid']):
    tract_nb_lookup[i].append(j)

# create tract lookup dictionary for supervisor districts
for i, j in zip(geo_lookup_df['supervisor_district'], geo_lookup_df['tractid']):
    tract_sd_lookup[i].append(j)


### Calculating Medians
To calculate median values of aggregated geographies you cannot use the mean of component geographies. Instead a statistical approximation of the median must be calculated from range tables. 

Range variables in the ACS have a unique ID like any other Census variable. They represent the amount of a variable within a select range. e.g. number of households with household incomes between $45000-50000. Range variable ID's and range information is stored in the median_ranges.csv file in the repository. These range variables and ranges are needed for calculating the median at the neighborhood level. 

The below function calculates a median based on range data. This method follows the offical ACS documentation for [calculating a median](https://www.dof.ca.gov/Forecasting/Demographics/Census_Data_Center_Network/documents/How_to_Recalculate_a_Median.pdf)


In [38]:
# import median tables from median_ranges csv and add empty columns for rows 'households and 'cumulative_totals'
range_df = pd.read_csv (r'./median_ranges.csv')
range_df['households']=0
range_df['cumulative_total']=0
range_df.head()

Unnamed: 0,name,id,range_start,range_end,households,cumulative_total
0,median_household_income,B19001_002E,0,9999,0,0
1,median_household_income,B19001_003E,10000,14999,0,0
2,median_household_income,B19001_004E,15000,19999,0,0
3,median_household_income,B19001_005E,20000,24999,0,0
4,median_household_income,B19001_006E,25000,29999,0,0


In [61]:
# define median helper function
def calc_median(tract_df, range_df, median_to_calc):
    # subset range df for current median variable to calc
    range_df = range_df[range_df['name']==median_to_calc]
    # sort dataframe low to high by range start column
    range_df = range_df.sort_values(by=['range_start'])
    # calculate households as sum of tract level households for each row based on range id
    range_df['households'] = range_df.apply(lambda row : tract_df[row['id']].sum(), axis = 1)
    # calculate the cumulative total of households
    range_df['cumulative_total'] = range_df['households'].cumsum()
    # calculate total households and return 0 if total households is 0
    total_households = range_df['households'].sum()
    if total_households == 0:
        return 0
    # calculate midpoint
    midpoint = total_households/2
    print(midpoint)
    # extract rows below midrange by subsetting median df for rows with cumulative total less than midpoint.
    print(range_df)
    if midpoint < range_df['cumulative_total'].iloc[0]:
        less_midpoint_df = range_df[range_df['cumulative_total']==range_df['cumulative_total'].iloc[0]]
    else:
        less_midpoint_df = range_df[range_df['cumulative_total']<midpoint]
    # get the single row containing the range just below the mid range by getting the row with the max range start from the subsetted median df
    range_below_mid_range_df = less_midpoint_df[less_midpoint_df['range_start'] == less_midpoint_df['range_start'].max()]
    # get the cumulative total value for the first row of the range below mid range dictionary
    total_hh_previous_range = range_below_mid_range_df['cumulative_total'].iloc[0]
    hh_to_mid_range = midpoint - total_hh_previous_range
    # extract rows above midrange by subsetting median df for rows with cumulative total grearter than midpoint.
    greater_midpoint_df = range_df[range_df['cumulative_total']>midpoint]
    # get the single row containing the mid range by getting the row with the min range start from the subsetted median df
    mid_range_df = greater_midpoint_df[greater_midpoint_df['range_start'] == greater_midpoint_df['range_start'].min()]
    # get the households value for the first row of the mid range dictionary
    hh_in_mid_range = mid_range_df['households'].iloc[0]
    # calculate proportion of number of households in the mid range that would be needed to get to the mid-point
    prop_of_hh = hh_to_mid_range/hh_in_mid_range
    # calculate width of the mid range
    width = (mid_range_df['range_end'].iloc[0]-mid_range_df['range_start'].iloc[0])+1
    # apply proportion to width of mid range
    prop_to_width = prop_of_hh*width
    beginning_of_mid_range = mid_range_df['range_start'].iloc[0]
    # calculate new median
    new_median = beginning_of_mid_range + prop_to_width
    return new_median

## Calculate Socio-economic Data
The `calc_socio_economic_data` function takes tract level data from the API call and the tract/neighborhood lookup dictionary. This function creates all of the socio-economic data calcs and returns a dictionary. The calcs in this function are derived from the [Data_Items_and_Sources.xlsx](https://github.com/jsherba/socio-economic-profiles/raw/main/Data_Items_And_Sources_2019.xlsx) file developed by Michael Webster

In [62]:
# define other helper functions
def calc_sum(df, attribute_id):
    return df[attribute_id].sum()

def calc_normalized(df, attribute_id, attribute_id2):
    if df[attribute_id2].sum() == 0:
        return 0
    else:
        return (df[attribute_id].sum()/df[attribute_id2].sum())

def calc_sum_normalized(df, attribute_list, attribute_id2):
    if df[attribute_id2].sum()==0:
        return 0
    else:
        sum_of_attributes = 0
        for attribute_id in attribute_list:
            sum_of_attributes+=df[attribute_id].sum()
        return sum_of_attributes/df[attribute_id2].sum()


In [63]:
# function runs all calcs for each neighborhood
def calc_socio_economic_data(df, tract_nb_lookup):
    # create empty dictionary to add calculated attribute information to
    all_calc_data = defaultdict(dict) 
    # calculate all stats for each neighborhood
    for nb_name, tracts in tract_nb_lookup.items():
        # extract attribute information for tracks associated with a neighborhood
        tract_df = df[df['tract'].isin(tracts)]
        # build dictionary with all stats for a neighborhood
        all_calc_data_nb = all_calc_data[nb_name]
        # population
        all_calc_data_nb["Total Population"] = calc_sum(tract_df, 'B01001_001E')
        all_calc_data_nb["Group Quarter Population"] = calc_sum(tract_df, 'B26001_001E')
        all_calc_data_nb["Percent Female"] = calc_normalized(tract_df, 'B01001_026E', 'B01001_001E')
        # household stats
        all_calc_data_nb["Housholds"] = calc_sum(tract_df, 'B11001_001E')
        all_calc_data_nb["Family Households"] = calc_normalized(tract_df, 'B11001_002E', 'B11001_001E')
        all_calc_data_nb["Non-Family Households"] = calc_normalized(tract_df, 'B11001_007E', 'B11001_001E')
        all_calc_data_nb["Single Person Households, % of Total"] = calc_normalized(tract_df, 'B11001_008E', 'B11001_001E')
        all_calc_data_nb["Households with Children, % of Total"] = calc_normalized(tract_df, 'B11005_002E', 'B11001_001E')
        all_calc_data_nb["Households with 60 years and older, % of Total"] = calc_normalized(tract_df, 'B11006_002E', 'B11001_001E')
        all_calc_data_nb["Average Household Size"] = calc_normalized(tract_df, 'B11002_001E', 'B11001_001E')
        all_calc_data_nb["Average Family Household Size"] = calc_normalized(tract_df, 'B11002_002E', 'B11001_002E')
        # race and ethnicity stats
        all_calc_data_nb["Asian"] = calc_normalized(tract_df, 'B02001_005E', 'B02001_001E')
        all_calc_data_nb["Black/African American"] = calc_normalized(tract_df, 'B02001_003E', 'B02001_001E')
        all_calc_data_nb["White"] = calc_normalized(tract_df, 'B02001_002E', 'B02001_001E')
        all_calc_data_nb["Native American Indian"] = calc_normalized(tract_df, 'B02001_005E', 'B02001_001E')
        all_calc_data_nb["Native Hawaiian/Pacific Islander"] = calc_normalized(tract_df, 'B02001_006E', 'B02001_001E')
        all_calc_data_nb["Other/Two or More Races"] = calc_sum_normalized(tract_df, ['B02001_008E', 'B02001_007E'], 'B02001_001E')
        all_calc_data_nb["% Latino (of Any Race)"] = calc_normalized(tract_df, 'B03001_003E', 'B03001_001E')
        # age
        all_calc_data_nb["0-4 Years"] = calc_sum_normalized(tract_df, ['B01001_003E', 'B01001_027E'], 'B01001_001E')
        all_calc_data_nb["5-17 Years"] = calc_sum_normalized(tract_df, ['B01001_004E', 'B01001_005E', 'B01001_006E', 'B01001_028E', 'B01001_029E', 'B01001_030E'],'B01001_001E')
        all_calc_data_nb["18-34 Years"] = calc_sum_normalized(tract_df, ['B01001_007E','B01001_008E','B01001_009E', 'B01001_010E', 'B01001_011E', 'B01001_012E','B01001_031E','B01001_032E','B01001_033E','B01001_034E','B01001_035E','B01001_036E'], 'B01001_001E')
        all_calc_data_nb["35-59 Years"] = calc_sum_normalized(tract_df, ['B01001_013E', 'B01001_014E', 'B01001_015E', 'B01001_016E', 'B01001_017E', 'B01001_037E', 'B01001_038E', 'B01001_039E', 'B01001_040E', 'B01001_041E'], 'B01001_001E')
        all_calc_data_nb["60 and Older"] = calc_sum_normalized(tract_df, ['B01001_018E', 'B01001_019E', 'B01001_020E', 'B01001_021E', 'B01001_022E', 'B01001_023E', 'B01001_024E', 'B01001_025E', 'B01001_042E', 'B01001_043E', 'B01001_044E', 'B01001_045E', 'B01001_046E', 'B01001_047E', 'B01001_048E', 'B01001_049E'], 'B01001_001E')
        #all_calc_data_nb["Median Age"]
        # educationa attainment
        all_calc_data_nb["High School or Less"] = calc_sum_normalized(tract_df, ['B15003_002E', 'B15003_003E', 'B15003_004E', 'B15003_005E', 'B15003_006E', 'B15003_007E', 'B15003_008E', 'B15003_009E', 'B15003_010E', 'B15003_011E', 'B15003_012E', 'B15003_013E', 'B15003_014E', 'B15003_015E', 'B15003_016E', 'B15003_017E', 'B15003_018E'], 'B15003_001E')
        all_calc_data_nb["Some College/Associate Degree"] = calc_sum_normalized(tract_df, ['B15003_019E', 'B15003_020E', 'B15003_021E'], 'B15003_001E')
        all_calc_data_nb["College Degree"] = calc_normalized(tract_df, 'B15003_022E', 'B15003_001E')
        all_calc_data_nb["Graduate/Professional Degree"] = calc_sum_normalized(tract_df, ['B15003_023E', 'B15003_024E', 'B15003_025E'], 'B15003_001E')
        # nativity
        all_calc_data_nb["Foreign Born"] = calc_normalized(tract_df, 'B05002_013E', 'B05002_001E')
        # language spoken at home
        all_calc_data_nb["English Only"] = calc_sum_normalized(tract_df, ['B16007_003E', 'B16007_009E', 'B16007_015E'], 'B16007_001E')
        all_calc_data_nb["Spanish Only"] = calc_sum_normalized(tract_df, ['B16007_004E', 'B16007_010E', 'B16007_016E'], 'B16007_001E')
        all_calc_data_nb["Asian/Pacific Islander"] = calc_sum_normalized(tract_df, ['B16007_006E', 'B16007_012E', 'B16007_018E'], 'B16007_001E')
        all_calc_data_nb["Other European Languages"] = calc_sum_normalized(tract_df, ['B16007_005E', 'B16007_011E', 'B16007_017E'], 'B16007_001E')
        all_calc_data_nb["Other Languages"] = calc_sum_normalized(tract_df, ['B16007_007E', 'B16007_013E', 'B16007_019E'], 'B16007_001E')
        # linguistic isolation
        all_calc_data_nb["% of All Households"] = calc_sum_normalized(tract_df, ['B16003_002E', 'B16003_008E'], 'B16004_001E')
        all_calc_data_nb["% of Spanish-Speaking Households"] = calc_sum_normalized(tract_df, ['B16003_004E', 'B16003_009E'], 'B16004_001E')
        all_calc_data_nb["% of Asian-Speaking Households"] = calc_sum_normalized(tract_df, ['B16003_006E', 'B16003_011E'], 'B16004_001E')
        all_calc_data_nb["% of Other European-Speaking Households"] = calc_sum_normalized(tract_df, ['B16003_005E', 'B16003_010E'], 'B16004_001E')
        all_calc_data_nb["% of Households Speaking Other Languages"] = calc_sum_normalized(tract_df, ['B16003_007E', 'B16003_012E'], 'B16004_001E')
        # housing
        all_calc_data_nb["Total Number of Units"] = calc_sum(df, 'B25001_001E')
        all_calc_data_nb["Median Year Structure Built"] = calc_median(tract_df, range_df, 'median_year_structure_built')
        all_calc_data_nb["Owner Occupied"] = calc_normalized(tract_df, 'B25007_002E', 'B25007_001E')
        all_calc_data_nb["Renter Occupied"] = calc_normalized(tract_df, 'B25007_012E', 'B25007_001E')
        all_calc_data_nb["Vacant Units"] = calc_normalized(tract_df, 'B25004_001E', 'B25001_001E')
        all_calc_data_nb["For Rent"] = calc_normalized(tract_df, 'B25004_002E', 'B25004_001E')
        all_calc_data_nb["For Sale Only"] = calc_normalized(tract_df, 'B25004_004E', 'B25004_001E')
        all_calc_data_nb["Rented or Sold, Not Occupied"] = calc_sum_normalized(tract_df, ['B25004_003E', 'B25004_005E'], 'B25004_001E')
        all_calc_data_nb["For Seasonal, Recreation or Occasional Use"] = calc_normalized(tract_df, 'B25004_006E', 'B25004_001E')
        all_calc_data_nb["Other Vacant"] = calc_normalized(tract_df, 'B25004_008E', 'B25004_001E')
        #all_calc_data_nb["Median Year Moved in to Unit (Own)"] = calc_median(df, 'B25001_001E')
        #all_calc_data_nb["Median Year Moved in to Unit (Rent)"] = calc_median(df, 'B25001_001E')
        all_calc_data_nb["Percent in Same House Last Year"] = calc_normalized(tract_df, 'B07001_017E', 'B07001_001E')
        all_calc_data_nb["Percent Abroad Last Year"] = calc_normalized(tract_df, 'B07003_016E', 'B07003_001E')
        # structure type
        all_calc_data_nb["Single Family Housing"] = calc_sum_normalized(tract_df, ['B25024_002E', 'B25024_003E'], 'B25024_001E')
        all_calc_data_nb["2-4 Units"] = calc_sum_normalized(tract_df, ['B25024_004E', 'B25024_005E'], 'B25024_001E')
        all_calc_data_nb["5-9 Units"] = calc_normalized(tract_df, 'B25024_006E', 'B25024_001E')
        all_calc_data_nb["10-19 Units"] = calc_normalized(tract_df, 'B25024_007E', 'B25024_001E')
        all_calc_data_nb["20 Units or More"] = calc_sum_normalized(tract_df, ['B25024_008E', 'B25024_009E'], 'B25024_001E')
        all_calc_data_nb["Other"] = calc_sum_normalized(tract_df, ['B25024_010E', 'B25024_011E'], 'B25024_001E')
        # unit size
        all_calc_data_nb["No Bedroom"] = calc_normalized(tract_df,'B25041_002E', 'B25041_001E')
        all_calc_data_nb["1 Bedroom"] = calc_normalized(tract_df, 'B25041_003E', 'B25041_001E')
        all_calc_data_nb["2 Bedrooms"] = calc_normalized(tract_df, 'B25041_004E', 'B25041_001E')
        all_calc_data_nb["3-4 Bedrooms"] = calc_sum_normalized(tract_df, ['B25041_005E', 'B25041_006E'], 'B25041_001E')
        all_calc_data_nb["5 Bedrooms"] = calc_normalized(tract_df, 'B25041_007E', 'B25041_001E')
        # housing prices
        #all_calc_data_nb["Median Rent"] = calc_median(df, 'B25001_001E')
        #all_calc_data_nb["Median Contract Rent"] = calc_median(df, 'B25001_001E')
        #all_calc_data_nb["Median Rent as % of Household Income"] = calc_median(df, 'B25001_001E')
        #all_calc_data_nb["Median Home Value"] = calc_median(df, 'B25001_001E')
        # vehicles available
        all_calc_data_nb["Vehicles Available"] = calc_sum(df, 'B25046_001E')
        all_calc_data_nb["Vehicles Homeowners"] = calc_normalized(tract_df, 'B25046_002E', 'B25046_001E')
        all_calc_data_nb["Vehicles Renters"] = calc_normalized(tract_df, 'B25046_003E', 'B25046_001E')
        all_calc_data_nb["Vehicles Per Capita"] = calc_normalized(tract_df, 'B25046_001E', 'B01001_001E')
        all_calc_data_nb["Households with no Vehicle"] = calc_sum_normalized(tract_df, ['B25044_003E', 'B25044_010E'], 'B25044_001E')
        all_calc_data_nb["Percent of Homeowning Households"] = calc_normalized(tract_df, 'B25044_003E', 'B25044_002E')
        all_calc_data_nb["Percent of Renting Households"] = calc_normalized(tract_df, 'B25044_010E', 'B25044_009E')
        # income
        all_calc_data_nb["Median Household Income"] = calc_median(tract_df, range_df, 'median_household_income')
        all_calc_data_nb["Median Family Income"] = calc_median(tract_df, range_df, 'median_family_income')
        all_calc_data_nb["Per Capita Income"] = calc_normalized(tract_df, 'B19025_001E', 'B01001_001E')
        all_calc_data_nb["Percent in Poverty"] = calc_normalized(tract_df, 'B17001_002E', 'B17001_001E')
        # employment
        all_calc_data_nb["Unemployment Rate"] = calc_normalized(tract_df, 'B23025_005E', 'B23025_002E')
        all_calc_data_nb["Percent Unemployment Female"] = calc_sum_normalized(tract_df, ['B23001_094E', 'B23001_101E', 'B23001_108E', 'B23001_115E', 'B23001_122E', 'B23001_129E', 'B23001_136E', 'B23001_143E', 'B23001_150E', 'B23001_157E', 'B23001_162E', 'B23001_167E', 'B23001_172E', 'B23001_090E', 'B23001_097E', 'B23001_104E', 'B23001_111E', 'B23001_118E', 'B23001_125E', 'B23001_132E', 'B23001_139E', 'B23001_146E', 'B23001_153E', 'B23001_160E', 'B23001_165E', 'B23001_170E'], 'B23025_002E')
        all_calc_data_nb["Percent Unemployment Male"] = calc_sum_normalized(tract_df, ['B23001_008E', 'B23001_015E', 'B23001_022E', 'B23001_029E', 'B23001_036E', 'B23001_043E', 'B23001_050E', 'B23001_057E', 'B23001_064E', 'B23001_071E', 'B23001_076E', 'B23001_081E', 'B23001_086E', 'B23001_004E', 'B23001_011E', 'B23001_018E', 'B23001_025E', 'B23001_032E', 'B23001_039E', 'B23001_046E', 'B23001_053E', 'B23001_060E', 'B23001_067E', 'B23001_074E', 'B23001_079E', 'B23001_084E'], 'B23025_002E')
        all_calc_data_nb["Employed Residents"] = calc_sum(df, 'C24050_001E')
        all_calc_data_nb["Managerial Professional"] = calc_normalized(tract_df, 'C24050_015E', 'C24050_001E')
        all_calc_data_nb["Services"] = calc_normalized(tract_df, 'C24050_029E', 'C24050_001E')
        all_calc_data_nb["Sales and Office"] = calc_normalized(tract_df, 'C24050_043E', 'C24050_001E')
        all_calc_data_nb["Natural Resources"] = calc_normalized(tract_df, 'C24050_057E', 'C24050_001E')
        all_calc_data_nb["Production Transport Materials"] = calc_normalized(tract_df, 'C24050_071E', 'C24050_001E')
        # journey to work
        all_calc_data_nb["Workers 16 Years and Older"] = calc_sum(df, 'B08006_001E')
        all_calc_data_nb["Car"] = calc_normalized(tract_df, 'B08006_002E', 'B08006_001E')
        all_calc_data_nb["Drove Alone"] = calc_normalized(tract_df, 'B08006_003E', 'B08006_001E')
        all_calc_data_nb["Carpooled"] = calc_normalized(tract_df, 'B08006_004E', 'B08006_001E')
        all_calc_data_nb["Transit"] = calc_normalized(tract_df, 'B08006_008E', 'B08006_001E')
        all_calc_data_nb["Bike"] = calc_normalized(tract_df, 'B08006_014E', 'B08006_001E')
        all_calc_data_nb["Walk"] = calc_normalized(tract_df, 'B08006_015E', 'B08006_001E')
        all_calc_data_nb["Other"] = calc_normalized(tract_df, 'B08006_016E', 'B08006_001E')
        all_calc_data_nb["Worked at Home"] = calc_normalized(tract_df, 'B08006_017E', 'B08006_001E')
        # population density
        all_calc_data_nb["Population Density per Acre"] = calc_sum(df, 'B01001_001E')
        
        
    #return calc dictionary
    return all_calc_data


    

In [64]:
# run functions to calculate all stats and convert calc dictionary to pandas dataframe
all_calc_data = calc_socio_economic_data(df, tract_nb_lookup)
df_all_calcs = pd.DataFrame.from_dict(all_calc_data).reset_index()
df_all_calcs.rename(columns = {'index':'Attribute'}, inplace = True) 
df_all_calcs.head()

3680.0
                           name           id  range_start  range_end  \
32  median_year_structure_built  B25034_011E         1939       1939   
33  median_year_structure_built  B25034_010E         1940       1949   
34  median_year_structure_built  B25034_009E         1950       1959   
35  median_year_structure_built  B25034_008E         1960       1969   
36  median_year_structure_built  B25034_007E         1970       1979   
37  median_year_structure_built  B25034_006E         1980       1989   
38  median_year_structure_built  B25034_005E         1990       1999   
39  median_year_structure_built  B25034_004E         2000       2009   
40  median_year_structure_built  B25034_003E         2010       2013   
41  median_year_structure_built  B25034_002E         2014       2014   

    households  cumulative_total  
32        3693              3693  
33         314              4007  
34         506              4513  
35         541              5054  
36         901           

2068.0
                    name           id  range_start  range_end  households  \
16  median_family_income  B19101_002E            0       9999         126   
17  median_family_income  B19101_003E        10000      14999         171   
18  median_family_income  B19101_004E        15000      19999         134   
19  median_family_income  B19101_005E        20000      24999         348   
20  median_family_income  B19101_006E        25000      29999         142   
21  median_family_income  B19101_007E        30000      34999         219   
22  median_family_income  B19101_008E        35000      39999         254   
23  median_family_income  B19101_009E        40000      44999         184   
24  median_family_income  B19101_010E        45000      49999         217   
25  median_family_income  B19101_011E        50000      59999         251   
26  median_family_income  B19101_012E        60000      74999         454   
27  median_family_income  B19101_013E        75000      99999        

                    name           id  range_start  range_end  households  \
16  median_family_income  B19101_002E            0       9999           0   
17  median_family_income  B19101_003E        10000      14999           0   
18  median_family_income  B19101_004E        15000      19999          55   
19  median_family_income  B19101_005E        20000      24999           0   
20  median_family_income  B19101_006E        25000      29999           0   
21  median_family_income  B19101_007E        30000      34999          19   
22  median_family_income  B19101_008E        35000      39999           0   
23  median_family_income  B19101_009E        40000      44999           0   
24  median_family_income  B19101_010E        45000      49999           0   
25  median_family_income  B19101_011E        50000      59999          18   
26  median_family_income  B19101_012E        60000      74999          14   
27  median_family_income  B19101_013E        75000      99999           9   

                    name           id  range_start  range_end  households  \
16  median_family_income  B19101_002E            0       9999         128   
17  median_family_income  B19101_003E        10000      14999         259   
18  median_family_income  B19101_004E        15000      19999         302   
19  median_family_income  B19101_005E        20000      24999          81   
20  median_family_income  B19101_006E        25000      29999         162   
21  median_family_income  B19101_007E        30000      34999         114   
22  median_family_income  B19101_008E        35000      39999          92   
23  median_family_income  B19101_009E        40000      44999           0   
24  median_family_income  B19101_010E        45000      49999         375   
25  median_family_income  B19101_011E        50000      59999         369   
26  median_family_income  B19101_012E        60000      74999         112   
27  median_family_income  B19101_013E        75000      99999         250   

                           name           id  range_start  range_end  \
32  median_year_structure_built  B25034_011E         1939       1939   
33  median_year_structure_built  B25034_010E         1940       1949   
34  median_year_structure_built  B25034_009E         1950       1959   
35  median_year_structure_built  B25034_008E         1960       1969   
36  median_year_structure_built  B25034_007E         1970       1979   
37  median_year_structure_built  B25034_006E         1980       1989   
38  median_year_structure_built  B25034_005E         1990       1999   
39  median_year_structure_built  B25034_004E         2000       2009   
40  median_year_structure_built  B25034_003E         2010       2013   
41  median_year_structure_built  B25034_002E         2014       2014   

    households  cumulative_total  
32        2750              2750  
33         373              3123  
34         243              3366  
35         288              3654  
36         362              4016

2489.0
                           name           id  range_start  range_end  \
32  median_year_structure_built  B25034_011E         1939       1939   
33  median_year_structure_built  B25034_010E         1940       1949   
34  median_year_structure_built  B25034_009E         1950       1959   
35  median_year_structure_built  B25034_008E         1960       1969   
36  median_year_structure_built  B25034_007E         1970       1979   
37  median_year_structure_built  B25034_006E         1980       1989   
38  median_year_structure_built  B25034_005E         1990       1999   
39  median_year_structure_built  B25034_004E         2000       2009   
40  median_year_structure_built  B25034_003E         2010       2013   
41  median_year_structure_built  B25034_002E         2014       2014   

    households  cumulative_total  
32        1248              1248  
33         817              2065  
34        1121              3186  
35         510              3696  
36         549           

9315.0
                    name           id  range_start  range_end  households  \
16  median_family_income  B19101_002E            0       9999         242   
17  median_family_income  B19101_003E        10000      14999         222   
18  median_family_income  B19101_004E        15000      19999         273   
19  median_family_income  B19101_005E        20000      24999         273   
20  median_family_income  B19101_006E        25000      29999         387   
21  median_family_income  B19101_007E        30000      34999         275   
22  median_family_income  B19101_008E        35000      39999         437   
23  median_family_income  B19101_009E        40000      44999         558   
24  median_family_income  B19101_010E        45000      49999         459   
25  median_family_income  B19101_011E        50000      59999         864   
26  median_family_income  B19101_012E        60000      74999        1241   
27  median_family_income  B19101_013E        75000      99999        

15              1324  
368.5
                    name           id  range_start  range_end  households  \
16  median_family_income  B19101_002E            0       9999           0   
17  median_family_income  B19101_003E        10000      14999           0   
18  median_family_income  B19101_004E        15000      19999           0   
19  median_family_income  B19101_005E        20000      24999           0   
20  median_family_income  B19101_006E        25000      29999          25   
21  median_family_income  B19101_007E        30000      34999          13   
22  median_family_income  B19101_008E        35000      39999           0   
23  median_family_income  B19101_009E        40000      44999           0   
24  median_family_income  B19101_010E        45000      49999          33   
25  median_family_income  B19101_011E        50000      59999          26   
26  median_family_income  B19101_012E        60000      74999          40   
27  median_family_income  B19101_013E        75

Unnamed: 0,Attribute,North Beach,Russian Hill,Financial District,Chinatown,Nob Hill,Tenderloin,Marina,Pacific Heights,Presidio Heights,...,Sunset/Parkside,Lakeshore,Inner Richmond,Outer Richmond,Seacliff,Presidio,Mission Bay,Lincoln Park,Golden Gate Park,McLaren Park
0,Total Population,11854.0,18139.0,21537.0,14438.0,26445.0,29927.0,25375.0,24279.0,10585.0,...,82682.0,14801.0,22572.0,45921.0,2507.0,4226.0,13222.0,312.0,63.0,507.0
1,Group Quarter Population,17.0,5.0,563.0,53.0,977.0,1523.0,97.0,518.0,196.0,...,452.0,2723.0,147.0,202.0,0.0,0.0,342.0,191.0,0.0,0.0
2,Percent Female,0.486081,0.503501,0.465617,0.516207,0.490641,0.429913,0.515153,0.518226,0.514218,...,0.507922,0.53895,0.547891,0.512336,0.483845,0.539991,0.519286,0.227564,0.396825,0.585799
3,Housholds,6297.0,9636.0,11297.0,6907.0,15247.0,17649.0,13583.0,12815.0,4805.0,...,28277.0,4924.0,9385.0,18652.0,891.0,1324.0,6312.0,70.0,48.0,239.0
4,Family Households,0.397332,0.365816,0.372754,0.522948,0.26884,0.234348,0.309578,0.353726,0.47076,...,0.658839,0.390739,0.483324,0.537905,0.775533,0.556647,0.504119,0.442857,0.0,0.472803


In [56]:
# transpose dataset for second neighborhood view of dataset
df_all_calcs2 = df_all_calcs.T.reset_index()
df_all_calcs2.columns = df_all_calcs2.iloc[0]
df_all_calcs2 = df_all_calcs2[1:].rename(columns={'Attribute':'Neighborhood'})
df_all_calcs2[["Median Household Income","Neighborhood"]]

Unnamed: 0,Median Household Income,Neighborhood
1,88275.0,North Beach
2,144372.294372,Russian Hill
3,183786.525974,Financial District
4,25255.235602,Chinatown
5,91083.865351,Nob Hill
6,35653.526971,Tenderloin
7,165038.222647,Marina
8,169015.199493,Pacific Heights
9,149321.086262,Presidio Heights
10,76238.636364,Western Addition


## Visualizations

## Export

In [None]:
# export both dataset views to csv
df_all_calcs.to_csv(r'C:\Users\Jason Sherba\Downloads\data\test.csv', index = False)
df_all_calcs2.to_csv(r'C:\Users\Jason Sherba\Downloads\data\test2.csv', index = False)