# Census data summaries for 2000, 2010, 2020 census

Summaries for 2010 and 2020 are derived from the [American Community Survey](https://www.census.gov/programs-surveys/acs). Year 2000 summaries are derived from the 2000 decentenial census. County level socio-economic data is summarized for three geographies: 1-mile radius, closest tracts, tenderloin/SoMA neighborhoods. 

## Import packages

In [219]:
# base libraries
import requests, json, os
import pandas as pd
import numpy as np
from collections import defaultdict

# graph libraries
import plotly.express as px
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

## Retrieve data from Census API
All socio-economic data comes from the Census ACS 5-year estimates for 2010 and 2020 and the 2000 decentenial census and is available at the county level through the census API. See census API docs [here](https://www.census.gov/data/developers/data-sets.html)

### Census Attribute IDs
The census API returns attribute values for provided attribute IDs. A list of attribute ID's needed for calculating the socio-economic profile data is compiled below from IDs stored in attribute_lookup.csv.

In [248]:
# set run year
run_year = "2000"

In [247]:
# Create list of attribute IDs from attribute_lookup.csv
attribute_lookup_df = pd.read_csv (r'./lookup_tables/attribute_lookup.csv', dtype=str)

In [249]:
# set year and extension to choose API endpoint
attribute_col = "attribute_" + run_year
extension_col = "extension_" + run_year

attribute_dict = attribute_lookup_df[[attribute_col, extension_col]].to_dict('records')

attribute_dict_source = defaultdict(list)
for record in attribute_dict:
    attribute_dict_source[record[extension_col]]+=record[attribute_col].split(",")  

### Build Census API URL and Make Query
The code below builds the URL for the census API call 

- State code is '06' for CA
- County code is '075' for San Francisco County
- Attributes are defined by the attribute id list and includes all relevant attributes for the socio-economic data calcs

In [250]:
census_endpoints = {"2000": r"2000/dec/", 
                   "2010":r"2010/acs/", 
                   "2020":r"2020/acs/"}

In [251]:
# function builds the api URL from tract_code, state_code, county_code, and attribute ids. 
def build_census_url(tract_code, state_code, county_code, attribute_ids, file_name):
    attributes = ','.join(attribute_ids)
    census_url = r'https://api.census.gov/data/'+ census_endpoints[run_year] + file_name + '?get={}&in=state:{}&for=county:{}'\
                .format(attributes, state_code, county_code)
    return census_url
    

In [252]:
# function makes a single api call and collects results in a pandas dataframe
def make_census_api_call(census_url):
    # make API call to Census
    resp = requests.get(census_url)
    if resp.status_code != 200:
        # this means something went wrong
        resp.raise_for_status()
       
    # retrieve data as json and convert to Pandas Dataframe
    data = resp.json()
    headers = data.pop(0)
    df = pd.DataFrame(data, columns=headers)

    # convert values that are not state, county, or tract to numeric type
    cols=[i for i in df.columns if i not in ["state","county","tract"]]
    #print(df.dtypes)
    #for col in cols:
        
    #    df[col]=pd.to_numeric(df[col])
        
    return df

In [253]:
# set geo variables and make api call
tract_code = "*"
state_code = "06"
county_code = "075"

df=None
first = True

for file in attribute_dict_source.keys():
    attribute_ids = attribute_dict_source[file]
    attribute_ids = list(set(attribute_ids))
    # split attributes into groups of 45, run a census query for each, merge outputs into a single df
    split_attribute_ids = [attribute_ids[i:i+45] for i in range(0, len(attribute_ids), 45)]

    for ids in split_attribute_ids:
        census_url = build_census_url(tract_code, state_code, county_code, ids, file)
        #census_url = build_census_url(state_code, county_code, ids, file)
        print(census_url)
        returned_df = make_census_api_call(census_url)
        if first:
            df = returned_df
            first = False
        else:
            #returned_df = returned_df.drop(columns=['state', 'county'])
            #df = pd.merge(df, returned_df, on='tract', how='left')
            returned_df = returned_df.drop(columns=['state'])
            df = pd.merge(df, returned_df, on='county', how='left')


https://api.census.gov/data/2000/dec/sf1?get=P009004,P015B001,P015C001,P023002,P015I001,P007006,P007008,P007005,P015H001,P015001,P015E001,P008010,P015G001,P007003,P008003,P015D001,P007001,P015F001,P007007,P019003&in=state:06&for=county:075
https://api.census.gov/data/2000/dec/sf3?get=HCT011022,HCT011016,HCT011024,P042015,P092002,P042021,HCT011023,HCT011017,HCT011019,P042004,H008002,HCT011003,P042024,H011010,H069010,HCT011025,H016002,HCT011020,P042028,H006003,P042007,HCT012003,P042048,P042031,H011002,H069008,P042038,H001001,HCT011018,HCT011021,H069007,H069009,P042045&in=state:06&for=county:075


Unnamed: 0,P009004,P015B001,P015C001,P023002,P015I001,P007006,P007008,P007005,P015H001,P015001,...,P042031,H011002,H069008,P042038,H001001,HCT011018,HCT011021,H069007,H069009,P042045
0,8971,24280,1334,78716,184804,3844,33255,239565,31803,329700,...,2207,115315,11339,43858,346527,11119,31040,16221,13982,10249


In [254]:
# transpose data
df_t = df.T
df_t = df_t.reset_index()
df_t.columns = ["attribute", "value"]
df_t["value"] = df_t["value"].astype(str).astype(int)

## Define functions for calculating socio-economic data
Takes care of attributes that are calculated as a sum of multiple attributes

In [256]:
attributes = attribute_lookup_df.to_dict('records')

In [257]:
# function runs all calcs for each neighborhood or supervisor district
def calc_socio_economic_data(df_t, attributes):
    # create empty dictionary to add calculated attribute information to
    all_calc_data =[]
    # calculate all stats for each neighborhood
    for attribute in attributes:
        # population
        attribute_list = attribute[attribute_col].split(",")
        subset = df_t[df_t['attribute'].isin(attribute_list)]
        total = subset['value'].sum()
        all_calc_data.append({'category':attribute['category'], "attribute_name":attribute["attribute_name"], 'total':total})
        
            
    #return calc dictionary
    return all_calc_data

## Caculate Socioeconomic Profiles

### Run Socioeconomic Profiles Calcs

In [258]:
# run functions to calculate all stats and convert calc dictionary to pandas dataframe
all_calc_data = calc_socio_economic_data(df_t, attributes)
all_calc_data

df = pd.DataFrame(all_calc_data)

Unnamed: 0,category,attribute_name,total
0,Population,Total Population,776733
1,Population,"Non-Hispanicor Latino, white",338909
2,Population,Asian,239565
3,Population,Hispanic or Latino,109504
4,Population,Two or More Races,33255
5,Population,Other,50368
6,Population,Black or African American,60515
7,Population,American Indian or Alaskan Native,8971
8,Population,Native Hawaiian or Pacific Islander,3844
9,Population,Persons with Disabilities,127937


## Export

In [246]:
# export both dataset views to csv
df.to_csv(r"./output/sf_county_"+run_year+".csv", index = False)