Import all necessary libraries and set pandas options

In [None]:
import pandas as pd
import numpy as np
import requests

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)   

Define a list of variables that you can pull from the PL data that is available on the API. I would recommend also pulling the 'NAME' variable, so you have a textual description. Below I have instantiated a sample list which includes NAME and the two total population fields for P2 (total population by race, with ethnicity) and P4 (total VAP by race, with ethnicity). Feel free to add or remove any variables as you see fit. 

In [None]:
variables = ['NAME','P002001','P004001']

The function pulls 2010 data and returns it as a data frame. It requires a FIPS code for the state as a string (e.g. '10' for Delaware), a geography specification as a string (e.g. 'cnty' or 'place' for county or place, respectively. There are two default inputs as well: variables (a list of variables, which is coded in the previous cell, but if you wanted to hard code this into the function that would also work and remove it as an input), and a Census API Key. If you do not yet have a Census API key, you can get one here (https://api.census.gov/data/key_signup.html) and they will email you a key that you can insert as a string to replace 'YOUR_KEY_HERE'.

In [None]:
def get_2010pl_data(fip, geog, variables = variables, CENSUS_API_KEY = 'YOUR_KEY_HERE'):
    HOST = "https://api.census.gov/data"
    year = "2010"
    dataset = "dec/pl"
    base_url = "/".join([HOST, year, dataset])
    print('starting to collect data for ' + geog + ' ' + fip)
    predicates = {} 
    predicates["get"] = ",".join(variables)
    if geog == 'place':
        predicates["for"] = "place:*"
    if geog == 'cnty':
        predicates["for"] = "county:*"
    predicates["in"] = "state:" + fip
    predicates["key"] = CENSUS_API_KEY
    # Write the result to a response object
    response = requests.get(base_url, params=predicates)
    col_names = response.json()[0]        
    data = response.json()[1:]
    print('done collecting data for', fip)
    geoids = []  # initialize geoid vector
    pop_data = pd.DataFrame(columns=col_names, data=data)
    cols = [i for i in pop_data.columns if i not in ["NAME","place","state","county"]]
    for col in cols:
        pop_data[col]=pd.to_numeric(pop_data[col])
    for index, row in pop_data.iterrows():
        # make changes here for tracts
        if geog == 'place':
            geoid = row["state"] + row["place"]
        if geog == 'cnty':
            geoid = row["state"] + row["county"]
        geoids.append(geoid)
    pop_data["GEOID"] = geoids
    return pop_data

Below is a test of the above function which calls the function for Delaware (10 FIPS) and at the place level, and then also uses the default inputs specified above. If you want to save these to CSVs, you could add that into the script instead of having it do a return (e.g. pop_data.to_csv(SPECIFIED_FILE_PATH)). You could aslo save the dataframes to a dictionary or something if you are iterating of all states at place/county level or something.

In [None]:
de = get_2010pl_data('10','place')
de.head(5)