# API Problem Set

<span style="color:red">0 / 0 points</span>.

In this problem set, you will work the United States Census' American Community Survey data available in this class's Data directory.  After every step in the problem set, create a new cell, code or Markdown as appropriate, for the answer. 


## Orienting
1. The next question requires the `df_multiple` dataframe made in the lesson.  Load it from where it was saved and look at rows 120 through 132.  Name the object `df_m`.

In [None]:
import pandas as pd

df_m = pd.read_csv('../../lessons/APIs/df_acs2019_mult.csv')  # must load from the location it was saved for full credit

df_m.iloc[120:133]  # 133 is required for full credit

2. The dataframe has several variables in addition to DP04_0132E. In words, what are they?  The answer needs to be descriptive and make it clear which description is for which variable.  Where did you go to find this information?

The three other variables are DP04_0001E, DP04_0108E, and DP04_0134E.  DP04_0001E is an estimate of the total number of housing units.  DP04_0108E is an estimate of the number of housing units with a mortage worth \$1,000 or more.  DP04_0134E is the estimate of median gross rent for occupied housing units.  This information comes from the `Data/ACS/DP04/CountyACSDP1Y2019.DP04-Column-Metadata.csv` file included with the textbook's ACS data.  <!--- Full credit does not require that specific file, but the answer should document a reputable source.  ChatGPT and other generative services do not count. --->

3. When making requests for multiple years, no data were retrieved for 2023.  Why?

The Census has not made 2023 ACS data available at this time.  <!--- True as of 06.03.2024 --->

## Programming 
4. Download the 2014 percent estimate for Spanish spoken at home for language for residents five and over.  Write a function that converts the return into a dataframe.  Call this dataframe `perc_span2014`.

In [None]:
import requests
import pandas as pd

## URL parts
year='2014'
name= 'acs'
acronym='acs5' 
cols= 'DP02_0114PE,NAME' 
county='*' 
keyfile= 'census_key.txt'  # Change this name to reflect the name of your file. Make sure it is saved in the same directory as your script.

## Read api key in from file
with open ('../../lessons/APIs/'+keyfile) as key:
    api_key=key.read().strip()

## Retrieve data, print output to screen
base_url = f'https://api.census.gov/data/{year}/{name}/{acronym}/' 
data_url = f'{base_url}profile?get={cols}&for=county:{county}&key={api_key}'
response = requests.get(data_url)

response # Inspecting. <Response [200]>, good

perc_span2014 = response.json()

# Define the function
def response_to_df(from_census):
    data = from_census
    header = data.pop(0)
    df = pd.DataFrame(data, columns=header)
    print('The returned data contain ' + str(df.shape[0]) + ' rows and ' + str(df.shape[1]) + ' columns.')
    return(df)

# Convert to dataframe
perc_span2014 = response_to_df(from_census=perc_span2014)

5. Make a new function, `make_request`, that takes user supplied arugments, makes a correctly formatted URL hardcoded to request county data, and passes that URL to the API; it should also use your API key.  Hardcode the function to return data at the county level and return the response object.  Use `make_request` to request the same information as for the previous problem, then convert the response to a dataframe called `perc_span2014b`. Confirm the results have the same number of rows and columns as the dataframe object make above.

In [None]:
# Define the function
def make_request(year, name, acronym, cols, keyfile):
    ## URL parts
    year=year
    name= name
    acronym=acronym 
    cols= cols 
    county='*' 
    keyfile = keyfile  # Change this name to reflect the name of your file. Make sure it is saved in the same directory as your script.

    ## Read api key in from file
    with open ('../../lessons/APIs/'+keyfile) as key: # The path will differ depending on where the key is saved.
        api_key=key.read().strip()    
    ## Retrieve data, print output to screen
    base_url = f'https://api.census.gov/data/{year}/{name}/{acronym}/' 
    data_url = f'{base_url}profile?get={cols}&for=county:{county}&key={api_key}'
    response = requests.get(data_url)

    temp = response.json()
    return temp

# Get the returned request
perc_span2014b = make_request(year='2014', name='acs', acronym='acs5', cols='DP02_0114PE,NAME', keyfile='census_key.txt')

# Make a dataframe
perc_span2014b = response_to_df(from_census=perc_span2014b)

# Compare size
perc_span2014b.shape == perc_span2014.shape  # Returns True

6. Using the two functions you have made, download the 2016 values for three variables: DP02_0114PE, the estimate for total households with a broadband internet connection, and population 25 years and over with a bachelor's degree. Make the data a dataframe called `mult_2016.`  Look at the first 12 rows and rows 400:408 to see if everything looks normal.

In [None]:
temp = make_request(year='2016', name='acs', acronym='acs5', cols='DP02_0114PE,DP02_0152E,DP02_0064E,NAME', keyfile='census_key.txt')

# Make a dataframe
mult_2016 = response_to_df(from_census=temp)

# Inspect
mult_2016.head(12)

mult_2016.iloc[400:408,]

7. Next, analyze the same three variables but for the years 2013-2018.  Make sure the final dataframe has a column indicating the ACS year, and name the final dataframe containing all the years `mult_201318`.

In [None]:
dfs = []  # Make list for dataframes, concatenate at the end
years = range(2013, 2019)

for year in years:
    t = make_request(year=str(year), name='acs', acronym='acs5', cols='DP02_0114PE,DP02_0152E,DP02_0064E,NAME', keyfile='census_key.txt')
    t = response_to_df(from_census=t)
    t['year'] = str(year)
    dfs.append(t)

mult_201318 = pd.concat(dfs)

8. Now, modify `response_to_df` so that it saves the dataframe it creates.  For this functionality, add an argument called `outfile` that expects a string value.  The modified function will write a .csv to that location _if the user passes a string_, otherwise it writes nothing.  Test it by with the `temp` response from above and name the file 'mult_2016.csv'. 

In [None]:
# Make a new function with the same name.
def response_to_df(from_census, outfile=None):
    # Kept the same
    data = from_census
    header = data.pop(0)
    df = pd.DataFrame(data, columns=header)
    print('The returned data contain ' + str(df.shape[0]) + ' rows and ' + str(df.shape[1]) + ' columns.')
    
    # New
    if isinstance(outfile, str):  # If the outfile is a string,
        df.to_csv(outfile, index=False)
        print('The dataframe was written to ' + outfile + '.')
    
    if isinstance(outfile, str) == False:  # Not needed for full credit
        print('To save the dataframe, do not forget to provide the outfile as a string.')
        
    return(df)

mult_2016 = response_to_df(from_census=temp, outfile='mult_2016.csv')

9. Finally, download subject table S2507 from 2018, convert it to a dataframe named `s2507`, and save it as 's2507_2018.csv'. _NB: S tables are known as subject tables and have slightly different URLs than for individual variables or a group.  Modify code accordingly._ 

In [14]:
def make_request_table(year, name, acronym, cols, keyfile):
    ## URL parts
    year=year
    name= name
    acronym=acronym 
    cols= cols 
    county='*' 
    keyfile= keyfile  # Change this name to reflect the name of your file. Make sure it is saved in the same directory as your script.

    ## Read api key in from file
    with open ('../../lessons/APIs/'+keyfile) as key:
        api_key=key.read().strip()    
    ## Retrieve data, print output to screen
    base_url = f'https://api.census.gov/data/{year}/{name}/{acronym}/' 
    data_url = f'{base_url}subject?get={cols}&for=county:{county}&key={api_key}'
    response = requests.get(data_url)

    temp = response.json()
    return temp


s2507 = make_request_table(year='2018', name='acs', acronym='acs5', cols='group(S2507),NAME', keyfile='census_key.txt')
s2507 = response_to_df(from_census=s2507, outfile='s2507_2018.csv')

SyntaxError: invalid syntax (290018620.py, line 21)

## Debugging
The final questions of the problem set provides code with error(s).  You have to debug the code as asked.

10. The below code tries to create a dataframe using functions made earlier, yet it does not work.  Why?  Answer in a Markdown cell and then provide the correct code in a code cell.

In [None]:
test = make__request(year='2014', name='acs', acronym='acs5', cols='DP02_0114PE,NAME', keyfile='census_key.txt')
test_df = response_to_fd(test)

Both function names were typed incorectly.  There is an extra underscore in `make_request` and `response_to_fd` should end with `_df`.  The correct code is:

In [None]:
test = make_request(year='2014', name='acs', acronym='acs5', cols='DP02_0114PE,NAME', keyfile='census_key.txt')
test_df = response_to_df(test)

11. There is an error in the code below.  Explain the error in a Markdown cell and then provide the correct code in a code cell.

In [None]:
mult_bad = make_request(year='2018', name='acs', acronym='acs5', cols='DP02_0072E, DP02_0127PE, DP02_0024PE DP02_0014E,NAME', keyfile='census_key.txt')

The errors are in the `cols` argument.  There should not be spaces after the ',' and a comma is missing between '0024PE DP02'.

In [None]:
mult_good = make_request(year='2018', name='acs', acronym='acs5', cols='DP02_0072E,DP02_0127PE,DP02_0024PE,DP02_0014E,NAME', keyfile='census_key.txt')

12. The function `make_request_bad` contains two errors.  Make a new function called `make_request_good` that fixes those two errors, and provides comments in the function explaining the error fix.  Then test the good function by using it to make a dataframe called `perc_span2014c`, and compare the shape of this new dataframe to `perc_span2014`.

In [None]:
def make_request_bad(year, name, acronym, cols, keyfile):
    ## URL parts
    year=year
    name= name
    acronym=acroym 
    cols= cols 
    county='*' 
    keyfile = keyfil 
    ## Read api key in from file
    with open ('../../lessons/APIs/'+keyfile) as key:
        api_key=key.read().strip()    
    ## Retrieve data, print output to screen
    base_url = f'https://api.census.gov/data/{year}/{name}/{acronym}/' 
    data_url = f'{base_url}profile?get={cols}&for=county:{county}&key={api_key}'
    response = requests.get(data_url)

    temp = response.json()
    return temp
    
def make_request_good(year, name, acronym, cols, keyfile):
    ## URL parts
    year=year
    name= name
    acronym=acronym # acroym -> acronym or change argument name in first line
    cols= cols 
    county='*' 
    keyfile = keyfile  # keyfil -> keyfile 
    ## Read api key in from file
    with open ('../../lessons/APIs/'+keyfile) as key:
        api_key=key.read().strip()    
    ## Retrieve data, print output to screen
    base_url = f'https://api.census.gov/data/{year}/{name}/{acronym}/' 
    data_url = f'{base_url}profile?get={cols}&for=county:{county}&key={api_key}'
    response = requests.get(data_url)

    temp = response.json()
    return temp


## Make request
perc_span2014c = make_request_good(year='2014', name='acs', acronym='acs5', cols='DP02_0114PE,NAME', keyfile='census_key.txt')

## Make a dataframe
perc_span2014c = response_to_df(from_census=perc_span2014c)

## Compare
perc_span2014c.shape == perc_span2014.shape