# Control Structures Problem Set

<span style="color:red">0 / 0 points</span>.

In the next chapter on data visualization, we would like to visualize data for specific points in time a well as across several years of data. The American Community Survey DP04 housing dataset has data files from 2010 to 2022, which is suitable for visualizing group and time-series variables. Instead of loading twelve individual datasets, you will be merging all the data available like you practiced in the control structures chapter. This means loading in multiple ACS datasets with a for loop. Then the data needs cleaning. 

This problem set is a little different in that each question leads towards a final usable dataframe. It is also a change from previous problem sets in that you will have to put several questions/problems together into a single code cell, as opposed to past problem sets where we generally asked for one line of code per question. Be sure to read the questions carefully so that your code meets all of the requirements.

The variables we are interested in visualizing from the ACS data, including descriptions and suggested new names, are in the table below.


|Variable|New Name|Label|
|:-:|:-:|:-:|
|NAME|CITY + STATE|	Geographic Area Name|
|DP04_0001E |TOT_UNITS|HOUSING OCCUPANCY Total housing units
|DP04_0089E |MED_VALUE|Estimate Median Value (dollars)
|DP04_0134E	|MED_RENT|Estimate Median Rent(dollars)
|DP04_0002E |OCCUP_OCCUP | HOUSING OCCUPANCY Total housing units Occupied housing units
|DP04_0003E	|VACANT| HOUSING OCCUPANCY Total housing units Vacant housing units
|DP04_0017E |BUILT_2020 |Year housing built: 2020 or later|
|DP04_0018E |BUILT_2010 |Year housing built: 2010 to 2019|
|DP04_0019E |BUILT_2000 |Year housing built: 2000 to 2009|
|DP04_0020E |BUILT_1990 |Year housing built: 1990 to 1999|
|DP04_0021E |BUILT_1980 |Year housing built: 1980 to 1989|
|DP04_0022E |BUILT_1970 |Year housing built: 1970 to 1979|
|DP04_0023E |BUILT_1960 |Year housing built: 1960 to 1969|
|DP04_0024E |BUILT_1950 |Year housing built: 1950 to 1959|
|DP04_0025E |BUILT_1940 |Year housing built: 1940 to 1949|
|DP04_0026E |BUILT_1939 |Year housing built: 1939 or earlier|

For this task you will need to create a for loop to read several ACS files, which you have practice doing. There is also some cleaning and and manipulation, like you have seen in the introduction to pandas chapters one and two. We'll provide step by step instructions for each code cell, and by the end of this assignment, you should have one `housing` dataframe that encompasses every year of housing information for the DP04 1-Year data, from 2010 to 2022

In [8]:
# This code cell will be in every one of our chapters in Jupyter Notebook
# The function allows you to see every line of output when the code has multiple lines
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

In [2]:
import pandas as pd
import os

## Programming 

### Step 1 - Prepare Path and File Objects

Make the non-variable objects that will help you iterate over your data files.

1. A `path` object that is a string leading back to the DP04 'Place' data. Remeber to use `os.getcwd()` if you can't recall where this Jupyter Notebook's path is.
2. A `csv_files` list object of strings, created with list comprehension, which contains all the filenames in the `path` directory that end in 'Data.csv'
3. Use `.sort()` on `csv_files` to sort the list alphabetically (i.e.: by year).
4. A `years` list, which is a range of values from 2010 to 2022. Remember that the range() gives an end number _lower_ than the number you provide: Check the content of `years`.
5. Remove the  2022 string from `years`, because the ACS did not collect data during the COVID-19 pandemic. _HINT: We showed you how to remove values from lists in Introduction to Python_

In [9]:
# loop to load all ACS DP04 Place csv files
path = '../../Data/ACS/DP04/Place/'

csv_files = [i for i in os.listdir(path) if i.endswith('Data.csv')]
csv_files.sort() # listdir doesnt put the csv files in alphabetical order, so sort them

years=list(range(2010,2023))
years
years.remove(2020) # ACS is missing data for 2020, remove it from the 'year' list

[2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022]

### Step 2 - Load Files in a For Loop
Make a for loop that iterates over the `csv_files` list and concatenates the opened dataframes into a new dataframe.

1. Make an empty python object called `ACS`, then reassign it as a `pd.DataFrame()` datatype.
2. Make your for loop iterate over the range in the length of `csv_files`
3. In the loop, make a `file_path` object that is the name of the `path` plus the name of the csv file for index `[i]` of csv_files.
4. In the loop, make a `temp` object where you can assign each csv file temporarily`
5. In the loop, create a new variable in the temp object, `temp['YEAR']` and assign the value of the year in index `[i]` of `year`. It is one year per iteration so you won't need to make any fancy nested loops.
6. In the last line of the loop, concatenate the `ACS` dataframe with the current `temp` dataframe.
7. Inspect the `ACS` dataframe out of the loop

In [10]:
ACS=[]
ACS=pd.DataFrame(ACS) # ceate empty dataframe to take new data
length=0
for i in range(len(csv_files)):
    file_path=path+csv_files[i]
    temp=pd.read_csv(file_path, low_memory=False) # load unique dataframes to a temporary df
    temp['YEAR']=years[i] # add a year variable into the dataframe
    ACS=pd.concat([ACS,temp]) # concatenate the data into 'ACS'
    length=len(temp)+length
    print(f'{file_path} of length {len(temp)} while ACS = {len(ACS)}')

ACS

../../Data/ACS/DP04/Place/ACSDP1Y2010.DP04-Data.csv of length 556 while ACS = 556
../../Data/ACS/DP04/Place/ACSDP1Y2011.DP04-Data.csv of length 563 while ACS = 1119
../../Data/ACS/DP04/Place/ACSDP1Y2012.DP04-Data.csv of length 569 while ACS = 1688
../../Data/ACS/DP04/Place/ACSDP1Y2013.DP04-Data.csv of length 583 while ACS = 2271
../../Data/ACS/DP04/Place/ACSDP1Y2014.DP04-Data.csv of length 592 while ACS = 2863
../../Data/ACS/DP04/Place/ACSDP1Y2015.DP04-Data.csv of length 597 while ACS = 3460
../../Data/ACS/DP04/Place/ACSDP1Y2016.DP04-Data.csv of length 606 while ACS = 4066
../../Data/ACS/DP04/Place/ACSDP1Y2017.DP04-Data.csv of length 615 while ACS = 4681
../../Data/ACS/DP04/Place/ACSDP1Y2018.DP04-Data.csv of length 631 while ACS = 5312
../../Data/ACS/DP04/Place/ACSDP1Y2019.DP04-Data.csv of length 635 while ACS = 5947
../../Data/ACS/DP04/Place/ACSDP1Y2021.DP04-Data.csv of length 635 while ACS = 6582
../../Data/ACS/DP04/Place/ACSDP1Y2022.DP04-Data.csv of length 647 while ACS = 7229


Unnamed: 0,GEO_ID,NAME,DP04_0001E,DP04_0001M,DP04_0001PE,DP04_0001PM,DP04_0002E,DP04_0002M,DP04_0002PE,DP04_0002PM,...,YEAR,DP04_0142E,DP04_0142M,DP04_0142PE,DP04_0142PM,DP04_0143E,DP04_0143M,DP04_0143PE,DP04_0143PM,Unnamed: 574
0,Geography,Geographic Area Name,Estimate!!HOUSING OCCUPANCY!!Total housing units,Estimate Margin of Error!!HOUSING OCCUPANCY!!T...,Percent!!HOUSING OCCUPANCY!!Total housing units,Percent Margin of Error!!HOUSING OCCUPANCY!!To...,Estimate!!HOUSING OCCUPANCY!!Occupied housing ...,Estimate Margin of Error!!HOUSING OCCUPANCY!!O...,Percent!!HOUSING OCCUPANCY!!Occupied housing u...,Percent Margin of Error!!HOUSING OCCUPANCY!!Oc...,...,2010,,,,,,,,,
1,1600000US0107000,"Birmingham city, Alabama",108537,3024,108537,(X),87228,2521,80.4,1.9,...,2010,,,,,,,,,
2,1600000US0121184,"Dothan city, Alabama",29249,909,29249,(X),25840,1012,88.3,2.1,...,2010,,,,,,,,,
3,1600000US0135896,"Hoover city, Alabama",34160,2162,34160,(X),30280,1881,88.6,2.8,...,2010,,,,,,,,,
4,1600000US0137000,"Huntsville city, Alabama",86495,1839,86495,(X),74841,2444,86.5,2.2,...,2010,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
642,1600000US7210334,"Caguas zona urbana, Puerto Rico",36911,1998,36911,(X),32415,1947,87.8,3.0,...,2022,4336,1279,50.3,12.5,4616,1278,(X),(X),
643,1600000US7214290,"Carolina zona urbana, Puerto Rico",72023,2021,72023,(X),60265,2552,83.7,2.4,...,2022,5819,1431,40.9,8.5,4192,1382,(X),(X),
644,1600000US7232522,"Guaynabo zona urbana, Puerto Rico",33545,1246,33545,(X),29229,1640,87.1,3.9,...,2022,2670,989,38.1,11.8,2132,686,(X),(X),
645,1600000US7263820,"Ponce zona urbana, Puerto Rico",56582,1459,56582,(X),45731,1852,80.8,2.2,...,2022,4419,1101,45.5,9.5,6404,790,(X),(X),


### Step 3 - Select Variables

1. Select the following variables from `ACS` into a new dataframe called `housing`: 
```
'NAME','YEAR','DP04_0001E','DP04_0089E','DP04_0134E','DP04_0002E','DP04_0003E','DP04_0017E','DP04_0018E', 'DP04_0019E','DP04_0020E','DP04_0021E','DP04_0022E','DP04_0023E','DP04_0024E','DP04_0025E','DP04_0026E'
```
2. Check the first few rows of `housing` and see if there is a row of labels.
3. Drop the 'labels' row and check the first few rows again.

In [11]:
# select variables
housing = ACS[['NAME','YEAR','DP04_0001E','DP04_0089E','DP04_0134E','DP04_0002E','DP04_0003E','DP04_0017E','DP04_0018E',
               'DP04_0019E','DP04_0020E','DP04_0021E','DP04_0022E','DP04_0023E','DP04_0024E','DP04_0025E','DP04_0026E']]

housing.head()

#drop the "labels" row which has a description of each variable
housing=housing.drop(0, axis=0)

housing.head()

Unnamed: 0,NAME,YEAR,DP04_0001E,DP04_0089E,DP04_0134E,DP04_0002E,DP04_0003E,DP04_0017E,DP04_0018E,DP04_0019E,DP04_0020E,DP04_0021E,DP04_0022E,DP04_0023E,DP04_0024E,DP04_0025E,DP04_0026E
0,Geographic Area Name,2010,Estimate!!HOUSING OCCUPANCY!!Total housing units,Estimate!!MORTGAGE STATUS!!Owner-occupied units,Estimate!!GROSS RENT AS A PERCENTAGE OF HOUSEH...,Estimate!!HOUSING OCCUPANCY!!Occupied housing ...,Estimate!!HOUSING OCCUPANCY!!Vacant housing units,Estimate!!YEAR STRUCTURE BUILT!!Built 2005 or ...,Estimate!!YEAR STRUCTURE BUILT!!Built 2000 to ...,Estimate!!YEAR STRUCTURE BUILT!!Built 1990 to ...,Estimate!!YEAR STRUCTURE BUILT!!Built 1980 to ...,Estimate!!YEAR STRUCTURE BUILT!!Built 1970 to ...,Estimate!!YEAR STRUCTURE BUILT!!Built 1960 to ...,Estimate!!YEAR STRUCTURE BUILT!!Built 1950 to ...,Estimate!!YEAR STRUCTURE BUILT!!Built 1940 to ...,Estimate!!YEAR STRUCTURE BUILT!!Built 1939 or ...,Estimate!!ROOMS!!Total housing units
1,"Birmingham city, Alabama",2010,108537,44824,38684,87228,21309,3691,3872,5443,11517,18783,14725,20893,12188,17425,108537
2,"Dothan city, Alabama",2010,29249,15142,9931,25840,3409,2162,1683,4288,3723,7928,3377,3040,1958,1090,29249
3,"Hoover city, Alabama",2010,34160,20130,9865,30280,3880,2963,3438,10126,6253,5766,3464,1808,128,214,34160
4,"Huntsville city, Alabama",2010,86495,46809,25681,74841,11654,7217,5689,8984,14795,13529,21424,8446,3231,3180,86495


Unnamed: 0,NAME,YEAR,DP04_0001E,DP04_0089E,DP04_0134E,DP04_0002E,DP04_0003E,DP04_0017E,DP04_0018E,DP04_0019E,DP04_0020E,DP04_0021E,DP04_0022E,DP04_0023E,DP04_0024E,DP04_0025E,DP04_0026E
1,"Birmingham city, Alabama",2010,108537,44824,38684,87228,21309,3691,3872,5443,11517,18783,14725,20893,12188,17425,108537
2,"Dothan city, Alabama",2010,29249,15142,9931,25840,3409,2162,1683,4288,3723,7928,3377,3040,1958,1090,29249
3,"Hoover city, Alabama",2010,34160,20130,9865,30280,3880,2963,3438,10126,6253,5766,3464,1808,128,214,34160
4,"Huntsville city, Alabama",2010,86495,46809,25681,74841,11654,7217,5689,8984,14795,13529,21424,8446,3231,3180,86495
5,"Mobile city, Alabama",2010,89745,44130,28381,75328,14417,3124,2513,6998,10574,22379,14470,15797,6022,7868,89745


### Step 4 - Extract City and State Values from `NAME`
1. Extract the city and state names from the `housing['NAME']` variable and assign them to new housing columns names 'CITY' and 'STATE'. You may use the .insert() method to place them into a specific column index if you prefer, or you may use simple assignment to create new variables at the end of the dataframe.

In [12]:
# split `NAME` into city and state vectors, then insert these into the df
city = housing['NAME'].str.split(', ').str[0] # 0 represents the first string after the comma
state = housing['NAME'].str.split(', ').str[1] # 1 represents the second string before the comma
housing.insert(1, 'CITY', city)
housing.insert(2, 'STATE', state)

### Step 5 - Change DP04 Variable to Numeric

All of the `DP04_XXXXX` variables should be a 'numeric' datatype but pandas read them as 'object', aka 'string'. Change the datatype of the DP04_ variables to a numeric type. Check your work with the `.info()` function

In [14]:
# force numeric variables to numeric data types, and coerce strings to NaN missing values.
housing[['DP04_0001E','DP04_0089E','DP04_0134E','DP04_0002E','DP04_0003E',
         'DP04_0017E','DP04_0018E','DP04_0019E','DP04_0020E','DP04_0021E',
         'DP04_0022E','DP04_0023E','DP04_0024E','DP04_0025E','DP04_0026E'
        ]] = housing[['DP04_0001E','DP04_0089E','DP04_0134E','DP04_0002E','DP04_0003E',
                      'DP04_0017E','DP04_0018E','DP04_0019E','DP04_0020E','DP04_0021E',
                      'DP04_0022E','DP04_0023E','DP04_0024E','DP04_0025E','DP04_0026E']].apply(pd.to_numeric, errors='coerce')

housing.info()

<class 'pandas.core.frame.DataFrame'>
Index: 7217 entries, 1 to 646
Data columns (total 19 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   NAME        7217 non-null   object 
 1   CITY        7217 non-null   object 
 2   STATE       7217 non-null   object 
 3   YEAR        7217 non-null   int64  
 4   DP04_0001E  7217 non-null   int64  
 5   DP04_0089E  7205 non-null   float64
 6   DP04_0134E  7186 non-null   float64
 7   DP04_0002E  7217 non-null   int64  
 8   DP04_0003E  7217 non-null   int64  
 9   DP04_0017E  7175 non-null   float64
 10  DP04_0018E  7175 non-null   float64
 11  DP04_0019E  7175 non-null   float64
 12  DP04_0020E  7175 non-null   float64
 13  DP04_0021E  7175 non-null   float64
 14  DP04_0022E  7175 non-null   float64
 15  DP04_0023E  7175 non-null   float64
 16  DP04_0024E  7175 non-null   float64
 17  DP04_0025E  7175 non-null   float64
 18  DP04_0026E  7189 non-null   float64
dtypes: float64(12), int64(4), object(

### Step 6 - Clean Variable Names

Rename the variables into something more legible. We'll provide the new names for you because they will need to match in order for the code in the data visualization chapter to work!
```
'DP04_0001E' to 'TOT_UNITS',
'DP04_0089E' to 'MED_VALUE',
'DP04_0134E' to 'MED_RENT',
'DP04_0002E' to 'OCCUP_OCCUP',
'DP04_0003E' to 'VACANT',
'DP04_0017E' to 'BUILT_2020',
'DP04_0018E' to 'BUILT_2010',
'DP04_0019E' to 'BUILT_2000',
'DP04_0020E' to 'BUILT_1990',
'DP04_0021E' to 'BUILT_1980',
'DP04_0022E' to 'BUILT_1970',
'DP04_0023E' to 'BUILT_1960',
'DP04_0024E' to 'BUILT_1950',
'DP04_0025E' to 'BUILT_1940',
'DP04_0026E' to 'BUILT_1939'},
```

_Hint: You can use the inplace=True argument in the .rename() function. Check intro to pandas chapter 1_

In [15]:
# rename variables to something readable
housing.rename(columns=
               {'DP04_0001E':'TOT_UNITS',
                'DP04_0089E':'MED_VALUE',
                'DP04_0134E':'MED_RENT',
                'DP04_0002E':'OCCUP_OCCUP',
                'DP04_0003E':'VACANT',
                'DP04_0017E':'BUILT_2020',
                'DP04_0018E':'BUILT_2010',
                'DP04_0019E':'BUILT_2000',
                'DP04_0020E':'BUILT_1990',
                'DP04_0021E':'BUILT_1980',
                'DP04_0022E':'BUILT_1970',
                'DP04_0023E':'BUILT_1960',
                'DP04_0024E':'BUILT_1950',
                'DP04_0025E':'BUILT_1940',
                'DP04_0026E':'BUILT_1939'},
               inplace=True
              )

housing.head(5)

Unnamed: 0,NAME,CITY,STATE,YEAR,TOT_UNITS,MED_VALUE,MED_RENT,OCCUP_OCCUP,VACANT,BUILT_2020,BUILT_2010,BUILT_2000,BUILT_1990,BUILT_1980,BUILT_1970,BUILT_1960,BUILT_1950,BUILT_1940,BUILT_1939
1,"Birmingham city, Alabama",Birmingham city,Alabama,2010,108537,44824.0,38684.0,87228,21309,3691.0,3872.0,5443.0,11517.0,18783.0,14725.0,20893.0,12188.0,17425.0,108537.0
2,"Dothan city, Alabama",Dothan city,Alabama,2010,29249,15142.0,9931.0,25840,3409,2162.0,1683.0,4288.0,3723.0,7928.0,3377.0,3040.0,1958.0,1090.0,29249.0
3,"Hoover city, Alabama",Hoover city,Alabama,2010,34160,20130.0,9865.0,30280,3880,2963.0,3438.0,10126.0,6253.0,5766.0,3464.0,1808.0,128.0,214.0,34160.0
4,"Huntsville city, Alabama",Huntsville city,Alabama,2010,86495,46809.0,25681.0,74841,11654,7217.0,5689.0,8984.0,14795.0,13529.0,21424.0,8446.0,3231.0,3180.0,86495.0
5,"Mobile city, Alabama",Mobile city,Alabama,2010,89745,44130.0,28381.0,75328,14417,3124.0,2513.0,6998.0,10574.0,22379.0,14470.0,15797.0,6022.0,7868.0,89745.0


### Step 7 - Create a Function to Load CSV Files

1. Create the non-variable objects `path` and `years` before creating your function. These are the same as from Step 1.
2. Create a new function named `dataload` which you will be able to use to load multi-year csv files from a directory to a concatenated dataframe.
3. Inside the function, create a `csv_files` object with list comprehension that takes the names of all csv files in the path
4. Inside the function, also create a `df` object that is an empty pandas dataframe.
5. Create a for loop inside the function which loop over the range of the length of `csv_files`.
6. Inside this loop, create a `file_path` string object that is the `path` string and the csv file name of the current iteration.
7. Load the current iteration's csv file to a temporary `temp` object.
8. Concatenate the `df` object with `temp`.
9. Return the `df` object
10. Call the `dataload()` function and assign the output of the function to `ACS_fun`. 
11. Use pandas' `.equals()` function to test whether `ACS_fun` and `ACS` are identical dataframes. The syntax is simple: `df1.equals(df2)` which returns a True or False.

In [17]:
# loop to load all ACS DP04 Place csv files
path = '../../Data/ACS/DP04/Place/'
years=list(range(2010,2023))
years.remove(2020) # ACS is missing data for 2020, remove it from the 'year' list

def dataload(path, year):
    length=0
    csv_files = [i for i in os.listdir(path) if i.endswith('Data.csv')]
    csv_files.sort() # listdir doesnt put the csv files in alphabetical order, so sort them   
    df=[]
    df=pd.DataFrame(df) # ceate empty dataframe to take new data
    for i in range(len(csv_files)):
        file_path=path+csv_files[i]
        temp=pd.read_csv(file_path, low_memory=False) # load unique dataframes to a temporary df
        temp['YEAR']=years[i] # add a year variable into the dataframe
        df=pd.concat([df,temp]) # concatenate the data into 'ACS'
        length=len(temp)+length
        print(f'{file_path} of length {len(temp)} while df = {len(df)}')
    return df

ACS_fun = dataload(path=path, year=years)

ACS.equals(ACS_fun) # compare ACS and ACS_fun that you just created

../../Data/ACS/DP04/Place/ACSDP1Y2010.DP04-Data.csv of length 556 while df = 556
../../Data/ACS/DP04/Place/ACSDP1Y2011.DP04-Data.csv of length 563 while df = 1119
../../Data/ACS/DP04/Place/ACSDP1Y2012.DP04-Data.csv of length 569 while df = 1688
../../Data/ACS/DP04/Place/ACSDP1Y2013.DP04-Data.csv of length 583 while df = 2271
../../Data/ACS/DP04/Place/ACSDP1Y2014.DP04-Data.csv of length 592 while df = 2863
../../Data/ACS/DP04/Place/ACSDP1Y2015.DP04-Data.csv of length 597 while df = 3460
../../Data/ACS/DP04/Place/ACSDP1Y2016.DP04-Data.csv of length 606 while df = 4066
../../Data/ACS/DP04/Place/ACSDP1Y2017.DP04-Data.csv of length 615 while df = 4681
../../Data/ACS/DP04/Place/ACSDP1Y2018.DP04-Data.csv of length 631 while df = 5312
../../Data/ACS/DP04/Place/ACSDP1Y2019.DP04-Data.csv of length 635 while df = 5947
../../Data/ACS/DP04/Place/ACSDP1Y2021.DP04-Data.csv of length 635 while df = 6582
../../Data/ACS/DP04/Place/ACSDP1Y2022.DP04-Data.csv of length 647 while df = 7229


True

### Step 8 - Create a Function to Clean the ACS Data

Now lets make a function that does all of the ACS cleaning steps. 
1. Create a function called `cleanACS` and give it a single argument: `data`. You will feed the newly created `ACS_fun` dataframe to the `data` argument, so that the ACS_fun dataframe becomes the `data` object in your function.
2. Inside the function, create an empty pandas dataframe object named `clean_ACS`. All your data manipulation that used to be `housing` will be `clean_ACS` in this function
3. Gather up __all__ of the code you used to clean `ACS` into the function.
4. Run the custom `cleanACS()` function and assign it to an object named `clean_ACS` and inspect it. This will be useful in testing your function as you go.
5. Use pandas' `.equals()` function to test whether `clean_ACS` and `housing` are identical dataframes. 

In [18]:
def cleanACS(data):
    clean_ACS=pd.DataFrame([])
    clean_ACS = data[['NAME','YEAR','DP04_0001E','DP04_0089E','DP04_0134E','DP04_0002E','DP04_0003E',
                      'DP04_0017E','DP04_0018E','DP04_0019E','DP04_0020E','DP04_0021E','DP04_0022E',
                      'DP04_0023E','DP04_0024E','DP04_0025E','DP04_0026E']]
    clean_ACS=clean_ACS.drop(0, axis=0)
    city = clean_ACS['NAME'].str.split(', ').str[0] # 0 represents the first string after the comma
    state = clean_ACS['NAME'].str.split(', ').str[1] # 1 represents the second string before the comma
    clean_ACS.insert(1, 'CITY', city)
    clean_ACS.insert(2, 'STATE', state)
    clean_ACS[[
        'DP04_0001E','DP04_0089E','DP04_0134E','DP04_0002E','DP04_0003E','DP04_0017E','DP04_0018E','DP04_0019E',
        'DP04_0020E','DP04_0021E','DP04_0022E','DP04_0023E','DP04_0024E','DP04_0025E','DP04_0026E'
              ]] = clean_ACS[[
        'DP04_0001E','DP04_0089E','DP04_0134E','DP04_0002E','DP04_0003E','DP04_0017E','DP04_0018E','DP04_0019E',
        'DP04_0020E','DP04_0021E','DP04_0022E','DP04_0023E','DP04_0024E','DP04_0025E','DP04_0026E']].apply(pd.to_numeric, errors='coerce')
    clean_ACS.rename(columns=
               {'DP04_0001E':'TOT_UNITS',
                'DP04_0089E':'MED_VALUE',
                'DP04_0134E':'MED_RENT',
                'DP04_0002E':'OCCUP_OCCUP',
                'DP04_0003E':'VACANT',
                'DP04_0017E':'BUILT_2020',
                'DP04_0018E':'BUILT_2010',
                'DP04_0019E':'BUILT_2000',
                'DP04_0020E':'BUILT_1990',
                'DP04_0021E':'BUILT_1980',
                'DP04_0022E':'BUILT_1970',
                'DP04_0023E':'BUILT_1960',
                'DP04_0024E':'BUILT_1950',
                'DP04_0025E':'BUILT_1940',
                'DP04_0026E':'BUILT_1939'},
               inplace=True
              )
    return clean_ACS

In [19]:
clean_ACS = cleanACS(data=ACS_fun)

housing.equals(clean_ACS) # use the pandas equals function to test if both dataframes are the same. 

True

### Step 9 - Save Your Work

You will need the `housing` dataframe in the next chapter on data visualization. You can use `os.getcwd()` if you need to remember your current working directory. Save `housing` with `.to_csv()` into your current working directory ('Python for Social Science/assignments/control_structures/') with the file name housing.csv. Always remember to use the `index=False` argument in .to_csv() to keep the row index from saving as a new column. 

In [20]:
import os
print(os.getcwd()) # Get our current working directory
# see what dataframes are available to save
%whos DataFrame

/home/fernando/Documents/UCLA/DataX/Python_for_Social_Science/problem_sets/control_structures
Variable    Type         Data/Info
----------------------------------
ACS         DataFrame                   GEO_ID    <...>[7229 rows x 577 columns]
ACS_fun     DataFrame                   GEO_ID    <...>[7229 rows x 577 columns]
clean_ACS   DataFrame                             <...>n[7217 rows x 19 columns]
housing     DataFrame                             <...>n[7217 rows x 19 columns]
temp        DataFrame                   GEO_ID    <...>n[647 rows x 576 columns]


In [21]:
housing.to_csv('housing.csv', index=False) # index writes row names if True, so always tell it False
os.listdir()

['housing.csv',
 '.ipynb_checkpoints',
 'ps_control_structures.ipynb',
 'answers_control_structures.ipynb']

## Debugging
The final question of the problem set provides code with error(s). You have to inspect the code carefully and debug the code as asked.

10. The for loop below iterates over the `housing` dataframe and should calculates if a city's occupancy was less than 70% (30% or more of homes were unoccupied). If the rate is less than 0.7, it will print the year, city, and state that had such a low occupancy rate. Find the bugs in the code and fix them in a new code cell.

In [22]:
for i in len(housing):
    if (housing['OCCUP_OCCUP'].iloc[i]/housing['TOT_UNITS'].iloc[i]) < 0.7 :
        print('In', housing['YEAR'].iloc[i] housing['CITY'].iloc[i], housing['STATE'].iloc[i], 'had less than 70% occupancy')

SyntaxError: invalid syntax. Perhaps you forgot a comma? (2907403459.py, line 3)

In [25]:
for i in range(len(housing)):
    if (housing['OCCUP_OCCUP'].iloc[i]/housing['TOT_UNITS'].iloc[i])<0.7 :
        print('In', housing['YEAR'].iloc[i], housing['CITY'].iloc[i], housing['STATE'].iloc[i], 'had less than 70% occupancy')

In 2010 Fort Myers city Florida had less than 70% occupancy
In 2010 Miami Beach city Florida had less than 70% occupancy
In 2011 Fort Myers city Florida had less than 70% occupancy
In 2011 Miami Beach city Florida had less than 70% occupancy
In 2012 Fort Myers city Florida had less than 70% occupancy
In 2012 Miami Beach city Florida had less than 70% occupancy
In 2012 Gary city Indiana had less than 70% occupancy
In 2012 Detroit city Michigan had less than 70% occupancy
In 2013 Fort Myers city Florida had less than 70% occupancy
In 2013 Miami Beach city Florida had less than 70% occupancy
In 2013 Detroit city Michigan had less than 70% occupancy
In 2014 Miami Beach city Florida had less than 70% occupancy
In 2014 Gary city Indiana had less than 70% occupancy
In 2014 Detroit city Michigan had less than 70% occupancy
In 2015 Miami Beach city Florida had less than 70% occupancy
In 2016 Miami Beach city Florida had less than 70% occupancy
In 2016 Schenectady city New York had less than 70%