# Final Project: Places Data Wrangling

- **Vintage**:  2020
- **Geography Level**: Places     
- **Variables**:  https://api.census.gov/data/2020/acs/acs5/profile/variables.html 
- **Supported Geographies**: https://api.census.gov/data/2020/acs/acs5/profile/geography.html

### ***Question***:  
- What is the estimation and percent of population who speak Spanish at home for each place in California?  

## 1. Import necessary packages

In [51]:
import pandas as pd
import json
import requests

## 2. Build the API Request URL

### 2.1. Base URL

In [52]:
base_url = "https://api.census.gov/data"

### 2.2. Dataset Name

In [53]:
dataset_name = "/2020/acs/acs5/profile"

### 2.3. Get Variables

- **DP02_0116E**: Estimate of population (5 years and over) who speaks Spanish at home
- **DP02_0116PE**: Percent of population (5 years and over) who speaks Spanish at home

In [54]:
get_variables = "?get=NAME,DP02_0116E,DP02_0116PE"

### 2.4. Geography Levels 

- Every state in the US

In [55]:
geography = "&for=place:*&in=state:06"

### 2.5. Put it all together 

In [56]:
request_url = base_url + dataset_name + get_variables + geography
print("request_url = ", request_url)

request_url =  https://api.census.gov/data/2020/acs/acs5/profile?get=NAME,DP02_0116E,DP02_0116PE&for=place:*&in=state:06


## 3. Make the API call

In [57]:
# Make API Call
r = requests.get(request_url)

api_results = r.json()

In [58]:
type(api_results)

list

## 4. Get the data into a Dataframe 

In [59]:
data = pd.DataFrame(api_results)

print("Number of rows:", data.shape[0])
print("Number of columns:", data.shape[1])
data.head()

Number of rows: 1612
Number of columns: 5


Unnamed: 0,0,1,2,3,4
0,NAME,DP02_0116E,DP02_0116PE,state,place
1,"Home Garden CDP, California",913,64.3,06,34281
2,"Home Gardens CDP, California",6587,58.5,06,34302
3,"Homeland CDP, California",3226,45.3,06,34316
4,"Homestead Valley CDP, California",182,7.1,06,34392


## 5. Get the first row into columns and then get rid of it

In [60]:
data.columns = data.iloc[0]

data = data.iloc[1:]

print("Number of rows:", data.shape[0])
print("Number of columns:", data.shape[1])
data.head()

Number of rows: 1611
Number of columns: 5


Unnamed: 0,NAME,DP02_0116E,DP02_0116PE,state,place
1,"Home Garden CDP, California",913,64.3,6,34281
2,"Home Gardens CDP, California",6587,58.5,6,34302
3,"Homeland CDP, California",3226,45.3,6,34316
4,"Homestead Valley CDP, California",182,7.1,6,34392
5,"Homewood Canyon CDP, California",31,12.9,6,34405


## 6. Cleaning Data

### 6.1. Splitting column to get some information

In [61]:
two_new_cols = ['Place_Name', 'State_Name']

data[two_new_cols] = data['NAME'].str.split(', ',1, expand=True)

print("Number of rows:", data.shape[0])
print("Number of columns:", data.shape[1])
data.head()

Number of rows: 1611
Number of columns: 7


Unnamed: 0,NAME,DP02_0116E,DP02_0116PE,state,place,Place_Name,State_Name
1,"Home Garden CDP, California",913,64.3,6,34281,Home Garden CDP,California
2,"Home Gardens CDP, California",6587,58.5,6,34302,Home Gardens CDP,California
3,"Homeland CDP, California",3226,45.3,6,34316,Homeland CDP,California
4,"Homestead Valley CDP, California",182,7.1,6,34392,Homestead Valley CDP,California
5,"Homewood Canyon CDP, California",31,12.9,6,34405,Homewood Canyon CDP,California


### 6.2. Dropping repeated column

In [62]:
data.drop("NAME", axis='columns', inplace=True)

print("Number of rows:", data.shape[0])
print("Number of columns:", data.shape[1])
data.head()

Number of rows: 1611
Number of columns: 6


Unnamed: 0,DP02_0116E,DP02_0116PE,state,place,Place_Name,State_Name
1,913,64.3,6,34281,Home Garden CDP,California
2,6587,58.5,6,34302,Home Gardens CDP,California
3,3226,45.3,6,34316,Homeland CDP,California
4,182,7.1,6,34392,Homestead Valley CDP,California
5,31,12.9,6,34405,Homewood Canyon CDP,California


### 6.3. Renaming columns

In [63]:
cols_to_rename = {
                   'DP02_0116E' : 'Language spoken at home (Spanish) (DP02_0116E)', 
                   'DP02_0116PE' : 'Language spoken at home (Spanish) - Percent (DP02_0116PE)', 
                   'state' : 'FIPS_State', 
                   'place' : 'FIPS_Place'
                 }
data.rename(columns = cols_to_rename, inplace=True)

print("Number of rows:", data.shape[0])
print("Number of columns:", data.shape[1])
data.head()

Number of rows: 1611
Number of columns: 6


Unnamed: 0,Language spoken at home (Spanish) (DP02_0116E),Language spoken at home (Spanish) - Percent (DP02_0116PE),FIPS_State,FIPS_Place,Place_Name,State_Name
1,913,64.3,6,34281,Home Garden CDP,California
2,6587,58.5,6,34302,Home Gardens CDP,California
3,3226,45.3,6,34316,Homeland CDP,California
4,182,7.1,6,34392,Homestead Valley CDP,California
5,31,12.9,6,34405,Homewood Canyon CDP,California


### 6.4. Reordering columns

In [64]:
cols_to_keep = ['Place_Name', 'State_Name', 'Language spoken at home (Spanish) (DP02_0116E)', 'Language spoken at home (Spanish) - Percent (DP02_0116PE)', 'FIPS_Place', 'FIPS_State']
df = data[cols_to_keep]

print("Number of rows:", df.shape[0])
print("Number of columns:", df.shape[1])
df.head()

Number of rows: 1611
Number of columns: 6


Unnamed: 0,Place_Name,State_Name,Language spoken at home (Spanish) (DP02_0116E),Language spoken at home (Spanish) - Percent (DP02_0116PE),FIPS_Place,FIPS_State
1,Home Garden CDP,California,913,64.3,34281,6
2,Home Gardens CDP,California,6587,58.5,34302,6
3,Homeland CDP,California,3226,45.3,34316,6
4,Homestead Valley CDP,California,182,7.1,34392,6
5,Homewood Canyon CDP,California,31,12.9,34405,6


## 7. Save the Dataframe as a CSV file

In [65]:
csv_file_to_create = "Places_Data.csv"

filename_with_path = "Data/" + csv_file_to_create
df.to_csv(filename_with_path, index=False)