# Final Project: 2021 Display

- **Vintage**:  2020 and 2021 (differences)
- **Geography Level**: State     
- **Variables**:  https://api.census.gov/data/2020/acs/acs5/profile/variables.html 
- **Supported Geographies**: https://api.census.gov/data/2020/acs/acs5/profile/geography.html

### ***Questions***:  
2. Barcharts:
    - 2.1. Top 5 States with Spanish as first language at home (DP02_0116E)
    - 2.2. Top 5 States with Spanish as first language at home - Percent (DP02_0116PE)
#
3. US Map:
    - 3.1. US States: People speaking Spanish at home (DP02_0116E)
    - 3.2. US States: Percentage of people speaking Spanish at home (DP02_0116PE)

In [1]:
import pandas as pd
import plotly.express as px

## 1. Read csv file

In [2]:
df = pd.read_csv('Data/2021_Data.csv', dtype={'FIPS_State': str})
print(df.shape)
df.head()

(50, 9)


Unnamed: 0,State_Name,FIPS_State,State_Abbreviation,2020 - Language spoken at home (Spanish) (DP02_0116E),2020 - Language spoken at home (Spanish) - Percent (DP02_0116PE),2021 - Language spoken at home (Spanish) (DP02_0116E),2021 - Language spoken at home (Spanish) - Percent (DP02_0116PE),Difference - Language spoken at home (Spanish) (DP02_0116E),Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)
0,Arkansas,5,AR,153429,5.4,155476,5.5,2047,0.1
1,Washington,53,WA,602058,8.5,620206,8.7,18148,0.2
2,Kansas,20,KS,207181,7.6,212194,7.7,5013,0.1
3,Oklahoma,40,OK,269433,7.3,274323,7.4,4890,0.1
4,Wisconsin,55,WI,254258,4.6,256965,4.6,2707,0.0


In [3]:
print("Data types: ")
df.dtypes

Data types: 


State_Name                                                                 object
FIPS_State                                                                 object
State_Abbreviation                                                         object
2020 - Language spoken at home (Spanish) (DP02_0116E)                       int64
2020 - Language spoken at home (Spanish) - Percent (DP02_0116PE)          float64
2021 - Language spoken at home (Spanish) (DP02_0116E)                       int64
2021 - Language spoken at home (Spanish) - Percent (DP02_0116PE)          float64
Difference - Language spoken at home (Spanish) (DP02_0116E)                 int64
Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)    float64
dtype: object

## 2. Barcharts:

### 2.1. Top 5 States with highest increase between 2020 and 2021 in number of people speaking Spanish at home (DP02_0116E)

- Sort values

In [4]:
df_high_estimate = df.sort_values(by="Difference - Language spoken at home (Spanish) (DP02_0116E)", ascending=False)
df_high_estimate.head()

Unnamed: 0,State_Name,FIPS_State,State_Abbreviation,2020 - Language spoken at home (Spanish) (DP02_0116E),2020 - Language spoken at home (Spanish) - Percent (DP02_0116PE),2021 - Language spoken at home (Spanish) (DP02_0116E),2021 - Language spoken at home (Spanish) - Percent (DP02_0116PE),Difference - Language spoken at home (Spanish) (DP02_0116E),Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)
21,New York,36,NY,2702957,14.7,2801677,14.8,98720,0.1
23,Florida,12,FL,4376716,21.8,4469836,22.1,93120,0.3
45,New Jersey,34,NJ,1368165,16.4,1440046,16.5,71881,0.1
18,California,6,CA,10462968,28.3,10514821,28.3,51853,0.0
40,Texas,48,TX,7666020,28.8,7717053,28.7,51033,-0.1


- Get Top 5

In [5]:
df_high_estimate = df_high_estimate.iloc[ : 5]
df_high_estimate

Unnamed: 0,State_Name,FIPS_State,State_Abbreviation,2020 - Language spoken at home (Spanish) (DP02_0116E),2020 - Language spoken at home (Spanish) - Percent (DP02_0116PE),2021 - Language spoken at home (Spanish) (DP02_0116E),2021 - Language spoken at home (Spanish) - Percent (DP02_0116PE),Difference - Language spoken at home (Spanish) (DP02_0116E),Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)
21,New York,36,NY,2702957,14.7,2801677,14.8,98720,0.1
23,Florida,12,FL,4376716,21.8,4469836,22.1,93120,0.3
45,New Jersey,34,NJ,1368165,16.4,1440046,16.5,71881,0.1
18,California,6,CA,10462968,28.3,10514821,28.3,51853,0.0
40,Texas,48,TX,7666020,28.8,7717053,28.7,51033,-0.1


- Sort again to get the plot in ascending way

In [6]:
df_high_estimate.sort_values(by="Difference - Language spoken at home (Spanish) (DP02_0116E)", ascending=True, inplace=True)

- Plot

In [15]:
fig = px.bar(df_high_estimate,              
             x='Difference - Language spoken at home (Spanish) (DP02_0116E)', 
             y='State_Name',
             text='Difference - Language spoken at home (Spanish) (DP02_0116E)',
             orientation='h',   
             template='simple_white',
             title='Top 5 States with highest increase between 2020 and 2021 in number of people speaking Spanish at home (DP02_0116E)')

# Formatting bar labels
fig.update_traces(textposition='auto', 
                  texttemplate='%{text:,.2s}'
                 )

fig.show()

### 2.2. Top 5 States with highest increase between 2020 and 2021 in percentage of people speaking Spanish at home (DP02_0116PE)

- Sort values

In [9]:
df_high_percent = df.sort_values(by="Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)", ascending=False)
df_high_percent.head()

Unnamed: 0,State_Name,FIPS_State,State_Abbreviation,2020 - Language spoken at home (Spanish) (DP02_0116E),2020 - Language spoken at home (Spanish) - Percent (DP02_0116PE),2021 - Language spoken at home (Spanish) (DP02_0116E),2021 - Language spoken at home (Spanish) - Percent (DP02_0116PE),Difference - Language spoken at home (Spanish) (DP02_0116E),Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)
8,Rhode Island,44,RI,123763,12.3,131450,12.7,7687,0.4
23,Florida,12,FL,4376716,21.8,4469836,22.1,93120,0.3
39,Massachusetts,25,MA,593684,9.1,623189,9.4,29505,0.3
49,North Carolina,37,NC,736886,7.5,753142,7.7,16256,0.2
36,Connecticut,9,CT,403019,11.9,413847,12.1,10828,0.2


- Get Top 5

In [10]:
df_high_percent = df_high_percent.iloc[ : 5]
df_high_percent

Unnamed: 0,State_Name,FIPS_State,State_Abbreviation,2020 - Language spoken at home (Spanish) (DP02_0116E),2020 - Language spoken at home (Spanish) - Percent (DP02_0116PE),2021 - Language spoken at home (Spanish) (DP02_0116E),2021 - Language spoken at home (Spanish) - Percent (DP02_0116PE),Difference - Language spoken at home (Spanish) (DP02_0116E),Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)
8,Rhode Island,44,RI,123763,12.3,131450,12.7,7687,0.4
23,Florida,12,FL,4376716,21.8,4469836,22.1,93120,0.3
39,Massachusetts,25,MA,593684,9.1,623189,9.4,29505,0.3
49,North Carolina,37,NC,736886,7.5,753142,7.7,16256,0.2
36,Connecticut,9,CT,403019,11.9,413847,12.1,10828,0.2


- Get value in percentage format

In [12]:
df_high_percent['Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)'] = df_high_percent['Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)'] / 100

- Sort again to get the plot in ascending way

In [13]:
df_high_percent.sort_values(by="Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)", ascending=True, inplace=True)

- Plot

In [14]:
fig = px.bar(df_high_percent,              
             x='Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)', 
             y='State_Name',
             text='Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)',
             orientation='h',   
             template='simple_white',
             title='Top 5 States with highest increase between 2020 and 2021 in percentage of people speaking Spanish at home (DP02_0116PE)')

# Formatting bar labels
fig.update_traces(textposition='auto', 
                  texttemplate='%{text:.1%}'
                 )

fig.show()

### 2.3. Top 5 States with highest decrease between 2020 and 2021 in number of people speaking Spanish at home (DP02_0116E)

- Sort values

In [20]:
df_low_estimate = df.sort_values(by="Difference - Language spoken at home (Spanish) (DP02_0116E)", ascending=True)
df_low_estimate.head()

Unnamed: 0,State_Name,FIPS_State,State_Abbreviation,2020 - Language spoken at home (Spanish) (DP02_0116E),2020 - Language spoken at home (Spanish) - Percent (DP02_0116PE),2021 - Language spoken at home (Spanish) (DP02_0116E),2021 - Language spoken at home (Spanish) - Percent (DP02_0116PE),Difference - Language spoken at home (Spanish) (DP02_0116E),Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)
32,Arizona,4,AZ,1358980,20.2,1342767,20.1,-16213,-0.1
11,New Mexico,35,NM,514071,26.0,510402,25.7,-3669,-0.3
12,Nevada,32,NV,593610,20.9,591262,20.5,-2348,-0.4
44,Colorado,8,CO,602273,11.2,600603,11.1,-1670,-0.1
5,Mississippi,28,MS,67565,2.4,66351,2.4,-1214,0.0


- Get Top 5

In [21]:
df_low_estimate = df_low_estimate.iloc[ : 5]
df_low_estimate

Unnamed: 0,State_Name,FIPS_State,State_Abbreviation,2020 - Language spoken at home (Spanish) (DP02_0116E),2020 - Language spoken at home (Spanish) - Percent (DP02_0116PE),2021 - Language spoken at home (Spanish) (DP02_0116E),2021 - Language spoken at home (Spanish) - Percent (DP02_0116PE),Difference - Language spoken at home (Spanish) (DP02_0116E),Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)
32,Arizona,4,AZ,1358980,20.2,1342767,20.1,-16213,-0.1
11,New Mexico,35,NM,514071,26.0,510402,25.7,-3669,-0.3
12,Nevada,32,NV,593610,20.9,591262,20.5,-2348,-0.4
44,Colorado,8,CO,602273,11.2,600603,11.1,-1670,-0.1
5,Mississippi,28,MS,67565,2.4,66351,2.4,-1214,0.0


- Sort again to get the plot in ascending way

In [29]:
df_low_estimate.sort_values(by="Difference - Language spoken at home (Spanish) (DP02_0116E)", ascending=False, inplace=True)

- Plot

In [30]:
fig = px.bar(df_low_estimate,              
             x='Difference - Language spoken at home (Spanish) (DP02_0116E)', 
             y='State_Name',
             text='Difference - Language spoken at home (Spanish) (DP02_0116E)',
             orientation='h',   
             template='simple_white',
             title='Top 5 States with highest decrease between 2020 and 2021 in number of people speaking Spanish at home (DP02_0116E)')

# Formatting bar labels
fig.update_traces(textposition='auto', 
                  texttemplate='%{text:,.2s}'
                 )

fig.show()

### 2.4. Top 5 States with highest decrease between 2020 and 2021 in percentage of people speaking Spanish at home (DP02_0116PE)

- Sort values

In [31]:
df_low_percent = df.sort_values(by="Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)", ascending=True)
df_low_percent.head()

Unnamed: 0,State_Name,FIPS_State,State_Abbreviation,2020 - Language spoken at home (Spanish) (DP02_0116E),2020 - Language spoken at home (Spanish) - Percent (DP02_0116PE),2021 - Language spoken at home (Spanish) (DP02_0116E),2021 - Language spoken at home (Spanish) - Percent (DP02_0116PE),Difference - Language spoken at home (Spanish) (DP02_0116E),Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)
12,Nevada,32,NV,593610,20.9,591262,20.5,-2348,-0.4
11,New Mexico,35,NM,514071,26.0,510402,25.7,-3669,-0.3
40,Texas,48,TX,7666020,28.8,7717053,28.7,51033,-0.1
31,Wyoming,56,WY,25717,4.7,25145,4.6,-572,-0.1
22,Alaska,2,AK,23785,3.5,23629,3.4,-156,-0.1


- Get Top 5

In [32]:
df_low_percent = df_low_percent.iloc[ : 5]
df_low_percent

Unnamed: 0,State_Name,FIPS_State,State_Abbreviation,2020 - Language spoken at home (Spanish) (DP02_0116E),2020 - Language spoken at home (Spanish) - Percent (DP02_0116PE),2021 - Language spoken at home (Spanish) (DP02_0116E),2021 - Language spoken at home (Spanish) - Percent (DP02_0116PE),Difference - Language spoken at home (Spanish) (DP02_0116E),Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)
12,Nevada,32,NV,593610,20.9,591262,20.5,-2348,-0.4
11,New Mexico,35,NM,514071,26.0,510402,25.7,-3669,-0.3
40,Texas,48,TX,7666020,28.8,7717053,28.7,51033,-0.1
31,Wyoming,56,WY,25717,4.7,25145,4.6,-572,-0.1
22,Alaska,2,AK,23785,3.5,23629,3.4,-156,-0.1


- Get value in percentage format

In [33]:
df_low_percent['Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)'] = df_low_percent['Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)'] / 100

- Sort again to get the plot in ascending way

In [36]:
df_low_percent.sort_values(by="Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)", ascending=False, inplace=True)

- Plot

In [37]:
fig = px.bar(df_low_percent,              
             x='Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)', 
             y='State_Name',
             text='Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)',
             orientation='h',   
             template='simple_white',
             title='Top 5 States with highest increase between 2020 and 2021 in percentage of people speaking Spanish at home (DP02_0116PE)')

# Formatting bar labels
fig.update_traces(textposition='auto', 
                  texttemplate='%{text:.1%}'
                 )

fig.show()

## 3. US State Maps:

### 3.1. US States: Differences between 2020 and 2021 in population speaking Spanish at home (DP02_0116E)

In [39]:
fig = px.choropleth(df, 
                    scope="usa",    
                    locationmode='USA-states',          # Plot states of USA
                    locations='State_Abbreviation',     # Column containing State Abbeviations                   
                    
                    color='Difference - Language spoken at home (Spanish) (DP02_0116E)',             # Column determining map color for each State
                    hover_name='State_Name',            # Sets top label of Tooltip
                    color_continuous_scale='OrRd', 
                    title="US States: Differences between 2020 and 2021 in population speaking Spanish at home (DP02_0116E)"
                   )

fig.update_layout(margin={"r":0,"t":50,"l":0,"b":0})

fig.show()

### 3.2. US States: Differences between 2020 and 2021 in percentage of people speaking Spanish at home (DP02_0116PE)

In [40]:
fig = px.choropleth(df, 
                    scope="usa",    
                    locationmode='USA-states',          # Plot states of USA
                    locations='State_Abbreviation',     # Column containing State Abbeviations                   
                    
                    color='Difference - Language spoken at home (Spanish) - Percent (DP02_0116PE)',      # Column determining map color for each State
                    hover_name='State_Name',            # Sets top label of Tooltip
                    color_continuous_scale='OrRd', 
                    title="US States: Differences between 2020 and 2021 in percentage of people speaking Spanish at home (DP02_0116PE)"
                   )

fig.update_layout(margin={"r":0,"t":50,"l":0,"b":0})

fig.show()