# US Population By States
References:
- https://www.indexmundi.com/facts/united-states/quick-facts/all-states/population#map

Data Souces:
- https://www.census.gov/data/datasets/time-series/demo/popest/2010s-state-total.html

This notebook demonstrates the use of following Python techniques:
- Web scrapting 
    - Pandas read_html
    - Much simpler than requests + beautiful soup
- Interactive data visualization with Plotly
    - Plotly/Plotly Express 
    - Bar chart
    - Choropleth map

Note:

A choropleth map is a type of thematic map in which areas are shaded or patterned in proportion to a statistical variable that represents an aggregate summary of a geographic characteristic within each area, such as population density or per-capita income. 

From Greek, khōra place + plēthos multitude.

In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objs as go
import plotly.io as pio

In [None]:
DATA_URL = "https://simple.wikipedia.org/wiki/List_of_U.S._states_by_population"

In [None]:
df_list = pd.read_html(DATA_URL)
len(df_list)

1

In [None]:
df = df_list[0]
df.head()

In [None]:
df2 = df.iloc[:, [2,3]]
df2.columns = ["State", "Pop"]
df2

In [None]:
df2 = df2.iloc[0:56,]
df2

In [None]:
fig = px.bar(df2, y="State", x="Pop", orientation='h', height=800)

fig.update_layout(
    title='US Population by States (2019 Estimate)',
    yaxis=dict(
        tickangle=0,
        showticklabels=True,
        type='category',
       # title='Xaxis Name',
        tickmode='linear'
    )
)
fig

In [None]:
STATES_URL = "https://www.nrcs.usda.gov/wps/portal/nrcs/detail/?cid=nrcs143_013696"

# df_list2 = pd.read_html(STATES_URL) This return 15 tables, too many. 

df_list2 = pd.read_html(STATES_URL, attrs = {'class': 'data'})     # pick one specific table based on an attribute

states_df = df_list2[0]        # This list only has one element (one table)
states_df.head()

Unnamed: 0,Name,Postal Code,FIPS
0,Alabama,AL,1
1,Alaska,AK,2
2,Arizona,AZ,4
3,Arkansas,AR,5
4,California,CA,6


In [None]:
states_df.columns = ["State", "ST", "FIPS"]
states_df.head()

Unnamed: 0,State,ST,FIPS
0,Alabama,AL,1
1,Alaska,AK,2
2,Arizona,AZ,4
3,Arkansas,AR,5
4,California,CA,6


In [None]:
df3 = pd.merge(df2, states_df, on="State", how="inner")
df3

In [None]:
fig = px.choropleth(df3,  
                    locations='ST', 
                    color='Pop',
                    color_continuous_scale="Viridis",
                    scope="usa",
                    hover_name="State",
                    locationmode = 'USA-states',
                    labels={'ST':'State'}
)

#fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()