# EDA
Exploratory Data Analysis

## Enhancing Milk-Report Visualizations
_Paulo G. Martinez_ 9/21/2020

In [1]:
# import packages
# for json loading and dumping
import json
# for os/platform independent path handling
from pathlib import Path
# for table manipulations
import pandas as pd
# for html friendly interactive visualizations
import plotly.graph_objects as go
# for pretty printing
import pprint
# for string manipulations
import re


## Target Visualization to Enhance:
![Monthly Milk Production 24 Selected States](pictures/monthly-milk-prod-24-states.png)

**I'll actually pull in more data to attempt to demo the summarizing power and flexibility of an interactive vis**
- Instead of pulling National Estimates for two years at monthly periods
- I pulled State estimates for all available years (1924 - 2020)
    - hoping to represent national estimates as aggregated sum of states
    - And grouping estimates by year so as to plot them as differently "hued" traces on a single line chart
    
## Download local copy of QS CSV for Monthly Milk Production Viz from QS like so:
![Monthly Milk Production All States 1924 - 9-10-2020](pictures/milk-prod-all-states-1924-9-10-2020-QS.png)

In [2]:
'''(I think I may have even saved the query at:
[DEFAULT]
URL=https://quickstats.nass.usda.gov/results/1DBE1F03-7EDB-36CD-8F5E-ED46D069872C
[InternetShortcut]
URL=https://quickstats.nass.usda.gov/results/1DBE1F03-7EDB-36CD-8F5E-ED46D069872C'''

'(I think I may have even saved the query at:\n[DEFAULT]\nURL=https://quickstats.nass.usda.gov/results/1DBE1F03-7EDB-36CD-8F5E-ED46D069872C\n[InternetShortcut]\nURL=https://quickstats.nass.usda.gov/results/1DBE1F03-7EDB-36CD-8F5E-ED46D069872C'

## Read csv into data frame

In [3]:
# declare and handle path to data for local os
path_to_data = Path('data/milk-prod-all-states-1924-9-10-2020-1DBE1F03-7EDB-36CD-8F5E-ED46D069872C.csv')
# read raw QS table into pandas
qs_df = pd.read_csv(path_to_data)
# show first rows
qs_df.head(3)

Unnamed: 0,Program,Year,Period,Week Ending,Geo Level,State,State ANSI,Ag District,Ag District Code,County,...,Zip Code,Region,watershed_code,Watershed,Commodity,Data Item,Domain,Domain Category,Value,CV (%)
0,SURVEY,2020,JAN,,STATE,ARIZONA,4.0,,,,...,,,0,,MILK,"MILK - PRODUCTION, MEASURED IN LB",TOTAL,NOT SPECIFIED,412000000,
1,SURVEY,2020,JAN,,STATE,CALIFORNIA,6.0,,,,...,,,0,,MILK,"MILK - PRODUCTION, MEASURED IN LB",TOTAL,NOT SPECIFIED,3526000000,
2,SURVEY,2020,JAN,,STATE,COLORADO,8.0,,,,...,,,0,,MILK,"MILK - PRODUCTION, MEASURED IN LB",TOTAL,NOT SPECIFIED,422000000,


In [4]:
# show basic info
qs_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46375 entries, 0 to 46374
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Program           46375 non-null  object 
 1   Year              46375 non-null  int64  
 2   Period            46375 non-null  object 
 3   Week Ending       0 non-null      float64
 4   Geo Level         46375 non-null  object 
 5   State             46375 non-null  object 
 6   State ANSI        46368 non-null  float64
 7   Ag District       0 non-null      float64
 8   Ag District Code  0 non-null      float64
 9   County            0 non-null      float64
 10  County ANSI       0 non-null      float64
 11  Zip Code          0 non-null      float64
 12  Region            0 non-null      float64
 13  watershed_code    46375 non-null  int64  
 14  Watershed         0 non-null      float64
 15  Commodity         46375 non-null  object 
 16  Data Item         46375 non-null  object

**correct the datatype for the 'Value' column**

In [5]:
# see what the values look like
qs_df.Value.head().values

array(['412,000,000', '3,526,000,000', '422,000,000', '215,000,000',
       '161,000,000'], dtype=object)

In [6]:
# test out comma substitution
re.sub(',', '', qs_df.Value[0])

'412000000'

In [7]:
# apply the comma stripping to the whole column
qs_df.Value.apply(
    lambda s: re.sub(',', '', s)
)

0          412000000
1         3526000000
2          422000000
3          215000000
4          161000000
            ...     
46370     1287000000
46371     1540000000
46372      709000000
46373    10127000000
46374      223000000
Name: Value, Length: 46375, dtype: object

**That's strange, how many non-numeric values are in there?**

In [8]:
# set comprehension to collect all the strings that aren't numeric
{s for s in qs_df.Value if not re.sub(',', '', s).isnumeric()}

{' (D)'}

Ok. So, there's only one value, is it a sentinell for NULL? How many times does it appear?

In [9]:
qs_df[qs_df.Value == ' (D)']

Unnamed: 0,Program,Year,Period,Week Ending,Geo Level,State,State ANSI,Ag District,Ag District Code,County,...,Zip Code,Region,watershed_code,Watershed,Commodity,Data Item,Domain,Domain Category,Value,CV (%)
25,SURVEY,2020,JAN THRU MAR,,STATE,ALASKA,2.0,,,,...,,,0,,MILK,"MILK - PRODUCTION, MEASURED IN LB",TOTAL,NOT SPECIFIED,(D),
34,SURVEY,2020,JAN THRU MAR,,STATE,HAWAII,15.0,,,,...,,,0,,MILK,"MILK - PRODUCTION, MEASURED IN LB",TOTAL,NOT SPECIFIED,(D),
148,SURVEY,2020,APR THRU JUN,,STATE,ALASKA,2.0,,,,...,,,0,,MILK,"MILK - PRODUCTION, MEASURED IN LB",TOTAL,NOT SPECIFIED,(D),
157,SURVEY,2020,APR THRU JUN,,STATE,HAWAII,15.0,,,,...,,,0,,MILK,"MILK - PRODUCTION, MEASURED IN LB",TOTAL,NOT SPECIFIED,(D),
707,SURVEY,2019,OCT THRU DEC,,STATE,ALASKA,2.0,,,,...,,,0,,MILK,"MILK - PRODUCTION, MEASURED IN LB",TOTAL,NOT SPECIFIED,(D),
716,SURVEY,2019,OCT THRU DEC,,STATE,HAWAII,15.0,,,,...,,,0,,MILK,"MILK - PRODUCTION, MEASURED IN LB",TOTAL,NOT SPECIFIED,(D),


In [10]:
qs_df[qs_df.Value == ' (D)'].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 25 to 716
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Program           6 non-null      object 
 1   Year              6 non-null      int64  
 2   Period            6 non-null      object 
 3   Week Ending       0 non-null      float64
 4   Geo Level         6 non-null      object 
 5   State             6 non-null      object 
 6   State ANSI        6 non-null      float64
 7   Ag District       0 non-null      float64
 8   Ag District Code  0 non-null      float64
 9   County            0 non-null      float64
 10  County ANSI       0 non-null      float64
 11  Zip Code          0 non-null      float64
 12  Region            0 non-null      float64
 13  watershed_code    6 non-null      int64  
 14  Watershed         0 non-null      float64
 15  Commodity         6 non-null      object 
 16  Data Item         6 non-null      object 
 17

**Come back to this later, for now the bug appears to be contained to quarterly estimates which I'm not working on now**

# Declutter the Pivotted Data
### Drop empty columns

In [11]:
# drop columns where all cells are empty (not a number, nan)
qs_df = qs_df.dropna(
    axis = 'columns', 
    how = 'all',
)

### Split non-varying columns into a metadata object

In [12]:
# init a metadata object
metadata_dct = {}

# find columns where there is only one unique value
for col in qs_df:
    if len(qs_df[col].unique()) == 1:
        # print column header and unique value counts
        print(col, ":", qs_df[col].unique()[0])
        # add the column header and its unique value to the metadata
        metadata_dct[col] = qs_df[col].unique()[0]
        
        # drop it from the data frame
        qs_df = qs_df.drop(columns = [col])

Program : SURVEY
Geo Level : STATE
watershed_code : 0
Commodity : MILK
Data Item : MILK - PRODUCTION, MEASURED IN LB
Domain : TOTAL
Domain Category : NOT SPECIFIED


#### drop null-metadata

**get dictionary of known null-sentinel-values**

In [13]:
# check for the existence of null-sentinel-dictionary
path_to_sentinel_nulls_dct = Path('sentinel-nulls.json')
# if the file is in existence
if path_to_sentinel_nulls_dct.is_file():
    # read it into a dict
    with open(path_to_sentinel_nulls_dct, 'r') as file_path:
        sentinel_nulls_dct = json.load(file_path)
    assert type(sentinel_nulls_dct) == dict
# else, initialize the dict
else:
    sentinel_nulls_dct = {}

# document the known sentinel values
if 'watershed_code' not in sentinel_nulls_dct:
    sentinel_nulls_dct['watershed_code'] = 0
if 'Domain Category' not in sentinel_nulls_dct:
    sentinel_nulls_dct['Domain Category'] = 'NOT SPECIFIED'

# save the additions to the null-sentinel-dictionary
with open(path_to_sentinel_nulls_dct, 'w') as file_path:
    json.dump(sentinel_nulls_dct, file_path)

**drop values known to be sentinels for NULL from the metadata**

In [14]:
for attribute in sentinel_nulls_dct:
    if attribute in metadata_dct:
        if sentinel_nulls_dct[attribute] == metadata_dct[attribute]:
            del metadata_dct[attribute]

**display the metadata as dictionary (json)**

In [15]:
# display the metadata
feedback = 'Auto-detected Meta-Data: (formatted as JSON)\n'
feedback = feedback + '-'*len(feedback) + '\n'
print(feedback)
pp = pprint.PrettyPrinter()
pp.pprint(metadata_dct)

Auto-detected Meta-Data: (formatted as JSON)
---------------------------------------------

{'Commodity': 'MILK',
 'Data Item': 'MILK - PRODUCTION, MEASURED IN LB',
 'Domain': 'TOTAL',
 'Geo Level': 'STATE',
 'Program': 'SURVEY'}


**Display the metadata as a table**

In [16]:
feedback = 'Auto-detected Meta-Data: (formatted as table)\n'
feedback = feedback + '-'*len(feedback) + '\n'
print(feedback)
metadata_df = pd.DataFrame({k: [metadata_dct[k]] for k in metadata_dct})
metadata_df

Auto-detected Meta-Data: (formatted as table)
----------------------------------------------



Unnamed: 0,Program,Geo Level,Commodity,Data Item,Domain
0,SURVEY,STATE,MILK,"MILK - PRODUCTION, MEASURED IN LB",TOTAL


**Display the first few rows of the pivotted data**

In [17]:
qs_df.head()

Unnamed: 0,Year,Period,State,State ANSI,Value
0,2020,JAN,ARIZONA,4.0,412000000
1,2020,JAN,CALIFORNIA,6.0,3526000000
2,2020,JAN,COLORADO,8.0,422000000
3,2020,JAN,FLORIDA,12.0,215000000
4,2020,JAN,GEORGIA,13.0,161000000


##### I SHOULD COME BACK TO THIS AND IMPLEMENT SOME "BUSINESS RULES" TO FURTHER AUGMENT USER FRIENDLINESS
- add user friendly descriptions and data dictionary for 'Data Item'

In [18]:
# rename 'Value' in the pivotted data into something more user friendly
qs_df = qs_df.rename(columns = {'Value':'Milk Production (Lbs)'})
# reorder the columns
qs_df = qs_df[['Period', 'Year', 'State', 'Milk Production (Lbs)', 'State ANSI']]
qs_df.head()

Unnamed: 0,Period,Year,State,Milk Production (Lbs),State ANSI
0,JAN,2020,ARIZONA,412000000,4.0
1,JAN,2020,CALIFORNIA,3526000000,6.0
2,JAN,2020,COLORADO,422000000,8.0
3,JAN,2020,FLORIDA,215000000,12.0
4,JAN,2020,GEORGIA,161000000,13.0


## Check the pivotted data for periodicity inconsistencies

In [19]:
# display the value counts of the unique vlaues in the 'Period' column
print(qs_df.Period.value_counts())

YEAR            4727
JAN             2698
FEB             2696
MAR             2696
JUL             2664
MAY             2664
APR             2664
JUN             2664
DEC             2642
NOV             2640
OCT             2640
AUG             2640
SEP             2640
APR THRU JUN    2450
JAN THRU MAR    2450
OCT THRU DEC    2401
JUL THRU SEP    2399
Name: Period, dtype: int64


## Split out the Annual Estimates into their own table
- these are memory-redundant, we might not need to store these in the DB and instead just serve them as an aggregation of the monthly estimates

In [20]:
# subset the annual estimates into its own table
annual_df = qs_df[qs_df.Period == 'YEAR']
# drop the now redundant 'Period' column
annual_df = annual_df.drop(columns = ['Period'])
# sort the data by descending year and ascending State
annual_df = annual_df.sort_values(by = ['Year'], ascending = False)
# reset the row index to avoid confusion
annual_df = annual_df.reset_index(drop = True)
# display the top few rows
annual_df.head()

Unnamed: 0,Year,State,Milk Production (Lbs),State ANSI
0,2019,ALABAMA,60000000,1.0
1,2019,SOUTH CAROLINA,206000000,45.0
2,2019,NEW JERSEY,100000000,34.0
3,2019,NEW MEXICO,8187000000,35.0
4,2019,NEW YORK,15122000000,36.0


## Split out the Quarterly Estimates into their own table
- these are memory-redundant, we might not need to store these in the DB and instead just serve them as an aggregation of the monthly estimates

## Split out the Monthly Estimates into their own table

In [21]:
# subset the monthly estimates into its own table
monthly_df = qs_df.loc[[
    # this seems inneficcient but lets just get it done for now
    i for i in qs_df.index if qs_df.Period[i] in {
        'JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 
        'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC',
    }
]].reset_index(drop = True)

In [22]:
# create a map from month string to int
month_to_int_dct = {
    mo:i+1 for i,mo in enumerate([
        'JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN',
        'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC',])
}
month_to_int_dct

{'JAN': 1,
 'FEB': 2,
 'MAR': 3,
 'APR': 4,
 'MAY': 5,
 'JUN': 6,
 'JUL': 7,
 'AUG': 8,
 'SEP': 9,
 'OCT': 10,
 'NOV': 11,
 'DEC': 12}

In [23]:
# add the column of months as ints
monthly_df['Month'] = monthly_df.Period.map(month_to_int_dct)
monthly_df.head()

Unnamed: 0,Period,Year,State,Milk Production (Lbs),State ANSI,Month
0,JAN,2020,ARIZONA,412000000,4.0,1
1,JAN,2020,CALIFORNIA,3526000000,6.0,1
2,JAN,2020,COLORADO,422000000,8.0,1
3,JAN,2020,FLORIDA,215000000,12.0,1
4,JAN,2020,GEORGIA,161000000,13.0,1


**ensure datatypes are correct**

In [24]:
monthly_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31948 entries, 0 to 31947
Data columns (total 6 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Period                 31948 non-null  object 
 1   Year                   31948 non-null  int64  
 2   State                  31948 non-null  object 
 3   Milk Production (Lbs)  31948 non-null  object 
 4   State ANSI             31948 non-null  float64
 5   Month                  31948 non-null  int64  
dtypes: float64(1), int64(2), object(3)
memory usage: 1.5+ MB


In [25]:
# see what the values look like
monthly_df['Milk Production (Lbs)'].head().values

array(['412,000,000', '3,526,000,000', '422,000,000', '215,000,000',
       '161,000,000'], dtype=object)

In [26]:
# correct the datatype for the value from string to numeric
monthly_df['Milk Production (Lbs)'] = pd.to_numeric(
    monthly_df['Milk Production (Lbs)'].apply(
        lambda s: re.sub(',', '', s)
    ),
    #errors='coerce'
)

# Ok, let's make a State line plot
Start with a simple reproduction for only one state

In [27]:
# init the figure
fig = go.FigureWidget()
# add some traces
for yr in [2019, 2020]:
    fig.add_trace(
        go.Scatter(
            x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == 'VIRGINIA'],
            y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.State == 'VIRGINIA'],
            mode = 'lines+markers',
            name = str(yr)
        )
    )
# Add title
fig.update_layout(
    title = 'Monthly Milk Production - Virginia',
    yaxis_title = 'Milk Production (Lbs)',
)
# display the figure
fig

FigureWidget({
    'data': [{'mode': 'lines+markers',
              'name': '2019',
              'type': 'sca…

### Display the visualization as json

In [28]:
print(fig.to_json(pretty = False))

{"data":[{"mode":"lines+markers","name":"2019","type":"scatter","x":["JAN","FEB","MAR","APR","MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC"],"y":[132000000,121000000,134000000,131000000,133000000,122000000,119000000,118000000,115000000,120000000,119000000,126000000]},{"mode":"lines+markers","name":"2020","type":"scatter","x":["JAN","FEB","MAR","APR","MAY","JUN","JUL"],"y":[134000000,127000000,137000000,132000000,133000000,126000000,123000000]}],"layout":{"template":{"data":{"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5}},"type":"bar"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5}},"type":"barpolar"}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"choropleth":[{"color

**Let's go ahead and add in all the years** 
- Although I'll leave the older ones "toggled off" in the legend

In [29]:
# init the figure
fig = go.FigureWidget()
# add some traces
for yr in monthly_df.Year.unique():
    if yr in [2020, 2019]:
        fig.add_trace(
            go.Scatter(
                x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == 'VIRGINIA'],
                y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.State == 'VIRGINIA'],
                mode = 'lines+markers',
                name = str(yr),
            )
        )
    # add the rest of the traces toggled off
    else:
        fig.add_trace(
            go.Scatter(
                x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == 'VIRGINIA'],
                y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.State == 'VIRGINIA'],
                mode = 'lines+markers',
                name = str(yr),
                visible='legendonly'
            )
        )
# Add title
fig.update_layout(
    title = 'Monthly Milk Production, Virginia',
    yaxis_title = 'Milk Production (Lbs)',
)
# display the figure
fig

FigureWidget({
    'data': [{'mode': 'lines+markers',
              'name': '2020',
              'type': 'sca…

### Save the vis to html and json

In [30]:
# save to html
with open(Path('visualizations/va-monthly-milk-prod-by-year.html'), 'w') as file_path:
    file_path.write(fig.to_html())
# save to json
with open(Path('visualizations/va-monthly-milk-prod-by-year.json'), 'w') as file_path:
    file_path.write(fig.to_json())  

## Ok, Let's Aggregate the states into a national graph

In [31]:
'''monthly_df.groupby(['Year', 'Period']).apply(
    lambda df: df['Milk Production (Lbs)'].sum()
).to_frame()'''

"monthly_df.groupby(['Year', 'Period']).apply(\n    lambda df: df['Milk Production (Lbs)'].sum()\n).to_frame()"

In [32]:
'''monthly_df.groupby(['Year', 'Period']).apply(
    lambda df: df['Milk Production (Lbs)'].sum()
).to_frame().unstack()'''

"monthly_df.groupby(['Year', 'Period']).apply(\n    lambda df: df['Milk Production (Lbs)'].sum()\n).to_frame().unstack()"

In [33]:
'''monthly_df.groupby(['Year', 'Period']).apply(
    lambda df: df['Milk Production (Lbs)'].sum()
).to_frame().unstack().reset_index()'''

"monthly_df.groupby(['Year', 'Period']).apply(\n    lambda df: df['Milk Production (Lbs)'].sum()\n).to_frame().unstack().reset_index()"

In [34]:
# group by Year, Month, and aggregate all the states as a sum of the Milk Production
national_df = monthly_df.groupby(['Year', 'Period']).apply(
    lambda df: df['Milk Production (Lbs)'].sum()
).reset_index().rename(columns = {0:'Milk Production (Lbs)'})
# map the Period to a numerical month again
national_df['Month'] = national_df.Period.map(month_to_int_dct)
# display the first few rows
national_df = national_df.sort_values(by = ['Year', 'Month'], ascending=False).reset_index(drop = True)
national_df.head(15)

Unnamed: 0,Year,Period,Milk Production (Lbs),Month
0,2020,JUL,17800000000,7
1,2020,JUN,17486000000,6
2,2020,MAY,18049000000,5
3,2020,APR,17778000000,4
4,2020,MAR,18455000000,3
5,2020,FEB,17031000000,2
6,2020,JAN,17956000000,1
7,2019,DEC,17517000000,12
8,2019,NOV,16699000000,11
9,2019,OCT,17299000000,10


# Display the line chart for the national aggregation

In [35]:
# init the figure
fig = go.FigureWidget()
# add some traces
for yr in national_df.Year.unique():
    if yr in [2020, 2019]:
        fig.add_trace(
            go.Scatter(
                x = national_df[national_df.Year == yr].sort_values(by = ['Month'])['Period'],
                y = national_df[national_df.Year == yr].sort_values(by = ['Month'])['Milk Production (Lbs)'],
                mode = 'lines+markers',
                name = str(yr),
            )
        )
    # add the rest of the traces toggled off
    else:
        fig.add_trace(
            go.Scatter(
                x = national_df[national_df.Year == yr].sort_values(by = ['Month'])['Period'],
                y = national_df[national_df.Year == yr].sort_values(by = ['Month'])['Milk Production (Lbs)'],
                mode = 'lines+markers',
                name = str(yr),
                visible='legendonly'
            )
        )
# Add title
fig.update_layout(
    title = 'Monthly Milk Production, National',
    yaxis_title = 'Milk Production (Lbs)',
)
# display the figure
fig

FigureWidget({
    'data': [{'mode': 'lines+markers',
              'name': '2020',
              'type': 'sca…

**Display the table that feeds this visualization**

In [36]:
national_df.head(12)

Unnamed: 0,Year,Period,Milk Production (Lbs),Month
0,2020,JUL,17800000000,7
1,2020,JUN,17486000000,6
2,2020,MAY,18049000000,5
3,2020,APR,17778000000,4
4,2020,MAR,18455000000,3
5,2020,FEB,17031000000,2
6,2020,JAN,17956000000,1
7,2019,DEC,17517000000,12
8,2019,NOV,16699000000,11
9,2019,OCT,17299000000,10


# Experiment with the visualizations a bit more
## Let's try that again with a stacked area chart 
**to better illustrate "how the states stack up"**

## This vis is good, but slows down the browser

In [37]:
'''# init the figure
fig = go.FigureWidget()
for yr in monthly_df.Year.unique():
    # add some traces
    for st in monthly_df.State.unique():
        if yr in [2020, 2019]:
            fig.add_trace(
                go.Scatter(
                    x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == st],
                    y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.State == st],
                    mode = 'lines',
                    fillcolor='lightgrey',
                    name = st + ' ' + str(yr),
                    stackgroup= str(yr),
                    legendgroup = str(yr),
                )
            )
        # add the rest of the traces toggled off
        else:
            fig.add_trace(
                go.Scatter(
                    x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == st],
                    y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.State == st],
                    mode = 'lines',
                    name = st + ' ' + str(yr),
                    stackgroup = str(yr),
                    visible='legendonly',
                    legendgroup = str(yr),
                )
            )
# Add title
fig.update_layout(
    title = 'Monthly Milk Production',
    yaxis_title = 'Milk Production (Lbs)',
)
# display the figure
fig'''

"# init the figure\nfig = go.FigureWidget()\nfor yr in monthly_df.Year.unique():\n    # add some traces\n    for st in monthly_df.State.unique():\n        if yr in [2020, 2019]:\n            fig.add_trace(\n                go.Scatter(\n                    x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == st],\n                    y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.State == st],\n                    mode = 'lines',\n                    fillcolor='lightgrey',\n                    name = st + ' ' + str(yr),\n                    stackgroup= str(yr),\n                    legendgroup = str(yr),\n                )\n            )\n        # add the rest of the traces toggled off\n        else:\n            fig.add_trace(\n                go.Scatter(\n                    x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == st],\n                    y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.St

## Previous drafts of stacked area charts

In [38]:
'''# init the figure
fig = go.FigureWidget()
for st in monthly_df.State.unique():
    # add some traces
    for yr in monthly_df.Year.unique():
        if yr in [2020, 2019]:
            fig.add_trace(
                go.Scatter(
                    x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == st],
                    y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.State == st],
                    mode = 'lines+markers',
                    name = st + str(yr),
                    stackgroup= str(yr)
                )
            )
        # add the rest of the traces toggled off
        else:
            fig.add_trace(
                go.Scatter(
                    x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == st],
                    y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.State == st],
                    mode = 'lines+markers',
                    name = st + str(yr),
                    stackgroup = str(yr),
                    visible='legendonly'
                )
            )
# Add title
fig.update_layout(
    title = 'Monthly Milk Production, Virginia',
    yaxis_title = 'Milk Production (Lbs)',
)
# display the figure
fig'''

"# init the figure\nfig = go.FigureWidget()\nfor st in monthly_df.State.unique():\n    # add some traces\n    for yr in monthly_df.Year.unique():\n        if yr in [2020, 2019]:\n            fig.add_trace(\n                go.Scatter(\n                    x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == st],\n                    y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.State == st],\n                    mode = 'lines+markers',\n                    name = st + str(yr),\n                    stackgroup= str(yr)\n                )\n            )\n        # add the rest of the traces toggled off\n        else:\n            fig.add_trace(\n                go.Scatter(\n                    x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == st],\n                    y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.State == st],\n                    mode = 'lines+markers',\n                    name = st

In [39]:
'''# init the figure
fig = go.FigureWidget()
#for yr in monthly_df.Year.unique():
for yr in [2019]:
    # add some traces
    for st in monthly_df.State.unique():
        if yr in [2020, 2019]:
            fig.add_trace(
                go.Scatter(
                    x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == st],
                    y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.State == st],
                    mode = 'lines',
                    fillcolor='lightgrey',
                    name = st + ' ' + str(yr),
                    stackgroup= str(yr),
                    legendgroup = str(yr)
                )
            )
        # add the rest of the traces toggled off
        else:
            fig.add_trace(
                go.Scatter(
                    x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == st],
                    y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.State == st],
                    mode = 'lines',
                    name = st + ' ' + str(yr),
                    stackgroup = str(yr),
                    visible='legendonly',
                    legendgroup = str(yr)
                )
            )
# Add title
fig.update_layout(
    title = 'Monthly Milk Production',
    yaxis_title = 'Milk Production (Lbs)',
)
# display the figure
fig'''

"# init the figure\nfig = go.FigureWidget()\n#for yr in monthly_df.Year.unique():\nfor yr in [2019]:\n    # add some traces\n    for st in monthly_df.State.unique():\n        if yr in [2020, 2019]:\n            fig.add_trace(\n                go.Scatter(\n                    x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == st],\n                    y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.State == st],\n                    mode = 'lines',\n                    fillcolor='lightgrey',\n                    name = st + ' ' + str(yr),\n                    stackgroup= str(yr),\n                    legendgroup = str(yr)\n                )\n            )\n        # add the rest of the traces toggled off\n        else:\n            fig.add_trace(\n                go.Scatter(\n                    x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == st],\n                    y = monthly_df['Milk Production (Lbs)'][monthly_df.Year =

In [40]:
'''# init the figure
fig = go.FigureWidget()
for yr in monthly_df.Year.unique():
    # add some traces
    for st in monthly_df.State.unique():
        if yr in [2020, 2019]:
            fig.add_trace(
                go.Scatter(
                    x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == st],
                    y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.State == st],
                    mode = 'lines',
                    fillcolor='lightgrey',
                    name = st + ' ' + str(yr),
                    stackgroup= str(yr),
                    legendgroup = str(yr),
                    showlegend = False
                )
            )
        # add the rest of the traces toggled off
        else:
            fig.add_trace(
                go.Scatter(
                    x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == st],
                    y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.State == st],
                    mode = 'lines',
                    name = st + ' ' + str(yr),
                    stackgroup = str(yr),
                    visible='legendonly',
                    legendgroup = str(yr),
                    showlegend = False
                )
            )
# Add title
fig.update_layout(
    title = 'Monthly Milk Production',
    yaxis_title = 'Milk Production (Lbs)',
)
# display the figure
fig'''

"# init the figure\nfig = go.FigureWidget()\nfor yr in monthly_df.Year.unique():\n    # add some traces\n    for st in monthly_df.State.unique():\n        if yr in [2020, 2019]:\n            fig.add_trace(\n                go.Scatter(\n                    x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == st],\n                    y = monthly_df['Milk Production (Lbs)'][monthly_df.Year == yr][monthly_df.State == st],\n                    mode = 'lines',\n                    fillcolor='lightgrey',\n                    name = st + ' ' + str(yr),\n                    stackgroup= str(yr),\n                    legendgroup = str(yr),\n                    showlegend = False\n                )\n            )\n        # add the rest of the traces toggled off\n        else:\n            fig.add_trace(\n                go.Scatter(\n                    x = monthly_df['Period'][monthly_df.Year == yr][monthly_df.State == st],\n                    y = monthly_df['Milk Production (Lbs

**plotly (css) compatible colors**

aliceblue, antiquewhite, aqua, aquamarine, azure,
            beige, bisque, black, blanchedalmond, blue,
            blueviolet, brown, burlywood, cadetblue,
            chartreuse, chocolate, coral, cornflowerblue,
            cornsilk, crimson, cyan, darkblue, darkcyan,
            darkgoldenrod, darkgray, darkgrey, darkgreen,
            darkkhaki, darkmagenta, darkolivegreen, darkorange,
            darkorchid, darkred, darksalmon, darkseagreen,
            darkslateblue, darkslategray, darkslategrey,
            darkturquoise, darkviolet, deeppink, deepskyblue,
            dimgray, dimgrey, dodgerblue, firebrick,
            floralwhite, forestgreen, fuchsia, gainsboro,
            ghostwhite, gold, goldenrod, gray, grey, green,
            greenyellow, honeydew, hotpink, indianred, indigo,
            ivory, khaki, lavender, lavenderblush, lawngreen,
            lemonchiffon, lightblue, lightcoral, lightcyan,
            lightgoldenrodyellow, lightgray, lightgrey,
            lightgreen, lightpink, lightsalmon, lightseagreen,
            lightskyblue, lightslategray, lightslategrey,
            lightsteelblue, lightyellow, lime, limegreen,
            linen, magenta, maroon, mediumaquamarine,
            mediumblue, mediumorchid, mediumpurple,
            mediumseagreen, mediumslateblue, mediumspringgreen,
            mediumturquoise, mediumvioletred, midnightblue,
            mintcream, mistyrose, moccasin, navajowhite, navy,
            oldlace, olive, olivedrab, orange, orangered,
            orchid, palegoldenrod, palegreen, paleturquoise,
            palevioletred, papayawhip, peachpuff, peru, pink,
            plum, powderblue, purple, red, rosybrown,
            royalblue, rebeccapurple, saddlebrown, salmon,
            sandybrown, seagreen, seashell, sienna, silver,
            skyblue, slateblue, slategray, slategrey, snow,
            springgreen, steelblue, tan, teal, thistle, tomato,
            turquoise, violet, wheat, white, whitesmoke,
            yellow, yellowgreen