# **Impact of Covid19 on Digital Learning in the United States**

**We are here to take a look at the impact of the pandemic on students and digital learning tools. The 2020-2021 posed a great challenge for teachers as well as students, but, has it given us a new direction that we could embrace?**

**If yes, what are the factors contributing to it?**


![](http://static01.nyt.com/images/2020/03/18/business/18Techfix-illo/18Techfix-illo-articleLarge.gif?quality=75&auto=webp&disable=upscale)

## Import libraries ðŸ“š

In [None]:
import numpy as np 
import pandas as pd
import os
import glob
import warnings
import matplotlib.pyplot as plt
import seaborn as sns
import re
import calendar
import plotly as py
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from datetime import datetime
from datetime import date
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
pd.set_option('display.max_columns', None)
%matplotlib inline

# District Data
distdf = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv")

# Products Data
proddf = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/products_info.csv")

# # External Datasets
# # U.S. Policy Data (with date of announcement)
policy_df = pd.read_csv('../input/digitallearninghelperdatasets/policydata.csv')

# # Internet speed available in each state
speed_df = pd.read_csv('../input/digitallearninghelperdatasets/speedbystate.csv')

# # Expenditure per student in each state
expenditure = pd.read_csv('../input/digitallearninghelperdatasets/ExpendPerStudent_Statewise.csv')

# # Demography Data (Gender,Ethnic Origin,Race,Age,Population)
state_wise_demography = pd.read_csv('../input/digitallearninghelperdatasets/StatewiseDemography.csv')

# # Schools data 
schools_df = pd.read_csv('../input/schools-per-state-usa/state_schools.csv')

print("Data loaded successfully!")


# Let's have a look at our datasets and the available attributes


This step is crucial to any data analysis process that gives us a bird view of the details we have to find the right answers to our question of how the digital learning trends have been during the pandemic.

However, this could also be worked the other way around by asking, "**What kind of questions can I answer from this dataset?" ðŸ¤”**

## Data Definition
### Engagement data
![newplot (2)](https://user-images.githubusercontent.com/69154768/134678776-b7dc6e16-5fe1-4686-bf25-1808b3ac75a9.png)
![image](https://user-images.githubusercontent.com/69154768/134679097-28b9734d-8437-43b3-898a-3e7d2961ad4d.png)
### District information data
![newplot](https://user-images.githubusercontent.com/69154768/134678803-c46f0aff-53cb-48fd-96fe-dae4cb0bd000.png)
![image](https://user-images.githubusercontent.com/69154768/134679186-979ffee4-956a-4b81-9528-c79b9017a15b.png)
### Product information data
![newplot (1)](https://user-images.githubusercontent.com/69154768/134678785-21bd5ee8-d840-43bd-9a14-5bec09a76a54.png)
![image](https://user-images.githubusercontent.com/69154768/134679282-5811f31b-46f0-489d-8dbc-4559679624bc.png)

# Time to get our hands dirty by cleaning the dataset ðŸ§¹
![](https://user-images.githubusercontent.com/69154768/134681086-f58aed7d-b7f4-4ffb-afab-19f20ca6dfdf.gif)

In [None]:
# Normalizing the column names (To allow us to make joins)
distdf.columns  = ['district_id', 'State', 'locale', 'pct_black/hispanic',
       'pct_free/reduced', 'county_connections_ratio', 'pp_total_raw']

# Dropping obselete NaN values 
distdf.dropna(subset=['State'],inplace = True)
assert distdf['State'].isna().sum() == 0

In [None]:
# Dropping the districts that do not have engagement index data for all the days of the year

PATH = '../input/learnplatform-covid19-impact-on-digital-learning/engagement_data' 
temp = []

for district in distdf.district_id.unique():
    df = pd.read_csv(f'{PATH}/{district}.csv', index_col=None, header=0)
    df["district_id"] = district
    if df.time.nunique() == 366:
        temp.append(df)
    
engagement = pd.concat(temp)
engagement = engagement.reset_index(drop=True)

# Only consider districts with full 2020 engagement data
distdf = distdf[distdf['district_id'].isin(engagement.district_id.unique())].reset_index(drop=True)
distdf.fillna(0, inplace = True)
proddf = proddf[proddf['LP ID'].isin(engagement.lp_id.unique())].reset_index(drop=True)
engagement = engagement[engagement.lp_id.isin(proddf['LP ID'].unique())]

# Handling invalid datatypes
engagement['time'] = pd.to_datetime(engagement['time'])

# Deriving Weekday and Month from the date column 
engagement['weekday'] = engagement['time'].dt.weekday
engagement['month'] = engagement['time'].dt.month

In [None]:
# More clearning!

# Splitting the Primary Essential Function and one hot encode the sector values
proddf['primary_function_main'] = proddf['Primary Essential Function'].apply(lambda x: x.split(' - ')[0] if x == x else x) #if condition handles Null values which could throw an attribute error
proddf['primary_function_sub'] = proddf['Primary Essential Function'].apply(lambda x: x.split(' - ')[1] if x == x else x)

proddf.drop(columns = ['Primary Essential Function'], axis =1);

proddf['primary_function_sub'] = proddf['primary_function_sub'].replace({'Sites, Resources & References' : 'Sites, Resources & Reference'})

temp_sectors = proddf['Sector(s)'].str.get_dummies(sep="; ")
temp_sectors.columns = [f"sector_{re.sub(' ', '', c)}" for c in temp_sectors.columns]
proddf = proddf.join(temp_sectors)
proddf.drop("Sector(s)", axis=1, inplace=True)

# Normalizing the column names (To allow us to make joins)
proddf.columns = ['lp_id', 'URL', 'Product Name', 'Provider/Company Name',
       'Primary Essential Function', 'primary_function_main',
       'primary_function_sub', 'sector_Corporate', 'sector_HigherEd',
       'sector_PreK-12']

proddf.drop(proddf.loc[proddf['primary_function_main']=='LC/CM/SDO'].index, inplace=True)

In [None]:
schools_df['No.of public schools'] = schools_df['No.of public schools'].apply(lambda x: re.sub('[^A-Za-z0-9]+', '', str(x)))
schools_df['No.of public schools'] = schools_df['No.of public schools'].astype(int)

schools_df.columns = ['State', 'Code', 'No.of public schools']

expenditure.columns = ['State', 'K12 Spending', 'Postsecondary Spending']

expenditure['K12 Spending']=  expenditure['K12 Spending'].str[1:]
expenditure['K12 Spending'] = expenditure['K12 Spending'].apply(lambda x: re.sub('[^A-Za-z0-9]+', '', str(x)))
expenditure['K12 Spending']=  expenditure['K12 Spending'].astype(int)

expenditure['Postsecondary Spending']=  expenditure['Postsecondary Spending'].str[1:]
expenditure['Postsecondary Spending'] = expenditure['Postsecondary Spending'].apply(lambda x: re.sub('[^A-Za-z0-9]+', '', str(x)))
expenditure['Postsecondary Spending']=  expenditure['Postsecondary Spending'].astype(int)


In [None]:
# Function to convert 'range' like values to their average values (Ex. [0.3,0.5] --> 0.4)
def range_to_mean(val):
        if val != 0:
                val = re.sub('[^.,A-Za-z0-9]+', '', str(val))
                val  = val.split(',')
                vals = (float(val[0]) +float(val[1]))/2
                vals = round(vals, 2)
                return vals 
        else:
                return np.nan
            

# List of all the columns with range values
range_list = ['pct_black/hispanic','pct_free/reduced','county_connections_ratio','pp_total_raw']


for ele in range_list:
    col_name = 'mean_'+ele
    distdf[col_name] = distdf[ele].apply(lambda x: range_to_mean(x))
    distdf.drop( columns = ele , inplace = True)

# Exploratory Data Analysis ðŸŒŸ
![](http://healthcarecmi.files.wordpress.com/2020/12/growth-analysis.gif)

## Internet Speed

##### Let's start by looking at the average download speed of data available in each state

# Demography & Engagement Index

In [None]:
state_wise_demography.columns =  ['State', 'SEX_Val', 'Origin_Val', 'Race_val', 
                                  'age_bins','CENSUS2010POP', 'POPESTIMATE2019']
state_wise_demography.drop(state_wise_demography.loc[state_wise_demography['SEX_Val']=='Total'].index, inplace=True)
state_wise_demography.drop(state_wise_demography.loc[state_wise_demography['Origin_Val']=='Total'].index, inplace=True)
state_wise_demography = state_wise_demography[ (state_wise_demography['age_bins']=='[0, 10)') | (state_wise_demography['age_bins']=='[10, 20)')]

In [None]:
demo_df = state_wise_demography.groupby(['State'])['POPESTIMATE2019'].sum().reset_index()
demo_df = demo_df.sort_values(by=['POPESTIMATE2019'],ascending=False)
pop = pd.merge(demo_df,schools_df, on='State', how ='inner')

fig = go.Figure()
layout = dict(
    title_text = "<b> Number of Public Schools Per State in 2020 </b>",
    font=dict(size=18),
    title_x = 0.5,
    geo_scope='usa',
)

fig.add_trace(
    go.Choropleth(
        locations=pop["Code"],
        z = pop['No.of public schools'],
        locationmode = 'USA-states', 
        marker_line_color='black',
        colorscale=px.colors.sequential.Plasma_r
    )
)
            
fig.update_layout(layout,template = 'plotly_dark')   
fig.show()

In [None]:
demo_df = state_wise_demography.groupby(['State', 'Race_val'])['POPESTIMATE2019'].sum().reset_index()
demo_df = demo_df.sort_values(by=['POPESTIMATE2019'],ascending=False)
eng_dist = engagement.merge(distdf, on='district_id', how = 'inner')

layout = dict(
    title_text = "<b> Avergage Engagement Index By State </b>",
    xaxis_title="State",
    yaxis_title="Mean Engagement Index",
    title_x = 0.5)

temp = eng_dist.groupby(['State'])['engagement_index'].mean().reset_index()
temp = temp.sort_values(by = 'engagement_index', ascending = False)

fig = px.bar(x=temp['State'],y= temp['engagement_index'],
             color = temp['engagement_index'],
             color_continuous_scale=px.colors.sequential.RdBu)

fig.update_layout(layout, template = 'plotly_dark')   
fig.show()

### **Engagement Index is not correlated with the student population of the State**
> * **California** with a population over *2.5 Million* (0-20 years) has a engagment index more or less **same** to **Indiana** where the population is less than *0.5 Million*
> * **New York** the busiest state which has a population just over *1.5 Million* has the ***Highest Engagement Index***


In [None]:
temp1 = demo_df[demo_df.State.isin(eng_dist.State.unique())]
temp2 = eng_dist[eng_dist.State.isin(demo_df.State.unique())].groupby(['State'])['engagement_index'].mean().reset_index()
temp = temp1.merge(temp2, on= 'State', how= 'inner')

x = temp['State'].unique()[:10]

y1 = temp[temp['Race_val']=='American Indian or Alaska Native']['POPESTIMATE2019'].values[:10]
y2 = temp[temp['Race_val']=='Asian ']['POPESTIMATE2019'].values[:10]
y3 = temp[temp['Race_val']=='Black']['POPESTIMATE2019'].values[:10]
y4 = temp[temp['Race_val']=='Native Hawaiian and Other Pacific Islander ']['POPESTIMATE2019'].values[:10]
y5 = temp[temp['Race_val']=='Two or more races']['POPESTIMATE2019'].values[:10]

temp = temp.groupby(['State', 'engagement_index'])['POPESTIMATE2019'].sum().reset_index()
temp = temp.sort_values(by ='POPESTIMATE2019' , ascending = False)

y_line = temp['engagement_index'].values[:10]

fig = go.Figure(go.Bar(x=x, y=y1, name='American Indian or Alaska Native'))
fig.add_trace(go.Bar(x=x, y=y2,  name='Asian'))
fig.add_trace(go.Bar(x=x, y=y5, name='Two or more races'))
fig.add_trace(go.Bar(x=x, y=y4, name='Native Hawaiian and Other Pacific Islander'))
fig.add_trace(go.Bar(x=x, y=y3, name='Black'))
fig.add_trace(go.Scatter(x=x, name="Engagement Index",  line=dict(color="yellow") ,y=y_line*(10**3.5)))

fig.update_layout(barmode='stack', 
                  title_x = 0.5,
                  title_text = "<b> Avergage Engagement Index By State </b>",
                  xaxis_title="State",
                  yaxis_title="Population and Engagement Index",
                  template="plotly_dark", 
                  colorway=px.colors.qualitative.Pastel)
 
fig.show()

### **Impact of the primary education is more on Engagement Index**
> * **New York is the state that spends more on K-12 as well as Postsecondary education.** 
> * **Focussing on K12 education plays a key role in the average Engagement Index of the State.**

In [None]:
exp_eng_df = pd.merge(expenditure,
                      eng_dist.groupby(['State'])['engagement_index'].mean().reset_index(),
                      on='State', how ='inner')

x = exp_eng_df['State']
y1 = exp_eng_df['K12 Spending']
y2 = exp_eng_df['Postsecondary Spending']
y3 = exp_eng_df['engagement_index']*(10**1.7)


fig = go.Figure()

fig.add_trace(go.Scatter(x=x, y=y1,
                    mode='lines+markers',
                    name='K12 Spending'))
fig.add_trace(go.Scatter(x=x, y=y2,
                    mode='lines+markers',
                    name='Postsecondary Spending'))
fig.add_trace(go.Bar(x=x, y=y3,
                    name='Engagement Index'))

fig.update_layout(title_text = "<b> Expenditure and Engagement Index By State </b>",
                  title_x = 0.5,
                  xaxis_title="State",
                  yaxis_title="Expenditure and Engagement Index", 
                  colorway= ['#fc4503','#3afa05','#4257f5'],
                  template="plotly_dark")
fig.show()

## Impact of Internet speed in Digital Learning/Classroom
High speed Internet enhances every level of education from kindergarten through high school to college to graduate school. Advances in information and communications technology means that education is no longer confined to the classroom. New broadband-enabled educational tools allow for remote collaboration among fellow students on projects, videoconferences with teachers and real-time video exploration of faraway areas. The educational advantage possible with high speed Internet has become indispensable to students preparing to enter the 21st Century workforce.

Source: https://speedmatters.org/speedmatters-2/k-12-education

* Utah has the highest internet speed statewise with 296.63 Mb/s.


In [None]:
fig = go.Figure()
layout = dict(
    title_text = "<b> Speed By State in 2020 </b>",
    font=dict(size=18),
    title_x = 0.5,
    geo_scope='usa',
)

fig.add_trace(
    go.Choropleth(zmin = 50, zmax = 200,
        locations=speed_df["stateabbr"],
        hovertext=speed_df["State"],
        z = speed_df["avg_maxaddown"],
        locationmode = 'USA-states', 
        marker_line_color='black',
        colorscale=px.colors.sequential.Turbo
    )
)
            
fig.update_layout(layout,template = 'plotly_dark')   
fig.show()


### Is Internet speed a relevant factor for Engagement index?
So it seems intutuive that internet speed would play a major role in enabling Digital Learning/Classroom. But, here's the catch let's see how the engagement index are for the states.

* Utah with the highest internet speed of 296.63 Mb/s has an engagment index of 52. 
* However Arizon with an internet speed of 94.49 Mb/s has an engagment index of 214.

From these two examples we can safely say that higher internet speed may not necessarily lead to better enagagement of students.

In [None]:
speed_df.drop(speed_df.loc[speed_df['Year']==2019].index, inplace=True)
speed_df = speed_df.replace(['District of Columbia'],'District Of Columbia')
plot_data = speed_df.merge(eng_dist, how = 'inner', on = 'State')
top5 = plot_data.groupby(['State','avg_maxaddown'])['engagement_index'].mean().reset_index().sort_values(by = 'avg_maxaddown', ascending = False)[:5]
last5 = plot_data.groupby(['State','avg_maxaddown'])['engagement_index'].mean().reset_index().sort_values(by = 'avg_maxaddown', ascending = False)[-5:]

fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=("Top 5 states with high speed internet connections","Top 5 states with low speed internet connections",'cc'))


fig.add_trace(go.Bar(x=top5['State'], 
                         y=top5['avg_maxaddown'],
                         name='Internet Speed'),row=1, col=1)

fig.add_trace(go.Scatter(x=top5['State'], 
                            y=top5['engagement_index']*(10**-.55),mode='markers+lines',
                             name="Engagement Index", 
                             marker=dict(size=10),line=dict(width=6)),row=1, col=1)

fig.add_trace(go.Bar(x=last5['State'], 
                         y=last5['avg_maxaddown'], 
                         name='Internet Speed', showlegend=False),row=1, col=2)

fig.add_trace(go.Scatter(x=last5['State'], 
                             y=last5['engagement_index']*(10**-.6),
                             name="Engagement Index",
                             mode='markers+lines',
        marker=dict(size=10),line=dict(width=6), showlegend=False),row=1, col=2)

fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide', 
                  title_x = 0.45,
                  xaxis_title="State",
                  yaxis_title="Population and Engagement Index",
                  template="plotly_dark",
                  colorway=['#f3cec9', '#479f77'])

fig['layout']['xaxis']['title']='State'
fig['layout']['xaxis2']['title']='State'
fig['layout']['yaxis']['title']='Population and Engagement Index'
fig.show()

# Locale

### How does Engagement Index vary with locale ?

From the given dataset, we can observe that the engagement index from the locale 'city' is the least. Perhaps more relevant data could alter this observation.

In [None]:
temp = eng_dist.groupby(['locale'])['engagement_index'].mean().reset_index()
labels = temp['locale']
values = temp['engagement_index']

fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.5, pull=[0.1, 0, 0, 0])])

fig.update_layout(
                  title_text = "<b> Locale and Engagement Index </b>",
                  title_x = 0.5, 
                  colorway=px.colors.cyclical.Edge_r
                  )
fig.show()

#### From the plot below, we can observe that pct_access is positively correlated with engagement index. 

In [None]:
px.imshow(engagement.corr())

#### Is this correlation affected by the locale?

* Yes, from the below plot we can observe that the correlation stays more or less the same for all the locales except 'city'

In [None]:
df_ = eng_dist.groupby(['locale']).agg({'pct_access':'mean', 'engagement_index':'mean'}).reset_index()
df_ = df_.sort_values(by=['pct_access'], ascending=False)
fig = go.Figure()

fig.add_trace(go.Scatterpolar(
      r=df_['engagement_index']/100,
      theta=df_['locale'],
      fill='toself',
      name='Engagement Index'
))
fig.add_trace(go.Scatterpolar(
      r=df_['pct_access'],
      theta=df_['locale'],
      fill='toself',
      name='PCT Access'
))
fig.update_layout(
  polar=dict(
    radialaxis=dict(
      visible=True,
      range=[0, 5]
    )),template="plotly_dark",
  showlegend=True,
title=" <b> Correlation between Engagement Index and Locales </b>",
title_x = 0.5
)

fig.show()

# Products 

### Distribution of the products by their primary and sub functions

> A huge chunk of the products are primarily used for 'Learning & Curriculum(LC)" followed by "Classroom Management(CM)" and "School & District Operations (SDO)"

In [None]:
labels = proddf['primary_function_main'].value_counts().index
values = proddf['primary_function_main'].value_counts().tolist()

fig = go.Figure(data=[(go.Pie(labels=labels, values=values, hole=.5))])

fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide', 
                  title_text=" <b> Main - Primary Function Distribution </b>",
                  title_x = 0.5, 
                  colorway=px.colors.cyclical.Edge_r
                  )
fig.show()

In [None]:
fig = px.bar(x=proddf['primary_function_sub'].value_counts().index,
             y=  proddf['primary_function_sub'].value_counts().tolist(),
             color = proddf['primary_function_sub'].value_counts().tolist(),
             color_continuous_scale=px.colors.sequential.Agsunset_r,
             )

fig.update_layout(template = 'plotly_dark', font_size = 10,
                  title_x = 0.5, 
                  yaxis_title = "Count of Products",
                  xaxis_title = "Sub Function Category",
                  title_text=" <b> Sub Function Distribution </b>")

### How has the mean engagement index varied over time for the different product categories

> We can see an upward trend in the engagement index across all the primary categories over time with maximum increase in trend for the Classroom Management (CM) tools which must have become a necessity for virtual schooling. 

> This increase is followed by Learning & Curriculum (LC) tools which also gained momentum, albeit slowly.

> Therefore, digital learnint picked pace during the second half of the year 2020.

In [None]:
trend_eng = pd.merge(engagement, proddf, how="inner" , on= ['lp_id'])
trend_eng = trend_eng.groupby(['time','primary_function_main'])['engagement_index'].mean()
trend_eng = trend_eng.reset_index()
trend_eng['time'] = pd.to_datetime(trend_eng['time'])
policy_df['STEMERG'] = pd.to_datetime(policy_df['STEMERG'])
policy_df['CLSCHOOL'] = pd.to_datetime(policy_df['CLSCHOOL'])

fig = px.scatter(trend_eng, x="time", y='engagement_index',
                 color='primary_function_main',trendline="ols",template="plotly_dark",
                 color_discrete_sequence=px.colors.qualitative.Plotly_r, height =350, width = 950)

fig.add_vrect(x0= policy_df['STEMERG'].mode()[0],
              x1 = policy_df['CLSCHOOL'].mode()[0],
              annotation_text="State of emergency issued and K12 schools closed",
              annotation_position="top left",
              annotation=dict(font_size=10),
              fillcolor="red", opacity=0.7, line_width=1)

fig.update_traces(mode = 'lines')
fig.show()

**The plot below helps us explain the dips in given line chart**

> We can conclude that the minor dips are caused by the weekends
> Summer holiday can help us explain the relatively smaller engagement index between the period June 20 - August 16.

In [None]:
df_eng = engagement.groupby(['weekday','month'])["engagement_index"].mean().reset_index()
df_eng = df_eng.pivot("month", "weekday", "engagement_index")
fig = px.imshow(df_eng,color_continuous_scale='Reds')
fig.update_layout(template = 'plotly_dark')
fig.show()

### The top 10 products that show a positive trend in engagement index in the second half of the year

In [None]:
bins = [np.datetime64(date(2020,1,1)),
         np.datetime64(date(2020,3,1)), 
         np.datetime64(date(2020,6,19)),
         np.datetime64(date(2020,8,16)),
         np.datetime64(date(2020,12,31))]
        
group_names = ['Jan-Mar', 'Apr-Jun19','Jun20-Aug16','Aug17-Dec31']
engagement['time'] = pd.to_datetime(engagement['time'])
engdf = engagement.copy()
engdf.set_index('time', inplace=True)

engdf['datecategory'] = pd.cut(engdf.index, bins, labels=group_names)
#print(engdf)

engdf = engdf.groupby(['lp_id','datecategory'])['engagement_index'].mean()
engdf = pd.DataFrame(engdf).reset_index()
engdf['lp_id'] = engdf['lp_id'].astype('category')
tdf = engdf.pivot_table( values = 'engagement_index' ,
                            index= ['lp_id'],
                             columns =['datecategory'])

tdf.rename_axis(None,axis=1).reset_index(drop= True)
tdf = tdf.rename_axis(None,axis=1).reset_index()
tdf = tdf.replace(np.nan, 0)
left = tdf[(tdf['Jan-Mar'] + tdf['Apr-Jun19'])  <  tdf['Aug17-Dec31']]
result = pd.merge(left, proddf, how="inner" , on= ['lp_id'])

#calculating products with significant percentage change in trend
result['pct_chg'] = ((result['Aug17-Dec31'] - (tdf['Jan-Mar'] + tdf['Apr-Jun19'])))/ ((tdf['Jan-Mar'] + tdf['Apr-Jun19'])+0.1)
result = result[result['pct_chg'] > 0]
result = result.sort_values(['pct_chg'],ascending=False)[:10]
result = result.merge(engagement.groupby(['lp_id','time'])['engagement_index'].mean().reset_index(),
            how = 'left',
            on = 'lp_id')

fig = px.scatter(result, x=result['time'], trendline="ols", template="plotly_dark",
                 y=result['engagement_index'],color_discrete_sequence= px.colors.sequential.Rainbow,
                 color='Product Name', height =400, width = 1000)

fig.add_vrect(x0= policy_df['STEMERG'].mode()[0],
              x1 = policy_df['CLSCHOOL'].mode()[0],
             annotation_text="State of emergency issued and K12 schools closed",
              annotation_position="top left",
              annotation=dict(font_size=10),
              fillcolor="red", opacity=0.5, line_width=0.2)

fig.update_traces(mode = 'lines' )
fig.show()

### Popular products in each locale

> Google docs, Google classroom, Youtube, Canvas remain popular amongst all the locales, products are Kahoot and schoology have a good base in Town and Rural areas. 

In [None]:

loc_pd = eng_dist.groupby(['locale','lp_id'])['engagement_index'].mean().reset_index().sort_values('engagement_index',ascending = False).groupby('locale').head().sort_values(by='locale')
res = pd.merge(loc_pd, proddf, how="inner" , on= ['lp_id'])
res = res[['locale','Product Name','engagement_index']]
res = res.sort_values(by=['locale','engagement_index'],ascending = False)

fig = go.Figure()
fig = make_subplots(2, 2, horizontal_spacing=0.1, vertical_spacing=0.3, 
                    y_title='Mean Engagement Index',
                   subplot_titles=("Town","City",'Rural','Suburb'))
fig.add_trace(go.Bar(x= res[res['locale'] == 'Town']['Product Name'], 
                     y=res[res['locale'] == 'Town']['engagement_index']),1,1)


fig.add_trace(go.Bar(x= res[res['locale'] == 'City']['Product Name'], 
                     y=res[res['locale'] == 'City']['engagement_index']),1,2)
fig.add_trace(go.Bar(x= res[res['locale'] == 'Rural']['Product Name'], 
                     y=res[res['locale'] == 'Rural']['engagement_index']),2,1)
fig.add_trace(go.Bar(x= res[res['locale'] == 'Suburb']['Product Name'], 
                     y=res[res['locale'] == 'Suburb']['engagement_index']),2,2)

fig.update_layout(title = '<b> Popular products in each locale </b>' ,
                  showlegend = False, title_x = 0.5, template = 'plotly_dark')
fig.show()

In [None]:
bins = [np.datetime64(date(2020,1,1)),
         np.datetime64(date(2020,3,1)), 
         np.datetime64(date(2020,6,19)),
         np.datetime64(date(2020,8,16)),
         np.datetime64(date(2020,12,31))]
        
group_names = ['Jan-Mar', 'Apr-Jun19','Jun20-Aug16','Aug17-Dec31']

engdf = eng_dist.copy()

engdf['datecategory'] = pd.cut(engdf.time, bins, labels=group_names,include_lowest=True)

engdf = engdf.groupby(['lp_id','datecategory','locale','district_id'])['engagement_index'].mean().reset_index()
engdf['lp_id'] = engdf['lp_id'].astype('category')
engdf = engdf.replace(np.nan, 0)
product_wise = engdf.pivot_table(values = 'engagement_index',
                                       index= ['lp_id','locale'],
                                       columns =['datecategory'])
product_wise.rename_axis(None,axis=1).reset_index(drop= True)

product_wise = product_wise.rename_axis(None,axis=1).reset_index()
left = product_wise[(product_wise['Jan-Mar'] + product_wise['Apr-Jun19'])  <  product_wise['Aug17-Dec31']]
result = pd.merge(left, proddf, how="inner" , on= ['lp_id'])

#calculating products with significant percentage change in trend
result['pct_chg'] = ((result['Aug17-Dec31'] - (result['Jan-Mar'] + result['Apr-Jun19'])))/ ((result['Jan-Mar'] + result['Apr-Jun19']) +0.1)
result  = result[['lp_id','Product Name','locale','pct_chg']]

# locale wise products
loc_pd = result.groupby(['locale','lp_id','Product Name'])['pct_chg'].mean().reset_index()

loc_pd  = loc_pd.sort_values(['pct_chg'],ascending=False).groupby(['locale']).head(5)

fig = go.Figure()
fig = make_subplots(2, 2,y_title='Mean Engagement Index',
                   subplot_titles=('Suburb','Rural','City',"Town"))

fig.add_trace(go.Bar(x= loc_pd[loc_pd['locale'] == 'City']['Product Name'], 
                     y=loc_pd[loc_pd['locale'] == 'City']['pct_chg']),2,1)
fig.add_trace(go.Bar(x= loc_pd[loc_pd['locale'] == 'Rural']['Product Name'], 
                     y=loc_pd[loc_pd['locale'] == 'Rural']['pct_chg']),1,2)
fig.add_trace(go.Bar(x= loc_pd[loc_pd['locale'] == 'Suburb']['Product Name'], 
                     y=loc_pd[loc_pd['locale'] == 'Suburb']['pct_chg']),1,1)
fig.add_trace(go.Bar(x= loc_pd[loc_pd['locale'] == 'Town']['Product Name'], 
                     y=loc_pd[loc_pd['locale'] == 'Town']['pct_chg']),2,2)

fig.update_layout(title = ' <b> Products that gained popularity in each locale after summer holidays </b>', title_x = 0.5,
    template ='plotly_dark', showlegend = False)
fig.show()

# Acknowledgements

Thanks to [Leonie's](https://www.kaggle.com/iamleonie) -  [How To Approach Analytics Challenges](https://www.kaggle.com/iamleonie/how-to-approach-analytics-challenges)  which helped us to efficiently preprocess the data 

* [U.S Statewise Demography Data](https://www.kaggle.com/derinrobert/us-statewisedemography-info-20102019) wrangled from [United States Census Bureau](https://www.census.gov/data/tables/time-series/demo/popest/2010s-national-detail.html)

* [COVID-19 US State Policy Database](https://www.kaggle.com/cavfiumella/covid19-us-state-policy-database) by [Andrea Serpolla](https://www.kaggle.com/cavfiumella)

* [U.S. Public Education Spending Statistics](https://educationdata.org/public-education-spending-statistics) by [EDUCATIONDATA.ORG](https://educationdata.org/public-education-spending-statistics)


* [Number of Schools in the United States](https://www.statista.com/statistics/304974/us-public-elementary-and-secondary-schools-by-state/) from [statista.com](https://www.statista.com)


* [COVID-19 US State Policy Database](https://opendata.fcc.gov/resource/4kuc-phrr.json) from [FCC Open Data](https://opendata.fcc.gov/Wireline/Fixed-Broadband-Deployment-Data-June-2020-V1/4kuc-phrr)

##### *Code to efficiently extract the average upload and download speed from FCC Open Data*
```
import requests
import pandas as pd
response =  requests.get('https://opendata.fcc.gov/resource/4kuc-phrr.json?$select= stateabbr,avg(maxaddown),avg(maxadup)&$group= stateabbr')
print(response.status_code)  # 200 if the API Call was successful
data = response.json()
speed2020df = pd.DataFrame(data)  
```

# Happy Learning!