# INTRODUCTION

It is considered that COVID-19 affected all the aspects of human life around the world in a way that was never seen before. It is iportant to understand how the pandemic affected each part of our life. 

In this notebook the effect of COVID-19 on education is analyzed. Since the mode of learning changed to digital and online learning during pandemic, data from LearnPlatform is being used.

> This notbook is part of the analytics competion [ LearnPlatform COVID-19 Impact on Digital Learning](https://www.kaggle.com/c/learnplatform-covid19-impact-on-digital-learning)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import re
from datetime import datetime
from matplotlib.dates import date2num
import warnings
warnings.filterwarnings('ignore')

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:purple;
           font-size:100%;
           font-family:Verdana;
           letter-spacing:0.5px">
<h1 style="text-align: center;
           padding: 40px;
              color:white">
<a id="data"> </a>
1. Data Preparation
</h1>
</div>

**Competition data**

The data used in this notebook is from [ LearnPlatform COVID-19 Impact on Digital Learning](https://www.kaggle.com/c/learnplatform-covid19-impact-on-digital-learning)

This data includes 3 datasets as follows:

1. `products_info.csv`
2. `districts_info.csv`
3. `engagement_data` folder 

More information regarding the data is available in the link above.

**Additional Data**

In addition to the above data, the following covid related data is also being used:

1. [COVID-19 US State Policy Database](https://www.kaggle.com/cavfiumella/covid19-us-state-policy-database)
    
    File name: `data.csv` - only selected columns of this data are used
2. [USA-statewise(COVID-19 cases)](https://www.kaggle.com/umeshkumar017/usastatewisecovid19-cases)
    
    File name: `all-states-history.csv`
    
    Columns used: `state`, `date` and `positiveIncrease`
3. [2019 Census US Population Data By State](https://www.kaggle.com/peretzcohen/2019-census-us-population-data-by-state)
    
    This dataset was used to convert the covid increase rate(`positiveIncrease` column from statewise covid data) as a percentage of population in each state.
    
    File name: `2019_Census_US_Population_Data_By_State_Lat_Long.csv`
    
    Columns used: `STATE` and `POPESTIMATE2019`
4. [Summary of COVID-19 data for the United States](https://covidtracking.com/data/download)
    
    Daily increase rate of positive COVID-19 cases in the US.
    
    File name: `national-history.csv`
    
    Columns used: `date` and `positiveIncrease`

## Competition Data

In [None]:
districts_data = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv")
products_data = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/products_info.csv",
                           usecols=["LP ID", "Product Name", "Sector(s)", "Primary Essential Function"])

In [None]:
districts_data.head()

In [None]:
# find primary function from `Primary Essential Function`
products_data['primary_function_main'] = products_data['Primary Essential Function'].apply(lambda x: x.split(' - ')[0] if x == x else x)

In [None]:
products_data.head()

In [None]:
PATH = '../input/learnplatform-covid19-impact-on-digital-learning/engagement_data' 

temp = []

for district in districts_data.district_id.unique():
    df = pd.read_csv(f'{PATH}/{district}.csv', index_col=None, header=0)
    df["district_id"] = district
    temp.append(df)
    
    
engagement = pd.concat(temp)
engagement = engagement.reset_index(drop=True)

engagement.head()

## Merging Datasets

Merge the competion datasets `districts_info` and `product_info` to `engagement` data.

Use `LP ID` to merge `product_info`.
Use `district_id` to merge `districts_info`.

In [None]:
engagement = pd.merge(engagement, products_data, left_on="lp_id", right_on="LP ID", how="left")

engagement.drop(['LP ID', "lp_id"], inplace=True, axis=1)

# convert 'time' column to data type datetime
engagement['time'] = pd.to_datetime(engagement['time'])
# create column for Month names
engagement['month'] = engagement['time'].dt.month_name()

engagement.head()

In [None]:
del products_data

In [None]:
engagement = pd.merge(engagement, districts_data, on="district_id", how="left")

engagement.head()

In [None]:
# Standardize 'engagement_index' and 'pct_access' using Min Max
# apply normalization techniques
for column in ['pct_access', 'engagement_index']:
    engagement[column+'_norm'] = (engagement[column] - engagement[column].min()) / (engagement[column].max() - engagement[column].min())

# adding a column
engagement['prdt_use'] = engagement['pct_access'] * engagement['engagement_index']

engagement.head()

## Additional Data

#### US Statewise COVID Data

In [None]:
covid_statewise = pd.read_csv("../input/usastatewisecovid19-cases/all-states-history.csv", 
                              usecols=["date", "state", 'positiveIncrease'])

covid_statewise.head()

In [None]:
state_map_dict = {'AL': 'Alabama',
 'AK': 'Alaska',
 'AS': 'American Samoa',
 'AZ': 'Arizona',
 'AR': 'Arkansas',
 'CA': 'California',
 'CO': 'Colorado',
 'CT': 'Connecticut',
 'DE': 'Delaware',
 'DC': 'District of Columbia',
 'D.C.': 'District of Columbia',
 'FM': 'Federated States of Micronesia',
 'FL': 'Florida',
 'GA': 'Georgia',
 'GU': 'Guam',
 'HI': 'Hawaii',
 'ID': 'Idaho',
 'IL': 'Illinois',
 'IN': 'Indiana',
 'IA': 'Iowa',
 'KS': 'Kansas',
 'KY': 'Kentucky',
 'LA': 'Louisiana',
 'ME': 'Maine',
 'MH': 'Marshall Islands',
 'MD': 'Maryland',
 'MA': 'Massachusetts',
 'MI': 'Michigan',
 'MN': 'Minnesota',
 'MS': 'Mississippi',
 'MO': 'Missouri',
 'MT': 'Montana',
 'NE': 'Nebraska',
 'NV': 'Nevada',
 'NH': 'New Hampshire',
 'NJ': 'New Jersey',
 'NM': 'New Mexico',
 'NY': 'New York',
 'NC': 'North Carolina',
 'ND': 'North Dakota',
 'MP': 'Northern Mariana Islands',
 'OH': 'Ohio',
 'OK': 'Oklahoma',
 'OR': 'Oregon',
 'PW': 'Palau',
 'PA': 'Pennsylvania',
 'PR': 'Puerto Rico',
 'RI': 'Rhode Island',
 'SC': 'South Carolina',
 'SD': 'South Dakota',
 'TN': 'Tennessee',
 'TX': 'Texas',
 'UT': 'Utah',
 'VT': 'Vermont',
 'VI': 'Virgin Islands',
 'VA': 'Virginia',
 'WA': 'Washington',
 'WV': 'West Virginia',
 'WI': 'Wisconsin',
 'WY': 'Wyoming'}


def get_state_codes(x):
    try:
        return state_map_dict[x]
    except:
        return "Others"
    
covid_statewise["state"] = covid_statewise["state"].apply(lambda x: get_state_codes(x))


# convert 'time' column to data type datetime
covid_statewise['date'] = pd.to_datetime(covid_statewise['date'])
# create column for Month names
covid_statewise['month'] = covid_statewise['date'].dt.month_name()

covid_statewise['year'] = covid_statewise['date'].dt.year

covid_statewise = covid_statewise.loc[covid_statewise['year'] == 2020]
covid_statewise.drop(labels=['year'], inplace=True, axis=1)

covid_statewise.head()

In [None]:
posInc_monthlyMean = covid_statewise.groupby(['state','month']).agg({'positiveIncrease': 'mean'})

posInc_monthlyMean = posInc_monthlyMean.reset_index()

posInc_monthlyMean

#### US Statewise population data

In [None]:
popu_statewise = pd.read_csv("../input/2019-census-us-population-data-by-state/2019_Census_US_Population_Data_By_State_Lat_Long.csv", 
                              usecols=["STATE", "POPESTIMATE2019"])

popu_statewise.head()

In [None]:
def getPopuPercent(x):
    #print(x['state'])
    statePopulation = popu_statewise.loc[popu_statewise['STATE']==x['state'],'POPESTIMATE2019']
    if len(statePopulation)>0:
        statePopulation = statePopulation.values[0]
    else:
        return np.nan
    #print(statePopulation)
    positiveIncreasePercent = (x['positiveIncrease']/statePopulation)*100
    positiveIncreasePercent = np.round(positiveIncreasePercent, 3)
    #print(positiveIncreasePercent)
    return positiveIncreasePercent
    
posInc_monthlyMean["positiveIncreasePercent"] = posInc_monthlyMean.apply(getPopuPercent, axis=1)



#posInc_monthlyMean.loc[posInc_monthlyMean['state']=='Virgin Islands']

posInc_monthlyMean.head()

In [None]:
# States as column headers
stateCovid = posInc_monthlyMean.pivot_table('positiveIncreasePercent', ['month'], 'state')

# order months
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
stateCovid.reset_index(inplace=True)
stateCovid['month'] = pd.Categorical(stateCovid['month'], categories=months, ordered=True)
stateCovid.sort_values(by='month',inplace=True) 
stateCovid.set_index('month', inplace=True)

stateCovid

In [None]:
# Covid increase in all states
fig,ax1 = plt.subplots(figsize=(20,16))

sns.lineplot(data=stateCovid, markers=True, lw=2, ax=ax1)

title = 'The monthly average COVID-19 increase rate in each state in the US'

ax1.set_xlabel('')
ax1.set_ylabel('Covid Increase % mean', size=14)
plt.title(title, fontsize=16, weight='bold', color="gray")

ax1.tick_params(labelsize=14, width=0.5, length=1.5)

#### Nation wide covid data summary

In [None]:
national_covid = pd.read_csv("../input/d/sreeedevi/us-covid-summary/national-history.csv", 
                              usecols=["date", "positiveIncrease"])

national_covid.head()

In [None]:
# convert 'time' column to data type datetime
national_covid['date'] = pd.to_datetime(national_covid['date'])
# create column for Month names
national_covid['month'] = national_covid['date'].dt.month_name()

national_covid['year'] = national_covid['date'].dt.year

national_covid = national_covid.loc[national_covid['year'] == 2020]
national_covid.drop(labels=['year'], inplace=True, axis=1)


# Min Max normalizing 'positiveIncrease'
column = 'positiveIncrease'
national_covid['positiveIncreaseNorm'] = (national_covid[column] - national_covid[column].min()) / (national_covid[column].max() - national_covid[column].min())    

national_covid.head()

#### COVID-19 US State Policy Database

In [None]:
state_policy = pd.read_csv("../input/covid19-us-state-policy-database/data.csv", 
                          usecols=["STATE", "STEMERG", "STEMERGEND", "STEMERG2", "STAYHOME", "STAYHOMENOGP", "END_STHM"])

state_policy.head()

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:purple;
           font-size:100%;
           font-family:Verdana;
           letter-spacing:0.5px">
<h1 style="text-align: center;
           padding: 40px;
              color:white">
<a id="analysis"> </a>
2. Analysis
</h1>
</div>

[Q1. What is the picture of digital connectivity and engagement in 2020?](#Q1)

[Q2. What is the effect of the COVID-19 pandemic on online and distance learning, and how might this also evolve in the future?](#Q2)

[Q3. How does student engagement with different types of education technology change over the course of the pandemic?](#Q3)

[Q4. How does student engagement with online learning platforms relate to different geography? Demographic context (e.g., race/ethnicity, ESL, learning disability)? Learning context? Socioeconomic status?](#Q4)

[Q5. Do certain state interventions, practices or policies (e.g., stimulus, reopening, eviction moratorium) correlate with the increase or decrease online engagement](#Q5)

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:80%;
           font-family:Verdana;
           letter-spacing:0.5px">
<h1 style="text-align: center;
           padding: 20px;
              color:purple">
<a id="Q1"> </a>
Q1. What is the picture of digital connectivity and engagement in 2020?
</h1>
</div>

* How the overall monthly mean of `pct_access` and `engagement_index` changes over time?

* Which is the most used product (`Product Name`) in each district in each month in each `Sector(s)` and `Primary Essential Function`? (Based on `pct_access` and `engagement_index`)
    - Does in change over time?
    - Is there a common product that is used in most districts?
        - is there a trend in districts based on `pct_black/hispanic`, `pct_free/reduced` , `county_connections_ratio` and `pp_total_raw`?
        - is there a trend based on states?
        
        - what is the racial-economic status of districts with lower pct values in relation to districts with highter pct value *in each state*?

### How the overall monthly mean of pct_access and engagement_index changes over time?

In [None]:
overall = engagement.groupby(['month']).agg({'pct_access': 'mean','engagement_index': 'mean'})

# order months
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
overall.reset_index(inplace=True)
overall['month'] = pd.Categorical(overall['month'], categories=months, ordered=True)
overall.sort_values(by='month',inplace=True) 
overall.set_index('month', inplace=True)

overall

In [None]:
#plt.rcParams['figure.dpi'] = 600
fig = plt.figure(figsize=(20, 16), facecolor='#f6f5f5')
gs = fig.add_gridspec(5,1)
gs.update(wspace=0.3, hspace=0.4)

background_color = "#f6f5f5" # same as fig facecolor
sns.set_palette(['#ffd514'])

title = 'Overall student engagement trend'
plt.suptitle(title, fontsize=20, weight='bold')

ax1 = fig.add_subplot(gs[0, 0])

ax2 = ax1.twinx()
sns.lineplot(data=overall.loc[:,'pct_access'], 
             markers=True, lw=2, ax=ax1, color='y', marker='o', zorder=2, label = 'pct_access',legend=0)
sns.lineplot(data=overall.loc[:,'engagement_index'], 
             markers=True, lw=2, ax=ax2, color="green", marker='o', zorder=2, label = 'engagement_index',legend=0)



ax1.set_xlabel('')
ax1.set_ylabel('pct_access', color='gray', size=14)
ax2.set_ylabel('engagement_index', color="gray", size=14)
plt.title('', fontsize=16, weight='bold', color="gray")

ax1.tick_params(labelsize=14, width=0.5, length=1.5)
ax2.tick_params(labelsize=14, width=0.5, length=1.5)

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
ax1.set_xticklabels(months, rotation=90);


### The most used product (`Product Name`) in each district in each month in each `Sector(s)` and `Primary Essential Function`

In [None]:
def findPopularPrdtsInSector(df_engagement, sector):

    mostUsedProduct = df_engagement.groupby(['month','district_id', "Sector(s)", "Product Name"]).agg({'pct_access': 'mean', 'prdt_use':'mean'})

    mostUsedProduct.reset_index(inplace=True)

    all_months = mostUsedProduct['month'].unique()
    #print(all_months)

    all_districts = mostUsedProduct['district_id'].unique()
    #print(all_districts.shape)

    all_sectors = mostUsedProduct["Sector(s)"].unique()
#     print(all_sectors)

    mostUsedProduct_sec = mostUsedProduct.loc[mostUsedProduct["Sector(s)"]==sector]

    mostUsedProduct_sec.reset_index(inplace=True, drop=True)

    # max pct_access product

    idx = mostUsedProduct_sec.groupby(['month','district_id'])['pct_access'].transform(max) == mostUsedProduct_sec['pct_access']
    popular = mostUsedProduct_sec[idx]

    popular.drop(['Sector(s)', 'prdt_use'], axis=1, inplace=True)


    

    # # max pct_access and engagement_index product
    # idx = mostUsedProduct_sec.groupby(['month','district_id'])['prdt_use'].transform(max) == mostUsedProduct_sec['prdt_use']
    # most_used = mostUsedProduct_sec[idx]
    # most_used
    
    return popular


In [None]:
popular = findPopularPrdtsInSector(engagement, "PreK-12")

popular

In [None]:
def findPopularPrdtsInState(df_popular, state):
    
    # merge with district info dataset to get the state names
    popular_merged = pd.merge(df_popular, districts_data, on="district_id", how="left")
    popular_merged['district_id'] = popular_merged['district_id'].apply(lambda x: str(x))

    #popular_merged
    #stateslist = popular_merged['state'].unique()
    
    popular_state = popular_merged.loc[popular_merged['state']==state]
    
    return popular_state

In [None]:
popular_state = findPopularPrdtsInState(popular, 'Illinois')

popular_state.head()

In [None]:
# Plot function

def plotDistCategories(popular_state,states):
    months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

    #plt.rcParams['figure.dpi'] = 600
    fig = plt.figure(figsize=(24, 20), facecolor='#f6f5f5')
    rows, cols = 3, 4
    gs = fig.add_gridspec(rows,cols)
    gs.update(wspace=0.3, hspace=0.4)

    title = 'Popular products - ' + states
    plt.suptitle(title, fontsize=14, weight='bold')

    k = 0

    for r in range(rows):
        for c in range(cols):
            ax1 = fig.add_subplot(gs[r, c])

            eachMonth = months[k]
            k = k + 1

            popular_temp = popular_state.loc[popular_state['month']==eachMonth]
            popular_temp.reset_index(inplace=True, drop=True)
            ax1.barh(popular_temp['district_id'], popular_temp['pct_access'])

            ax1.set_xlabel('pct_access',  color='gray', size=14)
            plt.title(eachMonth, fontsize=16, weight='bold', color="gray")

            for e, p in enumerate(ax1.patches):
                x = p.get_x() + p.get_width() + 1
                y = p.get_y() + p.get_height() / 2 
                ax1.text(x, y, popular_temp.loc[e,"Product Name"], ha='center', va='center', fontsize=10)

    plt.show()

In [None]:
plotDistCategories(popular_state, 'Illinois')

In [None]:
##### # districts as column headers
popular_prdt = popular.pivot_table("Product Name", ['month'], 'district_id', aggfunc=lambda x: x)
# order months
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
popular_prdt.reset_index(inplace=True)
popular_prdt['month'] = pd.Categorical(popular_prdt['month'], categories=months, ordered=True)
popular_prdt.sort_values(by='month',inplace=True) 
popular_prdt.set_index('month', inplace=True)

popular_prdt = popular_prdt.T

popular_prdt


# districts as column headers
popular_pct = popular.pivot_table('pct_access', ['month'], 'district_id')
# order months
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
popular_pct.reset_index(inplace=True)
popular_pct['month'] = pd.Categorical(popular_pct['month'], categories=months, ordered=True)
popular_pct.sort_values(by='month',inplace=True) 
popular_pct.set_index('month', inplace=True)

popular_pct = popular_pct.T

popular_pct

statewise_distlist = districts_data.groupby('state').agg(lambda x:list(x))['district_id']
distInEachState = statewise_distlist['Illinois']

months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
df = pd.DataFrame(index=months)

for i in distInEachState:
    df['prdt_'+str(i)] = popular_prdt.loc[i]
    df['pct_'+str(i)] = popular_pct.loc[i]

df

In [None]:
# Plot function

# prdtToPlot = popular_prdt.loc[1000]
# pctToPlot = popular_pct.loc[1000]
# color = 'y'

# prdtToPlot_1 = popular_prdt.loc[1039]
# pctToPlot_1 = popular_pct.loc[1039]
# color_1 = 'g'

#def plotDistricts(prdtToPlot, pctToPlot, color='gray'):

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

#plt.rcParams['figure.dpi'] = 600
fig = plt.figure(figsize=(20, 10), facecolor='#f6f5f5')
gs = fig.add_gridspec(1,1)
gs.update(wspace=0.3, hspace=0.4)

background_color = "#f6f5f5" # same as fig facecolor
sns.set_palette(['#ffd514'])

title = 'Popular products (Statewise)'
plt.suptitle(title, fontsize=20, weight='bold')

ax1 = fig.add_subplot(gs[0, 0])

n = len(distInEachState)
colors = plt.cm.jet(np.linspace(0,1,n))

for i in range(len(distInEachState)):
    ax1.plot(months, df.loc[:,'pct_'+str(distInEachState[i])], 
             color=colors[i], lw=2, marker='o', label = str(distInEachState[i]))

ax1.legend(loc=1)
ax1.set_xlabel('')
ax1.set_ylabel('pct_access', color='gray', size=14)
plt.title('', fontsize=16, weight='bold', color="gray")
ax1.tick_params(labelsize=14, width=0.5, length=1.5)
ax1.set_xticklabels(months, rotation=90);

# for x,y,prdt in zip(months,pctToPlot,prdtToPlot):
#     ax1.annotate(prdt, xy=(x,y), textcoords='data', color=color, fontweight='bold', fontsize=12)


### What is the racial-economic status of districts with lower pct values in relation to districts with highter pct value *in each state*?

First, lets find the districts with higher and lower pct values *in each state*

In [None]:
def findHighLowDistID(popular_state, verbose=1):
    mean_pct_statewise = popular_state.groupby('district_id').agg({'pct_access':'mean'})

    mean_pct_statewise

    if mean_pct_statewise.shape[0] > 1:
        idx = mean_pct_statewise.idxmin()
        lowest_district_id = idx[0]
        lowest_pct = mean_pct_statewise.loc[lowest_district_id][0]

        idx = mean_pct_statewise.idxmax()
        highest_district_id = idx[0]
        highest_pct = mean_pct_statewise.loc[highest_district_id][0]
    else:
        return []

    mean_pct = mean_pct_statewise['pct_access'].mean()

    if verbose:
        print("District ID with Highest Pct:%s (pct = %.3f)" %(highest_district_id,highest_pct) )
        print("District ID with Lowest Pct:%s (pct = %.3f)" %(lowest_district_id,lowest_pct) )
        print("Mean pct fot the state = %.3f" %mean_pct)

    return [(lowest_district_id, lowest_pct), (highest_district_id, highest_pct), mean_pct]


In [None]:
popular = findPopularPrdtsInSector(engagement, "PreK-12")

popular_state = findPopularPrdtsInState(popular, 'Illinois')

popular_state

In [None]:
outList = findHighLowDistID(popular_state)

(lowest_district_id, lowest_pct) = outList[0]
(highest_district_id, highest_pct) = outList[1]
mean_pct = outList[2]

In [None]:
def getHighLowPctInSector(engagement, sector):

    stateslist = engagement['state'].unique()

    popular = findPopularPrdtsInSector(engagement, sector)

    lowhigh_pct_dists = pd.DataFrame(columns=['state', 'sector', 'val_prop', 'pct_dist', 'pct'])
    e = 0
    for st in stateslist:

        popular_state = findPopularPrdtsInState(popular, st)

        outList = findHighLowDistID(popular_state, verbose=0)

        if len(outList) > 0:

            (lowest_district_id, lowest_pct) = outList[0]
            (highest_district_id, highest_pct) = outList[1]
            mean_pct = outList[2]

            lowhigh_pct_dists.loc[e, 'state'] = st 
            lowhigh_pct_dists.loc[e, 'sector'] = "PreK-12" 
            lowhigh_pct_dists.loc[e, 'val_prop'] = "Low" 
            lowhigh_pct_dists.loc[e, 'pct_dist'] = lowest_district_id 
            lowhigh_pct_dists.loc[e, 'pct'] = lowest_pct

            e = e + 1

            lowhigh_pct_dists.loc[e, 'state'] = st 
            lowhigh_pct_dists.loc[e, 'sector'] = "PreK-12" 
            lowhigh_pct_dists.loc[e, 'val_prop'] = "High" 
            lowhigh_pct_dists.loc[e, 'pct_dist'] = highest_district_id 
            lowhigh_pct_dists.loc[e, 'pct'] = highest_pct

            e = e + 1

            lowhigh_pct_dists.loc[e, 'state'] = st 
            lowhigh_pct_dists.loc[e, 'sector'] = "PreK-12" 
            lowhigh_pct_dists.loc[e, 'val_prop'] = "Mean" 
            lowhigh_pct_dists.loc[e, 'pct'] = mean_pct

            e = e + 1


    return lowhigh_pct_dists

In [None]:
def getAnanlysis(lowhigh_pct_dists):
    # Analysis to get the distribution of racial, economic, geographical factors in high and low pct districts

    lowhigh_pct_dists_1 = lowhigh_pct_dists.loc[lowhigh_pct_dists['val_prop']!="Mean"]
    lowhigh_pct_dists_1['pct_dist'] = lowhigh_pct_dists_1['pct_dist'].astype(int)
    # print(lowhigh_pct_dists_1.shape)
    lowhigh_pct_dists_1 = pd.merge(lowhigh_pct_dists_1, districts_data, right_on='district_id', left_on="pct_dist", how="left")

    low_temp = lowhigh_pct_dists_1.loc[lowhigh_pct_dists_1['val_prop']=="Low"]
    high_temp = lowhigh_pct_dists_1.loc[lowhigh_pct_dists_1['val_prop']=="High"]

    def countVal(low_temp, high_temp, categories):
        low_valcount = pd.DataFrame(low_temp[categories].value_counts())
        low_valcount['type'] = "Low"
        low_valcount

        high_valcount = pd.DataFrame(high_temp[categories].value_counts())
        high_valcount['type'] = "High"
        high_valcount

        return (low_valcount,high_valcount)

    low_valcount,high_valcount = countVal(low_temp, high_temp, 'locale')
    locale_count = pd.concat([low_valcount, high_valcount])
    locale_count.reset_index(inplace=True)

    low_valcount,high_valcount = countVal(low_temp, high_temp, 'pct_black/hispanic')
    race_count = pd.concat([low_valcount, high_valcount])
    race_count.reset_index(inplace=True)

    low_valcount,high_valcount = countVal(low_temp, high_temp, 'pct_free/reduced')
    eco_count = pd.concat([low_valcount, high_valcount])
    eco_count.reset_index(inplace=True)

    low_valcount,high_valcount = countVal(low_temp, high_temp, 'county_connections_ratio')
    conn_count = pd.concat([low_valcount, high_valcount])
    conn_count.reset_index(inplace=True)

    low_valcount,high_valcount = countVal(low_temp, high_temp, 'pp_total_raw')
    pp_count = pd.concat([low_valcount, high_valcount])
    pp_count.reset_index(inplace=True)
    
    return [locale_count, race_count, eco_count, conn_count, pp_count]

In [None]:
def plotDistStateSector(lowhigh_pct_dists, locale_count, race_count, eco_count, conn_count, pp_count):
    #plt.rcParams['figure.dpi'] = 600
    fig = plt.figure(figsize=(20, 16), facecolor='#f6f5f5')

    ax1 = plt.subplot2grid((3, 3), (0, 0), colspan=2, rowspan=2)
    sns.barplot(data = lowhigh_pct_dists, x='state', y = 'pct', hue='val_prop', ax=ax1)

    ax1.set_xlabel('pct_access',  color='gray', size=14)
    plt.title('', fontsize=16, weight='bold', color="gray")
    labels = ax1.get_xticklabels()
    ax1.set_xticklabels(labels, rotation=90);
    
    

    ax1 = plt.subplot2grid((3, 3), (0, 2))
    plt.subplots_adjust(hspace=1)
    sns.barplot(data = locale_count, x='index', y = 'locale', hue='type', ax=ax1)

    ax1 = plt.subplot2grid((3, 3), (1, 2))
    sns.barplot(data = race_count, x='index', y = 'pct_black/hispanic', hue='type', ax=ax1)

    ax1 =  plt.subplot2grid((3, 3), (2, 2))
    sns.barplot(data = eco_count, x='index', y = 'pct_free/reduced', hue='type', ax=ax1)

    ax1 = plt.subplot2grid((3, 3), (2, 1))
    sns.barplot(data = conn_count, x='index', y = 'county_connections_ratio', hue='type', ax=ax1)

    ax1 = plt.subplot2grid((3, 3), (2, 0))
    sns.barplot(data = pp_count, x='index', y = 'pp_total_raw', hue='type', ax=ax1)

    ax1.set_xlabel('pct_access',  color='gray', size=14)
    plt.title('', fontsize=16, weight='bold', color="gray")
    labels = ax1.get_xticklabels()
    ax1.set_xticklabels(labels, rotation=90);

In [None]:
sectorlist = engagement["Sector(s)"].unique()
sectorlist

In [None]:
lowhigh_pct_dists = getHighLowPctInSector(engagement, "PreK-12")
[locale_count, race_count, eco_count, conn_count, pp_count] = getAnanlysis(lowhigh_pct_dists)

plotDistStateSector(lowhigh_pct_dists, locale_count, race_count, eco_count, conn_count, pp_count)

In [None]:
lowhigh_pct_dists = getHighLowPctInSector(engagement, "PreK-12; Higher Ed")
[locale_count, race_count, eco_count, conn_count, pp_count] = getAnanlysis(lowhigh_pct_dists)

plotDistStateSector(lowhigh_pct_dists, locale_count, race_count, eco_count, conn_count, pp_count)

In [None]:
lowhigh_pct_dists = getHighLowPctInSector(engagement, "Higher Ed; Corporate")
[locale_count, race_count, eco_count, conn_count, pp_count] = getAnanlysis(lowhigh_pct_dists)

plotDistStateSector(lowhigh_pct_dists, locale_count, race_count, eco_count, conn_count, pp_count)

In [None]:
lowhigh_pct_dists = getHighLowPctInSector(engagement, "Corporate")
[locale_count, race_count, eco_count, conn_count, pp_count] = getAnanlysis(lowhigh_pct_dists)

plotDistStateSector(lowhigh_pct_dists, locale_count, race_count, eco_count, conn_count, pp_count)

In [None]:
lowhigh_pct_dists = getHighLowPctInSector(engagement, "PreK-12; Higher Ed; Corporate")
[locale_count, race_count, eco_count, conn_count, pp_count] = getAnanlysis(lowhigh_pct_dists)

plotDistStateSector(lowhigh_pct_dists, locale_count, race_count, eco_count, conn_count, pp_count)

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:80%;
           font-family:Verdana;
           letter-spacing:0.5px">
<h1 style="text-align: center;
           padding: 20px;
              color:purple">
<a id="Q3"> </a>
Q3. How does student engagement with different types of education technology change over the course of the pandemic?
</h1>
</div>

* How monthly average `pct_access` in each state relate to pandemic progression?

* How pandemic progression affected the monthly average online learning of each `primary_function_main`?

* How some of the top subclasses of primary functions of technologies got affected with pandemic progression? 

### How monthly average `pct_access` in each state relate to pandemic progression?

In [None]:
pct_access_MonthlyMean = engagement.groupby(['state','month']).agg({'pct_access': 'mean'})

pct_access_MonthlyMean = pct_access_MonthlyMean.reset_index()
# States as column headers
pct_access_MonthlyMean = pct_access_MonthlyMean.pivot_table('pct_access', ['month'], 'state')
# order months
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
pct_access_MonthlyMean.reset_index(inplace=True)
pct_access_MonthlyMean['month'] = pd.Categorical(pct_access_MonthlyMean['month'], categories=months, ordered=True)
pct_access_MonthlyMean.sort_values(by='month',inplace=True) 
pct_access_MonthlyMean.set_index('month', inplace=True)


pct_access_MonthlyMean

In [None]:
# Find out if any states in the Online learnrning data missing in the covid data list

pctList = list(pct_access_MonthlyMean.columns)

covidList = list(stateCovid.columns)

main_list = list(set(pctList) - set(covidList))
#print(main_list)
# ['District Of Columbia']

# Rename 'District of Columbia' as in online learning dataset

#stateCovid.columns

stateCovid.rename(columns = {'District of Columbia':'District Of Columbia'}, inplace = True)

# States that are not in the online learning dataset to be removed from the covid state list also

pctList = list(pct_access_MonthlyMean.columns)

covidList = list(stateCovid.columns)

del_list = list(set(covidList) - set(pctList))

del_list

stateCovid.drop(columns = del_list, inplace=True)

# Merge online learning and covid data
pct_access_covid = pct_access_MonthlyMean.merge(stateCovid, on="month", suffixes=("_pctAccess", "_popu"))

# pct_access_covid.columns

In [None]:
statesList = list(pct_access_MonthlyMean.columns)

rows, cols = 12, 2

#plt.rcParams['figure.dpi'] = 600
fig = plt.figure(figsize=(20, 60), facecolor='#f6f5f5')
gs = fig.add_gridspec(rows,cols)
gs.update(wspace=0.3, hspace=0.4)


fig.tight_layout()
fig.subplots_adjust(top=0.9)

background_color = "#f6f5f5" # same as fig facecolor
sns.set_palette(['#ffd514'])

title = 'How Covid Increase Rate affected Online learning?'
plt.suptitle(title, fontsize=20, weight='bold')

#fig.text(0.5, 0.04, 'Months', ha='center')
fig.text(0.04, 0.5, 'pct_access', va='center', rotation='vertical', fontsize=16)

k = 0
for i in range(rows):
    for j in range(cols):
        if k<len(statesList):
            statename = statesList[k]
            specificStateData = pct_access_covid[pct_access_covid.columns[pd.Series(pct_access_covid.columns).str.startswith(statename)]]
        else:
            continue
        
        
        ax1 = fig.add_subplot(gs[i, j])
            
            
        ax2 = ax1.twinx()
        
        sns.lineplot(data=specificStateData.iloc[:,0], markers=True, lw=2, ax=ax1, marker='o', zorder=2, label = 'pct_access',legend=0)
        sns.lineplot(data=specificStateData.iloc[:,1], markers=True, lw=2, ax=ax2, color="red", marker='o', zorder=2, label = 'Covid increase rate',legend=0)
        
        k = k + 1
        
        title = statename

        ax1.set_xlabel('')
        ax1.set_ylabel('', color='b', size=14)
        ax2.set_ylabel('', color='r', size=14)
        plt.title(title, fontsize=16, weight='bold', color="gray")

        ax1.tick_params(labelsize=14, width=0.5, length=1.5)
        ax2.tick_params(labelsize=14, width=0.5, length=1.5)
        
        months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
        ax1.set_xticklabels(months, rotation=90)
        
handles, labels = [(a + b) for a, b in zip(ax1.get_legend_handles_labels(), ax2.get_legend_handles_labels())]
fig.legend(handles, labels, loc='upper center')

### How pandemic progression affected the monthly average online learning of each `primary_function_main`?

Next, the effect of pandemic progression on each type of online learning technology is analyzed below.

In [None]:
national_covid.head()

In [None]:
national_covid_mean = national_covid.groupby(['month']).agg({'positiveIncreaseNorm': 'mean'})
national_covid_mean

In [None]:
main_learning_technologies = engagement.groupby(['primary_function_main','month']).agg({'pct_access_norm': 'mean','engagement_index_norm': 'mean'})

main_learning_technologies.reset_index(inplace=True)

# States as column headers
main_learning_technologies = main_learning_technologies.pivot_table(['pct_access_norm','engagement_index_norm'], ['month'], 'primary_function_main')

# join different levels of columns by '_'
main_learning_technologies.columns = main_learning_technologies.columns.map('_'.join)

# order months
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
main_learning_technologies.reset_index(inplace=True)
main_learning_technologies['month'] = pd.Categorical(main_learning_technologies['month'], categories=months, ordered=True)
main_learning_technologies.sort_values(by='month',inplace=True) 
main_learning_technologies.set_index('month', inplace=True)

main_learning_technologies

In [None]:

#plt.rcParams['figure.dpi'] = 600
fig = plt.figure(figsize=(20, 26), facecolor='#f6f5f5')
gs = fig.add_gridspec(5,1)
gs.update(wspace=0.3, hspace=0.4)

background_color = "#f6f5f5" # same as fig facecolor
sns.set_palette(['#ffd514'])

title = 'How National Covid Increase Rate affected different Online learning technologies?'
plt.suptitle(title, fontsize=20, weight='bold')

# Graph 1 - Pandemic progression
ax1 = fig.add_subplot(gs[0, 0])

sns.lineplot(data=national_covid_mean.loc[:,'positiveIncreaseNorm'], 
             markers=True, lw=2, ax=ax1, color="red", marker='o', zorder=2, label = 'Covid increase rate',legend=0)

ax1.set_xlabel('')
ax1.set_ylabel('positiveIncreaseRate', color='gray', size=14)
plt.title('Pandemic Progression', fontsize=16, weight='bold', color="gray")

ax1.tick_params(labelsize=14, width=0.5, length=1.5)
ax2.tick_params(labelsize=14, width=0.5, length=1.5)

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
ax1.set_xticklabels(months, rotation=90)

# Graph 2 - CM
ax1 = fig.add_subplot(gs[1, 0])

ax2 = ax1.twinx()
sns.lineplot(data=main_learning_technologies.loc[:,'pct_access_norm_CM'], 
             markers=True, lw=2, ax=ax1, color='y', marker='o', zorder=2, label = 'pct_access',legend=0)
sns.lineplot(data=main_learning_technologies.loc[:,'engagement_index_norm_CM'], 
             markers=True, lw=2, ax=ax2, color="green", marker='o', zorder=2, label = 'engagement_index',legend=0)



ax1.set_xlabel('')
ax1.set_ylabel('pct_access', color='gray', size=14)
ax2.set_ylabel('engagement_index', color="gray", size=14)
plt.title('CM', fontsize=16, weight='bold', color="gray")

ax1.tick_params(labelsize=14, width=0.5, length=1.5)
ax2.tick_params(labelsize=14, width=0.5, length=1.5)

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
ax1.set_xticklabels(months, rotation=90)

# Graph 3 - LC
ax1 = fig.add_subplot(gs[2, 0])

ax2 = ax1.twinx()
sns.lineplot(data=main_learning_technologies.loc[:,'pct_access_norm_LC'], 
             markers=True, lw=2, ax=ax1, color='y', marker='o', zorder=2, label = 'pct_access',legend=0)
sns.lineplot(data=main_learning_technologies.loc[:,'engagement_index_norm_LC'], 
             markers=True, lw=2, ax=ax2, color="green", marker='o', zorder=2, label = 'engagement_index',legend=0)



ax1.set_xlabel('')
ax1.set_ylabel('pct_access', color='gray', size=14)
ax2.set_ylabel('engagement_index', color="gray", size=14)
plt.title('LC', fontsize=16, weight='bold', color="gray")

ax1.tick_params(labelsize=14, width=0.5, length=1.5)
ax2.tick_params(labelsize=14, width=0.5, length=1.5)

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
ax1.set_xticklabels(months, rotation=90)

# Graph 4 - SDO
ax1 = fig.add_subplot(gs[3, 0])

ax2 = ax1.twinx()
sns.lineplot(data=main_learning_technologies.loc[:,'pct_access_norm_SDO'], 
             markers=True, lw=2, ax=ax1, color='y', marker='o', zorder=2, label = 'pct_access',legend=0)
sns.lineplot(data=main_learning_technologies.loc[:,'engagement_index_norm_SDO'], 
             markers=True, lw=2, ax=ax2, color="green", marker='o', zorder=2, label = 'engagement_index',legend=0)



ax1.set_xlabel('')
ax1.set_ylabel('pct_access', color='gray', size=14)
ax2.set_ylabel('engagement_index', color="gray", size=14)
plt.title('SDO', fontsize=16, weight='bold', color="gray")

ax1.tick_params(labelsize=14, width=0.5, length=1.5)
ax2.tick_params(labelsize=14, width=0.5, length=1.5)

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
ax1.set_xticklabels(months, rotation=90)
        

# Graph 5 - LC/CM/SDO
ax1 = fig.add_subplot(gs[4, 0])

ax2 = ax1.twinx()
sns.lineplot(data=main_learning_technologies.loc[:,'pct_access_norm_LC/CM/SDO'], 
             markers=True, lw=2, ax=ax1, color='y', marker='o', zorder=2, label = 'pct_access',legend=0)
sns.lineplot(data=main_learning_technologies.loc[:,'engagement_index_norm_LC/CM/SDO'], 
             markers=True, lw=2, ax=ax2, color="green", marker='o', zorder=2, label = 'engagement_index',legend=0)



ax1.set_xlabel('')
ax1.set_ylabel('pct_access', color='gray', size=14)
ax2.set_ylabel('engagement_index', color="gray", size=14)
plt.title('LC/CM/SDO', fontsize=16, weight='bold', color="gray")

ax1.tick_params(labelsize=14, width=0.5, length=1.5)
ax2.tick_params(labelsize=14, width=0.5, length=1.5)

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
ax1.set_xticklabels(months, rotation=90)
    
# common legends for figure
handles, labels = [(a + b) for a, b in zip(ax1.get_legend_handles_labels(), ax2.get_legend_handles_labels())]
fig.legend(handles, labels, loc='upper right', fontsize=16)

### How some of the top subclasses of primary functions of technologies got affected with pandemic progression? 

This is being analyzed below.

In [None]:
# Select and filter for only the 'top_n' 'Primary Essential Function' categories
top_n = 7
top_learning_funct = list(engagement['Primary Essential Function'].value_counts().head(top_n).index)

boolean_series = engagement['Primary Essential Function'].isin(top_learning_funct)
engagement_filt = engagement[boolean_series]


sub_learning_technologies = engagement_filt.groupby(['Primary Essential Function','month']).agg({'pct_access_norm': 'mean','engagement_index_norm': 'mean'})

sub_learning_technologies

sub_learning_technologies.reset_index(inplace=True)

# States as column headers
sub_learning_technologies = sub_learning_technologies.pivot_table(['pct_access_norm','engagement_index_norm'], ['month'], 'Primary Essential Function')

# join different levels of columns by '_'
sub_learning_technologies.columns = sub_learning_technologies.columns.map('_'.join)

# order months
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
sub_learning_technologies.reset_index(inplace=True)
sub_learning_technologies['month'] = pd.Categorical(sub_learning_technologies['month'], categories=months, ordered=True)
sub_learning_technologies.sort_values(by='month',inplace=True) 
sub_learning_technologies.set_index('month', inplace=True)

sub_learning_technologies

In [None]:

#plt.rcParams['figure.dpi'] = 600
fig = plt.figure(figsize=(20, 26), facecolor='#f6f5f5')
gs = fig.add_gridspec(7,2)
gs.update(wspace=0.3, hspace=0.4)

background_color = "#f6f5f5" # same as fig facecolor
sns.set_palette(['#ffd514'])

title = 'National Covid Increase Rate vs Top Primary Essential Function categories?'
plt.suptitle(title, fontsize=20, weight='bold')



def plot_pct_and_index(sub_cat, row_no):
    # Graph 1 - LC - Content Creation & Curation Vs pct_access
    ax1 = fig.add_subplot(gs[row_no, 0])

    ax2 = ax1.twinx()
    sns.lineplot(data=sub_learning_technologies.loc[:,'pct_access_norm_'+sub_cat], 
                 markers=True, lw=2, ax=ax1, color='y', marker='o', zorder=2, label = 'pct_access',legend=0)
    sns.lineplot(data=national_covid_mean.loc[:,'positiveIncreaseNorm'], 
                 markers=True, lw=2, ax=ax2, color="red", marker='o', zorder=2, label = 'Covid increase rate',legend=0)

    ax1.set_xlabel('')
    ax1.set_ylabel('pct_access', color='gray', size=14)
    ax2.set_ylabel('positiveIncreaseRate', color="gray", size=14)
    plt.title(sub_cat + ' vs pct_access', fontsize=16, weight='bold', color="gray")

    ax1.tick_params(labelsize=14, width=0.5, length=1.5)
    ax2.tick_params(labelsize=14, width=0.5, length=1.5)

    months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    ax1.set_xticklabels(months, rotation=90)

    # Graph 2 - LC - Content Creation & Curation Vs  engagement_index
    ax1 = fig.add_subplot(gs[row_no, 1])

    ax2 = ax1.twinx()
    sns.lineplot(data=sub_learning_technologies.loc[:,'engagement_index_norm_'+sub_cat], 
                 markers=True, lw=2, ax=ax1, color="green", marker='o', zorder=2, label = 'engagement_index',legend=0)
    sns.lineplot(data=national_covid_mean.loc[:,'positiveIncreaseNorm'], 
                 markers=True, lw=2, ax=ax2, color="red", marker='o', zorder=2, label = 'Covid increase rate',legend=0)



    ax1.set_xlabel('')
    ax1.set_ylabel('engagement_index', color='gray', size=14)
    ax2.set_ylabel('positiveIncreaseRate', color="gray", size=14)
    plt.title(sub_cat + ' vs engagement_index', fontsize=16, weight='bold', color="gray")

    ax1.tick_params(labelsize=14, width=0.5, length=1.5)
    ax2.tick_params(labelsize=14, width=0.5, length=1.5)

    months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    ax1.set_xticklabels(months, rotation=90)
    
# Category: LC - Content Creation & Curation
sub_cat = 'LC - Content Creation & Curation'
plot_pct_and_index(sub_cat, 0)

# Category: LC - Courseware & Textbooks
sub_cat = 'LC - Courseware & Textbooks'
plot_pct_and_index(sub_cat, 1)

# Category: LC - Digital Learning Platforms
sub_cat = 'LC - Digital Learning Platforms'
plot_pct_and_index(sub_cat, 2)

# Category: LC - Sites, Resources & Reference
sub_cat = 'LC - Sites, Resources & Reference'
plot_pct_and_index(sub_cat, 3)

# Category: LC - Sites, Resources & Reference - Games & Simulations
sub_cat = 'LC - Sites, Resources & Reference - Games & Simulations'
plot_pct_and_index(sub_cat, 4)

# Category: LC - Study Tools
sub_cat = 'LC - Study Tools'
plot_pct_and_index(sub_cat, 5)

# Category: LC/CM/SDO - Other
sub_cat = 'LC/CM/SDO - Other'
plot_pct_and_index(sub_cat, 6)

# common legends for figure
handles, labels = [(a + b) for a, b in zip(ax1.get_legend_handles_labels(), ax2.get_legend_handles_labels())]
fig.legend(handles, labels, loc='upper right', fontsize=16)

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:80%;
           font-family:Verdana;
           letter-spacing:0.5px">
<h1 style="text-align: center;
           padding: 20px;
              color:purple">
<a id="Q4"> </a>
Q4. How does student engagement with online learning platforms relate to different geography? Demographic context (e.g., race/ethnicity, ESL, learning disability)? Learning context? Socioeconomic status?
</h1>
</div>

* How the  monthly mean of `pct_access` and `engagement_index`  changes over time for disctricts based on their `'locale', pct_black/hispanic`, `pct_free/reduced` , `county_connections_ratio` and `pp_total_raw`?

* How does student engagement varies in different Locale?

* How does student engagement varies in districs of different Black/Hispanic population percent?

* How does student engagement varies in districs of different Economic status (`pct_free/reduced`) population percent?

### How does student engagement varies in different Locale?

In [None]:
locale = engagement.groupby(['locale','month']).agg({'pct_access_norm': 'mean','engagement_index_norm': 'mean'})

locale.reset_index(inplace=True)

# States as column headers
locale = locale.pivot_table(['pct_access_norm','engagement_index_norm'], ['month'], 'locale')

# join different levels of columns by '_'
locale.columns = locale.columns.map('_'.join)

# order months
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
locale.reset_index(inplace=True)
locale['month'] = pd.Categorical(locale['month'], categories=months, ordered=True)
locale.sort_values(by='month',inplace=True) 
locale.set_index('month', inplace=True)

locale

In [None]:

#plt.rcParams['figure.dpi'] = 600
fig = plt.figure(figsize=(20, 26), facecolor='#f6f5f5')
gs = fig.add_gridspec(2,1)
gs.update(wspace=0.3, hspace=0.4)

background_color = "#f6f5f5" # same as fig facecolor
sns.set_palette(['#ffd514'])

title = 'Online learning rates in different locales'
plt.suptitle(title, fontsize=20, weight='bold')


# Graph 1 - pct_access
ax1 = fig.add_subplot(gs[0, 0])

locale_pct = locale[locale.columns[pd.Series(locale.columns).str.startswith('pct_access_norm_')]]

ax2 = ax1.twinx()

sns.lineplot(data=locale_pct, 
             markers=True, lw=2, ax=ax1, marker='o', zorder=2, legend=1)

sns.lineplot(data=national_covid_mean.loc[:,'positiveIncreaseNorm'], 
             markers=True, lw=2, ax=ax2, color="red", marker='o', zorder=2, label = 'Covid increase rate',legend=1)

ax1.set_xlabel('')
ax1.set_ylabel('pct_access', color='gray', size=14)
ax2.set_ylabel('positiveIncreaseRate', color="gray", size=14)
plt.title('pct_access rates in different locales', fontsize=16, weight='bold', color="gray")

ax1.tick_params(labelsize=14, width=0.5, length=1.5)
ax2.tick_params(labelsize=14, width=0.5, length=1.5)

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
ax1.set_xticklabels(months, rotation=90)

# Graph 2 - engagement_index
ax1 = fig.add_subplot(gs[1, 0])

locale_index = locale[locale.columns[pd.Series(locale.columns).str.startswith('engagement_index_norm_')]]

ax2 = ax1.twinx()

sns.lineplot(data=locale_index, 
             markers=True, lw=2, linestyle='-', ax=ax1, marker='o', zorder=2, legend=1)

sns.lineplot(data=national_covid_mean.loc[:,'positiveIncreaseNorm'], 
             markers=True, lw=2, ax=ax2, color="red", marker='o', zorder=2, label = 'Covid increase rate',legend=1)

ax1.set_xlabel('')
ax1.set_ylabel('engagement_index', color='gray', size=14)
ax2.set_ylabel('positiveIncreaseRate', color="gray", size=14)
plt.title('engagement_index rates in different locales', fontsize=16, weight='bold', color="gray")

ax1.tick_params(labelsize=14, width=0.5, length=1.5)
ax2.tick_params(labelsize=14, width=0.5, length=1.5)

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
ax1.set_xticklabels(months, rotation=90);
    
# # common legends for figure
# handles, labels = [(a + b) for a, b in zip(ax1.get_legend_handles_labels(), ax2.get_legend_handles_labels())]
# fig.legend(handles, labels, loc='upper right', fontsize=16)

### How does student engagement varies in districs of different Black/Hispanic population percent?

In [None]:

race = engagement.groupby(['pct_black/hispanic','month']).agg({'pct_access_norm': 'mean','engagement_index_norm': 'mean'})

race.reset_index(inplace=True)

# States as column headers
race = race.pivot_table(['pct_access_norm','engagement_index_norm'], ['month'], 'pct_black/hispanic')

# join different levels of columns by '_'
race.columns = race.columns.map('_'.join)

# order months
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
race.reset_index(inplace=True)
race['month'] = pd.Categorical(race['month'], categories=months, ordered=True)
race.sort_values(by='month',inplace=True) 
race.set_index('month', inplace=True)

race

In [None]:

#plt.rcParams['figure.dpi'] = 600
fig = plt.figure(figsize=(20, 26), facecolor='#f6f5f5')
gs = fig.add_gridspec(2,1)
gs.update(wspace=0.3, hspace=0.4)

background_color = "#f6f5f5" # same as fig facecolor
sns.set_palette(['#ffd514'])

title = 'Online learning rates in different race population distributions'
plt.suptitle(title, fontsize=20, weight='bold')


# Graph 1 - pct_access
ax1 = fig.add_subplot(gs[0, 0])

race_pct = race[race.columns[pd.Series(race.columns).str.startswith('pct_access_norm_')]]

ax2 = ax1.twinx()

sns.lineplot(data=race_pct, 
             markers=True, lw=2, ax=ax1, marker='o', zorder=2, legend=1)

sns.lineplot(data=national_covid_mean.loc[:,'positiveIncreaseNorm'], 
             markers=True, lw=2, ax=ax2, color="red", marker='o', zorder=2, label = 'Covid increase rate',legend=1)

ax1.set_xlabel('')
ax1.set_ylabel('pct_access', color='gray', size=14)
ax2.set_ylabel('positiveIncreaseRate', color="gray", size=14)
plt.title('pct_access rates in different race distribution areas', fontsize=16, weight='bold', color="gray")

ax1.tick_params(labelsize=14, width=0.5, length=1.5)
ax2.tick_params(labelsize=14, width=0.5, length=1.5)

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
ax1.set_xticklabels(months, rotation=90)

# Graph 2 - engagement_index
ax1 = fig.add_subplot(gs[1, 0])

race_index = race[race.columns[pd.Series(race.columns).str.startswith('engagement_index_norm_')]]

ax2 = ax1.twinx()

sns.lineplot(data=race_index, 
             markers=True, lw=2, linestyle='-', ax=ax1, marker='o', zorder=2, legend=1)

sns.lineplot(data=national_covid_mean.loc[:,'positiveIncreaseNorm'], 
             markers=True, lw=2, ax=ax2, color="red", marker='o', zorder=2, label = 'Covid increase rate',legend=1)

ax1.set_xlabel('')
ax1.set_ylabel('engagement_index', color='gray', size=14)
ax2.set_ylabel('positiveIncreaseRate', color="gray", size=14)
plt.title('engagement_index rates in different race distribution areas', fontsize=16, weight='bold', color="gray")

ax1.tick_params(labelsize=14, width=0.5, length=1.5)
ax2.tick_params(labelsize=14, width=0.5, length=1.5)

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
ax1.set_xticklabels(months, rotation=90);
    
# # common legends for figure
# handles, labels = [(a + b) for a, b in zip(ax1.get_legend_handles_labels(), ax2.get_legend_handles_labels())]
# fig.legend(handles, labels, loc='upper right', fontsize=16)

### How does student engagement varies in districs of different Economic status (`pct_free/reduced`) population percent?

In [None]:

economy = engagement.groupby(['pct_free/reduced','month']).agg({'pct_access_norm': 'mean','engagement_index_norm': 'mean'})

economy.reset_index(inplace=True)

# States as column headers
economy = economy.pivot_table(['pct_access_norm','engagement_index_norm'], ['month'], 'pct_free/reduced')

# join different levels of columns by '_'
economy.columns = economy.columns.map('_'.join)

# order months
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
economy.reset_index(inplace=True)
economy['month'] = pd.Categorical(economy['month'], categories=months, ordered=True)
economy.sort_values(by='month',inplace=True) 
economy.set_index('month', inplace=True)

economy

In [None]:

#plt.rcParams['figure.dpi'] = 600
fig = plt.figure(figsize=(20, 26), facecolor='#f6f5f5')
gs = fig.add_gridspec(2,1)
gs.update(wspace=0.3, hspace=0.4)

background_color = "#f6f5f5" # same as fig facecolor
sns.set_palette(['#ffd514'])

title = 'Online learning rates in different Economic population distributions'
plt.suptitle(title, fontsize=20, weight='bold')


# Graph 1 - pct_access
ax1 = fig.add_subplot(gs[0, 0])

economy_pct = economy[economy.columns[pd.Series(economy.columns).str.startswith('pct_access_norm_')]]

ax2 = ax1.twinx()

sns.lineplot(data=economy_pct, 
             markers=True, lw=2, ax=ax1, marker='o', zorder=2, legend=1)

sns.lineplot(data=national_covid_mean.loc[:,'positiveIncreaseNorm'], 
             markers=True, lw=2, ax=ax2, color="red", marker='o', zorder=2, label = 'Covid increase rate',legend=1)

ax1.set_xlabel('')
ax1.set_ylabel('pct_access', color='gray', size=14)
ax2.set_ylabel('positiveIncreaseRate', color="gray", size=14)
plt.title('pct_access rates in different Economic distribution areas', fontsize=16, weight='bold', color="gray")

ax1.tick_params(labelsize=14, width=0.5, length=1.5)
ax2.tick_params(labelsize=14, width=0.5, length=1.5)

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
ax1.set_xticklabels(months, rotation=90)

# Graph 2 - engagement_index
ax1 = fig.add_subplot(gs[1, 0])

economy_index = economy[economy.columns[pd.Series(economy.columns).str.startswith('engagement_index_norm_')]]

ax2 = ax1.twinx()

sns.lineplot(data=economy_index, 
             markers=True, lw=2, linestyle='-', ax=ax1, marker='o', zorder=2, legend=1)

sns.lineplot(data=national_covid_mean.loc[:,'positiveIncreaseNorm'], 
             markers=True, lw=2, ax=ax2, color="red", marker='o', zorder=2, label = 'Covid increase rate',legend=1)

ax1.set_xlabel('')
ax1.set_ylabel('engagement_index', color='gray', size=14)
ax2.set_ylabel('positiveIncreaseRate', color="gray", size=14)
plt.title('engagement_index rates in different Economic distribution areas', fontsize=16, weight='bold', color="gray")

ax1.tick_params(labelsize=14, width=0.5, length=1.5)
ax2.tick_params(labelsize=14, width=0.5, length=1.5)

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
ax1.set_xticklabels(months, rotation=90);
    
# # common legends for figure
# handles, labels = [(a + b) for a, b in zip(ax1.get_legend_handles_labels(), ax2.get_legend_handles_labels())]
# fig.legend(handles, labels, loc='upper right', fontsize=16)

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:white;
           font-size:80%;
           font-family:Verdana;
           letter-spacing:0.5px">
<h1 style="text-align: center;
           padding: 20px;
              color:purple">
<a id="Q5"> </a>
Q5. Do certain state interventions, practices or policies (e.g., stimulus, reopening, eviction moratorium) correlate with the increase or decrease online engagement?
</h1>
</div>

In [None]:
#pct_access_timeseries = engagement.groupby(['state']).agg({'pct_access': 'mean'})

# States as column headers
pct_access_timeseries = engagement.pivot_table(['pct_access_norm','engagement_index_norm'], ['time'], 'state')

# join different levels of columns by '_'
pct_access_timeseries.columns = pct_access_timeseries.columns.map('_'.join)

pct_access_timeseries

In [None]:

#plt.rcParams['figure.dpi'] = 600
fig = plt.figure(figsize=(20, 60), facecolor='#f6f5f5')
gs = fig.add_gridspec(len(statesList),1)
gs.update(wspace=0.3, hspace=0.4)

background_color = "#f6f5f5" # same as fig facecolor
sns.set_palette(['#ffd514'])

# title = 'How National Covid Increase Rate affected different Online learning technologies?'
# plt.suptitle(title, fontsize=20, weight='bold')

for k in range(len(statesList)):
    
    ax1 = fig.add_subplot(gs[k, 0])

    try:
        dates_to_plot_1 = pd.to_datetime(state_policy["STEMERG"].loc[state_policy['STATE']==statesList[k]].values[0])
        dates_to_plot_2 = pd.to_datetime(state_policy["STEMERGEND"].loc[state_policy['STATE']==statesList[k]].values[0])
    except:
        a = state_policy["STEMERG"].loc[state_policy['STATE']==statesList[k]]
        b = state_policy["STEMERGEND"].loc[state_policy['STATE']==statesList[k]]
        #print(a, b)
    

    #print(statesList[k], dates_to_plot_1, dates_to_plot_2)
    
    
    # label_list = [
    #     (dates_to_plot_1, 'STEMERG', 'r'),
    #     (dates_to_plot_2, "STEMERGEND", 'b')
    # ]

    ax2 = ax1.twinx()

    sns.lineplot(data=pct_access_timeseries.loc[:,'pct_access_norm_'+statesList[k]], 
                 markers=True, lw=2, ax=ax1, color="red", zorder=2, label = 'Covid increase rate',legend=0)
    sns.lineplot(data=pct_access_timeseries.loc[:,'engagement_index_norm_'+statesList[k]], 
                 markers=True, lw=2, ax=ax2, color="blue", zorder=2, label = 'Covid increase rate',legend=0)
    
    if not pd.isnull(dates_to_plot_1) and not pd.isnull(dates_to_plot_2) :
        if dates_to_plot_2 > pd.Timestamp(2020, 12, 31):
            dates_to_plot_2 = pd.Timestamp(2020, 12, 31)
            
        ax1.axvspan(dates_to_plot_1, dates_to_plot_2, 
                   label="Stay at home",color="green", alpha=0.5)
    elif not pd.isnull(dates_to_plot_1) and pd.isnull(dates_to_plot_2):
        ax1.axvline(dates_to_plot_1, 
                   label="Stay at home started",color="green", alpha=0.5)
    

    
    ax1.set_xlabel('')
    ax1.set_ylabel('pct_access', color='gray', size=14)
    ax2.set_ylabel('engagement_index', color='gray', size=14)
    plt.title(statesList[k], fontsize=16, weight='bold', color="gray")

    ax1.tick_params(labelsize=14, width=0.5, length=1.5)
    ax2.tick_params(labelsize=14, width=0.5, length=1.5)

In [None]:

#plt.rcParams['figure.dpi'] = 600
fig = plt.figure(figsize=(20, 60), facecolor='#f6f5f5')
gs = fig.add_gridspec(len(statesList),1)
gs.update(wspace=0.3, hspace=0.4)

background_color = "#f6f5f5" # same as fig facecolor
sns.set_palette(['#ffd514'])

# title = 'How National Covid Increase Rate affected different Online learning technologies?'
# plt.suptitle(title, fontsize=20, weight='bold')

for k in range(len(statesList)):
    
    ax1 = fig.add_subplot(gs[k, 0])

    try:
        dates_to_plot_1 = pd.to_datetime(state_policy["STAYHOME"].loc[state_policy['STATE']==statesList[k]].values[0])
        dates_to_plot_2 = pd.to_datetime(state_policy["END_STHM"].loc[state_policy['STATE']==statesList[k]].values[0])
    except:
        a = state_policy["STAYHOME"].loc[state_policy['STATE']==statesList[k]]
        b = state_policy["END_STHM"].loc[state_policy['STATE']==statesList[k]]
        #print(a, b)
    

    #print(statesList[k], dates_to_plot_1, dates_to_plot_2)
    
    
    # label_list = [
    #     (dates_to_plot_1, 'STEMERG', 'r'),
    #     (dates_to_plot_2, "STEMERGEND", 'b')
    # ]

    ax2 = ax1.twinx()

    sns.lineplot(data=pct_access_timeseries.loc[:,'pct_access_norm_'+statesList[k]], 
                 markers=True, lw=2, ax=ax1, color="red", zorder=2, label = 'Covid increase rate',legend=0)
    sns.lineplot(data=pct_access_timeseries.loc[:,'engagement_index_norm_'+statesList[k]], 
                 markers=True, lw=2, ax=ax2, color="blue", zorder=2, label = 'Covid increase rate',legend=0)
    
    if not pd.isnull(dates_to_plot_1) and not pd.isnull(dates_to_plot_2) :
        if dates_to_plot_2 > pd.Timestamp(2020, 12, 31):
            dates_to_plot_2 = pd.Timestamp(2020, 12, 31)
            
        ax1.axvspan(dates_to_plot_1, dates_to_plot_2, 
                   label="Stay at home",color="yellow", alpha=0.5)
    elif not pd.isnull(dates_to_plot_1) and pd.isnull(dates_to_plot_2):
        ax1.axvline(dates_to_plot_1, 
                   label="Stay at home started",color="yellow", alpha=0.5)
    

    
    ax1.set_xlabel('')
    ax1.set_ylabel('pct_access', color='gray', size=14)
    ax2.set_ylabel('engagement_index', color='gray', size=14)
    plt.title(statesList[k], fontsize=16, weight='bold', color="gray")

    ax1.tick_params(labelsize=14, width=0.5, length=1.5)
    ax2.tick_params(labelsize=14, width=0.5, length=1.5)