# LearnPlatform COVID-19 Impact on Digital Learning
Use digital learning data to analyze the impact of COVID-19 on student learning

* Challenge
We challenge the Kaggle community to explore (1) the state of digital learning in 2020 and (2) how the engagement of digital learning relates to factors such as district demographics, broadband access, and state/national level policies and events.

* We encourage you to guide the analysis with questions that are related to the themes that are described above (in bold font). Below are some examples of questions that relate to our problem statement:
    * What is the picture of digital connectivity and engagement in 2020?
    * What is the effect of the COVID-19 pandemic on online and distance learning, and how might this also evolve in the future?
    * How does student engagement with different types of education technology change over the course of the pandemic?
    * How does student engagement with online learning platforms relate to different geography? Demographic context (e.g., race/ethnicity, ESL, learning disability)? Learning context? Socioeconomic status?
    * Do certain state interventions, practices or policies (e.g., stimulus, reopening, eviction moratorium) correlate with the increase or decrease online engagement?

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# data visualization and plotting
import matplotlib.pyplot as plt 
import seaborn as sns

# Products
products = pd.read_csv("/kaggle/input/learnplatform-covid19-impact-on-digital-learning/products_info.csv")

# Districts
districts = pd.read_csv("/kaggle/input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv")

# Engagement
engagement=pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/engagement_data/1000.csv")


In [None]:
districts.head()

> #  A. CLEANING AND PREPROCESSING 

##      A.1. Dropping Rows in Districts with NaN States

In [None]:
districts = districts[districts.state.notna()].reset_index(drop=True)
districts

## A.2.  Splitting up the Primary Essential Function In Terms Of Main And Sub Functions

In [None]:
products['primary_function_main'] = products['Primary Essential Function'].apply(lambda x: x.split(' - ')[0] if x == x else x)
products['primary_function_sub'] = products['Primary Essential Function'].apply(lambda x: x.split(' - ')[1] if x == x else x)
products

## A.3. Merging engagement data from all districts

In [None]:
file = '../input/learnplatform-covid19-impact-on-digital-learning/engagement_data'

temp = []

for district in districts.district_id.unique():
    df = pd.read_csv(f'{file}/{district}.csv', index_col=None, header=0)
    df["district_id"] = district
    temp.append(df)
    
    
engagement = pd.concat(temp)
engagement = engagement.reset_index(drop=True)
engagement

> # EDA(Exploratory Data Analysis)

In [None]:
fig, ax = plt.subplots(2, 2, figsize=(16,8))

sns.countplot(data=districts, x='pct_black/hispanic', order=['[0, 0.2[', '[0.2, 0.4[', '[0.4, 0.6[', '[0.6, 0.8[','[0.8, 1[', ], palette='BuGn', ax=ax[0,0])
ax[0,0].set_ylim([0,135])

sns.countplot(data=districts, x='pct_free/reduced', order=['[0, 0.2[', '[0.2, 0.4[', '[0.4, 0.6[', '[0.6, 0.8[','[0.8, 1[', ],palette='BuPu', ax=ax[0,1])
ax[0,1].set_ylim([0,60])

sns.countplot(data=districts, x='county_connections_ratio', palette='autumn', ax=ax[1,0])
ax[1,0].set_ylim([0,135])

sns.countplot(data=districts, x='pp_total_raw', order=['[4000, 6000[', '[6000, 8000[', '[8000, 10000[', '[10000, 12000[','[12000, 14000[', '[14000, 16000[', '[16000, 18000[', '[18000, 20000[', '[20000, 22000[', '[22000, 24000[', ], 
palette='Pastel1', ax=ax[1,1])
ax[1,1].set_ylim([0,60])
ax[1,1].set_xticklabels(ax[1,1].get_xticklabels(), rotation=90)

plt.tight_layout()
plt.show()

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(20,6))
sns.countplot(data=products, x='primary_function_main', palette ='gist_earth', ax=ax[0])
ax[0].set_title('Main Categorical Count in Primary Functions')

sns.countplot(data=products[products.primary_function_main == 'LC'], x='primary_function_sub', palette ='gist_earth_r', ax=ax[1])
ax[1].set_title('Sub-Categories Count in Primary Function LC')
ax[1].set_xticklabels(ax[1].get_xticklabels(), rotation=90)
plt.show()

## pct_access: % of students in the district have at least one page-load event of a given product and on a given day
## engagement_index: Total page-load events per 1000 students of a given product and on a given day

In [None]:
virtual_classroom_lp_id = products[products.primary_function_sub == 'Virtual Classroom']['LP ID'].unique()

# Remove weekends from the dataframe
engagement['weekday'] = pd.DatetimeIndex(engagement['time']).weekday
engagement_without_weekends = engagement[engagement.weekday < 5]

# Figure 1
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(24, 6))
for virtual_classroom_product in virtual_classroom_lp_id:
    temp = engagement_without_weekends[engagement_without_weekends.lp_id == virtual_classroom_product].groupby('time').pct_access.mean().to_frame().reset_index(drop=False)
    sns.lineplot(x=temp.time, y=temp.pct_access, label=products[products['LP ID'] == virtual_classroom_product]['Product Name'].values[0])
plt.legend()
plt.show()

# Figure 2
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(24, 6))
for virtual_classroom_product in virtual_classroom_lp_id:
    temp = engagement_without_weekends[engagement_without_weekends.lp_id == virtual_classroom_product].groupby('time').engagement_index.mean().to_frame().reset_index(drop=False)
    sns.lineplot(x=temp.time, y=temp.engagement_index, label=products[products['LP ID'] == virtual_classroom_product]['Product Name'].values[0])
plt.legend()
plt.show()

 # Insights we came to know from above analysis:

* the home schooling phase starts at the beginning of March
* July and August there are summer holidays and therefore no classes to attend
* after the summer holidays the pct_access increases to a higher level as observed at the beginning of the pandemic and it stays somewhat constant
* there are a few drops in pct_access visible throughout the year due to national or other holidays
* Zoom and Meet are the two most popular products for virtual classrooms

# Below we can see the top 5 most accessed products for each LC sub-category sorted by the mean pct_access for 2020 over all districts. 

In [None]:
products['lp_id'] = products['LP ID'].copy()

f, ax = plt.subplots(nrows=3, ncols=3, figsize=(18, 8))

i = 0
j = 0
for subfunction in products[products.primary_function_main == 'LC'].primary_function_sub.unique():
    print(subfunction)
    lp_ids = products[products.primary_function_sub == subfunction]['LP ID'].unique()

    temp = engagement_without_weekends[engagement_without_weekends.lp_id.isin(lp_ids)]
    temp = temp.groupby('lp_id').pct_access.mean().sort_values(ascending=False).to_frame().reset_index(drop=False)
    temp = temp.merge(products[['lp_id', 'Product Name']], on='lp_id').head()
    
    sns.barplot(data=temp, x='pct_access', y='Product Name', palette='spring', ax=ax[i, j])
    
    ax[i, j].set_title(f'Top 5 in \n{subfunction}', fontsize=12)
    ax[i, j].set_xlim([0, 20])
    j = j + 1
    if j == 3:
        i = i + 1
        j = 0
        
f.delaxes(ax[2, 1])
f.delaxes(ax[2, 2])

plt.tight_layout()
plt.show()

# Summary

* Depending on what you want to achieve you might want to carefully preselect districts. Note that we approach in this notebook might not necessarily suit your individual purposes.
* When looking at digital learning, you might want to spend sometime in figuring out which districts actually applied digital during covid.
* how digital learning has came to be very useful alternative for knowledge gaining during covid and how it can be improved during such difficult times.