# LearnPlatform COVID-19 Impact on Digital Learning

Nelson Mandela believed education was the most powerful weapon to change the world. But not every student has equal opportunities to learn. Effective policies and plans need to be enacted in order to make education more equitable—and perhaps your innovative data analysis will help reveal the solution.

Current research shows educational outcomes are far from equitable. The imbalance was exacerbated by the COVID-19 pandemic. There's an urgent need to better understand and measure the scope and impact of the pandemic on these inequities.

Education technology company LearnPlatform was founded in 2014 with a mission to expand equitable access to education technology for all students and teachers. LearnPlatform’s comprehensive edtech effectiveness system is used by districts and states to continuously improve the safety, equity, and effectiveness of their educational technology. LearnPlatform does so by generating an evidence basis for what’s working and enacting it to benefit students, teachers, and budgets.

In this analytics competition, you’ll work to uncover trends in digital learning. Accomplish this with data analysis about how engagement with digital learning relates to factors like district demographics, broadband access, and state/national level policies and events. Then, submit a Kaggle Notebook to propose your best solution to these educational inequities.

Your submissions will inform policies and practices that close the digital divide. With a better understanding of digital learning trends, you may help reverse the long-term learning loss among America’s most vulnerable, making education more equitable.

## Problem Statement

The COVID-19 Pandemic has disrupted learning for more than 56 million students in the United States. In the Spring of 2020, most states and local governments across the U.S. closed educational institutions to stop the spread of the virus. In response, schools and teachers have attempted to reach students remotely through distance learning tools and digital platforms. Until today, concerns of the exacaberting digital divide and long-term learning loss among America’s most vulnerable learners continue to grow.

## Challenge

We challenge the Kaggle community to explore (1) the state of digital learning in 2020 and (2) how the engagement of digital learning relates to factors such as district demographics, broadband access, and state/national level policies and events.

We encourage you to guide the analysis with questions that are related to the themes that are described above (in bold font). Below are some examples of questions that relate to our problem statement:

What is the picture of digital connectivity and engagement in 2020?
What is the effect of the COVID-19 pandemic on online and distance learning, and how might this also evolve in the future?
How does student engagement with different types of education technology change over the course of the pandemic?
How does student engagement with online learning platforms relate to different geography? Demographic context (e.g., race/ethnicity, ESL, learning disability)? Learning context? Socioeconomic status?
Do certain state interventions, practices or policies (e.g., stimulus, reopening, eviction moratorium) correlate with the increase or decrease online engagement?

## Data Fetching

In [None]:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
# print(dirname,filenames)
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        pass
#         print(os.path.join(dirname, filename))

## Importing Libraries

In [None]:
import glob
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objs as go
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
import wandb
import math

import folium
from geopy.geocoders import Nominatim
from folium import Choropleth, Circle, Marker
from folium.plugins import HeatMap, MarkerCluster

In [None]:
#Dont Run!
df = pd.DataFrame()  

import random
random.seed(0)
for file in random.sample(filenames, 20):
    if df.empty:
        df = pd.read_csv(os.path.join(dirname, file))
    else:
        d = pd.read_csv(os.path.join(dirname, file))
        df = pd.concat([df, d])
df

## Reading Data

In [None]:
path = '../input/learnplatform-covid19-impact-on-digital-learning/engagement_data' 
files = glob.glob(path + "/*.csv")

csv_list = []

for filename in files:
    df = pd.read_csv(filename, index_col=None, header=0)
    district_id = filename.split("/")[4].split(".")[0]
    df["district_id"] = district_id
    csv_list.append(df)
    
engagement_data = pd.concat(csv_list)
engagement_data = engagement_data.reset_index(drop=True)
engagement_data.head()

In [None]:
engagement_data = engagement_data.rename(columns = {'lp_id':'LP ID'})

In [None]:
district_data = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv")
district_data.head()

In [None]:
district_data.info()

In [None]:
district_data = district_data[district_data.isnull().sum(axis=1)<5]
district_data

In [None]:
district_data.isnull().sum()

In [None]:
district_data.dtypes

In [None]:
product_data = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/products_info.csv")
product_data.head()

In [None]:
product_data.info()

In [None]:
product_data.isnull().sum()/len(product_data.index)

In [None]:
df_merge = pd.merge(engagement_data, product_data, how='inner', on='LP ID')
df_merge['district_id'] = df_merge['district_id'].astype('int')
df_master = pd.merge(df_merge, district_data, how='inner', on='district_id')
df_master.head()

In [None]:

round(df_master.isnull().sum()/len(df_master.index),2)

In [None]:
df_master.dropna(axis=0,inplace=True)

In [None]:
# Taking copy of cleaned data
df_master_clean = df_master[:]

## Data formating

In [None]:
print('before:' ,df_master_clean['pct_black/hispanic'].unique())
def clean_x(x):
    x1= x.split(',')[1][:4].strip()
    if x1 == '1[':
        x1 = x1[:1]
    return float(x1)
    
df_master_clean['pct_black/hispanic'] = df_master_clean['pct_black/hispanic'].apply(clean_x)
print('after:' ,df_master_clean['pct_black/hispanic'].unique())

print('before:' ,df_master_clean['pct_free/reduced'].unique())
df_master_clean['pct_free/reduced'] = df_master_clean['pct_free/reduced'].apply(clean_x)
print('after:' ,df_master_clean['pct_free/reduced'].unique())

In [None]:
df_master_clean.drop(['county_connections_ratio','URL'],axis=1,inplace=True)

In [None]:
print('before:' ,df_master_clean['pp_total_raw'].unique())
df_master_clean['pp_total_raw'] = df_master_clean['pp_total_raw'].apply(lambda x: int(x[1:-1].split(',')[1].strip()))
print('after:' ,df_master_clean['pp_total_raw'].unique())

In [None]:
df_master_clean['pp_total_raw'] = df_master_clean['pp_total_raw'].astype('int')

In [None]:
df_master_clean.head()

In [None]:
round(df_master_clean.isnull().sum()/len(df_master_clean.index),2)

In [None]:
df_master_clean['time'] = pd.to_datetime(df_master_clean['time'], format='%Y-%m-%d')
df_master_clean['month'] = df_master_clean['time'].apply(lambda x: x.month)
df_master_clean['month'] = df_master_clean['month'].astype('category')
df_master_clean['month'] = df_master_clean['month'].cat.rename_categories(['Jan','Feb','Mar','Apr','May',
                                                'Jun','Jul','Aug','Sep','Oct','Nov','Dec'])

## Data Analysing

In [None]:
df_master_clean.loc[:, ['LP ID', 'Product Name','district_id','Provider/Company Name']].value_counts()

## Demographic vs Engagement Analysis

In [None]:
state_df = df_master_clean['state'].value_counts()
state_df = state_df.to_frame()
state_df.reset_index(level=0,inplace=True)
plt.figure(figsize=(16,5))
ax = sns.barplot(y='state', x='index', data=state_df)
plt.xlabel('State')
plt.ylabel('Count')
plt.title('State Analysis')
ax.grid(True)


plt.show()
state_df.T

In [None]:
locale_df = df_master_clean['locale'].value_counts()
locale_df = locale_df.to_frame()
locale_df.reset_index(level=0,inplace=True)
plt.figure(figsize=(10,8))
ax = sns.barplot(y='locale', x='index', data=locale_df)
plt.xlabel('Locale')
plt.ylabel('Count')
plt.title('Locale Analysis')
ax.grid(True)
plt.show()
locale_df

In [None]:
d1 = df_master_clean.pivot_table(values='engagement_index',
                                            index='state',
                                            columns='locale',
                                            aggfunc='sum')

ax = d1.plot.bar(figsize=(20, 6),logy=True).grid(True)
plt.ylabel('Count')
plt.xlabel('State and  Locale')
plt.title("Engagement anlysis based on State with locale")
plt.show()
d1.T

In [None]:
d0 = df_master_clean.pivot_table(values='engagement_index',
                                            index='state',
                                            aggfunc='sum')

ax = d0.plot.bar(figsize=(20, 6),logy=True, color='g').grid(True)
plt.xlabel('State')
plt.ylabel('Count')
plt.title('State Analysis vs engagement Index')
plt.show()
d0.T

### Observation:

* Students of Utah , Illinois have highly enganged in digiital learning
* Students of suburb have highly enganged in digiital learning

## Month vs Engagement Analysis

In [None]:
m1 = df_master_clean.pivot_table(values='engagement_index',
                                            index='month',
                                            aggfunc='sum')
ax = m1.plot.bar(figsize=(20, 6),logy=True, color='r').grid(True)
plt.xlabel('Months')
plt.ylabel('Count')
plt.title("Engagement vs Month analysis")
plt.show()
# m1.T

m2 = df_master_clean.pivot_table(values='pct_access',
                                            index='month',
                                            aggfunc='sum')
ax = m2.plot.bar(figsize=(20, 6),logy=True, color='g').grid(True)
plt.xlabel('Months')
plt.ylabel('Count')
plt.title("Percentage of students accessed product vs Month analysis")
plt.show()
# m1.T

## Product vs Engagement Analysis

In [None]:
d2 = df_master_clean.pivot_table(values='engagement_index',
                                            index='Product Name',
                                            aggfunc='sum')
d2_sorted = d2.sort_values(by='engagement_index',ascending=False).head(20)

ax = d2_sorted.plot.bar(figsize=(20, 6),logy=True).grid(True)
plt.xlabel('Products')
plt.ylabel('Count')
plt.title("Product vs Engagement analysis")
plt.show()
d2_sorted.T

## Provider Analysis

In [None]:
comp_name_df = df_master_clean['Provider/Company Name'].value_counts()
comp_name_df = comp_name_df.to_frame()
comp_name_df.reset_index(level=0,inplace=True)
comp_name_df = comp_name_df.head(20)

plt.figure(figsize=(10,12))
ax = sns.barplot(x='Provider/Company Name', y='index', data=comp_name_df)
plt.xlabel('Count')
plt.ylabel('Companies')
plt.title('Top Companies vs Engangement analysis')
ax.grid(True)
plt.show()
# sector_df

### Sector Analysis

In [None]:
sector_df = df_master_clean['Sector(s)'].value_counts()
sector_df = sector_df.to_frame()
sector_df.reset_index(level=0,inplace=True)
plt.figure(figsize=(12,5))
ax = sns.barplot(y='Sector(s)', x='index', data=sector_df)
plt.ylabel('Count')
plt.xlabel('Sectors')
plt.title('Sectors Analysis')
ax.grid(True)
plt.show()
sector_df.T


## Primary Essential Function Analysis

In [None]:
pef_df = df_master_clean['Primary Essential Function'].value_counts()
pef_df = pef_df.to_frame()
pef_df.reset_index(level=0,inplace=True)
# pef_df = pef_df.head(100)
plt.figure(figsize=(15,12))
ax = sns.barplot(x='Primary Essential Function', y='index', data=pef_df)
plt.xlabel('Count')
plt.ylabel('Primary Essential Function')
plt.title('Primary Essential Function Analysis')
plt.xticks(rotation=90)
# plt.rc('font', size=20)  
ax.grid(True)
plt.show()
# pef_df

In [None]:
df_master_clean['Primary Essential Function categories'] = df_master_clean['Primary Essential Function'].apply(lambda x: x.split('-')[0])
df_master_clean['Primary Essential Function categories'].value_counts()

In [None]:
pef_df1 = df_master_clean['Primary Essential Function categories'].value_counts()
pef_df1 = pef_df1.to_frame()
pef_df1.reset_index(level=0,inplace=True)
# pef_df1 = pef_df.head(10)
plt.figure(figsize=(10,6))
ax = sns.barplot(y='Primary Essential Function categories', x='index', data=pef_df1)
plt.ylabel('Count')
plt.xlabel('Primary Essential Function')
plt.title('Primary Essential Function Analysis')
plt.xticks(rotation=90)
# plt.rc('font', size=20)  
ax.grid(True)
plt.show()

Highest engagement in Learning and Curriculum(LC)

In [None]:
type_df = df_master_clean['pct_black/hispanic'].value_counts()
type_df = type_df.to_frame()
type_df.reset_index(level=0,inplace=True)
# pef_df1 = pef_df.head(10)
plt.figure(figsize=(10,6))
ax = sns.barplot(y='pct_black/hispanic', x='index', data=type_df)
plt.ylabel('Count')
plt.xlabel('pct_black/hispanic')
plt.title('pct_black/hispanic Analysis')
plt.xticks(rotation=90) 
ax.grid(True)
plt.show()

### Work in progress..

In [None]:
type_df1 = df_master_clean.pivot_table(values='engagement_index',
                                            index='pct_black/hispanic',
                                            aggfunc='sum')

type_df1.reset_index(level=0,inplace=True)

plt.figure(figsize=(10,6))
ax = sns.barplot(y='engagement_index', x='pct_black/hispanic', data=type_df1)
plt.ylabel('Count')
plt.xlabel('pct_black/hispanic')
plt.title('pct_black/hispanic Analysis')
plt.xticks(rotation=90) 
ax.grid(True)
plt.show()