# Covid-19 Impact On Digital Learning

## Introduction

**Engangement Data**

* **lp_id**: The unique identifier of the product that can be used to link to product information in product_info.csv
* **pct_access**: Percentage of students in the district have at least one page-load event of a given product and on a given day
* **engagement_index**: Total page-load events per one thousand students of a given product and on a given day.

**District Information Data**

* **locale**: NCES locale classification that categorizes U.S. territory into four types of areas: City, Suburban, Town, and Rural. 
* **pct_black/hispanic**: Percentage of students in the districts identified as Black or Hispanic based on 2018-19 NCES data
* **pct_free/reduced**: Percentage of students in the districts eligible for free or reduced-price lunch based on 2018-19 NCES data
* **countyconnectionsratio**: ratio (residential fixed high-speed connections over 200 kbps in at least one direction/households) 
* **pptotalraw**: Per-pupil total expenditure (sum of local and federal expenditure)

**Product information data**

* **Primary Essential Function**: The basic function of the product. There are two layers of labels here. Products are first labeled as one of these three categories: LC = Learning & Curriculum, CM = Classroom Management, and SDO = School & District Operations. Each of these categories have multiple sub-categories with which the products were labeled

In [None]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from joblib import Parallel, delayed

In [None]:
# import data
products = pd.read_csv('../input/learnplatform-covid19-impact-on-digital-learning/products_info.csv')
district = pd.read_csv('../input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv')

In [None]:
products.head()

In [None]:
district.head()

In [None]:
# check for null values
print(products.isna().sum()) 
print('-'*30)
print(district.isna().sum())

## Data Visualization

### Products

In [None]:
# Pie chart for sectors
fig = px.pie(products, names='Sector(s)', height=400, width=1000, template='plotly_dark+presentation', title='Distribution of Sector')
fig.show()

In [None]:
df = products['Primary Essential Function'].dropna().reset_index(drop=True)
categories = df.str.split('-')

In [None]:
category = []
subcategory = []
for i in range(len(categories)):
    category.append(categories[i][0])
    
    for cat in categories[i][1:]:
        subcategory.append(cat)

In [None]:
# pie chart for category
fig = px.pie(names=category, height=400, width=1000, template='plotly_dark+presentation', title='Distribution of Category')
fig.show()

In [None]:
# pie chart for subcategory
title = 'Count of Subcategory'
fig = px.histogram(y=subcategory, template='plotly_dark+presentation', title=title, labels={'y': 'Subcategory'}).update_yaxes(categoryorder='total ascending')
fig.show()

In [None]:
# top provider/company name
fig = px.histogram(products, y='Provider/Company Name', height=400 ,template='plotly_dark+presentation', title='Top Company/Provider Name').update_yaxes(categoryorder='max descending', range=(-1, 8))
fig.show()

### District

In [None]:
def plot_sunbrust(df, col, path, title):
    
    df = df.loc[:, col].dropna()
    fig = px.sunburst(df, path=path, template='plotly_dark+presentation', title=title)
    fig.show()

In [None]:
plot_sunbrust(district, ['state', 'locale'], ['locale', 'state'], title='Distribution of states as per locale')

In [None]:
plot_sunbrust(district, ['pct_black/hispanic', 'locale'], ['locale', 'pct_black/hispanic'], title='Percentage of students in the districts as Black or Hispanic as per locale')

In [None]:
plot_sunbrust(district, ['pct_free/reduced', 'locale'], ['locale', 'pct_free/reduced'], 
              title='Percentage of students in the districts eligible for free or reduced-price lunchas per locale')

In [None]:
plot_sunbrust(district, ['pp_total_raw', 'locale',], ['locale', 'pp_total_raw'], title='Distribution of pp_total_raw as per locale')

### Engagement data

In [None]:
def plot_engagement(dist_id):
        
    engage = pd.read_csv(f'../input/learnplatform-covid19-impact-on-digital-learning/engagement_data/{1000}.csv')

    # Fill null values
    engage.fillna(0, inplace=True)

    engage['time'] = pd.to_datetime(engage['time'])
    
    fig = px.histogram(engage, x='time', y='engagement_index', height=400, template='plotly_dark+presentation', histfunc='avg')
    fig.show()
    
    fig = px.histogram(engage, x='time', y='pct_access', height=400, template='plotly_dark+presentation', histfunc='avg')
    fig.show()

In [None]:
plot_engagement(1000)

In [None]:
def merge_engage(dist_id):
    
    df = pd.read_csv(f'../input/learnplatform-covid19-impact-on-digital-learning/engagement_data/{dist_id}.csv')
    
    group = df.groupby(['time']).agg(pct_access=('pct_access', 'mean'),
                                     engagement_index=('engagement_index', 'sum')).reset_index()
    
    group['district_id'] = dist_id
    
    dist_group = district.merge(group, on=['district_id'], how='left')
    
    return dist_group

def get_dataset(dist_ids):
    
    dist_stat = Parallel(n_jobs=-1)(
        delayed(merge_engage)(dist_id) 
        for dist_id in dist_ids
    )
    
    district_df = pd.concat(dist_stat, ignore_index = True)

    return district_df

In [None]:
dist_ids = list(district['district_id'])
district_df = get_dataset(dist_ids)

In [None]:
group_locale = district_df.groupby(['locale', 'state']).agg(pct_access=('pct_access', 'mean'),
                                                            engagement_index=('engagement_index', 'mean')).reset_index()

In [None]:
# state with high engagement index
top = 50
high_engage = group_locale.sort_values(by='engagement_index', ascending=False)[:top]
fig = px.bar(high_engage, y='state', x='engagement_index', height=700, template='plotly_dark+presentation', title='Top State with high engagement index').update_yaxes(categoryorder='total ascending')
fig.show()

In [None]:
# state with high pct_access
top = 50
high_pct_access = group_locale.sort_values(by='pct_access', ascending=False)[:top]
fig = px.bar(high_pct_access, y='state', x='pct_access', height=700, template='plotly_dark+presentation', title='Top State with high pct_access').update_yaxes(categoryorder='total ascending')
fig.show()

In [None]:
# maps for high engagement index
us_state_abbrev = {
    'Alabama': 'AL',
    'Alaska': 'AK',
    'American Samoa': 'AS',
    'Arizona': 'AZ',
    'Arkansas': 'AR',
    'California': 'CA',
    'Colorado': 'CO',
    'Connecticut': 'CT',
    'Delaware': 'DE',
    'District Of Columbia': 'DC',
    'Florida': 'FL',
    'Georgia': 'GA',
    'Guam': 'GU',
    'Hawaii': 'HI',
    'Idaho': 'ID',
    'Illinois': 'IL',
    'Indiana': 'IN',
    'Iowa': 'IA',
    'Kansas': 'KS',
    'Kentucky': 'KY',
    'Louisiana': 'LA',
    'Maine': 'ME',
    'Maryland': 'MD',
    'Massachusetts': 'MA',
    'Michigan': 'MI',
    'Minnesota': 'MN',
    'Mississippi': 'MS',
    'Missouri': 'MO',
    'Montana': 'MT',
    'Nebraska': 'NE',
    'Nevada': 'NV',
    'New Hampshire': 'NH',
    'New Jersey': 'NJ',
    'New Mexico': 'NM',
    'New York': 'NY',
    'North Carolina': 'NC',
    'North Dakota': 'ND',
    'Northern Mariana Islands':'MP',
    'Ohio': 'OH',
    'Oklahoma': 'OK',
    'Oregon': 'OR',
    'Pennsylvania': 'PA',
    'Puerto Rico': 'PR',
    'Rhode Island': 'RI',
    'South Carolina': 'SC',
    'South Dakota': 'SD',
    'Tennessee': 'TN',
    'Texas': 'TX',
    'Utah': 'UT',
    'Vermont': 'VT',
    'Virgin Islands': 'VI',
    'Virginia': 'VA',
    'Washington': 'WA',
    'West Virginia': 'WV',
    'Wisconsin': 'WI',
    'Wyoming': 'WY'
}

state_codes = []
for i in group_locale.state:
    state_codes.append(us_state_abbrev[i])

group_locale['state_codes'] = state_codes

fig = px.choropleth(group_locale, 
                    locations='state_codes', 
                    locationmode = 'USA-states', 
                    color='engagement_index', 
                    scope='usa', 
                    title='States with highest engagement index')
fig.show()

In [None]:
fig = px.choropleth(group_locale, 
                    locations='state_codes', 
                    locationmode = 'USA-states', 
                    color='pct_access', 
                    scope='usa',
                    title='State with highest pct_access')
fig.show()