# Tasks

- Find the state of digital learning in 2020
- How the engagement of digital learning relates to factors such as district demographics, broadband access, and state/national level policies and events.

## Additional Tasks

- What is the effect of the COVID-19 pandemic on online and distance learning
- how might this also evolve in the future?
- How does student engagement with different types of education technology change over the course of the pandemic?

In [None]:
import pandas as  pd
import numpy as np

import os

import matplotlib.pyplot as plt

import seaborn as sns

%matplotlib inline

plt.rc("figure", autolayout=True)

### Engagement data
The engagement data are aggregated at school district level, and each file in the folder `engagement_data` represents data from one school district. The 4-digit file name represents `district_id` which can be used to link to district information in `district_info.csv`. The `lp_id` can be used to link to product information in `product_info.csv`.

| Name | Description |
| :--- | :----------- |
| time | date in "YYYY-MM-DD" |
| lp_id | The unique identifier of the product |
| pct_access | Percentage of students in the district have at least one page-load event of a given product and on a given day |
| engagement_index | Total page-load events per one thousand students of a given product and on a given day |

In [None]:
dataset_path = '../input/learnplatform-covid19-impact-on-digital-learning/'
engagement_path = '../input/learnplatform-covid19-impact-on-digital-learning/engagement_data'
district_path = '../input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv'
products_path = '../input/learnplatform-covid19-impact-on-digital-learning/products_info.csv'

In [None]:
def create_dataset_from_engagement(engagement_path='../input/learnplatform-covid19-impact-on-digital-learning/engagement_data'):
    
    dataset = pd.DataFrame()
    
    for path, dirc, files in os.walk(engagement_path):        
        for file in files:            
            if dataset.empty:
                dataset = pd.read_csv(os.path.join(path,file))
                dataset['dis_id'] = file.split('.')[0]                
            else:
                next_frame  = pd.read_csv(os.path.join(path,file)) 
                next_frame['dis_id'] = file.split('.')[0]
                dataset = pd.concat([dataset[:], next_frame[:]])
    
    return dataset
    

In [None]:
district_df = pd.read_csv(district_path)
products_df = pd.read_csv(products_path)

| Name | Description |
| :--- | :----------- |
| district_id | The unique identifier of the school district |
| state | The state where the district resides in |
| locale | NCES locale classification that categorizes U.S. territory into four types of areas: City, Suburban, Town, and Rural. See [Locale Boundaries User's Manual](https://eric.ed.gov/?id=ED577162) for more information. |
| pct_black/hispanic | Percentage of students in the districts identified as Black or Hispanic based on 2018-19 NCES data |
| pct_free/reduced | Percentage of students in the districts eligible for free or reduced-price lunch based on 2018-19 NCES data |
| county_connections_ratio | `ratio` (residential fixed high-speed connections over 200 kbps in at least one direction/households) based on the county level data from FCC From 477 (December 2018 version). See [FCC data](https://www.fcc.gov/form-477-county-data-internet-access-services) for more information. |
| pp_total_raw | Per-pupil total expenditure (sum of local and federal expenditure) from Edunomics Lab's National Education Resource Database on Schools (NERD$) project. The expenditure data are school-by-school, and we use the median value to represent the expenditure of a given school district. |

In [None]:
district_df.info()

In [None]:
district_df.head()

In [None]:
district_df.isnull().sum()

In [None]:
district_df.dropna(axis=0, subset=['state'], inplace=True)

In [None]:
district_df.isnull().sum()

In [None]:
print("Before removing null values")
district_df.isnull().sum()*100 / district_df.shape[0]
print("After removing null values")
district_df.dropna(subset=["state"], inplace = True)
district_df.isnull().sum()*100 / district_df.shape[0]

In [None]:
def average(s):
    return np.array(str(s).strip('[').split(',')).astype(float).mean()

district_df['average_pct_black/hispanic'] = district_df['pct_black/hispanic'].apply(average)
district_df['average_pct_free/reduced'] = district_df['pct_free/reduced'].apply(average)
district_df['average_county_connections_ratio'] = district_df['county_connections_ratio'].apply(average)
district_df['average_pp_total_raw'] = district_df['pp_total_raw'].apply(average)

In [None]:
district_df.head()

In [None]:
district_info = district_df.groupby(by='state', dropna=True).mean()

In [None]:
states = district_info['average_pct_black/hispanic'].index
population_data = np.array(district_info['average_pct_black/hispanic'])
spending_data  = np.array(district_info['average_pp_total_raw'])

In [None]:
population_data.mean()

In [None]:
pd.DataFrame(spending_data).dropna().mean().values[0]

In [None]:
fig, ax = plt.subplots(nrows=2, ncols=1, figsize=(12,10))
sns.barplot(x=states, y = population_data, ax=ax[0])
ax[0].set_title('Average Black/Hispanic Population by State ')
ax[0].set_xticklabels(labels= states,rotation=45)
ax[0].axhline(population_data.mean(), c="black", linestyle="--")

sns.barplot(x=states, y=spending_data, ax=ax[1])
ax[1].set_title('Average Spending by State')
ax[1].set_xticklabels(labels=states, rotation=45)
ax[1].axhline(pd.DataFrame(spending_data).dropna().mean().values[0], c="black", linestyle = "--")
plt.tight_layout()

# District Type Distribution

In [None]:
pie,ax = plt.subplots(figsize=(10,6))
labels = district_df['locale'].unique()
plt.pie(x=district_df['locale'].value_counts(), autopct="%.1f%%", explode=[0.05]*4, labels=labels, pctdistance=0.5)
plt.title("Locale Distribution", fontsize=14)

## Number of District's by state

In [None]:
x = district_df['state'].unique()
y = district_df['state'].value_counts()

In [None]:
district_df['state'].value_counts().index

In [None]:
fig= plt.figure(figsize=(12,6))

ax = fig.add_axes([0.1,0.1,0.9,0.9])
sns.countplot(district_df['state'],order=district_df['state'].value_counts().index ,ax=ax )
ax.set_title('District count By State')
plt.xlabel('State')
plt.ylabel('District Count')
plt.xticks( rotation=40)
plt.show()


| Name | Description |
| :--- | :----------- |
| LP ID| The unique identifier of the product |
| URL | Web Link to the specific product |
| Product Name | Name of the specific product |
| Provider/Company Name | Name of the product provider |
| Sector(s) | Sector of education where the product is used |
| Primary Essential Function | The basic function of the product. There are two layers of labels here. Products are first labeled as one of these three categories: LC = Learning & Curriculum, CM = Classroom Management, and SDO = School & District Operations. Each of these categories have multiple sub-categories with which the products were labeled |

In [None]:
products_df.head()

In [None]:
products_df['Sector(s)'].unique()

In [None]:
products_df.iloc[0]['Primary Essential Function'].split('-')[0]

In [None]:
def get_category(x):
    category = str(x).split('-')[0]
    
    if category.strip() == 'LC':
        return 'Learning & Curriculum'
    elif category.strip() == 'CM':
        return 'Classroom Management'
    elif category.strip() == 'SDO' :
        return 'School & District Operations'

def get_sub_category(x):
    sub_category = str(x).split('-')
    
    if len(sub_category) >= 2:
        return sub_category[1].strip()

def get_sub_sub_category(x):
    sub_sub_category = str(x).split('-')
    
    if len(sub_sub_category) == 2:
        return sub_sub_category[1].strip()
    
    elif len(sub_sub_category) == 3:
        return sub_sub_category[2].strip()

# products_df['product_category'] = products_df['Primary Essential Function'].apply(get_category)
#products_df['product_sub_category'] = products_df['Primary Essential Function'].apply(get_sub_category)
products_df['product_sub_sub_category'] = products_df['Primary Essential Function'].apply(get_sub_sub_category)

In [None]:
products_df

In [None]:
fig = plt.figure(figsize=(10,5))

ax = fig.add_axes([0.1,0.1,0.9,0.9])
sns.countplot(products_df['product_category'], order=products_df['product_category'].value_counts().index, ax=ax)
plt.title('Product Category Count')
plt.xlabel('Product Category')
plt.ylabel('Count')
plt.show()

In [None]:
products_df.columns

In [None]:
products_df['product_sub_sub_category'].unique()