It seems, COVID-19 has changed our world forever. In 2020 it caused substantial growth of digital learning all over the world as well. Has digital learning changed forever also?

In this analysis we'll look deeper at the impact of COVID-19 on digital learning area. 

# IMPORTING LIBRARIES

In [None]:
import numpy as np 
import pandas as pd 
import math
import glob
import os
import matplotlib.pyplot as plt
import seaborn as sns 
from wordcloud import WordCloud

In [None]:
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)

# DATA FILES

The product file includes information about the characteristics of the top 372 products with most users in 2020.

In [None]:
products = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/products_info.csv")
products.head()

The district file includes information about the characteristics of 233 school districts.

In [None]:
districts = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv")
districts.head()

 The engagement data have been aggregated at school district level, and each file represents data from one school district. 

In [None]:
path = '../input/learnplatform-covid19-impact-on-digital-learning/engagement_data' 
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    district_id = filename.split("/")[4].split(".")[0]
    df["district_id"] = district_id
    li.append(df)
    
engagement = pd.concat(li)
engagement = engagement.reset_index(drop=True)
engagement.head()

The files can be joined by lp_id and district_id.

The information is collection for the dates starting from 1 January 2020 and ending 31 December 2020 so we can analyze the full year.

# MISSING VALUES

There are obviously a lot of NaNs in districts which should be removed.

In [None]:
districts = districts[districts.state.notna()].reset_index(drop=True)
districts.head()

# EDA

Now it's time to look at some plots. First let's look at districts info.

In [None]:
plt.figure(figsize=(12,10))
sns.countplot(districts.locale)
ticks = plt.xticks(rotation=90)

In [None]:
plt.figure(figsize=(12,10))
sns.countplot(districts.state)
ticks = plt.xticks(rotation=90)

Now let's look at products info, names and essential functions.

In [None]:
plt.figure(figsize=(12, 10))
sns.countplot(y='Provider/Company Name', data=products, order=products["Provider/Company Name"].value_counts().index[:10])
plt.title("Top 10 Provider/Company Names",font="Serif", size=20)
plt.show()

In [None]:
plt.figure(figsize=(12, 10))
sns.countplot(y='Primary Essential Function', data=products, order=products["Primary Essential Function"].value_counts().index)
plt.title("Primary Essential Function",font="Serif", size=20)
plt.show()

In [None]:
c1=c2=c3=0
for s in products["Sector(s)"]:
    if(not pd.isnull(s)):
        s = s.split(";")
        for i in range(len(s)):
            sub = s[i].strip()
            if(sub == 'PreK-12'): c1+=1
            if(sub == 'Higher Ed'): c2+=1
            if(sub == 'Corporate'): c3+=1

fig, ax  = plt.subplots(figsize=(16, 8))
fig.suptitle('Sector Distribution', size = 20, font="Serif")
explode = (0.01, 0.01, 0.01)
labels = ['PreK-12','Higher Ed','Corporate']
sizes = [c1,c2, c3]
ax.pie(sizes, explode=explode,startangle=60, labels=labels,autopct='%1.2f%%', pctdistance=0.7)
ax.add_artist(plt.Circle((0,0),0.4,fc='white'))
plt.show()

In [None]:
cloud = WordCloud(width=1440, height=1080).generate(" ".join(products['Product Name'].astype(str)))
plt.figure(figsize=(15, 10))
plt.imshow(cloud)
plt.axis('off')

Please upvote if you like this notebook. To be continued.. 