In [None]:
import numpy as np
import pandas as pd 

import os

for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# There are 3 different ways to look at this
* #### Datewise usage analysis (analysing how various time periods affected education)
* #### Districtwise product analysis (how do the demographics of the district affect the amounts and types of products used); we can also do districtwise-time analysis (I have not been able to do this, but I did get the data arranged in a format that could be a helpful starting point for such an analysis)
* #### Productwise analysis (like how does a particular product/sector work); we can also see what product was popular at what time


## Datewise Analysis

Make a dictionary with the ID of the district as a key and its details as the value

In [None]:
districts_dict = {}

for dirname, _, filenames in os.walk('/kaggle/input/learnplatform-covid19-impact-on-digital-learning/engagement_data'):
    for filename in filenames:
        districts_dict[int(filename[:-4])] = pd.read_csv(os.path.join(dirname, filename))

Get the mean of the usage of all products for each date of a district, like:\
On 01/01/2021, product A had engagement_index = 10, product B had engagement_index = 20\
On 02/01/2021, product A had engagement_index = 5, product B had engagement_index = 15\
So, finally we have 01/01/2021: 15 and 02/01/02021:10

Also, I colored the weekends separately



In [None]:
index = 0
import matplotlib.pyplot as plt

for district, file in districts_dict.items():
    #print(index)
    if index == 0:
        datewise_engagement_main = pd.DataFrame((file.groupby("time").mean()["engagement_index"]))
        index += 1
    else:
        datewise_engagement_main["engagement_index"] = datewise_engagement_main["engagement_index"].add(file.groupby("time").mean()["engagement_index"], fill_value=0)

days = ["Wednesday", "Thrusday", "Friday", "Saturday", "Sunday", "Monday", "Tuesday"]*52 + ["Wednesday", "Thrusday"]
datewise_engagement_main["days"] = days

plt.figure(figsize=(50, 50))
colors = {"Wednesday":"b", "Thrusday":"b", "Friday":"b", "Saturday":"r", "Sunday":"r", "Monday":"b", "Tuesday":"b"}

plt.bar(datewise_engagement_main.index, datewise_engagement_main["engagement_index"], width=1, color=list(datewise_engagement_main["days"].map(colors)))
plt.xticks(range(len(datewise_engagement_main.index)), datewise_engagement_main.index, rotation='vertical')
plt.show()

#### Holidays
I got the holidays on 2020, and made a bar graph coloring them separately

In [None]:
holidays = ["2020-01-01","2020-01-20","2020-02-14","2020-02-17","2020-03-17","2020-05-25","2020-07-04","2020-09-07","2020-10-12","2020-10-31","2020-11-11","2020-11-26","2020-12-25","2020-12-31"]
datewise_holidays = datewise_engagement_main.copy()
for index in datewise_engagement_main.index:
    if index in holidays:
        datewise_engagement_main.loc[index, "is_holiday"] = True
    else:
        datewise_engagement_main.loc[index, "is_holiday"] = False

datewise_holidays = datewise_engagement_main.copy()
plt.figure(figsize=(50, 50))
colors_holidays = {True: "r", False: "b"}
plt.bar(datewise_holidays.index, datewise_holidays["engagement_index"], width=1, color=list(datewise_holidays["is_holiday"].map(colors_holidays)))
plt.xticks(range(len(datewise_holidays.index)), datewise_holidays.index, rotation='vertical')
plt.show()

Okay, so as of now, the 2-day dip is because of the weekends, and the holidays themselves do have little activity compared to the general activity at the time, but no other impact(which should have been pretty obvious)\
\
Also, I am clearing up the weekends dips, and just have the weekdays to get an idea of the trend

In [None]:
datewise_weekdays = datewise_engagement_main.loc[datewise_engagement_main['days'].isin(["Wednesday", "Thrusday",  "Friday", "Monday", "Tuesday"])]
plt.figure(figsize=(40, 40))
plt.bar(datewise_weekdays.index, datewise_weekdays["engagement_index"], width=0.5)
plt.xticks(range(len(datewise_weekdays.index)), datewise_weekdays.index, rotation='vertical')
plt.show()

In [None]:
datewise_tuesdays = datewise_engagement_main.loc[datewise_engagement_main['days'] == "Tuesday"]
datewise_tuesdays.plot(figsize=(10, 10))
plt.show()

Another interesting piece of info, is that on Mondays and Fridays, there's a drop, and if you remove all the Mondays and Fridays, you get a pretty smooth trend line\

**The main dip is during the summer break, and the rest is more or less consistent, but there is a pretty big jump around Feb-Mar. That was the time most of the schools, daycares and all closed down, so, I guess that's what caused our big jump**

**There appears to be a spike around mid-September and lasting for a few more weeks before dropping back to fairly regulat. And although I am not American, it appears that they start their school year somewhere around mid-September or end-September. So, it would appear that the jump is caused by enthusiasm during the beginning of the school year, which waned off in the next few weeks, and this would explain the return to normal that is seen by those who weren't just a part of the over-eager students in the beginning, but were consistently studying** (PS: This is just a guess I have as a student, because I have seen this spike and dip happen)


## Productwise analysis

In [None]:
products_info = pd.read_csv("/kaggle/input/learnplatform-covid19-impact-on-digital-learning/products_info.csv")
products_info.columns = ["lp_id", "URL", "Product Name", "Provider/Company Name", "Sector(s)", "Primary Essential Function"]
products_info.head()

### Sectorwise, as in PreK-12, Higher-Ed, Corporate
For those with multiple categories, I created a separate entry for each sector \
Meaning, if a product had Sector(s) as PreK-12, Higher-Ed \
In the new dataframe, it has two entries, with all the details, one with sector as PreK-12, one with Higher-Ed

In [None]:
sectorwise_products_info = pd.DataFrame()
for _, row in products_info.iterrows():
    if type(row["Sector(s)"]) == str and ";" in row["Sector(s)"]:
        sectors = row["Sector(s)"].split(";")
        for sector in sectors:
            new_row = row.drop("Sector(s)")
            new_row["Sector"] = sector.strip()
            #print(new_row)
            sectorwise_products_info = sectorwise_products_info.append(new_row)

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
sectorwise_products_info['Sector'].value_counts().plot(ax=ax, kind='bar')

### Functionwise
First, I separate the LC, CM and SDO out as Function Categories \
Then, I repeated the process for the sector(s) with the sub-categories 

In essence\
If a product had Primary Essential Function as LC - Sites, Resources & Reference, Games & Simulations \
In the new dataframe, it has two entries, with all the details, and an added column of Primary Essential Function Category which shows LC for both and in those 2 entries, one will have Primary Essential Function as Sites, Resources & Reference], one will have Games & Simulations

In [None]:
functionwise_products_info = pd.DataFrame()
for _, row in products_info.iterrows():
    if type(row["Primary Essential Function"]) == str:
        pef = row["Primary Essential Function"].split("-")
        pef_cat, pef = pef[0], pef[1:]
        new_row = row.drop("Primary Essential Function")
        new_row["Primary Essential Function Category"] = pef_cat.strip()
        if len(pef) > 1:
            for function in pef:
                new_row["Primary Essential Function"] = function.strip()
                #print(new_row)
                functionwise_products_info = functionwise_products_info.append(new_row)
        else:
            new_row["Primary Essential Function"] = pef[0].strip()
            functionwise_products_info.append(new_row)

In [None]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
functionwise_products_info['Primary Essential Function Category'].value_counts().plot(ax=ax, kind='bar')

In [None]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
functionwise_products_info['Primary Essential Function'].value_counts().plot(ax=ax, kind='bar')

#### Datewise

In [None]:
fwise_for_date = pd.DataFrame()
for _, row in products_info.iterrows():
    if type(row["Primary Essential Function"]) == str:
        pef = row["Primary Essential Function"].split("-")[0]
        new_row = row[["lp_id"]]
        new_row["Primary Essential Function Category"] = pef.strip()
        fwise_for_date = fwise_for_date.append(new_row)
        
fwise_for_date

In [None]:
count = 0
for dirname, _, filenames in os.walk('/kaggle/input/learnplatform-covid19-impact-on-digital-learning/engagement_data'):
    for filename in filenames:
        district_data = pd.read_csv(os.path.join(dirname, filename))[["lp_id", "time", "engagement_index"]]
        district_data = pd.merge(fwise_for_date, district_data, on="lp_id", how='left')
        print(count)
        grouped_ei = district_data.groupby(["time", "Primary Essential Function Category"]).mean()["engagement_index"]
        if count == 0:
            final_fwise_date = grouped_ei
        else:
            final_fwise_date = final_fwise_date.add(grouped_ei, fill_value=0)
        count+=1
        
final_fwise_date = pd.DataFrame(final_fwise_date)

In [None]:
final_date = pd.DataFrame()
final_date.index = final_fwise_date.index.get_level_values(0).unique()
for index in final_fwise_date.index:
    final_date.at[index[0], index[1]] = final_fwise_date._get_value(index, "engagement_index")
final_date = final_date.reset_index()

In [None]:
final_date

In [None]:
import matplotlib.pyplot as plt

ax = final_date.plot(figsize=(40,40), kind="line", stacked=True)
#plt.figure(figsize=(40,40))
datewise_engagement_main.plot(ax=ax)
#plt.show()
#fig = ax.get_figure()

To summarise this, for sectorwise and productwise, it is quite clear \
For the functionwise details, when seen with the engagement index as well, each seems to have a fairly similar trajectory, one that matches up quite neatly with the mean_engagement_index (eception: initially, the mean engagement index was below the CM values, but it eventually dropped below, and continued that way)

Also at an given date, *SDO > CM > LC = LC/CM/SDO* 

Also, SDO sees a visibly higher peak during mid- to end-September, which would make sense if schools reopened then and there were the intiial management things to get through before settling back to normal

## Districtwise (only the data)

In [None]:
index = 0
productwise_engagement = pd.DataFrame()
for district, file in districts_dict.items():
    #print(index)
    if index == 0:
        engagement = file.groupby("lp_id").mean()["engagement_index"]
        lp_id = engagement.index
        engagement = engagement.values
        productwise_engagement["lp_id"] = lp_id
        productwise_engagement[f"{district}_engagement"] = engagement
        index += 1
    else:
        engagement = file.groupby("lp_id").mean()["engagement_index"]
        lp_id = engagement.index
        engagement = engagement.values
        file_engagement = pd.DataFrame()
        file_engagement["lp_id"] = lp_id
        file_engagement[f"{district}_engagement"] = engagement
        productwise_engagement = pd.merge(productwise_engagement, file_engagement)
        
productwise_engagement

This would possibly have been the most interesting and insightful type of analysis, but I haven't done it yet :(

# Summary and an attempt to answer the questions posed in the competition (some theories, just theories)
Digital learning was the only option during the pandemic, and I can't speak to how many people adjusted, but it seems that those who did, adjusted well, seeing that the patterns which you would normally expect in school as a graphh of mean enthusiasm/interest in class was fairly similar to that of the mean_engagement_index. 

Although we analysed more products related to LC and CM than we did SDO, it is evident that school planning was the major utiliser of online platforms and resources to continue education, followed by classroom management, and then curriculum. I think this shows that the institutions were the ones who had to take up drastic changes, with rethinking the entire way of managing schedules, classes, clubs etc., then comes the teachers managing their classes, who had to go through monumental changes as well, but the change wasn't as drastic (probably becuase Google Classroom and things like that are already used in countries like America, not sure if this is true, ut I think it is), and finally the thing that changed the least is the curriculum, which is to be expected, as most teachers (atleast in my experience), continued to teach the same curriculum, only in the online mode. Some changes have been incorporated, but the base remained the same and this would explain the comparatively low usage of LC products.

Finally, for the questions of how demographics/state policies affected learning, I couldn't begin to theorise (sorry)