# Summary


- The COVID-19 pandemic generated a product substitution, all the locales abandoned more than 30 tools in the second part of the year. One possible cause could be operational efficiency. 

- The classes that lost product diversity are:
-- Black/Hispanic(BH)[0,0.2[ Free/Reduced(FR)[0.8,1[
-- BH[0.8,1[ FR[0.6,8[

- The market is segmented by class of expenditure, some tools are predominant only in some classes.

- BH [0.8,1]  deteriorated its condition centering its distribution on FR[0.8,1].

- Almost all classes peaked the number of tools used in March. A possible hypothesis is that these increases represent the effort spent to prepare for the pandemic, testing new tools, preparing for the possibility of a remote teaching scholastic year

- The classroom virtualization reduced the gap between classes in general, BH[0,0.2[ FR[0.8,1[ did not catch up with the others,  some classes worsened i.e. BH[0.4,0.6[ with FR[nan].

- A disaggregated picture of the product diversity per state, black/hispanic and free/reduced classes is produced. The figures highlight a substatial difference of the effects of the pandemy within the minority communities across states. A common and intuitive phenomena between the figures is that deterioration is recorded on FR classes greater than [0,0.2] regardless BH, showing that higher FR has been penalyzed more than other classes during the pandemy. However, in Indiana, Arizona, New York, and California the associated BH was greater than 0.4.



In [None]:
import os
from os import listdir
from os.path import isfile, join
import gc
import random
from collections import defaultdict

import pandas as pd
import numpy as np
import plotly.graph_objects as go
import sklearn.preprocessing as sk_p
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.mixture import GaussianMixture
#!pip install wikipedia
#import wikipedia as wp

#sort out the month number-> month name association
month = {'1':'January',
        '2':'February',
        '3':'March',
        '4':'April',
        '5':'May',
        '6':'June',
        '7':'July',
        '8':'August',
        '9':'September',
        '10':'October',
        '11':'November',
        '12':'December'
        }

#for dirname, _, filenames in os.walk('/kaggle/input'):
#    for filename in filenames:
#        print(os.path.join(dirname, filename))



# 
# Parent categories
# LC = Learning & Curriculum, CM = Classroom Management, and SDO = School & District Operations
# Extract categories
def encapsulate_split_primary_essential_function(products):
    """Transform Primary Essential Function
    input: dataframe products
    output: function to apply on dataframe, set of tokens
    """
    cat=[q.split("/")  for c in products["Primary Essential Function"].unique() for q in c.split(" - ")]

    token_primary_essential_function=set()
    [token_primary_essential_function.update(c) for c in cat]
    #print(token_primary_essential_function)
    def test_a(x,s):
        #lambda x: True if s in q.split("/") else False   for q in c
        
        for first_split in x["Primary Essential Function"].split(" - "):
            if type(first_split)==str:
                if first_split==s:
                    return True
            else:
                for q in first_split:
                    second_split=q.split("/")
                    if type(second_split)==str:
                        if second_split==s:
                            return True
                    else:
                        if s in second_split:
                            return True
                        
            
        return False

    return test_a,token_primary_essential_function


def load_df_month(index):
    """Load Dataframe per month
    """
    df=pd.read_parquet('/kaggle/working/base_month_{}.parquet'.format(index))\
        .fillna(-1)\
        .set_index("time").astype({
            "URL":int,
            "locale":int,
            "state":int,
            "Product Name":int,  
            "Provider/Company Name":int,  
            "Sector(s)":int, 
            "Primary Essential Function":int,
        })
    df["month"]=index
    
    return df


def create_synthetic_product(row):
    """
    define a function who substitute collapse URL product name and so on into a syntetic feature product.
    """
    product=-1 #unknown
    if row["URL"]!=-1:
        return row["URL"]
    if row["Product Name"]!=-1:
        return row["Product Name"]
    if row["Provider/Company Name"]!=-1:
        return row["Provider/Company Name"]
    if row["Primary Essential Function"]!=-1:
        return row["Primary Essential Function"]
    
    return product


The challenge is based on three datasets: products, engagement and districts. 
The products dataset contains one combined feature, "Primary Essential Function", that can be further decomposed in multiple binary features with the following process: tokenize "Primary Essential Function", each token becomes a binary feature which mark if the product has the feature. Decomposing enables the aggregation of products by common features.

The district 


first step is to load the data, remove the nan district_id since I can't match them with the other tables.

The second step is to categorize the data to reduce RAM usage.

Load products, split sector and primary essential function. Check the number of row for each feature.

In [None]:
districts=pd.read_csv("/kaggle/input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv").dropna(subset=["district_id"])
#the ranaming is useful to have them sorted in right way.
districts=districts.replace({
    "[4000, 6000[":"[04000, 6000[",
    "[6000, 8000[":"[06000, 8000[",
    "[8000, 10000[":"[08000, 10000[",
})

districts_columns_to_categorize=[
                      "pct_black/hispanic","pct_free/reduced","county_connections_ratio",
                      "pp_total_raw","state","locale"
]


districts=districts.fillna("nan")
districts_cat=sk_p.OrdinalEncoder()
districts_cat.fit(districts[districts_columns_to_categorize])
districts[districts_columns_to_categorize]=districts_cat.transform(
                                                            districts[districts_columns_to_categorize]
                                                        ).astype(np.int8)


ind=0
print("districts categories")
for c in districts_cat.categories_:
    print("{}:{}".format(ind,c))
    ind=ind+1

In [None]:
products=pd.read_csv("/kaggle/input/learnplatform-covid19-impact-on-digital-learning/products_info.csv").dropna(subset=["LP ID"])
products=products.fillna("nan")
#transform sectors in booleans
products["PreK-12"]=products.apply(lambda x: True if "PreK-12" in x["Sector(s)"].split("; ") else False,axis=1)
products["Higher Ed"]=products.apply(lambda x: True if "Higher Ed" in x["Sector(s)"].split("; ") else False,axis=1)
products["Corporate"]=products.apply(lambda x: True if "Corporate" in x["Sector(s)"].split("; ") else False,axis=1)
products["Unknown-Sector"]=products.apply(lambda x: True if x["Sector(s)"]=="nan" else False,axis=1)
products["Provider/Company Name"]=products.apply(lambda x: x["Provider/Company Name"] if "|" not in x["Provider/Company Name"] else x["Provider/Company Name"].replace("|","<br>"),axis=1)



lambded_split_primary_essential_function,token_primary_essential_function=encapsulate_split_primary_essential_function(products)

for s in token_primary_essential_function:
    products[s]=products.apply(lambded_split_primary_essential_function,args=(s,),axis=1)

products=products.rename({
                    "nan":"unknown-Primary-function"
                    },axis=1).replace({
                            True:1,
                            False:0
                        })
    
products_columns_to_categorize=[  "URL","Primary Essential Function",
                                  "Sector(s)","Provider/Company Name",
                                  "Product Name"]


print("matrix sparsity coefficient:{}\n".format(((products!=0).count().sum()-(products==0).values.sum())/(products!=0).count().sum()))
print("count nan values in the matrix {}\n\n".format(products[products=="nan"].count().sum()))
products_cat=sk_p.OrdinalEncoder()
products_cat.fit(products[products_columns_to_categorize])
products[products_columns_to_categorize]=products_cat.transform(
                                                            products[products_columns_to_categorize]
                                                        ).astype(np.int16)


bic_results=[]
for n_components in range(1,45):
    
    products_clusters=GaussianMixture(n_components).fit(products[products_columns_to_categorize])
    bic=products_clusters.bic(products[products_columns_to_categorize])
    aic=products_clusters.aic(products[products_columns_to_categorize])
    print("mixture {} --- BIC:{} --- AIC:{}\n".format(n_components,bic,aic),
         #   products_clusters.means_
         )
    bic_results.append({
        "bic":bic,
        "aic":aic,
        "model":n_components
    })
    components=products_clusters.predict(products[products_columns_to_categorize])
    #print(components)


print( sorted(bic_results, key=lambda x: float(x["bic"])))

In [None]:
#product.info()
products.sum()

Load the engagement files, the district id is the filename, trim the extension and add the id as column

Join the dataframe products, districs and engagement. 

Save the aligned matrix in parquet, having a month in each file. 

Helpful to handle the data without having to recompute the join.

In [None]:
# load all the engagement files, the district id is the filename, so, trim the extension and add as column


kaggle_directory="/kaggle/input/learnplatform-covid19-impact-on-digital-learning/engagement_data"
onlyfiles = [f for f in listdir(kaggle_directory) if isfile(join(kaggle_directory, f))]

list_of_districts=[]
for i in onlyfiles:
    d1=pd.read_csv(join(kaggle_directory,i))
    d1["district_id"]=int(i.split(".")[0])
    list_of_districts.append(d1)

engagement=pd.concat(list_of_districts,axis=0).dropna(subset=["lp_id"])
engagement["time"]=pd.to_datetime(engagement["time"])
engagement=engagement.set_index("time")
engagement["month"]=engagement.index.month
engagement=engagement.astype({
    "lp_id":int
}).reset_index()

#engagement.info()

In [None]:
for index, block in engagement.groupby("month"):
    if os.path.exists('/kaggle/working/base_month_{}.parquet'.format(index))==False:
        gc.collect()
        temp=block.copy()
        base=pd.merge(temp,products,how="left",left_on="lp_id",right_on="LP ID").drop("LP ID",axis=1)
        base=pd.merge(base,districts,how="left",left_on="district_id",right_on="district_id")
        base["month"]=index
        base.to_parquet('/kaggle/working/base_month_{}.parquet'.format(index))
    

In [None]:
for i in range(len(products_cat.categories_)):
    print(i)
    print(products_cat.categories_[i][:5],products_cat.categories_[i][-5:])


Calculate the correlation matrix including percentage of black/hispanic, pct free/reduced, county_connections_ratio and per pupil total expenditure for the sampled dataset. The mapping for the categories preserves the underlying meaning after removing the 'nan' category: lower category number implies lower percentage and conversely high number implies high percentage.

The dataset shows positive correlation between pct_black/hispanic and pct_free/reduced.



In [None]:
res=[]
for index in range(1,13):
    df=load_df_month(index)
    df=df[["pct_black/hispanic","pct_free/reduced","county_connections_ratio","pp_total_raw"]]
    res.append(df)

result=pd.concat(res,axis=0)
result[(result["pct_black/hispanic"]!=5)&(
        result["pct_free/reduced"]!=5)&(
        result["county_connections_ratio"]!=2)&(
        result["pp_total_raw"]!=11)][["pct_black/hispanic",
                                         "pct_free/reduced",
                                         "county_connections_ratio",
                                          "pp_total_raw"]].corr()

FIG.1: shows the number of unique product names having engagement greater than the average and grouped by locale. The term 'tools' in this context represents product names, and it is used interchangeably.

On the horizontal axis are listed the months, whilst the vertical axis represents the number of different tools. The figure highlights the difference in the number of products used in cities, suburban areas, rural areas and towns. The monthly seasonality reflects the scholastic year. The July low is due to the summer break. 

The number of tools used in rural, suburban areas, and non categorized (nan) is higher than the number of tools used in cities or towns.

Almost all classes peaked the number of tools used in March. A possible hypothesis is that these increases represent the effort spent to prepare for the pandemic, testing new tools, preparing for the possibility of a remote teaching scholastic year. This hypothesis is supported by FIG.1.1, which represents the number of tools used in the first part of the year and abandoned in the second part, and the tools used in both parts. July is the cut-off month between first and second part. 

FIG.1.1 shows that effort was made in all locales. All the locales abandoned a number of tools between 30 and 40, whilst towns abandoned 57 products. One possible cause could be efficiency and product substitution. 





In [None]:



res=[]
for index in range(1,13):
    df=load_df_month(index)
    df=df[["engagement_index","month","Product Name","locale"]]
    res.append(df)

result=pd.concat(res,axis=0)

fig=go.Figure()
for index,g in result[result["engagement_index"]>np.mean(result["engagement_index"].dropna().values)].groupby("locale"):
    target_district=districts[districts["locale"]==index]
    #print(target_district["state"].values[0])
    t=g[["month","Product Name"]].drop_duplicates().groupby("month").agg("count").reset_index()
    fig.add_trace(go.Bar(x=[ month[str(i)] for i in t["month"].values],
                         y=t["Product Name"].values,
                         marker={
                                 "color":index
                             },
                         name=districts_cat.categories_[-1][target_district["locale"].values[0]]
                         ),
                  
                 )
fig.update_layout(#showlegend=False,
                  title="FIG.1: diversity of tools with high engagement per month",
                  yaxis_title="number of tools",)
fig.show()

In [None]:
fig=go.Figure()
tars2={"locale":[],"tools abandoned":[],"tools stable":[]}
for index,g in result[result["engagement_index"]>np.mean(result["engagement_index"].dropna().values)].groupby("locale"):
    target_district=districts[districts["locale"]==index]
    
    #remove the common products on each month
    t=g[["month","Product Name"]].drop_duplicates().reset_index().drop("time",axis=1)#.groupby("month").agg("count").reset_index()
    # check products that have been abandoned during the year
    abandoned=0
    stable=0
    for i,r in t.groupby("Product Name"):
        if len(r[r["month"]<7])>0 and len(r[r["month"]>7])==0:
            abandoned=abandoned+1
        if len(r[r["month"]<7])>0 and len(r[r["month"]>7])>0:
            stable=stable+1
            
    tars2["locale"].append(districts_cat.categories_[-1][target_district["locale"].values[0]])
    tars2["tools abandoned"].append(abandoned)
    tars2["tools stable"].append(stable)
    

fig.add_trace(go.Bar(x=tars2["locale"], y=tars2["tools abandoned"], name="tools abandoned"))
fig.add_trace(go.Bar(x=tars2["locale"], y=tars2["tools stable"], name="tools stable"))
fig.update_layout(#showlegend=False,
                  title="FIG.1.1: Tools stable and abandoned per locale",
                  yaxis_title="number of tools",)
fig.show()

In [None]:
#black array(['[0, 0.2[', '[0.2, 0.4[', '[0.4, 0.6[', '[0.6, 0.8[', '[0.8, 1[','nan'], dtype=object)
#free array(['[0, 0.2[', '[0.2, 0.4[', '[0.4, 0.6[', '[0.6, 0.8[', '[0.8, 1[','nan'], dtype=object)
#connection array(['[0.18, 1[', '[1, 2[', 'nan']
#total array(['[04000, 6000[', '[06000, 8000[', '[08000, 10000[',
#       '[10000, 12000[', '[12000, 14000[', '[14000, 16000[',
#       '[16000, 18000[', '[18000, 20000[', '[20000, 22000[',
#       '[22000, 24000[', '[32000, 34000[', 'nan'], dtype=object)
#locale array(['City', 'Rural', 'Suburb', 'Town', 'nan'], dtype=object)
#state array(['Arizona', 'California', 'Connecticut', 'District Of Columbia',
#       'Florida', 'Illinois', 'Indiana', 'Massachusetts', 'Michigan',
#       'Minnesota', 'Missouri', 'New Hampshire', 'New Jersey', 'New York',
#       'North Carolina', 'North Dakota', 'Ohio', 'Tennessee', 'Texas',
#       'Utah', 'Virginia', 'Washington', 'Wisconsin', 'nan'], dtype=object)

res_df_1=[]
res_df_2=[]
res_df_3=[]
res_df_4=[]

for index in range(1,13):
    df=load_df_month(index)
    val1=df[["pct_black/hispanic","URL"]].groupby(["pct_black/hispanic"]).agg("count").reset_index()
    val2=df[["pct_free/reduced","URL"]].groupby(["pct_free/reduced"]).agg("count").reset_index()
    val3=df[["county_connections_ratio","URL"]].groupby(["county_connections_ratio"]).agg("count").reset_index()
    val4=df[["pp_total_raw","URL"]].groupby(["pp_total_raw"]).agg("count").reset_index()
    
    
    val1["month"]=month[str(index)]
    val2["month"]=month[str(index)]
    val3["month"]=month[str(index)]
    val4["month"]=month[str(index)]
    #pct_free/reduced
    #county_connections_ratio
    #pp_total_raw
    # x axis is the black
    # y is minority
    res_df_1.append(val1)
    res_df_2.append(val2)
    res_df_3.append(val3)
    res_df_4.append(val4)

res1=pd.concat(res_df_1,axis=0).set_index("pct_black/hispanic")
res2=pd.concat(res_df_2,axis=0).set_index("pct_free/reduced")
res3=pd.concat(res_df_3,axis=0).set_index("county_connections_ratio")
res4=pd.concat(res_df_4,axis=0).set_index("pp_total_raw")
#i would like to see how the differential between months changed per category
#print(res1)
#print(res2)
#print(res3)
#print(res4)
resa=res1.reset_index()#.set_index(["month","pct_black/hispanic"])
resb=res2.reset_index()#.set_index(["month","pct_free/reduced"])
resc=res3.reset_index()#.set_index(["month","county_connections_ratio"])
resd=res4.reset_index()#.set_index(["month","pp_total_raw"])
c_res=["pct_black/hispanic","pct_free/reduced","county_connections_ratio","pp_total_raw"]
c_names={"pct_black/hispanic": "Percentage of black/hispanic",
         "pct_free/reduced": "Percentage of free/reduced meals",
         "county_connections_ratio": "County connections ratio",
         "pp_total_raw": "Per pupil total expenditure"}
res=[resa,resb,resc,resd]

FIG.4, FIG.5, FIG.6, FIG.7 show the montly samples available in each category of percentage of black/hispanic (BH), percentage of free/reduced (FR), county connections ratio (CC), and per pupil total expendidure (PP). 





In [None]:


for r in range(len(res)):
    fig=go.Figure()
    for i in res[r][c_res[r]].unique():
        fig.add_trace(go.Scatter(x=res[r][res[r][c_res[r]]==i]["month"].values,
                                 y=res[r][res[r][c_res[r]]==i]["URL"].values,name="{}".format(
                                     districts_cat.categories_[r][i]
                                     )))

    fig.update_layout(title="FIG.{}: {}".format(4+r, c_names[c_res[r]]),
                      yaxis_title="number of samples",)
    
    fig.show()

FIG.8 summarizes the pandemy evolution by number of samples. The lines in each bar represents the ending of the class (or category). The sum for each class in each bar is the sum of the samples for the month. The number of samples is compatible with the pandemy evolution and the schoolastic year. 

To be noted how the online usage increased significantly in the second part of the year (starting the new schoolastic year). This result is compatible with the pandemy adaptation.

In [None]:
fig=go.Figure()
fig.add_trace(
    go.Bar(
        x=res1["month"].values,
        y=res1["URL"].values,
        hovertext=["{}".format(districts_cat.categories_[0][i%len(districts_cat.categories_[0])]) for i in range(len(res1["URL"].values))],
        name = "black/hispanic"
    )
)
fig.add_trace(
    go.Bar(
        x=res2["month"].values,
        y=res2["URL"].values,
        hovertext=["{}".format(districts_cat.categories_[1][i%len(districts_cat.categories_[1])]) for i in range(len(res2["URL"].values))],
        name = "free/reduced"
    )
)
fig.add_trace(
    go.Bar(
        x=res3["month"].values,
        y=res3["URL"].values,
        hovertext=["{}".format(districts_cat.categories_[2][i%len(districts_cat.categories_[2])]) for i in range(len(res3["URL"].values))],
        name = "county_connections_ratio"
    )
)
fig.add_trace(
    go.Bar(
        x=res4["month"].values,
        y=res4["URL"].values,
        hovertext=["{}".format(districts_cat.categories_[3][i%len(districts_cat.categories_[3])]) for i in range(len(res4["URL"].values))],
        name = "pp_total_raw"
    )
)
fig.update_layout(title="FIG.8: Number of samples per feature category",
                      yaxis_title="number of samples per feature category",)
fig.show()

FIG.9 investigates the distribution of points in the classes black/hispanic(BH), free/reduced (FR), and counts the number of points - samples (NS). A syntetic feature is created, BH * 10 + FR, to easily sort along the vertical axis all the available combinations, in other words, the point at (January,11) shows the number of samples with BH\[0.2, 0.4[ and FR[0.2, 0.4[ recorded in January. 


The minimum size of the markers is 3, it increases to represent the distribution of samples across the available FR, the baseline is the sum across FR on the considered BH.

To be noted that nan means not categorized.

If the sampling is reppresentative of the population, then approximatively the relative distribution between categories did not change considerably during the year (however, the absolute values changed). 

Some points that can be draft from this image are:
- BH=0 (\[0, 0.2[) has FR below or equal to [0.4, 0.6[. The numbers in each subsequent FR halves on a very coarse approximation.
- the percentage of black/hispanic is correlated with the percentage of free/reduced. If the dataset is representative of the underlying population, the data shows that communities with higher concetration of BH will also have higher concentration of FR, having the median into the same category of percentage (BH[0.2,0.4[ FR[0.2,0.4[, BH[0.4,0.6[ FR[0.4,0.6[, BH[0.6,0.8[ FR[0.6,0.8[, BH[0.8,1[ FR[0.8,1[).
- From April, the class BH[0.8,1[ registered a worsening in the distribution of FR (centering the distribution on BH[0.8,1[ FR[0.8,1[ from the previous BH[0.8,1[ FR[0.6,0.8[).

In [None]:
res=[]
for index in range(1,13):
    df=load_df_month(index)
    df["black+free"]=df["pct_black/hispanic"]*10+df["pct_free/reduced"]
    df["month"]=month[str(index)]
    df=df.drop(["pct_free/reduced","county_connections_ratio"],axis=1)
    
    df=df[["black+free","month","URL"]].groupby(["month","black+free"]).agg("count").reset_index()
    df["base"]=np.floor(df["black+free"]/10)
    df["color"]=(df["black+free"]-df["base"]*10)
    res.append(df)

result=pd.concat(res,axis=0).astype({
    "color":int,
    "base":int
})

color=["black","red","green","blue","brown","grey"]




result["values"]=1
for b in result["base"].unique():
    for c in result["month"].unique():
        condition=(result["base"]==b)&(result["month"]==c)
        s=result[condition]["URL"].sum()
        result.loc[condition,"values"]=s
        
result["color"]=result["color"].apply(lambda x: color[int(x)])

#print(result)
fig=go.Figure()
fig.add_trace(go.Scatter(
  x=result["month"].values,
  y=result["black+free"].values,
  text=[ "number:{}<br>black/hispanic {}<br>free/reduced {}".format(result["URL"].values[v],
                                                                   districts_cat.categories_[0][int(str(result["black+free"].values[v])[0])],
                                                                   districts_cat.categories_[1][int(str(result["black+free"].values[v])[1])]
        ) if len(str(result["black+free"].values[v]))==2 else "number:{}<br>black/hispanic [0, 0.2[<br>free/reduced {}".format(
                          result["URL"].values[v],
                          districts_cat.categories_[0][int(result["black+free"].values[v])]) for v in range(len(result["URL"].values))  ],
  mode="markers",
  marker={
         "size":result["URL"].values/result["values"].values*10+5,
         "line":dict(
                width=0
                ),
         "color":result["color"].values,
         "opacity":0.5
     },
))
fig.update_layout(title="FIG.9: Point distribution per category of the features black/hispanic and free/reduced",
                      yaxis_title="number of samples per feature category",)
fig.show()


The focus now shifts toward the engagement. 

FIG.10 shows the evolution over time of the diversity of products for each of the combined classes.

In [None]:



res=[]
for index in range(1,13):
    df=load_df_month(index).astype({
            "pct_black/hispanic":str,
            "pct_free/reduced":str
        })
    df["product"]=df.apply(lambda row: create_synthetic_product(row),axis=1)
    df["black+free"]=df["pct_black/hispanic"]+df["pct_free/reduced"]
    df["month"]=month[str(index)]
    #count the number of different products used, no distinction about the engagement 
    target=df[["month","black+free","product"]].drop_duplicates().groupby(["month","black+free"]).agg({
        "product":"count"
    }).reset_index()
    
    res.append(target)
    
    
result=pd.concat(res,axis=0)

#print(result)



fig=go.Figure()
for i in result["black+free"].unique():
    temp=result[result["black+free"]==i]
    fig.add_trace(go.Scatter(
      x=temp["month"].values,
      y=temp["product"].values,
      mode="markers+lines",
      name="BH{}<br>FR{}".format(districts_cat.categories_[0][int(i[0])],districts_cat.categories_[1][int(i[1])]
        ) if len(str(i))==2 else "BH[0, 0.2[<br>FR{}".format(districts_cat.categories_[0][int(i)]),
      text=[ "number:{}<br>black/hispanic {}<br>free/reduced {}".format(temp["product"].values[v],
                                                                   districts_cat.categories_[0][int(i[0])],
                                                                   districts_cat.categories_[1][int(i[1])]
        ) if len(str(i))==2 else "number:{}<br>black/hispanic [0, 0.2[<br>free/reduced {}".format(
                          temp["product"].values[v],
                          districts_cat.categories_[0][int(i)]) for v in range(len(temp["product"].values))  ],
    ))
fig.update_layout(title="FIG.10: Diversity of products per category of the features black/hispanic (BH) and free/reduced (FR)",
                      yaxis_title="number of products",)
fig.show()

FIG.11 restricts the sample to the most engaging platforms. 
Considered the average engagement, the products are filtered having engagement greater than the average.
The vertical axis is counting the montly number of different products with an engagement greater than the average,
each line represents a syntetic combined class, pct_black/hispanic concatenated pct_free/reduced.

The class code is the same, BH is pct of black/hispanic, FR is pct of free/reduced,  nan is not categorized.

The figure shows three groups: 
* a diversity greater than 150: 
  + BH\[0.0,0.2\[ FR\[0.0,0.2\[ 
  + BH\[0.2,0.4\[ FR\[0.0,0.2\[ 
  + BH\[0.2,0.4\[ FR\[0.2,0.4\[ 
  + BH\[0.4,0.6\[ FR\[0.0,0.2\[ 
  + BH\[ nan \[ FR\[0.0,0.2\[ 
  + BH\[ nan \[ FR\[ nan \[ 
- diversity between 60 and 150: 
  + BH\[0.2,0.4\[ FR\[0.4,0.6\[
  + BH\[0.2,0.4\[ FR\[ nan \[ 
  + BH\[0.4,0.6\[ FR\[0.4,0.6\[
  + BH\[0.4,0.6\[ FR\[0.6,0.8\[
  + BH\[0.4,0.6\[ FR\[ nan \[  
  + BH\[0.6,0.8\[ FR\[0.2,0.4\[
  + BH\[0.6,0.8\[ FR\[0.4,0.6\[
  + BH\[0.6,0.8\[ FR\[0.6,0.8\[
  + BH\[0.8,1.0\[ FR\[0.8,1.0\[
  + BH\[0.8,1.0\[ FR\[ nan \[ 
- diversity under 60: 
  + BH\[0.2,0.4\[ FR\[0,0.2\[
  + BH\[0.2,0.4\[ FR\[0.6,0.8\[
  + BH\[0.2,0.4\[ FR\[0.8,1.0\[
  + BH\[0.4,0.6\[ FR\[0.2,0.4\[
  + BH\[0.8,1.0\[ FR\[0.6,0.8\[

If we consider the diversity of products used a proxy to measure the learning/teaching potential, the classes most disadvantaged who also lost diversity are:
- BH[0,0.2[ FR[0.8,1[
- BH[0.8,1[ FR[0.6,8[

In March, almost all classes recorded a peak, which is compatible with experimenting solutions, some classes report a decline in diversity during the year. A possible explanation for the reduction in the diversity during the year is the product substitution. More interconnected functionalities are needed to manage a virtual class, make sense that institutions adopted solutions that were solving more problems and abandoned products specialized in one single functionality. 



In [None]:
res=[]
for index in range(1,13):
    df=load_df_month(index).astype({
            "pct_black/hispanic":str,
            "pct_free/reduced":str
        })
    df["product"]=df.apply(lambda row: create_synthetic_product(row),axis=1)
    df["black+free"]=df["pct_black/hispanic"]+df["pct_free/reduced"]
    df["month"]=month[str(index)]
    #filter the products that have engagement greater than the overall average on the month
    df=df[df["engagement_index"]>np.mean(df["engagement_index"].values)]
    #count the number of different products used, no distinction about the engagement 
    target=df[["month","black+free","product"]].drop_duplicates().groupby(["month","black+free"]).agg({
        "product":"count"
    }).reset_index()
    
    res.append(target)
    
    
result=pd.concat(res,axis=0)

#print(result)



fig=go.Figure()
for i in result["black+free"].unique():
    temp=result[result["black+free"]==i]
    fig.add_trace(go.Scatter(
      x=temp["month"].values,
      y=temp["product"].values,
      mode="markers+lines",
      name="BH{}<br>FR{}".format(districts_cat.categories_[0][int(i[0])],districts_cat.categories_[1][int(i[1])]
        ) if len(str(i))==2 else "BH[0, 0.2[<br>FR{}".format(districts_cat.categories_[0][int(i)]),
      text=[ "number:{}<br>black/hispanic {}<br>free/reduced {}".format(temp["product"].values[v],
                                                                   districts_cat.categories_[0][int(i[0])],
                                                                   districts_cat.categories_[1][int(i[1])]
        ) if len(str(i))==2 else "number:{}<br>black/hispanic [0, 0.2[<br>free/reduced {}".format(
                          temp["product"].values[v],
                          districts_cat.categories_[0][int(i)]) for v in range(len(temp["product"].values))  ],
    ))
fig.update_layout(title="FIG.11: Diversity of products with high engagement per category of the features <br>       black/hispanic(BH) and free/reduced(FR)",
                      yaxis_title="number of products",)
fig.show()

Fig.11.1 shows an heatmap of how the number of products used in each combination of classes changed through the year. Since the color scale is fixed, similar color is similar number: the figure highlights the gap between the least and the most. It is particularly interesting the initial gap between  BH\[0,0.2\[ FR[0.8,1\[ and the rest of the classes (please have a look at February), despite having improved it did not catch up with the others. 

Another insteresting point is that some classes worsened during the year, i.e. BH\[0.4,0.6\[ with FR\[nan\]. 

In [None]:
res=[]
for index in range(1,13):
    df=load_df_month(index).astype({
            "pct_black/hispanic":str,
            "pct_free/reduced":str
        })
    #df.apply(lambda x: print(x),axis=1)
    
    df["black+free"]=df["pct_black/hispanic"]+df["pct_free/reduced"]
    df["month"]=index #month[str(index)]
    #filter the products that have engagement greater than the overall average on the month
    target=df[["month","black+free",
               "Provider/Company Name",
               "Product Name"]].drop_duplicates().groupby(["month",
                                                          "black+free",
                                                          "Provider/Company Name"]).agg({
                                                                        "Product Name":"count"
                                                                        }).reset_index()
    #count the number of different products used, no distinction about the engagement 
    res.append(target)
    
    
result=pd.concat(res,axis=0)

In [None]:


sliders_dict = {
    "active": 0,
    "yanchor": "top",
    "xanchor": "left",
    "currentvalue": {
        "font": {"size": 20},
        #"prefix": "Month:",
        "visible": True,
        "xanchor": "right"
    },
    "transition": {"duration": 300, "easing": "cubic-in-out"},
    "pad": {"b": 10, "t": 50},
    "len": 0.9,
    "x": 0.1,
    "y": 0,
    "steps": []
}

frames=[]


for m in result["month"].unique():
    df_m=result[result["month"]==m].groupby("black+free").agg({
        "Product Name":sum
    }).reset_index()
    
    
    frames.append({"data":go.Heatmap(
                            x=[districts_cat.categories_[0][int(i[0])] for i in df_m["black+free"].values],
                            y=[districts_cat.categories_[1][int(i[1])] for i in df_m["black+free"].values],
                            z=df_m["Product Name"].values, 
                            #text=products_cat.categories_[3][c],
                            zmin=30,
                            zmax=380,
                            opacity=0.3,),
                   "name":month[str(m)]
                  })
    sliders_dict["steps"].append({"args": [
                                                [ month[str(m)]],
                                                {"frame": {"duration": 300, "redraw": True},
                                                 "mode": "immediate",
                                                 "transition": {"duration": 300}}
                                            ],
                                "label": month[str(m)],
                                "method": "animate"})

fig = go.Figure(
    data=[frames[0]["data"]],
    layout=go.Layout(#width=1000, height=700,
                     hovermode="closest",
                     updatemenus=[dict(type="buttons",
                                       buttons=[dict(label="Play",
                                                     method="animate",
                                                     args=[None])])],
                     sliders=[sliders_dict]
                    ),
    frames=[go.Frame(data=[f["data"]],name=f["name"]) for f in frames]
)


    
fig.update_layout(title="FIG.11.1: Number of products across black/hispanic and free/reduced ",
                  xaxis_title="pct_black_hispanic",
                  yaxis_title="pct_free/reduced",
                  #zaxis_title="Product",
                  showlegend=True)
fig.show()

One question to answer is how the usage of tools evolved during the pandemic. 

FIG.12 to FIG.32 show for each company who offer more than one product, how many products have been used each month. 

Google LCC shows an increased product adoptation in each class examined. Microsoft and Houghton Mifflin Harcourt are the second, with an average of 6 products.


In [None]:
res=[]
for index in range(1,13):
    df=load_df_month(index).astype({
            "pct_black/hispanic":str,
            "pct_free/reduced":str
        })
    
    df["black+free"]=df["pct_black/hispanic"]+df["pct_free/reduced"]
    df["month"]=month[str(index)]
    #filter the products that have engagement greater than the overall average on the month
    target=df[["month","black+free",
               "Provider/Company Name",
               "Product Name"]].drop_duplicates().groupby(["month",
                                                          "black+free",
                                                          "Provider/Company Name"]).agg({
                                                                        "Product Name":"count"
                                                                        }).reset_index()
    #count the number of different products used, no distinction about the engagement 
    res.append(target)
    
    
result=pd.concat(res,axis=0)



counter=12

for i in result["black+free"].unique():
    temp=result[result["black+free"]==i].set_index("month")
    #print(temp)
    #    print(temp[temp["Provider/Company Name"]==c].index.values)
    #    print(temp[temp["Provider/Company Name"]==c]["Product Name"].values)
    fig=go.Figure()
    for c in temp["Provider/Company Name"].unique():
        if np.mean(temp[temp["Provider/Company Name"]==c]["Product Name"].values)>1:
            fig.add_trace(go.Scatter(
              x=temp[temp["Provider/Company Name"]==c].index.values,
              y=temp[temp["Provider/Company Name"]==c]["Product Name"].values,
              mode="markers+lines",
              name="{}".format(products_cat.categories_[3][c]),
            ))
    if len(str(i))>1:
        fig.update_layout(title="FIG.{}:  Number of products used of each company<br>       considering only black/hispanic {} AND free/reduced {}".format(counter,
            districts_cat.categories_[0][int(str(i)[0])],districts_cat.categories_[1][int(str(i)[1])],))
    else:
        fig.update_layout(title="FIG.{}:  Number of products used of each company<br>       considering only black/hispanic AND free/reduced {}".format(counter,
            districts_cat.categories_[0][int(i)]))
    counter=counter+1
    fig.show()

FIG.33 to FIG.44 show how many products have been used in each month, Each figure represents a class of expenditure per pupil.  Since March, Google registered an increase of products persistent through the year in almost all the classes of expenditure.

In [None]:
res=[]
for index in range(1,13):
    df=load_df_month(index)
    
    #df["black+free"]=df["pct_black/hispanic"]*10+df["pct_free/reduced"]
    df["month"]=str(index)
    #filter the products that have engagement greater than the overall average on the month
    target=df[["month","pp_total_raw",
               "Provider/Company Name",
               "Product Name"]].drop_duplicates().groupby(["month",
                                                          "pp_total_raw",
                                                          "Provider/Company Name"]).agg({
                                                                        "Product Name":"count"
                                                                        }).reset_index()
    #count the number of different products used, no distinction about the engagement 
    res.append(target)
    
    
result=pd.concat(res,axis=0)
counter=33

for i in result["pp_total_raw"].unique():
    temp=result[result["pp_total_raw"]==i].set_index("month")
    #print(temp)
    #    print(temp[temp["Provider/Company Name"]==c].index.values)
    #    print(temp[temp["Provider/Company Name"]==c]["Product Name"].values)
    fig=go.Figure()
    for c in temp["Provider/Company Name"].unique():
        if np.mean(temp[temp["Provider/Company Name"]==c]["Product Name"].values)>1:
            
            x=temp[temp["Provider/Company Name"]==c].index.values.tolist()
            y=temp[temp["Provider/Company Name"]==c]["Product Name"].values.tolist()
            if len(x)<12:
                #figure whos' missing
                tester=defaultdict(lambda: 0)
                [tester[k] for k in month]
                for k in range(len(x)):
                    tester[x[k]]=y[k]
                x=[str(k) for k in range(1,13)]
                y=[tester[str(k)] for k in range(1,13)]
            x=[month[str(k)] for k in range(1,13)]
            
            
            fig.add_trace(go.Scatter(
              x=x,
              y=y,
              mode="markers+lines",
              name="{}".format(products_cat.categories_[3][c]),
            ))
    fig.update_layout(title="FIG.{}: Number of products used for each company<br>       considering only Per Pupil total expenditure class {}".format(counter,districts_cat.categories_[3][i]))
    counter=counter+1
    fig.show()

FIG.45 to FIG.56 show the percentage of access per company for each month having a value greater than 2%. Each figure isolates a class of expenditure per pupil. 

The sequence shows the variability of the offer across capacity of expenditure. We recall an hypothesis formulated from Fig.11, which will be useful for the investigation 
> A possible explanation for the reduction in the diversity during the year is the product substitution. More interconnected functionalities are needed to manage a virtual class, make sense that institutions adopted solutions that were solving more problems and abandoned products specialized in one single functionality. A possible explanation for the reduction in the diversity during the year is the product substitution. More interconnected functionalities are needed to manage a virtual class, make sense that institutions adopted solutions that were solving more problems and abandoned products specialized in one single functionality.

In almost all classes Cleaver shows a loss of percentage of access against other companies, as example:
Fig.45 focus on \[4000,6000[ and shows a substitution of Cleaver and Imagine Learning by Instructure and Curriculum Associates.
Fig.46 focus on \[6000,8000[ shows a substitution of Cleaver and Curriculum Associates toward Instructure and School loop.
Fig 47 focus on \[8000,1000[ shows a gain oof Schoology and Instructure against Cleaver and Curriculum Associates.

These figures show how the instruction market adapted during the year. Some companies that gained market share are Schoology, Instructure, Curriculum Associates, Classlink, Zoom, while others appear to have lost market, i.e. Cleaver, CoolMath.com, Mind research, and so on. The hypothesis of product substitution appears to have some credit. 

It is also interesting how some companies are present with a significant share only in some classes, i.e. Seesaw Learning, Google, Ed Puzzle, Blindside Networks (Fig.53 to Fig.55), whilst in some classes there are clear winners on the second part of the year:  Fig.50, Fig.51 Schoology; Fig.53 Classlink and Schoology.


 

In [None]:
res=[]
for index in range(1,13):
    df=load_df_month(index)
    
    #df["black+free"]=df["pct_black/hispanic"]*10+df["pct_free/reduced"]
    df["month"]=str(index)
    #filter the products that have engagement greater than the overall average on the month
    target=df[["month","pp_total_raw",
               "Provider/Company Name",
               "engagement_index",
               "pct_access",
               "Product Name"]].drop_duplicates().groupby(["month",
                                                          "pp_total_raw",
                                                          "Provider/Company Name"]).agg({
                                                                        "Product Name":"count",
                                                                        "pct_access":"mean",
                                                                        "engagement_index":"mean",
                                                                        }).reset_index()
    #count the number of different products used, no distinction about the engagement 
    res.append(target)
    
    
result=pd.concat(res,axis=0)
counter=45

for i in result["pp_total_raw"].unique():
    temp=result[result["pp_total_raw"]==i].set_index("month")
    #print(temp)
    #    print(temp[temp["Provider/Company Name"]==c].index.values)
    #    print(temp[temp["Provider/Company Name"]==c]["Product Name"].values)
    fig=go.Figure()
    for c in temp["Provider/Company Name"].unique():
        #filter the companies who provide a percentage of access greater than 2%
        if np.mean(temp[temp["Provider/Company Name"]==c]["pct_access"].values)>2:
            fig.add_trace(go.Scatter(
              x=[month[x] for x in temp[temp["Provider/Company Name"]==c].index.values],
              y=temp[temp["Provider/Company Name"]==c]["pct_access"].values,
              mode="markers+lines",
              name="{}".format(products_cat.categories_[3][c]),
            ))
    fig.update_layout(title="FIG.{}: Percentage of access per company<br>       considering only Per Pupil total expenditure class {}".format(counter,districts_cat.categories_[3][i]))
    counter=counter+1
    fig.show()

FIG.57 to FIG.68 compare companies toward the engagement index, the companies with engagement value lower than 100 are filtered out. Each figure shows only one per pupil total expenditure class. It is interesting a comparison of the engagement index registered in the first half of the year with the values in second half: Schoology, Instructure, Google show a substantial increase in second half, while kahoot! shows a loss in almost all classes.



In [None]:
counter=57
for i in result["pp_total_raw"].unique():
    temp=result[result["pp_total_raw"]==i].set_index("month")
    #print(temp)
    #    print(temp[temp["Provider/Company Name"]==c].index.values)
    #    print(temp[temp["Provider/Company Name"]==c]["Product Name"].values)
    fig=go.Figure()
    for c in temp["Provider/Company Name"].unique():
        #filter the companies who provide a percentage of engagement greater than 100 
        if np.mean(temp[temp["Provider/Company Name"]==c]["engagement_index"].values)>100:
            fig.add_trace(go.Scatter(
              x=[month[x] for x in temp[temp["Provider/Company Name"]==c].index.values],
              y=temp[temp["Provider/Company Name"]==c]["engagement_index"].values,
              mode="markers+lines",
              name="{}".format(products_cat.categories_[3][c]),
            ))
    fig.update_layout(title="FIG.{}:Engagement index per company<br>       considering only Per Pupil total expenditure class {}".format(counter,districts_cat.categories_[3][i]))
    counter=counter+1
    fig.show()

Companies in each primary essential function category 

In [None]:
#print(token_primary_essential_function)
# LC = Learning & Curriculum, CM = Classroom Management, and SDO = School & District Operations



#for each category list the companies.
for col in list(token_primary_essential_function):
    if col in result.columns:
        print("\n\n",col)
        print("\n",[
            products_cat.categories_[3][int(i)] for i in result[result[col]==1]["Provider/Company Name"].unique()
        ])

FIG.69 and FIG.70 show the mean engagement index and mean percentage per month for each primary essential function. 

In [None]:
#reduced_base=reduced_base.astype({col:int for col in list(token_primary_essential_function) if col in reduced_base.columns })

dictionary_of_results={}

for index in range(1,13):
    df=load_df_month(index).astype({
            col:int for col in list(token_primary_essential_function) if col in df.columns 
        })
    df["month"]=str(index)
    #df["black+free"]=df["pct_black/hispanic"]*10+df["pct_free/reduced"]
    
    for col in list(token_primary_essential_function):
        if col in df.columns:
            if col not in dictionary_of_results:
                dictionary_of_results[col]=[]
            
            result=df[df[col]==1].groupby("month").agg({
                                                "pct_access":"mean",
                                                "engagement_index":"mean"
                                            })
            
                #print(col)
            dictionary_of_results[col].append(result)

    
    
    
for col in dictionary_of_results:
    dictionary_of_results[col]=pd.concat(dictionary_of_results[col],axis=0)


In [None]:
def replace_ampersand_with_newline(string):
    if len(string)>25:
        return string.replace("&","&<br>    ")
    return string

In [None]:
fig=go.Figure()
for col in dictionary_of_results:
        
    fig.add_trace(go.Scatter(x=[month[str(i)] for i in dictionary_of_results[col].index.values],
                             y=dictionary_of_results[col]["engagement_index"].values,
                             #z=poject[:,2],
                             mode="markers+lines",
                             name=replace_ampersand_with_newline(col)
                            ),
                 )
fig.update_layout(title="FIG.69: engagement index per Primary Essential Function")
fig.show()

In [None]:
fig=go.Figure()
for col in dictionary_of_results:
        
    fig.add_trace(go.Scatter(x=[month[str(i)] for i in dictionary_of_results[col].index.values],
                             y=dictionary_of_results[col]["pct_access"].values,
                             #z=poject[:,2],
                             mode="markers+lines",
                             name=replace_ampersand_with_newline(col)
                            ),
                 )
fig.update_layout(title="FIG.70: Percentage access per Primary Essential Function")
fig.show()

FIG.71 and FIG.72 reproduce respectively FIG.69 and FIG.70 removing the seasonal drop of June, July and August. The scope is to better capture the Primary Essential Functions that increased/decreased during the pandemy.


The year shows an increased adoption of technology. Learning management systems, SSO and School management software record an high usage. Categories on increasing adoption are Video Conferencing & Screen sharing, Assessment and Classroom response, Virtual Classroom, online courses, SDO, streaming services, and Content Creation & curation. 

Declining categories are Mobile Device Management.

The picture is compatible with the virtualization of the learning process happened during the year and by all the agent of the system: from the side of the schools, teachers (virtual classroom, videoconferencing, and so on) and from the side of the students (the sharp increase of the online courses).  

In [None]:

fig=go.Figure()
for col in dictionary_of_results:
    tmp=dictionary_of_results[col][~dictionary_of_results[col].index.isin(["6","7","8"])]
        
    fig.add_trace(go.Scatter(x=[month[str(i)] for i in tmp.index.values],
                             y=tmp["pct_access"].values,
                             #z=poject[:,2],
                             mode="markers+lines",
                             name=replace_ampersand_with_newline(col)
                            ),
                 )
fig.update_layout(title="FIG.71: Percentage access per Primary Essential Function")
fig.show()
#get the categories who decreased
fig=go.Figure()
for col in dictionary_of_results:
    
    tmp=dictionary_of_results[col][~dictionary_of_results[col].index.isin(["6","7","8"])]
    
    fig.add_trace(go.Scatter(x=[month[str(i)] for i in tmp.index.values],
                             y=tmp["engagement_index"].values,
                             #z=poject[:,2],
                             mode="markers+lines",
                             name=replace_ampersand_with_newline(col)
                            ),
                 )
fig.update_layout(title="FIG.72: Engagement index per Primary Essential Function")
fig.show()

FIG.73 to FIG.77 show the the product diversity for the combined classes pct_black/hispanic and pct_free/reduced filtering only the data within a locale. The objective is to highlight the differences between living in an urban or rural district as example.



In [None]:
res=[]
for index in range(1,13):
    df=load_df_month(index)
    df["product"]=df.apply(lambda row: create_synthetic_product(row),axis=1)
    df["black+free"]=df["pct_black/hispanic"]*10+df["pct_free/reduced"]
    df["month"]=month[str(index)]
    #filter the products that have engagement greater than the overall average on the month
    #df=df[df["engagement_index"]>np.mean(df["engagement_index"].values)]
    #count the number of different products used, no distinction about the engagement 
    target=df[["month","black+free","product","locale"]].drop_duplicates().groupby(["month","black+free","locale"]).agg({
        "product":"count"
    }).reset_index()
    
    res.append(target)
    
    
result=pd.concat(res,axis=0)

#print(result)

counter=73
for j in result["locale"].unique():
    
    fig=go.Figure()
    for i in result["black+free"].unique():
        temp=result[(result["black+free"]==i)&(result["locale"]==j)]
        fig.add_trace(go.Scatter(
          x=temp["month"].values,
          y=temp["product"].values,
          mode="markers+lines",
          name="BH{}<br>FR{}".format(districts_cat.categories_[0][int(str(i)[0])],districts_cat.categories_[1][int(str(i)[1])]
        ) if len(str(i))==2 else "BH[0, 0.2[<br>FR{}".format(districts_cat.categories_[0][int(i)]),
          text=[ "number:{}<br>black/hispanic {}<br>free/reduced {}".format(temp["product"].values[v],
                                                                       districts_cat.categories_[0][int(str(i)[0])],
                                                                       districts_cat.categories_[1][int(str(i)[1])]
            ) if len(str(i))==2 else "number:{}<br>black/hispanic [0, 0.2[<br>free/reduced {}".format(
                              temp["product"].values[v],
                              districts_cat.categories_[0][int(i)]) for v in range(len(temp["product"].values))  ],
        ))
    fig.update_layout(title="FIG.{}: product diversity considering only data from {}".format(counter,districts_cat.categories_[5][int(j)]))
    counter=counter+1
    fig.show()

In [None]:
#https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_African-American_population


try:
    table1=pd.read_html("https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_African-American_population",
                  attrs={
                      'class':'sortable wikitable'
                  })[0].set_index("Rank")
    print("Table.1 - Percentage of African-American per state, census 2019, source wikipedia\n\n")
    #pd.set_option('display.max_columns', None)
    #pd.set_option('display.max_rows', None)
    #print(table)
except:
    print("no table, got at the link directly")



Table.1 lists the percentage of black/hispanic per state, the source is wikipedia (https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_African-American_population). 

In [None]:
table1

FIG.78 to FIG.101 show a disaggregated picture of the product diversity per state, black/hispanic and free/reduced classes. The figures highlight a substatial difference of the effects of the pandemy within the minority communities across states.  In Table.1 we list the states by percentage of black/hispanic people resident, the data is downloaded from wikipedia which refers the 2019 census. We will refer the percentage of african/american in the population of a state with the achronim PAA.

In the following description we will use BH for pct of black/hispanic, FR for pct of free/reduced, and NOP for number of products. We use the NOP as proxy to measure the relative offering of a learning system, and we compare the change of NOP across the year between BH and FR combinations in each state. 


FIG.78 (California, PAA 7\%) shows a sharp drop in the NOP used between October and November, the fall affected only BH\[0.6,0.8] FR\[0.4,0.6] and BH[0.2,0.4] FR[0.2,0.4], while the other classes remained stable. 

FIG.79 (Connecticut, PAA 13.2\%) shows the least NOP along the year is on BH[0,0.2] and FR[0.4,0.6], while all other classes follow a close pattern. 

FIG.80 (Illinois, PAA 15.4\%) has a similar behaviour of FIG.79 having the lowest NOP on BH[0,0.2] and FR[0.8,0.1].  

FIG.84 (Ohio, PAA 14.4\%) has only data from June to August. During the summer break, only BH[0.4,0.6] FR[nan] and BH[0.2,0.4] FR[0.2,0.4] shows a sharp drop in NOP. 

FIG.85 (Utah, PAA 1.9\%), shows that the least NOP is on BH[0.2,0.4] FR[0.4,0.6], moreover it is also the only category that recorded a drop in NOP during summer break. 

FIG.86 (Washington, PAA 5.6\%) shows probably a issue in the sampling method BH[0,0.2] FR[0.2,0.4] with BH[0,0.2] FR[0,0.2] display a suspicius pattern. The first dropped from over 300 to less then 60 in April and continued in this range unil end of year, while the second started the year with a NOP around 23 to raise to over 300 by October. The other classes are stable around 300. 
FIG.87 (Wisconsin, PAA 7.5\%) shows a similar pattern of FIG.86 with BH[0,0.2] FR[0.2,0.4]. Similar problems can be found in FIG.98 (Minnesota, PAA 8.1\%) and FIG.97 (Texas, PAA 13.5\%). In Texas there are only two classes, BH[0.6,0.8] FR[0.4,0.6] improved and BH[0.4,0.6] FR[0.4,0.6] worsened their NOP.

FIG.88 (New Hampshire, PAA 2.2\%) has only data for BH[0,0.2] FR[0.8,1] and shows a drop during summer break. The summer break drop is recorded also in FIG.94 (North Carolina, PAA 23.1\%) only for the class BH[0.4,0.6] FR[0.6,0.8], while in FIG.96 (Michigan, PAA 15.3\%) is common in all the classes.

FIG.89 (New York, PAA 17.6\%) has the lowest NOP on BH[0.8,1] FR[0.8,1], the pattern along the year is similar to the other classes. A similar figure is FIG.90 (Virginia, PAA 23.3\%) which shows the minimum NOP on BH[0,0.2] FR[0.2,0.4].

FIG.91 (Indiana, PAA 11\%) shows BH[0.8,1] FR[0.6,0.8] the only class to have substantially deteriorated NOP during the year. A similar comment can be applied for FIG.100 (Arizona, PAA 6\%) and the class BH[0.8,1] FR[nan].

FIG.93 (Tennessee, PAA 18\%) shows a deterioration of NOP for the class BH[0.2,0.4] FR[nan], the timeseries has data until August.

FIG.99 (District of Columbia, PAA 47.2\%) shows the least NOP in BH[0.4,0.6] which improved in the second part of the year without catching BH[0.8,1].


A common and intuitive phenomena between the figures is that deterioration is recorded on FR classes greater than [0,0.2] regardless BH, showing that higher FR has been penalyzed more than other classes during the pandemy. However, in Indiana, Arizona, New York, and California the associated BH was greater than 0.4.

FIG.99 is counterintuitive because in a state with PAA of about 50\% the class with BH [0.4, 0.6] has a lower NOP than BH [0.8,1]. A possible hypothesis could be that homogeneous districts have more political weight than mixed districts. 











In [None]:
res=[]
for index in range(1,13):
    df=load_df_month(index)
    df["product"]=df.apply(lambda row: create_synthetic_product(row),axis=1)
    df["black+free"]=df["pct_black/hispanic"]*10+df["pct_free/reduced"]
    df["month"]=month[str(index)]
    #filter the products that have engagement greater than the overall average on the month
    #df=df[df["engagement_index"]>np.mean(df["engagement_index"].values)]
    #count the number of different products used, no distinction about the engagement 
    target=df[["month","black+free","product","state"]].drop_duplicates().groupby(["month","black+free","state"]).agg({
        "product":"count"
    }).reset_index()
    
    res.append(target)
    
    
result=pd.concat(res,axis=0)

#print(result)

counter=78
for j in result["state"].unique():
    
    fig=go.Figure()
    for i in result["black+free"].unique():
        temp=result[(result["black+free"]==i)&(result["state"]==j)]
        fig.add_trace(go.Scatter(
          x=temp["month"].values,
          y=temp["product"].values,
          mode="markers+lines",
          name="BH{}<br>FR{}".format(districts_cat.categories_[0][int(str(i)[0])],districts_cat.categories_[1][int(str(i)[1])]
        ) if len(str(i))==2 else "BH[0, 0.2[<br>FR{}".format(districts_cat.categories_[0][int(i)]),
          text=[ "number:{}<br>black/hispanic {}<br>free/reduced {}".format(temp["product"].values[v],
                                                                       districts_cat.categories_[0][int(str(i)[0])],
                                                                       districts_cat.categories_[1][int(str(i)[1])]
            ) if len(str(i))==2 else "number:{}<br>black/hispanic [0, 0.2[<br>free/reduced {}".format(
                              temp["product"].values[v],
                              districts_cat.categories_[0][int(i)]) for v in range(len(temp["product"].values))  ],
        ))
    fig.update_layout(title="FIG.{}: product diversity considering only data from {}".format(counter,districts_cat.categories_[4][int(j)]))
    counter=counter+1
    fig.show()

FIG.102 to FIG.125 show the product number per state. The figures show the similarity in the number of products used per state across the year. 

An outlier with an unique pattern is Arizona (FIG.102), where the peak of usage of digital learning platform happened during the summer break. 

Minnesota (FIG.111) and North Dakota (FIG.117) probably have an issue in the data sampling, because after few months there is no data available. 

The District of Columbia (FIG.105), New Hampshire (FIG.113), New Jersey (FIG.114), Michigan (FIG.110) and New York(FIG.115) show a common drop in the number of the digital learning platform close to October, which may be the effect of the covid-19 policy response. As example, Michigan allowed the schools to open in-person in September, and later moved again on virtual learning in November (https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Michigan). 

Texas (FIG.120) has a low in July compatible with the summer break. 

In [None]:
if "nan" in token_primary_essential_function:
    token_primary_essential_function.remove("nan")



res=[]

for index in range(1,13):
    df=load_df_month(index)
    
    
    df["month"]=str(index)
    #filter the products that have engagement greater than the overall average on the month
    #df=df[df["engagement_index"]>np.mean(df["engagement_index"].values)]
    #count the number of different products used, no distinction about the engagement 
    target=df[["month","URL","state"]+list(token_primary_essential_function)].drop_duplicates().groupby(["month","state"]+list(token_primary_essential_function)).agg({
        "URL":"count"
    }).reset_index()
    
    res.append(target)
    
    
result=pd.concat(res,axis=0)

#reduced_base=reduced_base.astype({col:int for col in list(token_primary_essential_function) if col in reduced_base.columns })
counter=102
#reduced_base["syntetic"]=reduced_base["pct_access"]/100*reduced_base["engagement_index"]
for state in result["state"].unique():
    fig=go.Figure()
    for col in list(token_primary_essential_function):

        if col in result.columns:
            temp=result[(result[col]==1)&(result["state"]==state)].groupby("month").agg({
                "URL":"mean",
            })
            x=temp.index.values.tolist()
            y=temp["URL"].values.tolist()
            if len(x)<12:
                #figure whos' missing
                tester=defaultdict(lambda: 0)
                [tester[k] for k in month]
                for k in range(len(x)):
                    tester[x[k]]=y[k]
                x=[str(k) for k in range(1,13)]
                y=[tester[str(k)] for k in range(1,13)]
            x=[month[str(k)] for k in range(1,13)]
            fig.add_trace(go.Scatter(x=x,
                                     y=y,
                                     mode="markers+lines",
                                     name=col,
                                    ),
                         )
           
    fig.update_layout(title="FIG.{}: products number per Primary Essential Function on {}".format(counter,districts_cat.categories_[4][int(state)]))
    counter=counter+1
    fig.show()