1. [**Introduction**](#section-one)
2. [**Products and States**](#section-two) <br>
3. [**Location, Financial and Racial Factors**](#section-three) <br>
4. [**Conclusions**](#section-four) <br>
5. [**Sources**](#section-five) <br>

<a id="section-one"></a>
**1. Introduction**

The COVID-19 Pandemic has disrupted learning for more than 56 million students in the United States. In the Spring of 2020, most states and local governments across the U.S. closed educational institutions to stop the spread of the virus. In response, schools and teachers have attempted to reach students remotely through distance learning products and digital platforms. Until today, concerns of the exacaberting digital divide and long-term learning loss among America’s most vulnerable learners continue to grow.

More than 1.2 billion children in 186 countries were affected by closures of school because of the pandemic.
While some believe that the unplanned and rapid move to online learning – with no training, insufficient bandwidth, and little preparation – will result in a poor user experience that is unconducive to sustained growth, others believe that a new hybrid model of education will emerge, with significant benefits.

In this solution, we will focus on the remote learning products, analyzing how learning has changed in the post pandemic world of 2020 by checking: what and why of the use of specific learning products in specified location across United States.


This will be done in 3 parts:
1. Joining databases to create a master file.
2. Cleaning the dataset by removing the rows with no Product Information.
3. Creating Graphs to understand the data and provide Conclusions.

The language will be Python for this EDA. We will predominantly use plotly in this notebook.

I hope you'll enjoy reading the solution

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 14})

import re

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

import warnings
warnings.filterwarnings("ignore")
import plotly.express as px

districts_info = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv")
products_info = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/products_info.csv")

PATH = '../input/learnplatform-covid19-impact-on-digital-learning/engagement_data' 

temp = []

for district in districts_info.district_id.unique():
    df = pd.read_csv(f'{PATH}/{district}.csv', index_col=None, header=0)
    df["district_id"] = district
    temp.append(df)
    
    
engagement = pd.concat(temp)
engagement = engagement.reset_index(drop=True)

In [None]:
engagement.engagement_index=engagement.engagement_index.fillna(0)
products_info=products_info.rename(columns = {'LP ID': 'lp_id'}, inplace = False)
districts_info=districts_info.dropna(subset=['state'])
df=engagement.join(districts_info.set_index('district_id'),on='district_id')
df=df.join(products_info.set_index('lp_id'),on='lp_id')
df=df.dropna(subset=['Product Name'])
df.head()

Above is a view of how the complete data looks. We have collated three key informations:

1. Product Information
2. District Information
3. Information on the daily use of the Product

<a id="section-two"></a>
**2. Products and States**

We'll now see the mean Page Load per student across states.
We'll calculate the mean using "engagement index" which is actually ***total page-load events per one student of a specific product and on a specific day***.

In [None]:
df1=df.dropna(subset=['state'])
eda=df1.groupby(['state'], as_index=False)['engagement_index'].mean()
eda=eda.rename(columns = {'engagement_index': 'Mean Daily Page Load Per Student'}, inplace = False)
eda=eda.sort_values(by=['Mean Daily Page Load Per Student'],ascending=True)
list_of_state=eda.state
list_of_state=list_of_state.reset_index(drop=True)
eda['Mean Daily Page Load Per Student']=eda['Mean Daily Page Load Per Student']/1000
fig=px.bar(eda,x="Mean Daily Page Load Per Student",y="state",orientation="h",
           title="Mean Daily Page Load Per Student")
fig.show()

Arizona is on the top followed by North Dakota and New York whereas Tennessee, North Carolina and Michigan have the least Mean Daily Page Load Per Student.

**Top 10 Popular Products**

In [None]:
eda=df.groupby(['Product Name'], as_index=False)['engagement_index'].mean()
eda=eda.rename(columns = {'engagement_index': 'Mean Daily Page Load Per Student'}, inplace = False)
eda=eda.sort_values(by=['Mean Daily Page Load Per Student'],ascending=True)
eda['Mean Daily Page Load Per Student']=eda['Mean Daily Page Load Per Student']/1000
temp=eda.tail(10)
fig=px.bar(temp,x="Mean Daily Page Load Per Student",y="Product Name",orientation="h",
           title="Mean Daily Page Load Per Student")
fig.show()

The Top 3 are well known Google Products - Google Docs, Google Classroom and Youtube. All of them have value over 3 meaning on an average a student loads each of these products at least 3 times in a day.
In the top 10, we can find many other services created by Google, but also websites for creating tests/quizzes (Kahoot!) and many others products.

Next, we will focus on Top 10 Products and will see their uses across states in alphabetical order.

In [None]:
options=temp["Product Name"]
temp1 = df1[df1['Product Name'].isin(options)]
temp1 = temp1[temp1['state'].isin(list_of_state)]
eda=temp1.groupby(['Product Name','state'], as_index=False)['engagement_index'].mean()
eda=eda.rename(columns = {'engagement_index': 'Mean Daily Page Load Per Student'}, inplace = False)
eda['Mean Daily Page Load Per Student']=eda['Mean Daily Page Load Per Student']/1000
conditions = [
(eda['Product Name'] == 'Google Docs'),
(eda['Product Name'] =='Google Classroom'),
(eda['Product Name'] =='YouTube'),
(eda['Product Name'] =='Canvas'),
(eda['Product Name'] == 'Meet'),
(eda['Product Name'] =='Schoology'),
(eda['Product Name'] =='Kahoot!'),
(eda['Product Name'] =='Google Forms'),
(eda['Product Name'] == 'Google Drive'),
(eda['Product Name'] =='ClassLink')
]

# create a list of the values we want to assign for each condition
#values = ['0','1','2','3','4','5','6','7','8','9']
values = ['9','8','7','6','5','4','3','2','1','0']

# create a new column and use np.select to assign values to it using our lists as arguments
eda['sort'] = np.select(conditions, values)

eda=eda.sort_values(by='sort')
eda1=eda.loc[eda['state'] == 'Arizona']
eda2=eda.loc[eda['state'] == 'California']
eda3=eda.loc[eda['state'] == 'Connecticut']
eda4=eda.loc[eda['state'] == 'District Of Columbia']
eda5=eda.loc[eda['state'] == 'Florida']
eda6=eda.loc[eda['state'] == 'Illinois']
eda7=eda.loc[eda['state'] == 'Indiana']
eda8=eda.loc[eda['state'] == 'Massachusetts']
eda9=eda.loc[eda['state'] == 'Michigan']
eda10=eda.loc[eda['state'] == 'Minnesota']
eda11=eda.loc[eda['state'] == 'Missouri']
eda12=eda.loc[eda['state'] == 'New Hampshire']
eda13=eda.loc[eda['state'] == 'New Jersey']
eda14=eda.loc[eda['state'] == 'New York']
eda15=eda.loc[eda['state'] == 'North Carolina']
eda16=eda.loc[eda['state'] == 'North Dakota']
eda17=eda.loc[eda['state'] == 'Ohio']
eda18=eda.loc[eda['state'] == 'Tennessee']
eda19=eda.loc[eda['state'] == 'Texas']
eda20=eda.loc[eda['state'] == 'Utah']
eda21=eda.loc[eda['state'] == 'Virginia']
eda22=eda.loc[eda['state'] == 'Washington']
eda23=eda.loc[eda['state'] == 'Wisconsin']

fig = make_subplots(rows=2, cols=3, subplot_titles=("Arizona", "California",
                                                    "Connecticut", "District Of Columbia",
                                                    "Florida", "Illinois"),shared_yaxes=True,column_widths=[0.33, 0.33, 0.33])
fig.add_trace(
    go.Bar(x=eda1["Mean Daily Page Load Per Student"],y=eda1["Product Name"],orientation='h'),
    row=1, col=1
)

fig.add_trace(
    go.Bar(x=eda2["Mean Daily Page Load Per Student"],y=eda2["Product Name"],orientation='h'),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=eda3["Mean Daily Page Load Per Student"],y=eda3["Product Name"],orientation='h'),
    row=1, col=3
)

fig.add_trace(
    go.Bar(x=eda4["Mean Daily Page Load Per Student"],y=eda4["Product Name"],orientation='h'),
    row=2, col=1
)

fig.add_trace(
    go.Bar(x=eda5["Mean Daily Page Load Per Student"],y=eda5["Product Name"],orientation='h'),
    row=2, col=2
)

fig.add_trace(
    go.Bar(x=eda6["Mean Daily Page Load Per Student"],y=eda6["Product Name"],orientation='h'),
    row=2, col=3
)

fig.update_xaxes(range=[0, 20],row=1, col=1)
fig.update_xaxes(range=[0, 20],row=1, col=2)
fig.update_xaxes(range=[0, 20],row=1, col=3)
fig.update_xaxes(range=[0, 20],row=2, col=1)
fig.update_xaxes(range=[0, 20],row=2, col=2)
fig.update_xaxes(range=[0, 20],row=2, col=3)

fig.update_layout(height=700, width=1000, title_text="Mean Daily Page Load Per Student",showlegend=False)
fig.show()

Google Docs is the most popular of Product in all the above 6 states.
It has crossed an average page-load of 5 and in Arizona and Illinois it has crossed 15.
Google Classroom crossed 5 in 4 of 6 states and is second only to Google Docs in all the above states except District of Columbia where YouTube takes second position with an average of more than 6 page loads per student in a day.

In [None]:
fig = make_subplots(rows=2, cols=3, subplot_titles=("Indiana", "Massachusetts",
                                                    "Michigan","Minnesota","Missouri",
                                                    "New Hampshire"),shared_yaxes=True,column_widths=[0.33, 0.33, 0.33])
fig.add_trace(
    go.Bar(x=eda7["Mean Daily Page Load Per Student"],y=eda7["Product Name"],orientation='h'),
    row=1, col=1
)

fig.add_trace(
    go.Bar(x=eda8["Mean Daily Page Load Per Student"],y=eda8["Product Name"],orientation='h'),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=eda9["Mean Daily Page Load Per Student"],y=eda9["Product Name"],orientation='h'),
    row=1, col=3
)

fig.add_trace(
    go.Bar(x=eda10["Mean Daily Page Load Per Student"],y=eda10["Product Name"],orientation='h'),
    row=2, col=1
)

fig.add_trace(
    go.Bar(x=eda11["Mean Daily Page Load Per Student"],y=eda11["Product Name"],orientation='h'),
    row=2, col=2
)

fig.add_trace(
    go.Bar(x=eda12["Mean Daily Page Load Per Student"],y=eda12["Product Name"],orientation='h'),
    row=2, col=3
)

fig.update_xaxes(range=[0, 20],row=1, col=1)
fig.update_xaxes(range=[0, 20],row=1, col=2)
fig.update_xaxes(range=[0, 20],row=1, col=3)
fig.update_xaxes(range=[0, 20],row=2, col=1)
fig.update_xaxes(range=[0, 20],row=2, col=2)
fig.update_xaxes(range=[0, 20],row=2, col=3)

fig.update_layout(height=700, width=1000, title_text="Mean Daily Page Load Per Student",showlegend=False)
fig.show()

Google Docs is the most popular Product in 4 out of 6 states.
The most standout Product is Schoology which is the most popular product in Minnesota defeating the likes of Google Products such as Google Docs and Google Classroom.

In [None]:
fig = make_subplots(rows=2, cols=3, subplot_titles=("New Jersey", "New York", "North Carolina",
                                                     "Ohio","North Dakota",
                                                    "Tennessee"),shared_yaxes=True,column_widths=[0.33, 0.33, 0.33])
fig.add_trace(
    go.Bar(x=eda13["Mean Daily Page Load Per Student"],y=eda13["Product Name"],orientation='h'),
    row=1, col=1
)

fig.add_trace(
    go.Bar(x=eda14["Mean Daily Page Load Per Student"],y=eda14["Product Name"],orientation='h'),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=eda15["Mean Daily Page Load Per Student"],y=eda15["Product Name"],orientation='h'),
    row=1, col=3
)

fig.add_trace(
    go.Bar(x=eda17["Mean Daily Page Load Per Student"],y=eda17["Product Name"],orientation='h'),
    row=2, col=1
)

fig.add_trace(
    go.Bar(x=eda16["Mean Daily Page Load Per Student"],y=eda16["Product Name"],orientation='h'),
    row=2, col=2
)

fig.add_trace(
    go.Bar(x=eda18["Mean Daily Page Load Per Student"],y=eda18["Product Name"],orientation='h'),
    row=2, col=3
)

fig.update_xaxes(range=[0, 20],row=1, col=1)
fig.update_xaxes(range=[0, 20],row=1, col=2)
fig.update_xaxes(range=[0, 20],row=1, col=3)
fig.update_xaxes(range=[0, 20],row=2, col=1)
fig.update_xaxes(range=[0, 20],row=2, col=2)
fig.update_xaxes(range=[0, 20],row=2, col=3)

fig.update_layout(height=700, width=1000, title_text="Mean Daily Page Load Per Student",showlegend=False)
fig.show()

Next, Google Docs maintains 1st position in 5 out of the 6 states and crosses 5 in 3 of the states. Kahoot! is at pole position in North Dakota where it crosses 5 and Meet is noticiably popular in New York where it clocks 6 page load event on an average per person per day.

In [None]:
fig = make_subplots(rows=2, cols=3, subplot_titles=("Texas", "Utah", "Virginia","Washington",
                                                    "Wisconsin"),shared_yaxes=True,column_widths=[0.33, 0.33, 0.33])
fig.add_trace(
    go.Bar(x=eda19["Mean Daily Page Load Per Student"],y=eda19["Product Name"],orientation='h'),
    row=1, col=1
)

fig.add_trace(
    go.Bar(x=eda20["Mean Daily Page Load Per Student"],y=eda20["Product Name"],orientation='h'),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=eda21["Mean Daily Page Load Per Student"],y=eda21["Product Name"],orientation='h'),
    row=1, col=3
)

fig.add_trace(
    go.Bar(x=eda22["Mean Daily Page Load Per Student"],y=eda22["Product Name"],orientation='h'),
    row=2, col=1
)

fig.add_trace(
    go.Bar(x=eda23["Mean Daily Page Load Per Student"],y=eda23["Product Name"],orientation='h'),
    row=2, col=2
)


fig.update_xaxes(range=[0, 20],row=1, col=1)
fig.update_xaxes(range=[0, 20],row=1, col=2)
fig.update_xaxes(range=[0, 20],row=1, col=3)
fig.update_xaxes(range=[0, 20],row=2, col=1)
fig.update_xaxes(range=[0, 20],row=2, col=2)
fig.update_xaxes(range=[0, 20],row=2, col=3)

fig.update_layout(height=700, width=1000, title_text="Mean Daily Page Load Per Student",showlegend=False)
fig.show()

In the last set of the 5 states, Google Docs is at first position in 4 of them but the highlight is Schoology topping the charts in Texas where it has crossed 20 leaving the other products far behind. Schoology cross 8 hits per student per day in Wisconsin and is at second position in the state.

Now, let's turn the analysis 180 degree and check which Product has gained popularity in which state.

In [None]:
options=temp["Product Name"]
temp1 = df1[df1['Product Name'].isin(options)]
eda=temp1.groupby(['Product Name','state'], as_index=False)['engagement_index'].mean()
eda=eda.rename(columns = {'engagement_index': 'Mean Daily Page Load Per Student'}, inplace = False)
eda['Mean Daily Page Load Per Student']=eda['Mean Daily Page Load Per Student']/1000
conditions = [
eda['state'] == 'Arizona',
eda['state'] == 'California',
eda['state'] == 'Connecticut',
eda['state'] == 'District Of Columbia',
eda['state'] == 'Florida',
eda['state'] == 'Illinois',
eda['state'] == 'Indiana',
eda['state'] == 'Massachusetts',
eda['state'] == 'Michigan',
eda['state'] == 'Minnesota',
eda['state'] == 'Missouri',
eda['state'] == 'New Hampshire',
eda['state'] == 'New Jersey',
eda['state'] == 'New York',
eda['state'] == 'North Carolina',
eda['state'] == 'North Dakota',
eda['state'] == 'Ohio',
eda['state'] == 'Tennessee',
eda['state'] == 'Texas',
eda['state'] == 'Utah',
eda['state'] == 'Virginia',
eda['state'] == 'Washington',
eda['state'] == 'Wisconsin'
]
# create a list of the values we want to assign for each condition
#values = ['0','1','2','3','4','5','6','7','8','9']
values = ['32','31','30','29','28','27','26','25','24','23',
          '22','21','20','19','18','17','16','15','14','13','12','11','0']

# create a new column and use np.select to assign values to it using our lists as arguments
eda['sort'] = np.select(conditions, values)

eda=eda.sort_values(by='sort')
eda1=eda.loc[eda['Product Name'] == 'Google Docs']
eda2=eda.loc[eda['Product Name'] =='Google Classroom']
eda3=eda.loc[eda['Product Name'] =='YouTube']
eda4=eda.loc[eda['Product Name'] =='Canvas']
eda5=eda.loc[eda['Product Name'] == 'Meet']
eda6=eda.loc[eda['Product Name'] =='Schoology']
eda7=eda.loc[eda['Product Name'] =='Kahoot!']
eda8=eda.loc[eda['Product Name'] =='Google Forms']
eda9=eda.loc[eda['Product Name'] == 'Google Drive']
eda10=eda.loc[eda['Product Name'] =='ClassLink']

fig = make_subplots(rows=1, cols=5, subplot_titles=("Google Docs", "Google Classroom", "YouTube", "Canvas",
                                                    "Schoology"),shared_yaxes=True,column_widths=[0.2, 0.2, 0.2,0.2,0.2])
fig.add_trace(
    go.Bar(x=eda1["Mean Daily Page Load Per Student"],y=eda1["state"],orientation='h'),
    row=1, col=1
)

fig.add_trace(
    go.Bar(x=eda2["Mean Daily Page Load Per Student"],y=eda2["state"],orientation='h'),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=eda3["Mean Daily Page Load Per Student"],y=eda3["state"],orientation='h'),
    row=1, col=3
)

fig.add_trace(
    go.Bar(x=eda4["Mean Daily Page Load Per Student"],y=eda4["state"],orientation='h'),
    row=1, col=4
)
fig.add_trace(
    go.Bar(x=eda6["Mean Daily Page Load Per Student"],y=eda6["state"],orientation='h'),
    row=1, col=5
)

fig.update_xaxes(range=[0, 20],row=1, col=1)
fig.update_xaxes(range=[0, 20],row=1, col=2)
fig.update_xaxes(range=[0, 20],row=1, col=3)
fig.update_xaxes(range=[0, 20],row=1, col=4)
fig.update_xaxes(range=[0, 20],row=1, col=5)
fig.update_yaxes(title_text="State", row=1, col=1)
fig.update_layout(height=700, width=1000, title_text="Mean Daily Page Load Per Student",showlegend=False)
fig.show()

From the above chart, Google Docs is most popular product across most of the states and is most popular in Arizona and Illinois where it clocks over 17 hits per student per day. On the other hand, Canvas is popular in states like Wisconsin, Virginia and Missouri where it cross 4 hits per student per day. Most surprisingly, Schoology cross 20 hits per student per day in Texas and cross 5 in Wisconsin and 10 in Minnesota.

In [None]:
fig = make_subplots(rows=1, cols=5, subplot_titles=("Kahoot!", "Meet", "Google Forms", "Google Drive",
                                                    "ClassLink"),shared_yaxes=True,column_widths=[0.2, 0.2, 0.2,0.2,0.2])
fig.add_trace(
    go.Bar(x=eda7["Mean Daily Page Load Per Student"],y=eda7["state"],orientation='h'),
    row=1, col=1
)

fig.add_trace(
    go.Bar(x=eda5["Mean Daily Page Load Per Student"],y=eda5["state"],orientation='h'),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=eda8["Mean Daily Page Load Per Student"],y=eda8["state"],orientation='h'),
    row=1, col=3
)

fig.add_trace(
    go.Bar(x=eda9["Mean Daily Page Load Per Student"],y=eda9["state"],orientation='h'),
    row=1, col=4
)
fig.add_trace(
    go.Bar(x=eda10["Mean Daily Page Load Per Student"],y=eda10["state"],orientation='h'),
    row=1, col=5
)

fig.update_xaxes(range=[0, 7],row=1, col=1)
fig.update_xaxes(range=[0, 7],row=1, col=2)
fig.update_xaxes(range=[0, 7],row=1, col=3)
fig.update_xaxes(range=[0, 7],row=1, col=4)
fig.update_xaxes(range=[0, 7],row=1, col=5)
fig.update_yaxes(title_text="State", row=1, col=1)

fig.update_layout(height=700, width=1000, title_text="Mean Daily Page Load Per Student",showlegend=False)
fig.show()

Next, Meet has the highest hits per student per day across the states in New York where it cross 6 hits per student. Kahoot! cross 5 in North Dakota. Similarly, other Products such as ClassLink has visibility in some states such as Wisconsin and Tennessee and is largely missing from most of the states.

Next, we will look at the popularity of the Products across time

In [None]:
df['Date'] = pd.to_datetime(df['time'])
df['Week Number'] = pd.to_numeric(df['Date'].dt.week)
options=temp["Product Name"].tail(5)

temp1 = df[df['Product Name'].isin(options)]
eda=temp1.groupby(['Product Name','Week Number'], as_index=False)['engagement_index'].mean()
eda=eda.rename(columns = {'engagement_index': 'Mean Daily Page Load Per Student'}, inplace = False)

conditions = [
(eda['Product Name'] == 'Google Docs'),
(eda['Product Name'] =='Google Classroom'),
(eda['Product Name'] =='YouTube'),
(eda['Product Name'] =='Canvas'),
(eda['Product Name'] == 'Meet'),
(eda['Product Name'] =='Schoology'),
(eda['Product Name'] =='Kahoot!'),
(eda['Product Name'] =='Google Forms'),
(eda['Product Name'] == 'Google Drive'),
(eda['Product Name'] =='ClassLink')
]

# create a list of the values we want to assign for each condition
#values = ['0','1','2','3','4','5','6','7','8','9']
values = ['9','8','7','6','5','4','3','2','1','0']

# create a new column and use np.select to assign values to it using our lists as arguments
eda['sort'] = np.select(conditions, values)

eda=eda.sort_values(['sort', 'Week Number'], ascending=[False, True])

eda['Mean Daily Page Load Per Student']=eda['Mean Daily Page Load Per Student']/1000
eda['Mean Daily Page Load Per Student']=pd.to_numeric(eda['Mean Daily Page Load Per Student'])
fig = px.line(eda, x="Week Number", y="Mean Daily Page Load Per Student", color='Product Name')
fig.show()

We aggregated the data to a week and then took a mean on Daily Page Load per Student across the top 10 Products. In this graph, we have plotted the top 5 Products. The drop after week 25 till week 35 is summer holidays signified by a drop in popularity of the products. In terms of wave, we see two waves, one at the onset of the pandemic and one after summer break. Here, Google Docs is the most popular product thoughout followed by Google Classroom whereas the data for YouTube starts from week 24 which raises a possibility of noise in the dataset

In [None]:
options=temp["Product Name"].head(5)

temp1 = df[df['Product Name'].isin(options)]
eda=temp1.groupby(['Product Name','Week Number'], as_index=False)['engagement_index'].mean()
eda=eda.rename(columns = {'engagement_index': 'Mean Daily Page Load Per Student'}, inplace = False)

conditions = [
(eda['Product Name'] == 'Google Docs'),
(eda['Product Name'] =='Google Classroom'),
(eda['Product Name'] =='YouTube'),
(eda['Product Name'] =='Canvas'),
(eda['Product Name'] == 'Meet'),
(eda['Product Name'] =='Schoology'),
(eda['Product Name'] =='Kahoot!'),
(eda['Product Name'] =='Google Forms'),
(eda['Product Name'] == 'Google Drive'),
(eda['Product Name'] =='ClassLink')
]

# create a list of the values we want to assign for each condition
#values = ['0','1','2','3','4','5','6','7','8','9']
values = ['9','8','7','6','5','4','3','2','1','0']

# create a new column and use np.select to assign values to it using our lists as arguments
eda['sort'] = np.select(conditions, values)

eda=eda.sort_values(['sort', 'Week Number'], ascending=[False, True])

eda['Mean Daily Page Load Per Student']=eda['Mean Daily Page Load Per Student']/1000
eda['Mean Daily Page Load Per Student']=pd.to_numeric(eda['Mean Daily Page Load Per Student'])
fig = px.line(eda, x="Week Number", y="Mean Daily Page Load Per Student", color='Product Name')
fig.show()

In the next set of 5 products, Kahoot! seems to be more famous during the first wave and then lost the race to Schoology during the second wave. The rise of Schoology is truly significant since it even left products from Google such as Google Drive and Google Forms behind.

<a id="section-three"></a>
**3. Location, Financial and Racial Factors**

Next, we will look how the Mean Page Load Per Student Per Day across Products against **Location of Place of Study**.

In [None]:
df1=df.dropna(subset=['locale'])
options=temp["Product Name"]
temp1 = df1[df1['Product Name'].isin(options)]
eda=temp1.groupby(['Product Name','locale'], as_index=False)['engagement_index'].mean()
eda=eda.rename(columns = {'engagement_index': 'Mean Daily Page Load Per Student'}, inplace = False)
eda['Mean Daily Page Load Per Student']=eda['Mean Daily Page Load Per Student']/1000
conditions = [
eda['locale'] == 'City',
eda['locale'] == 'Suburb',
eda['locale'] == 'Town',
eda['locale'] == 'Rural',
]
# create a list of the values we want to assign for each condition
#values = ['0','1','2','3','4','5','6','7','8','9']
values = ['13','12','11','0']

# create a new column and use np.select to assign values to it using our lists as arguments
eda['sort'] = np.select(conditions, values)

eda=eda.sort_values(by='sort')
eda1=eda.loc[eda['Product Name'] == 'Google Docs']
eda2=eda.loc[eda['Product Name'] =='Google Classroom']
eda3=eda.loc[eda['Product Name'] =='YouTube']
eda4=eda.loc[eda['Product Name'] =='Canvas']
eda5=eda.loc[eda['Product Name'] == 'Meet']
eda6=eda.loc[eda['Product Name'] =='Schoology']
eda7=eda.loc[eda['Product Name'] =='Kahoot!']
eda8=eda.loc[eda['Product Name'] =='Google Forms']
eda9=eda.loc[eda['Product Name'] == 'Google Drive']
eda10=eda.loc[eda['Product Name'] =='ClassLink']

fig = make_subplots(rows=1, cols=5, subplot_titles=("Google Docs", "Google Classroom", "YouTube", "Canvas", "Meet",
                                                    ),shared_yaxes=True,column_widths=[0.2,0.2,0.2,0.2,0.2])
fig.add_trace(
    go.Bar(x=eda1["Mean Daily Page Load Per Student"],y=eda1["locale"],orientation='h'),
    row=1, col=1
)

fig.add_trace(
    go.Bar(x=eda2["Mean Daily Page Load Per Student"],y=eda2["locale"],orientation='h'),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=eda3["Mean Daily Page Load Per Student"],y=eda3["locale"],orientation='h'),
    row=1, col=3
)

fig.add_trace(
    go.Bar(x=eda4["Mean Daily Page Load Per Student"],y=eda4["locale"],orientation='h'),
    row=1, col=4
)

fig.add_trace(
    go.Bar(x=eda5["Mean Daily Page Load Per Student"],y=eda5["locale"],orientation='h'),
    row=1, col=5
)                                                            
                                                             

fig.update_xaxes(range=[0, 15],row=1, col=1)
fig.update_xaxes(range=[0, 15],row=1, col=2)
fig.update_xaxes(range=[0, 15],row=1, col=3)
fig.update_xaxes(range=[0, 15],row=1, col=4)
fig.update_xaxes(range=[0, 15],row=1, col=5)

fig.update_yaxes(title_text="Locale", row=1, col=1)
fig.update_layout(height=700, width=1000, title_text="Mean Daily Page Load Per Student",showlegend=False)
fig.show()

There are 4 categories of location in the data: town, suburb, rural, city.
We will calculate mean daily page load for each of the 10 top products.
The difference in mean numbers is small across the top 10 products. Apart from Google Docs which is less active in Town which can be attributed to large difference in the dataset for Town for Google Docs

In [None]:
fig = make_subplots(rows=1, cols=5, subplot_titles=("Schoology", "Kahoot!", "Google Forms", "Google Drive",
                                                    "ClassLink"),shared_yaxes=True,column_widths=[0.2,0.2,0.2,0.2,0.2])
fig.add_trace(
    go.Bar(x=eda6["Mean Daily Page Load Per Student"],y=eda6["locale"],orientation='h'),
    row=1, col=1
)
                                                             
fig.add_trace(
    go.Bar(x=eda7["Mean Daily Page Load Per Student"],y=eda7["locale"],orientation='h'),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=eda8["Mean Daily Page Load Per Student"],y=eda8["locale"],orientation='h'),
    row=1, col=3
)

fig.add_trace(
    go.Bar(x=eda9["Mean Daily Page Load Per Student"],y=eda9["locale"],orientation='h'),
    row=1, col=4
)

fig.add_trace(
    go.Bar(x=eda10["Mean Daily Page Load Per Student"],y=eda10["locale"],orientation='h'),
    row=1, col=5
) 

fig.update_xaxes(range=[0, 5],row=1, col=1)
fig.update_xaxes(range=[0, 5],row=1, col=2)
fig.update_xaxes(range=[0, 5],row=1, col=3)
fig.update_xaxes(range=[0, 5],row=1, col=4)
fig.update_xaxes(range=[0, 5],row=1, col=5)

fig.update_yaxes(title_text="Locale", row=1, col=1)
fig.update_layout(height=700, width=1000, title_text="Mean Daily Page Load Per Student",showlegend=False)
fig.show()

The difference in mean numbers is small here as well. Apart from Schoology which is non existent in Town, we don't see much difference in other products for location. It shows that the location of place of study doesn't significantly affect the popularity of any of the top products

Next, we will look how the Mean Page Load Per Student Per Day across Products against **Percentage of students in the districts identified as Black or Hispanic**.

In [None]:
conditions2 = [df['pct_black/hispanic'] == "[0, 0.2[",df['pct_black/hispanic'] == "[0.2, 0.4[",
              df['pct_black/hispanic'] == "[0.4, 0.6[",df['pct_black/hispanic'] == "[0.6, 0.8[",
              df['pct_black/hispanic'] == "[0.8, 1["]

values_1 = ['0-20%', '20-40%', '40-60%', '60-80%','80-100%']

df['pct_fr'] = np.select(conditions2, values_1)

df1=df.dropna(subset=['pct_black/hispanic'])

options=temp["Product Name"]
temp1 = df1[df1['Product Name'].isin(options)]
eda=temp1.groupby(['Product Name','pct_fr'], as_index=False)['engagement_index'].mean()
eda=eda.rename(columns = {'engagement_index': 'Mean Daily Page Load Per Student'}, inplace = False)
eda['Mean Daily Page Load Per Student']=eda['Mean Daily Page Load Per Student']/1000
conditions = [
eda['pct_fr'] == '0-20%',
eda['pct_fr'] == '20-40%',
eda['pct_fr'] == '40-60%',
eda['pct_fr'] == '60-80%',
eda['pct_fr'] == '80-100%'
]
# create a list of the values we want to assign for each condition
#values = ['0','1','2','3','4','5','6','7','8','9']
values = ['0','11','12','13','14']

# create a new column and use np.select to assign values to it using our lists as arguments
eda['sort'] = np.select(conditions, values)

eda=eda.sort_values(by='sort')
eda1=eda.loc[eda['Product Name'] == 'Google Docs']
eda2=eda.loc[eda['Product Name'] =='Google Classroom']
eda3=eda.loc[eda['Product Name'] =='YouTube']
eda4=eda.loc[eda['Product Name'] =='Canvas']
eda5=eda.loc[eda['Product Name'] == 'Meet']
eda6=eda.loc[eda['Product Name'] =='Schoology']
eda7=eda.loc[eda['Product Name'] =='Kahoot!']
eda8=eda.loc[eda['Product Name'] =='Google Forms']
eda9=eda.loc[eda['Product Name'] == 'Google Drive']
eda10=eda.loc[eda['Product Name'] =='ClassLink']

fig = make_subplots(rows=1, cols=5, subplot_titles=("Google Docs", "Google Classroom", "YouTube", "Canvas", "Meet",
                                                    ),shared_yaxes=True,column_widths=[0.2,0.2,0.2,0.2,0.2])
fig.add_trace(
    go.Bar(x=eda1["Mean Daily Page Load Per Student"],y=eda1["pct_fr"],orientation='h'),
    row=1, col=1
)

fig.add_trace(
    go.Bar(x=eda2["Mean Daily Page Load Per Student"],y=eda2["pct_fr"],orientation='h'),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=eda3["Mean Daily Page Load Per Student"],y=eda3["pct_fr"],orientation='h'),
    row=1, col=3
)

fig.add_trace(
    go.Bar(x=eda4["Mean Daily Page Load Per Student"],y=eda4["pct_fr"],orientation='h'),
    row=1, col=4
)

fig.add_trace(
    go.Bar(x=eda5["Mean Daily Page Load Per Student"],y=eda5["pct_fr"],orientation='h'),
    row=1, col=5
)                                                            
                                                             

fig.update_xaxes(range=[0, 15],row=1, col=1)
fig.update_xaxes(range=[0, 15],row=1, col=2)
fig.update_xaxes(range=[0, 15],row=1, col=3)
fig.update_xaxes(range=[0, 15],row=1, col=4)
fig.update_xaxes(range=[0, 15],row=1, col=5)

fig.update_yaxes(title_text="Share of black/hispanic students", row=1, col=1)

fig.update_layout(height=700, width=1000, title_text="Mean Daily Page Load Per Student",showlegend=False)
fig.show()

The **Percentage of students in the districts identified as Black or Hispanic** is distributed under 5 buckets. We will calculate mean daily page load for each of the 10 top products. The difference in mean numbers is not significant across some of the products. There is no clear change in popularity of the products with increase in share of black/hispanic students. 

In [None]:
fig = make_subplots(rows=1, cols=5, subplot_titles=("Schoology", "Kahoot!", "Google Forms", "Google Drive",
                                                    "ClassLink"),shared_yaxes=True,column_widths=[0.2,0.2,0.2,0.2,0.2])
fig.add_trace(
    go.Bar(x=eda6["Mean Daily Page Load Per Student"],y=eda6["pct_fr"],orientation='h'),
    row=1, col=1
)
                                                             
fig.add_trace(
    go.Bar(x=eda7["Mean Daily Page Load Per Student"],y=eda7["pct_fr"],orientation='h'),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=eda8["Mean Daily Page Load Per Student"],y=eda8["pct_fr"],orientation='h'),
    row=1, col=3
)

fig.add_trace(
    go.Bar(x=eda9["Mean Daily Page Load Per Student"],y=eda9["pct_fr"],orientation='h'),
    row=1, col=4
)

fig.add_trace(
    go.Bar(x=eda10["Mean Daily Page Load Per Student"],y=eda10["pct_fr"],orientation='h'),
    row=1, col=5
) 

fig.update_xaxes(range=[0, 5],row=1, col=1)
fig.update_xaxes(range=[0, 5],row=1, col=2)
fig.update_xaxes(range=[0, 5],row=1, col=3)
fig.update_xaxes(range=[0, 5],row=1, col=4)
fig.update_xaxes(range=[0, 5],row=1, col=5)

fig.update_yaxes(title_text="Share of black/hispanic students", row=1, col=1)

fig.update_layout(height=700, width=1000, title_text="Mean Daily Page Load Per Student",showlegend=False)
fig.show()

Here, not much Dependencies are visible as well. Overall, we can say that Percentage of students in the districts identified as Black or Hispanic doesn't significantly affect the popularity of some of the top 10 products.

Next, we will look how the Mean Page Load Per Student Per Day across Products against **Percentage of students in the districts eligible for free or reduced-price lunch**.

In [None]:
conditions2 = [df['pct_free/reduced'] == "[0, 0.2[",df['pct_free/reduced'] == "[0.2, 0.4[",
              df['pct_free/reduced'] == "[0.4, 0.6[",df['pct_free/reduced'] == "[0.6, 0.8[",
              df['pct_free/reduced'] == "[0.8, 1["]

values_1 = ['0-20%', '20-40%', '40-60%', '60-80%','80-100%']

df['pct_fr'] = np.select(conditions2, values_1)

df1=df.dropna(subset=['pct_free/reduced'])

options=temp["Product Name"]
temp1 = df1[df1['Product Name'].isin(options)]
eda=temp1.groupby(['Product Name','pct_fr'], as_index=False)['engagement_index'].mean()
eda=eda.rename(columns = {'engagement_index': 'Mean Daily Page Load Per Student'}, inplace = False)
eda['Mean Daily Page Load Per Student']=eda['Mean Daily Page Load Per Student']/1000
conditions = [
eda['pct_fr'] == '0-20%',
eda['pct_fr'] == '20-40%',
eda['pct_fr'] == '40-60%',
eda['pct_fr'] == '60-80%',
eda['pct_fr'] == '80-100%'
]
# create a list of the values we want to assign for each condition
#values = ['0','1','2','3','4','5','6','7','8','9']
values = ['0','11','12','13','14']

# create a new column and use np.select to assign values to it using our lists as arguments
eda['sort'] = np.select(conditions, values)

eda=eda.sort_values(by='sort')
eda1=eda.loc[eda['Product Name'] == 'Google Docs']
eda2=eda.loc[eda['Product Name'] =='Google Classroom']
eda3=eda.loc[eda['Product Name'] =='YouTube']
eda4=eda.loc[eda['Product Name'] =='Canvas']
eda5=eda.loc[eda['Product Name'] == 'Meet']
eda6=eda.loc[eda['Product Name'] =='Schoology']
eda7=eda.loc[eda['Product Name'] =='Kahoot!']
eda8=eda.loc[eda['Product Name'] =='Google Forms']
eda9=eda.loc[eda['Product Name'] == 'Google Drive']
eda10=eda.loc[eda['Product Name'] =='ClassLink']

fig = make_subplots(rows=1, cols=5, subplot_titles=("Google Docs", "Google Classroom", "YouTube", "Canvas", "Meet",
                                                    ),shared_yaxes=True,column_widths=[0.2,0.2,0.2,0.2,0.2])
fig.add_trace(
    go.Bar(x=eda1["Mean Daily Page Load Per Student"],y=eda1["pct_fr"],orientation='h'),
    row=1, col=1
)

fig.add_trace(
    go.Bar(x=eda2["Mean Daily Page Load Per Student"],y=eda2["pct_fr"],orientation='h'),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=eda3["Mean Daily Page Load Per Student"],y=eda3["pct_fr"],orientation='h'),
    row=1, col=3
)

fig.add_trace(
    go.Bar(x=eda4["Mean Daily Page Load Per Student"],y=eda4["pct_fr"],orientation='h'),
    row=1, col=4
)

fig.add_trace(
    go.Bar(x=eda5["Mean Daily Page Load Per Student"],y=eda5["pct_fr"],orientation='h'),
    row=1, col=5
)                                                            
                                                             

fig.update_xaxes(range=[0, 15],row=1, col=1)
fig.update_xaxes(range=[0, 15],row=1, col=2)
fig.update_xaxes(range=[0, 15],row=1, col=3)
fig.update_xaxes(range=[0, 15],row=1, col=4)
fig.update_xaxes(range=[0, 15],row=1, col=5)

fig.update_yaxes(title_text="Students eligible for free or reduced-price lunch", row=1, col=1)

fig.update_layout(height=700, width=1000, title_text="Mean Daily Page Load Per Student",showlegend=False)
fig.show()

The **Percentage of students in the districts eligible for free or reduced-price lunch** is distributed under 5 buckets. It relates to financial matters, probably the higher the value of this coefficient, the lower the average financial expenditure on students. We will calculate mean daily page load for each of the 10 top products. The difference in mean numbers is not significant across the top products.

In [None]:
fig = make_subplots(rows=1, cols=5, subplot_titles=("Schoology", "Kahoot!", "Google Forms", "Google Drive",
                                                    "ClassLink"),shared_yaxes=True,column_widths=[0.2,0.2,0.2,0.2,0.2])
fig.add_trace(
    go.Bar(x=eda6["Mean Daily Page Load Per Student"],y=eda6["pct_fr"],orientation='h'),
    row=1, col=1
)
                                                             
fig.add_trace(
    go.Bar(x=eda7["Mean Daily Page Load Per Student"],y=eda7["pct_fr"],orientation='h'),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=eda8["Mean Daily Page Load Per Student"],y=eda8["pct_fr"],orientation='h'),
    row=1, col=3
)

fig.add_trace(
    go.Bar(x=eda9["Mean Daily Page Load Per Student"],y=eda9["pct_fr"],orientation='h'),
    row=1, col=4
)

fig.add_trace(
    go.Bar(x=eda10["Mean Daily Page Load Per Student"],y=eda10["pct_fr"],orientation='h'),
    row=1, col=5
) 

fig.update_xaxes(range=[0, 5],row=1, col=1)
fig.update_xaxes(range=[0, 5],row=1, col=2)
fig.update_xaxes(range=[0, 5],row=1, col=3)
fig.update_xaxes(range=[0, 5],row=1, col=4)
fig.update_xaxes(range=[0, 5],row=1, col=5)

fig.update_yaxes(title_text="Students eligible for free or reduced-price lunch", row=1, col=1)

fig.update_layout(height=700, width=1000, title_text="Mean Daily Page Load Per Student",showlegend=False)
fig.show()

Here as well, We can see that there is no significant difference. Percentage of students who can eat free meals doesn't significantly affect the popularity of any of the products.

Next, we will look how the Mean Page Load Per Student Per Day across Products against **county connections ratio**

In [None]:
conditions2 = [df['county_connections_ratio'] == "[0.18, 1[",df['county_connections_ratio'] == "[1, 2["]

values_1 = ['<1', '>1']

df['pct_fr'] = np.select(conditions2, values_1)

df1=df.dropna(subset=['county_connections_ratio'])

options=temp["Product Name"]
temp1 = df1[df1['Product Name'].isin(options)]
eda=temp1.groupby(['Product Name','pct_fr'], as_index=False)['engagement_index'].mean()
eda=eda.rename(columns = {'engagement_index': 'Mean Daily Page Load Per Student'}, inplace = False)
eda['Mean Daily Page Load Per Student']=eda['Mean Daily Page Load Per Student']/1000
conditions = [
eda['pct_fr'] == '<1',
eda['pct_fr'] == '>1'
]
# create a list of the values we want to assign for each condition
#values = ['0','1','2','3','4','5','6','7','8','9']
values = ['0','11']

# create a new column and use np.select to assign values to it using our lists as arguments
eda['sort'] = np.select(conditions, values)

eda=eda.sort_values(by='sort')
eda1=eda.loc[eda['Product Name'] == 'Google Docs']
eda2=eda.loc[eda['Product Name'] =='Google Classroom']
eda3=eda.loc[eda['Product Name'] =='YouTube']
eda4=eda.loc[eda['Product Name'] =='Canvas']
eda5=eda.loc[eda['Product Name'] == 'Meet']
eda6=eda.loc[eda['Product Name'] =='Schoology']
eda7=eda.loc[eda['Product Name'] =='Kahoot!']
eda8=eda.loc[eda['Product Name'] =='Google Forms']
eda9=eda.loc[eda['Product Name'] == 'Google Drive']
eda10=eda.loc[eda['Product Name'] =='ClassLink']

fig = make_subplots(rows=1, cols=5, subplot_titles=("Google Docs", "Google Classroom", "YouTube", "Canvas", "Meet",
                                                    ),shared_yaxes=True,column_widths=[0.2,0.2,0.2,0.2,0.2])
fig.add_trace(
    go.Bar(x=eda1["Mean Daily Page Load Per Student"],y=eda1["pct_fr"],orientation='h'),
    row=1, col=1
)

fig.add_trace(
    go.Bar(x=eda2["Mean Daily Page Load Per Student"],y=eda2["pct_fr"],orientation='h'),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=eda3["Mean Daily Page Load Per Student"],y=eda3["pct_fr"],orientation='h'),
    row=1, col=3
)

fig.add_trace(
    go.Bar(x=eda4["Mean Daily Page Load Per Student"],y=eda4["pct_fr"],orientation='h'),
    row=1, col=4
)

fig.add_trace(
    go.Bar(x=eda5["Mean Daily Page Load Per Student"],y=eda5["pct_fr"],orientation='h'),
    row=1, col=5
)                                                            
                                                             

fig.update_xaxes(range=[0, 15],row=1, col=1)
fig.update_xaxes(range=[0, 15],row=1, col=2)
fig.update_xaxes(range=[0, 15],row=1, col=3)
fig.update_xaxes(range=[0, 15],row=1, col=4)
fig.update_xaxes(range=[0, 15],row=1, col=5)

fig.update_yaxes(title_text="Ratio of residential fixed high-speed connections/households", row=1, col=1)

fig.update_layout(height=700, width=1000, title_text="Mean Daily Page Load Per Student",showlegend=False)
fig.show()

The **county connections ratio** or **residential fixed high-speed connections over 200 kbps in at least one direction/households** is distributed under 2 buckets, 0.18-1 (less than 1 connection per household) and 1-2 (Good internet connection). We will calculate mean daily page load for each of the 10 top products. The difference in mean numbers is significant across the top products. The dependencies are quite strong, places where connection is limited has higher average load per student. The least difference is visible in the "Google Classroom" , but in other cases there is a very strong dependency.

In [None]:
fig = make_subplots(rows=1, cols=5, subplot_titles=("Schoology", "Kahoot!", "Google Forms", "Google Drive",
                                                    "ClassLink"),shared_yaxes=True,column_widths=[0.2,0.2,0.2,0.2,0.2])
fig.add_trace(
    go.Bar(x=eda6["Mean Daily Page Load Per Student"],y=eda6["pct_fr"],orientation='h'),
    row=1, col=1
)
                                                             
fig.add_trace(
    go.Bar(x=eda7["Mean Daily Page Load Per Student"],y=eda7["pct_fr"],orientation='h'),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=eda8["Mean Daily Page Load Per Student"],y=eda8["pct_fr"],orientation='h'),
    row=1, col=3
)

fig.add_trace(
    go.Bar(x=eda9["Mean Daily Page Load Per Student"],y=eda9["pct_fr"],orientation='h'),
    row=1, col=4
)

fig.add_trace(
    go.Bar(x=eda10["Mean Daily Page Load Per Student"],y=eda10["pct_fr"],orientation='h'),
    row=1, col=5
) 

fig.update_xaxes(range=[0, 5],row=1, col=1)
fig.update_xaxes(range=[0, 5],row=1, col=2)
fig.update_xaxes(range=[0, 5],row=1, col=3)
fig.update_xaxes(range=[0, 5],row=1, col=4)
fig.update_xaxes(range=[0, 5],row=1, col=5)

fig.update_yaxes(title_text="Ratio of residential fixed high-speed connections/households", row=1, col=1)
fig.update_layout(height=700, width=1000, title_text="Mean Daily Page Load Per Student",showlegend=False)
fig.show()

Again, apart from "Kahoot!" and "Google Forms", the dependency is clear.  Weaker connections requires more frequent loading of products, as seen in the chart above.

Next, we will look how the Mean Page Load Per Student Per Day across Products against **Per-pupil total expenditure (NERD) product**

In [None]:
conditions2 = [df['pp_total_raw'] == "[4000, 6000[",df['pp_total_raw'] == "[6000, 8000[",
               df['pp_total_raw'] == "[8000, 10000[",df['pp_total_raw'] == "[10000, 12000[",
               df['pp_total_raw'] == "[12000, 14000[",df['pp_total_raw'] == "[14000, 16000[",
               df['pp_total_raw'] == "[16000, 18000[",df['pp_total_raw'] == "[18000, 20000[",
               df['pp_total_raw'] == "[20000, 22000[",df['pp_total_raw'] == "[22000, 24000[",
               df['pp_total_raw'] == "[32000, 34000["]

values_1 = ['4/6', '6/8','8/10','10/12','12/14','14/16','16/18','18/20','20/22','22/24','32/34']

df['pp_total'] = np.select(conditions2, values_1)

df1=df.dropna(subset=['pp_total_raw'])

options=temp["Product Name"]
temp1 = df1[df1['Product Name'].isin(options)]
eda=temp1.groupby(['Product Name','pp_total'], as_index=False)['engagement_index'].mean()
eda=eda.rename(columns = {'engagement_index': 'Mean Daily Page Load Per Student'}, inplace = False)
eda['Mean Daily Page Load Per Student']=eda['Mean Daily Page Load Per Student']/1000

conditions = [
eda['pp_total'] == '4/6',
eda['pp_total'] == '6/8',
eda['pp_total'] == '8/10',
eda['pp_total'] == '10/12',
eda['pp_total'] == '12/14',
eda['pp_total'] == '14/16',
eda['pp_total'] == '16/18',
eda['pp_total'] == '18/20',
eda['pp_total'] == '20/22',
eda['pp_total'] == '22/24',
eda['pp_total'] == '32/34'
]
# create a list of the values we want to assign for each condition
values = ['0','1','2','3','4','5','6','7','8','9','99']

# create a new column and use np.select to assign values to it using our lists as arguments
eda['sort'] = np.select(conditions, values)

eda=eda.sort_values(by='sort')

eda1=eda.loc[eda['Product Name'] == 'Google Docs']
eda2=eda.loc[eda['Product Name'] =='Google Classroom']
eda3=eda.loc[eda['Product Name'] =='YouTube']
eda4=eda.loc[eda['Product Name'] =='Canvas']
eda5=eda.loc[eda['Product Name'] == 'Meet']
eda6=eda.loc[eda['Product Name'] =='Schoology']
eda7=eda.loc[eda['Product Name'] =='Kahoot!']
eda8=eda.loc[eda['Product Name'] =='Google Forms']
eda9=eda.loc[eda['Product Name'] == 'Google Drive']
eda10=eda.loc[eda['Product Name'] =='ClassLink']

fig = make_subplots(rows=1, cols=5, subplot_titles=("Google Docs", "Google Classroom", "YouTube", "Canvas", "Meet",
                                                    ),shared_yaxes=True,column_widths=[0.2,0.2,0.2,0.2,0.2])
fig.add_trace(
    go.Bar(x=eda1["Mean Daily Page Load Per Student"],y=eda1["pp_total"],orientation='h'),
    row=1, col=1
)

fig.add_trace(
    go.Bar(x=eda2["Mean Daily Page Load Per Student"],y=eda2["pp_total"],orientation='h'),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=eda3["Mean Daily Page Load Per Student"],y=eda3["pp_total"],orientation='h'),
    row=1, col=3
)

fig.add_trace(
    go.Bar(x=eda4["Mean Daily Page Load Per Student"],y=eda4["pp_total"],orientation='h'),
    row=1, col=4
)

fig.add_trace(
    go.Bar(x=eda5["Mean Daily Page Load Per Student"],y=eda5["pp_total"],orientation='h'),
    row=1, col=5
)                                                            
                                                             

fig.update_xaxes(range=[0, 20],row=1, col=1)
fig.update_xaxes(range=[0, 20],row=1, col=2)
fig.update_xaxes(range=[0, 20],row=1, col=3)
fig.update_xaxes(range=[0, 20],row=1, col=4)
fig.update_xaxes(range=[0, 20],row=1, col=5)

fig.update_yaxes(title_text="Per-pupil total expenditure from NERD$ project (in thousands)", row=1, col=1)

fig.update_layout(height=700, width=1000, title_text="Mean Daily Page Load Per Student",showlegend=False)
fig.show()

The **Per-pupil total expenditure (NERD) product** is distributed under 11 buckets. This is again a Financial Information of the students which is given in a bucket of 2000 dollars. We will calculate mean daily page load for each of the 10 top products. The difference in mean numbers is significant across the top products. The dependencies are visible, as the expenditure has increased, the popularity has also increased for most of the products such as Google Docs, Google Classroom, YouTube and Meet. The least difference is visible in the "Canvas" products, but in other cases there is a dependency.

In [None]:
fig = make_subplots(rows=1, cols=5, subplot_titles=("Schoology", "Kahoot!", "Google Forms", "Google Drive",
                                                    "ClassLink"),shared_yaxes=True,column_widths=[0.2,0.2,0.2,0.2,0.2])
fig.add_trace(
    go.Bar(x=eda6["Mean Daily Page Load Per Student"],y=eda6["pp_total"],orientation='h'),
    row=1, col=1
)
                                                             
fig.add_trace(
    go.Bar(x=eda7["Mean Daily Page Load Per Student"],y=eda7["pp_total"],orientation='h'),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=eda8["Mean Daily Page Load Per Student"],y=eda8["pp_total"],orientation='h'),
    row=1, col=3
)

fig.add_trace(
    go.Bar(x=eda9["Mean Daily Page Load Per Student"],y=eda9["pp_total"],orientation='h'),
    row=1, col=4
)

fig.add_trace(
    go.Bar(x=eda10["Mean Daily Page Load Per Student"],y=eda10["pp_total"],orientation='h'),
    row=1, col=5
) 

fig.update_xaxes(range=[0, 5],row=1, col=1)
fig.update_xaxes(range=[0, 5],row=1, col=2)
fig.update_xaxes(range=[0, 5],row=1, col=3)
fig.update_xaxes(range=[0, 5],row=1, col=4)
fig.update_xaxes(range=[0, 5],row=1, col=5)

fig.update_yaxes(title_text="Per-pupil total expenditure from NERD$ project (in thousands)", row=1, col=1)
fig.update_layout(height=700, width=1000, title_text="Mean Daily Page Load Per Student",showlegend=False)
fig.show()

The dependencies are quite strong in these products as well. As the expenditure has increased, the popularity has also increased for most of the products such as Google Forms, Schoology, Google Drive and ClassLink. The least difference is visible in "Kahoot!", but in other cases there is a dependency.

<a id="section-four"></a>
**4. Conclusions**

<p style="text-align: justify;">The objective of this analysis was to look at the remote learning scenario in 2020 and try to understand how different financial, racial or time period factors affected remote learning. We have focused our analysis on 10 most popular products and have done our EDA, from which we can conclude the following:</p>

* In all the products analysed, 7 products had an mean daily page load per student above 1, and two Google products were undisputed leaders: Google Docs and Google Classroom (with an mean of 10 and 5 loads per student per day).
* In all the states analysed, Arizona has the highest mean daily page load per student whereas Tennessee has the lowest mean daily page load per student.
* The Holiday break results in the biggest drop in activity across the Products which is the same as logical intuition.
* Google Docs is the most popular product across states apart from Texas and Minnesota where Schoology is very popular.
* Canvas, Schoology and ClassLink is really popular in some states and in other states they are unknown.
* In all top products, same trend is seen - bimodal distribution and increase in popularity with the onset of the pandemic and return after the summer break.
* Location doesn't affect the popularity of any partiular product.
* Share of black and Hispanic people school doesn't affect the popularity of the products.
* Financial Metrics such as share of students who can get a free meal does not determine the popularity of the any particular product.
* Speed of internet greatly influences the popularity of almost all products.
* Expenditure from NERD scholarship program also affects of popularity of products such as Google Docs, Google Classroom, YouTube and Meet.

<a id="section-five"></a>
**5. Sources**

1. https://www.kaggle.com/c/learnplatform-covid19-impact-on-digital-learning <br>
2. https://www.kaggle.com/iamleonie/how-to-approach-analytics-challenges <br>
3. https://www.kaggle.com/michau96/most-popular-tools-in-2020-digital-learning <br>
4. https://www.weforum.org/agenda/2020/04/coronavirus-education-global-covid19-online-digital-learning/ <br>
4. https://plotly.com/python/ <br>



**Thanks for taking your time and reading my kernel**

**This is my first kernel in Kaggle**

**If you like it, please give it an upvote**