In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## **Introduction** 

Since the COVID-19 pandemic started, many countries have been forced to close schools to stop the spread of the virus. From 2020 to 2021, In the US, more than 56 million students have been affected by the pandemic. 

In response, education is being reshaped, some schools have attempted to reach students remotely through online learning tools and digital platforms. And online learning could be a part of a permanent solution. However, the effectiveness of digital learning is open to question.

The level of student engagement on the digital learning tools is a good measure of the likelihood that a learning experience will be successful, it indicates student’s interaction and cooperation with their classmates and teachers. Online learning effectiveness and product engagement can be increased by understanding how students use those digital learning tools. 

We try to uncover the state of digital learning and study what factors may affect the engagement of digital learning.

## **Objectives**

This project has three main objectives.

First, uncover the state of digital learning in the US in 2020. 

Second, analyze how the engagement of digital learning relates to district demographics  and products.

Third, analyze how the user engagement of the tools is affected by the product's function or other factors.

## **Data**

The datasets in the project are provided by Kaggle, three sets of data are mainly used in this project. And in order to better understand students’ learning experience, data will be merged. 

First, district information data includes information about the characteristics of school districts, it mainly includes the state where the district resides, locale classification, percentage of students in the districts identified as Black or Hispanic based on 2018-19 NCES data, Per-pupil total expenditure of a given school district. 

Second, product information data is about the characteristics of the top 372 products with the most users in 2020. It mainly includes the name of the specific product, the product provider, and the basic function of the product. 
Third, engagement data are aggregated at the school district level, it mainly includes the percentage of students in the district who have at least one page-load event of a given product and on a given day, total page-load events per one thousand students of a given product and on a given day.

## **Outline**

1. Data preparation
1. Districts characteristic
1. Digital learning tools characteristic
1. Tools and user engagement
1. Engagement index and districts characteristic
1. Conclusion

## **1. Data preparation**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
from scipy import stats
from scipy.stats import norm, skew

### **Import data**

In [None]:
districts=pd.read_csv('../input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv')
products=pd.read_csv('../input/learnplatform-covid19-impact-on-digital-learning/products_info.csv')

In [None]:
import glob
path = '../input/learnplatform-covid19-impact-on-digital-learning/engagement_data' 
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    district_id = filename.split("/")[4].split(".")[0]
    df["district_id"] = district_id
    li.append(df)
    
engagement = pd.concat(li)
engagement = engagement.reset_index(drop=True)

## **2. Districts characteristic**

### **Objectives**

1. Understanding students distribution.
2. Uncovering the number of students in the districts are eligible for free or reduced-price lunch.
3. Uncovering the number of students in the districts are identified as Black or Hispanic

### **Data description**

* **district_id**:The unique identifier of the school district
* **state**:The state where the district resides in
* **locale**:NCES locale classification that categorizes U.S. territory into four types of areas: City, Suburban, Town, and Rural. See Locale Boundaries User's Manual for more information.
* **pct_black/hispanic**:Percentage of students in the districts identified as Black or Hispanic based on 2018-19 NCES data
* **pct_free/reduced**:Percentage of students in the districts eligible for free or reduced-price lunch based on 2018-19 NCES data
* **countyconnectionsratio**:ratio (residential fixed high-speed connections over 200 kbps in at least one direction/households) based on the county level data from FCC From 477 (December 2018 version). See FCC data for more information.
* **pptotalraw**：Per-pupil total expenditure (sum of local and federal expenditure) from Edunomics Lab's National Education Resource Database on Schools (NERD$) project. The expenditure data are school-by-school, and we use the median value to represent the expenditure of a given school district.

### **limitation**

There is a limitation in this dataset,because it is unbalanced,which means the target variables has more observations in some specific classes than the others.

In this dataset,most of the schools are locate in Connecticut,Utah and Illinois.

In [None]:
districts.head()

In [None]:
info=districts.info()
info=pd.DataFrame(info)

In [None]:
print(districts.shape)
print('\n __Missing data__')
districts.isnull().sum()/districts.shape[0]

###  **Data cleaning**

In [None]:
districts.dropna(subset=['state'],axis=0,inplace=True)
districts.dropna(subset=['pct_free/reduced'],axis=0,inplace=True)
print(districts.shape)
print('\n __Missing data__')
print(districts.isnull().sum()/districts.shape[0])

### **School location**

Most schools are locate in Connecticut,Utah and Illinois.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib as mpl

districts['state'].value_counts().plot(kind='pie',figsize=(17,17),autopct='%1.1f%%',fontsize=11,colormap=
                                      'summer_r')
plt.title('State',y=1.03,fontsize=20)
plt.axis('equal')
plt.ylabel('')

### **Locale classification**

Most of the schools are located in suburb area and rural area.


In [None]:
districts['locale'].value_counts().plot(kind='pie',figsize=(17,17),autopct='%1.1f%%',fontsize=11,colormap=
                                      'summer_r')
plt.title('Locale',y=1.03,fontsize=20)
plt.axis('equal')
plt.ylabel('')

### **Percentage of the students who are eligible for free or reduced-price lunch （aggregated by state）**

We can notice that students eligible for free or reduced-price lunch mainly are located in Utah,Connecticut and Illinois.

Also, students are identified as Black mainly located in Illinois,California and Connecticut. Students are identified as hispanic mainly located in Connecticut,Illinois and Utah.

In [None]:
pd.crosstab(index=districts['state'], columns=districts['pct_free/reduced']).cumsum(axis=1)

In [None]:
pd.crosstab(index=districts['state'], columns=districts['pct_black/hispanic']).cumsum(axis=1)

### **Percentage of the students who are eligible for free or reduced-price lunch (aggregated by locale)**

Students who are eligible for free or reduced-price lunch mainly are located in suburbs.Students are identified as Black or Hispanic also mainly located in suburbs.

In [None]:
pd.crosstab(index=districts['locale'], columns=districts['pct_free/reduced']).cumsum(axis=1)

In [None]:
pd.crosstab(index=districts['locale'], columns=districts['pct_black/hispanic']).cumsum(axis=1)

### **County_Connections ratio**

We can notice that almost all county_connections_ratio are between 0.18 to 1.

In [None]:
districts['county_connections_ratio'].value_counts()

In [None]:
districts['county_connections_ratio'].value_counts().plot(kind='pie',figsize=(17,17),autopct='%1.1f%%',colormap='summer_r')
plt.title('County_Connections ratio',y=1,fontsize=20)

### **Per-pupil total expenditure**

Per-pupil total expenditure (sum of local and federal expenditure) from Edunomics Lab's National Education Resource Database on Schools (NERD$) project. The expenditure data are school-by-school, and the median value is used to represent the expenditure of a given school district.



In [None]:
districts.groupby("locale")['pp_total_raw'].value_counts()

In [None]:
districts['pp_total_raw'].value_counts().plot(kind='bar',figsize=(17,17),color='yellowgreen')
plt.title('Per-pupil total expenditure',y=1,fontsize=20)

In [None]:
pp_total_raw_by_state=pd.crosstab(index=districts['state'], columns=districts['pp_total_raw']).cumsum(axis=1)
pp_total_raw_by_state

In [None]:
pd.crosstab(index=districts['locale'], columns=districts['pp_total_raw']).cumsum(axis=1)

## **3.Digital learning tools characteristic**

The products dataset includes information about the characteristics of the top 372 products with most users in 2020. 

### **Objectives**

1. Finding out the major providers and most popular products.
2. Finding out the main sectors of education where the digital learning products are used.
3. Finding out the popular functions of the digital learning products.


### **Products**
### **Data description**


* **LP ID**:The unique identifier of the product
* **URL**:Web Link to the specific product
* **Product Name**:Name of the specific product
* **Provider/Company Name**:Name of the product provider
* **Sector(s)**:Sector of education where the product is used
* **Primary Essential Function**:The basic function of the product. There are two layers of labels here. Products are first labeled as one of these three categories: LC = Learning & Curriculum, CM = Classroom Management, and SDO = School & District Operations. Each of these categories have multiple sub-categories with which the products were labeled


### **Data cleaning**

In [None]:
print(products.shape)
print("\n __Missing data__")
products.isnull().sum()

In [None]:
products.dropna(subset=['Provider/Company Name'],axis=0,inplace=True)
products.dropna(subset=['Sector(s)'],axis=0,inplace=True)
products.dropna(subset=['Primary Essential Function'],axis=0,inplace=True)
print("\n __Missing data__")
products.isnull().sum()

### **Engagement** 

The engagement data are ***aggregated at school district level***, and each file in the folder engagement_data represents data from one school district. The 4-digit file name represents district_id which can be used to link to district information in district_info.csv. The lp_id can be used to link to product information in product_info.csv.

### **Data description**

* **time**:date in "YYYY-MM-DD"
* **lp_id**:The unique identifier of the product
* **pct_access**:Percentage of students in the district have at least one page-load event of a given product and on a given day
* **engagement_index**:Total page-load events per one thousand students of a given product and on a given day

In [None]:
engagement.head()

### **Data cleaning**

In [None]:
print(engagement.shape)
print("\n __Missing data__")
engagement.isnull().sum()

In [None]:
print("\n __Missing data percentage__")
engagement.isnull().sum()/engagement.shape[0]

In [None]:
#create a dataset contains only missing value
#There are so many missing data in engagement_index columns, we should focus on it.

missingengagementindex = engagement[engagement['engagement_index'].isna()]
missingengagementindex.head()

In [None]:
missingengagementindex['district_id'].value_counts()

In [None]:
missingengagementindex['lp_id'].value_counts()

In [None]:
engagement.dropna(subset=['engagement_index'],axis=0,inplace=True)
print("\n __Missing data__")
engagement.isnull().sum()

In [None]:
print(engagement.shape)

### **Merging Engagement index and products**

In [None]:
productsandengagement = pd.merge(products, engagement, left_on='LP ID', right_on='lp_id')
productsandengagement.head()

### **Major Providers**

Top3 providers are Google LLC, IXL Learning,and PBS. We can notice that Google LLC is the major provider in our dataset. The number of products provided by Google LLC is much higher than other companies.


In [None]:
providercount=productsandengagement['Provider/Company Name'].value_counts()/productsandengagement.shape[0]
providercount=pd.DataFrame(providercount).sort_values(by=['Provider/Company Name'],ascending=False)
providercounttop10=providercount.head(10)

In [None]:
providercounttop10.plot(kind='bar',color='yellowgreen',figsize=(17,17))
plt.title("Top 10 Provider",fontsize=20)
plt.xticks(rotation=20)

### **Top Products**

The top 3 most popular products are Google Docs, Google Drive, and Google Classroom. Not surprisingly,  the products provided by Google make up a significant proportion of the top 10 most popular digital learning tools in our dataset.

In [None]:
productcount=productsandengagement['Product Name'].value_counts()/productsandengagement.shape[0]
productcount=pd.DataFrame(productcount).sort_values(by=['Product Name'],ascending=False)
productcount

In [None]:
productcounttop10=productcount.head(10)
productcounttop10

### **Sector(s)**

Prek-12(elementary and secondary schools students) are the major users of digital learning tools, and other users include higher education students and  corporate education students.

In [None]:
productsandengagement['Sector(s)'].value_counts().plot(kind='pie',figsize=(17,17),autopct='%1.1f%%',colormap='summer_r')
plt.title('Sector Percentage',fontsize=20)
plt.xticks(fontsize=17)
plt.yticks(fontsize=17)

### **Function**

The top 3 popular functions are digital learning platforms, Sites,Resources& Reference, and Content Creation&Curation. And obviously Learning & Curriculum is the most popular basic function. 

In [None]:
productsandengagement['Primary Essential Function'].value_counts().plot(kind='pie',figsize=(60,60),autopct='%1.1f%%',colormap='summer_r',fontsize=40)
plt.title('Sector Percentage',fontsize=50)
plt.xticks(fontsize=30)
plt.yticks(fontsize=30)

In [None]:
print('__Top10 functions__')
productsandengagement['Primary Essential Function'].value_counts().head(10)

## **4.  Tools and user engagement**

### **Objectives**

1. Understanding the state of digital learning engagement in different products.
1. Understanding the change in pct_access over time.
1. Understanding the change in engagement_index over time.


### **Factor:product**

Pct_access_average is the percentage of students in the district who have at least one page-load event of a given product and on a given day. It represents the user activity of the products.

Engagement_index is the total page-load events per one thousand students of a given product and on a given day. It presents the degree to which users interact with your product.

The graphs below show that Google Classroom and Google Docs have the highest pct_access_average, the number is much higher than ClassLink and Google Drive, which rank 3 and 4. 

Also, by comparing the average engagement index, Google Docs and Google Classroom occupy the top two positions in the ranking.
We can notice that some products are popular but their pct_access_average and engagement index are not high, such as Google Sites, Chrome Web Store, Wikipedia, Google Sheets, Khan Academy, Prodigy.


### **pct_access (aggregated by product)**

In [None]:
pct_by_product=pd.DataFrame(productsandengagement.groupby("Product Name")["pct_access"].mean())
pct_by_product=pct_by_product.rename(columns={'pct_access':'pct_access_average'})
pct_by_product=pct_by_product.sort_values(by='pct_access_average',ascending=False)

In [None]:
pct_by_producttop10=pct_by_product.head(10)
pct_by_producttop10

In [None]:
pct_by_producttop10.plot(kind='bar',color='yellowgreen',figsize=(17,17))
plt.title("Pct_access_average by products",fontsize=20)
plt.xticks(rotation=20)

### **engagement index (aggregated by product)**

In [None]:
engagement_by_product=pd.DataFrame(productsandengagement.groupby("Product Name")["engagement_index"].mean())
engagement_by_product=engagement_by_product.rename(columns={'engagement_index':'engagement index_average'})
engagement_by_product=engagement_by_product.sort_values(by='engagement index_average',ascending=False)

In [None]:
engagement_by_producttop10=engagement_by_product.head(10)
engagement_by_producttop10

In [None]:
engagement_by_producttop10.plot(kind='bar',color='yellowgreen',figsize=(17,17))
plt.title("engagement_average by Product",fontsize=20)
plt.xticks(rotation=20)

### **Factor: Function**

From the table, we can see clearly that some of function play an important role in user engagement, these functions include Learning Management Systems (LMS), School Management Software - SSO,   Content Creation & Curation, Classroom Engagement & Instruction - Assessment & Classroom Response, Virtual Classroom - Video Conferencing & Screen Sharing and Online Course Providers & Technical Skills Development. 
Similarly, these characteristics are also found in products which have the most engagement index. For example, Google Docs, Google Drive and Canvas have a content creation function. Google Classroom and ClassLink is a school management software, and has a virtual classroom function. Youtube is an online video sharing and social media platform, but a lot of useful courses can be found in it, which means it also provides online courses. 


### **pct_access (aggregated by function)**

In [None]:
pct_by_function=pd.DataFrame(productsandengagement.groupby("Primary Essential Function")["pct_access"].mean())
pct_by_function=pct_by_function.rename(columns={'pct_access':'pct_access_average'})
pct_by_function=pct_by_function.sort_values(by='pct_access_average',ascending=False)

In [None]:
pct_by_functiontop10=pct_by_function.head(10)
pct_by_functiontop10

In [None]:
pct_by_functiontop10.plot(kind='bar',color='yellowgreen',figsize=(17,17))
plt.title("Pct_access_average by products",fontsize=20)
plt.xticks(rotation=20)

### **engagement index (aggregated by function)**

In [None]:
engagement_by_function=pd.DataFrame(productsandengagement.groupby("Primary Essential Function")["engagement_index"].mean())
engagement_by_function=engagement_by_function.rename(columns={'engagement_index':'engagement_index_average'})
engagement_by_function=engagement_by_function.sort_values(by='engagement_index_average',ascending=False)

In [None]:
engagement_by_functiontop10=engagement_by_function.head(10)
engagement_by_functiontop10

In [None]:
engagement_by_functiontop10.plot(kind='bar',color='yellowgreen',figsize=(17,17))
plt.title("engagement_average by Product",fontsize=20)
plt.xticks(rotation=20)

### **Factor:Provider**

According to the engagement file, Top3 providers are Google LLC, IXL Learning and PBS. We can notice that Google LLC is the main provider in our dataset. The number of products provided by Google LLC is much higher than other companies , it is 11% higher than IXL Learning, which ranks number two.

### **pct_access (aggregated by provider)**

In [None]:
pct_by_provider=pd.DataFrame(productsandengagement.groupby(['Provider/Company Name'])['pct_access'].mean())
pct_by_provider=pct_by_provider.rename(columns={'pct_access':"pct_access_average"})
pct_by_provider=pct_by_provider.sort_values(by='pct_access_average',ascending=False)

In [None]:
pct_by_providertop10=pct_by_provider.head(10)
pct_by_providertop10

In [None]:
pct_by_providertop10.plot(kind='bar',color='yellowgreen',figsize=(17,17))
plt.title("Pct_access_average by provider",fontsize=20)
plt.xticks(rotation=20)

### **engagement index (aggregated by provider)**

In [None]:
engagement_by_provider=pd.DataFrame(productsandengagement.groupby("Provider/Company Name")["engagement_index"].mean())
engagement_by_provider=engagement_by_provider.rename(columns={'engagement_index':'engagement index_average'})
engagement_by_provider=engagement_by_provider.sort_values(by='engagement index_average',ascending=False)

In [None]:
engagement_by_providertop10=engagement_by_provider.head(10)

In [None]:
engagement_by_providertop10.plot(kind='bar',color='yellowgreen',figsize=(17,17))
plt.title("engagement_average by Provider",fontsize=20)
plt.xticks(rotation=20)

### **User engagement over time**

### **The change of pct_access over time**

In [None]:
productsandengagement['month'] = pd.DatetimeIndex(productsandengagement['time']).month

In [None]:
plt.figure(figsize = (17,17))
sns.lineplot(data=productsandengagement, x="month", y="pct_access",color='green')

### **The change of engagement_index over time**

In [None]:
plt.figure(figsize = (17,17))
sns.lineplot(data=productsandengagement, x="month", y="engagement_index",color='green')

## **5. Engagement index and districts**

### **Merging engagement index and Districts**

We can see that the pct_access_average and engagement_index in rural areas is much higher than other areas. 


In [None]:
engagement["district_id"] = engagement["district_id"].astype(str).astype(int)
districtsandengagement = pd.merge(districts, engagement, left_on='district_id', right_on='district_id')
districtsandengagement.head()

### **pct_access (aggregated by state)**

In [None]:
pct_by_state=districtsandengagement.groupby("state")["pct_access"].mean()
pct_by_state=pd.DataFrame(pct_by_state)
pct_by_state=pct_by_state.rename(columns={'pct_access':'pct_access_average'})
pct_by_state.sort_values(by='pct_access_average',ascending=False)
pct_by_state=pct_by_state.sort_values(by='pct_access_average',ascending=False)

In [None]:
pct_by_state.plot(kind='barh',color='yellowgreen',figsize=(17,17))
plt.title('Pct_Access_Average by State',fontsize=20)

### **pct_access (aggregated by locale)**

In [None]:
pct_by_locale=pd.DataFrame(districtsandengagement.groupby("locale")["pct_access"].mean())
pct_by_locale=pct_by_locale.rename(columns={'pct_access':'pct_access_average'})
pct_by_locale.sort_values(by='pct_access_average',ascending=False)
pct_by_locale=pct_by_locale.sort_values(by='pct_access_average',ascending=False)

In [None]:
pct_by_locale.plot(kind='barh',color='yellowgreen',figsize=(17,17))
plt.title('Pct_Access_Average by locale',fontsize=20)

### **The change of  pct_access over time (aggregated by locale)**

In [None]:
pct_by_locale_time=pd.DataFrame(districtsandengagement.groupby(["locale",'time'])["pct_access"].mean())
pct_by_locale_time=pct_by_locale_time.rename(columns={'pct_access':'pct_access_average'})

In [None]:
pct_by_locale_time.reset_index(level=0, inplace=True)
pct_by_locale_time.head()

In [None]:
pct_by_locale_time['month'] = pd.DatetimeIndex(pct_by_locale_time['time']).month

In [None]:
pct_by_locale_time.head()

In [None]:
pct_by_locale_time_city=pct_by_locale_time[pct_by_locale_time['locale']=='City']
pct_by_locale_time_suburb=pct_by_locale_time[pct_by_locale_time['locale']=='Suburb']
pct_by_locale_time_town=pct_by_locale_time[pct_by_locale_time['locale']=='Town']
pct_by_locale_time_rural=pct_by_locale_time[pct_by_locale_time['locale']=='Rural']

In [None]:
plt.figure(figsize = (17,17))
sns.lineplot(data=pct_by_locale_time_city, x="month", y="pct_access_average",color='green').set(title='change of pct_access in city')


In [None]:
plt.figure(figsize = (17,17))
sns.lineplot(data=pct_by_locale_time_suburb, x="month", y="pct_access_average",color='green').set(title='change of pct_access in suburb area')

In [None]:
plt.figure(figsize = (17,17))
sns.lineplot(data=pct_by_locale_time_town, x="month", y="pct_access_average",color='green').set(title='change of pct_access in Town area')

In [None]:
plt.figure(figsize = (17,17))
sns.lineplot(data=pct_by_locale_time_rural, x="month", y="pct_access_average",color='green').set(title='change of pct_access in rural area')

### **engagement_index (aggregated by state)**

In [None]:
engagement_by_state=districtsandengagement.groupby("state")["engagement_index"].mean()
engagement_by_state=pd.DataFrame(engagement_by_state)
engagement_by_state=engagement_by_state.rename(columns={'engagement_index':'engagement_index_average'})
engagement_by_state=engagement_by_state.sort_values(by='engagement_index_average',ascending=False)

In [None]:
engagement_by_state.plot(kind='barh',color='yellowgreen',figsize=(17,17))
plt.title('Engagement by State',fontsize=20)

### **engagement_index (aggregated by locale)**

In [None]:
engagement_by_locale=pd.DataFrame(districtsandengagement.groupby("locale")["engagement_index"].mean())
engagement_by_locale=engagement_by_locale.rename(columns={'engagement_index':'engagement_index_average'})
engagement_by_locale=engagement_by_locale.sort_values(by='engagement_index_average',ascending=False)

In [None]:
engagement_by_locale.plot(kind='barh',color='yellowgreen',figsize=(17,17))
plt.title('Engagement index by locale',fontsize=20)

### **The change of  engagement index over time (aggregated by locale)**

In [None]:
engagement_by_locale_time=pd.DataFrame(districtsandengagement.groupby(["locale",'time'])["engagement_index"].mean())
engagement_by_locale_time=engagement_by_locale_time.rename(columns={"engagement_index":'engagement_index'})

In [None]:
engagement_by_locale_time.reset_index(level=0, inplace=True)
engagement_by_locale_time['month']=pd.DatetimeIndex(engagement_by_locale_time['time']).month

In [None]:
engagement_by_locale_time_city=engagement_by_locale_time[engagement_by_locale_time['locale']=='City']
engagement_by_locale_time_suburb=engagement_by_locale_time[engagement_by_locale_time['locale']=='Suburb']
engagement_by_locale_time_town=engagement_by_locale_time[engagement_by_locale_time['locale']=='Town']
engagement_by_locale_time_rural=engagement_by_locale_time[engagement_by_locale_time['locale']=='Rural']

In [None]:
plt.figure(figsize = (17,17))
sns.lineplot(data=engagement_by_locale_time_city, x="month", y="engagement_index",color='green').set(title='change of engagement index in city')

In [None]:
plt.figure(figsize = (17,17))
sns.lineplot(data=engagement_by_locale_time_suburb, x="month", y="engagement_index",color='green').set(title='change of engagement index in suburb')

In [None]:
plt.figure(figsize = (17,17))
sns.lineplot(data=engagement_by_locale_time_town, x="month", y="engagement_index",color='green').set(title='change of engagement index in town')

In [None]:
plt.figure(figsize = (17,17))
sns.lineplot(data=engagement_by_locale_time_rural, x="month", y="engagement_index",color='green').set(title='change of engagement index in rural')

## **6. Conclusion**

### **Product design**
Digital learning platforms, Sites, Resources & Reference, and study tools are popular functions, but they are not the major factors that contribute to student engagement. The major functions that affect engagement are Learning Management, School Management, Content Creation & Curation, Classroom Engagement and response, Virtual Classroom, and Online courses. When designing digital learning tools, we may focus on these functions.

### **Marketing Promotion Schedule**
The overall trend for user engagement is similar in different locales. The chart reflects several trends. In January, user engagement continued to rise and culminated in May, then it continued to fall until July. Final exams before the holiday and the summer holiday are the possible reasons.

Usually, school starts in July to September, during this period, user engagement starts to grow and culminates in September. There was a slight drop from October to November, and then it continued to rise.

We may have some promotions from September to November because during this period, students don‘t use digital learning tools often but they still have classes.