### Kaggle Competition: LearnPlatform COVID-19 Impact on Digital Learning in the United States of America in 2020.


In this analytics competition, the work is to uncover trends in digital learning. Accomplish this with data analysis about how engagement with digital learning relates to factors like district demographics, broadband access, and state/national level policies and events. Then, propose the best solution to these educational inequities.

In this analytics challenge, we are given multiple .csv files. The districts_info.csv file contains information about each school district and the products_info.csv file contains information about the top 370 tools used for digital learning. For each school district, there is an additional file that contains the engagement for each tool for everyday in 2020. The files can be joined by the key columns district_id and lp_id.

#### importing libraries to be used in the project

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns #for making statistical graphics
import matplotlib.pyplot as plt #for ploting graphs
plt.rcParams.update({'font.size': 14})

import re

import plotly.express as px #for ploting graphs
import plotly.graph_objects as go #for ploting graphs
from plotly.subplots import make_subplots #for ploting graphs
import seaborn as sns
import plotly as py

import plotly.graph_objs as go
import matplotlib.pyplot as plt

import plotly.io as pio
from IPython.display import Image 

import warnings # for ignoring warnings 
warnings.filterwarnings("ignore")



import os


# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

#### Let's have a look of how our datasets look like and try to understand well what is given 

In [None]:
districts_info = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv")
products_info = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/products_info.csv")
engagement_sample = pd.read_csv("/kaggle/input/learnplatform-covid19-impact-on-digital-learning/engagement_data/5970.csv")

#### District information data

The district file districts_info.csv includes information about the characteristics of school districts, including data from NCES (2018-19), FCC (Dec 2018), and Edunomics Lab. In this data set.

The district file includes information about the characteristics of school districts:


**.district_id:** The unique identifier of the school district

**.state:** The state where the district resides in

**.locale:** NCES locale classification that categorizes U.S. territory into four types of areas: City, Suburban, Town, and Rural. See Locale Boundaries User's Manual for more information.
pct_black/hispanic: Percentage of students in the districts identified as Black or Hispanic based on 2018-19 NCES data

**.pct_free/reduced**: Percentage of students in the districts eligible for free or reduced-price lunch based on 2018-19 NCES data

**.countyconnectionsratio:** ratio (residential fixed high-speed connections over 200 kbps in at least one direction/households) based on the county level data from FCC From 477 (December 2018 version). See FCC data for more information.

**.pptotalraw:** Per-pupil total expenditure (sum of local and federal expenditure) from Edunomics Lab's National Education Resource Database on Schools (NERD$) project. The expenditure data are school-by-school, and we use the median value to represent the expenditure of a given school district.


dispalying the first 5 elements of districts.info and we can see that there are some districts with NON values

In [None]:

districts_info.head()

our districts dataframe conatins 7 columns with 233 rows in total

In [None]:
districts_info.info()

#### Product information data
The product file products_info.csv includes information about the characteristics of the top 372 products with most users in 2020. The categories listed in this file are part of LearnPlatform's product taxonomy. Data were labeled by our team. Some products may not have labels due to being duplicate, lack of accurate url or other reasons.

**Name** Description

**LP ID** The unique identifier of the product

**URL** Web Link to the specific product

**Product Name** Name of the specific product

**Provider/Company Name** Name of the product provider

**Sector(s)** Sector of education where the product is used

**Primary Essential Function** The basic function of the product. There are two layers of labels here. Products are first labeled as one of these three categories: LC = Learning & Curriculum, CM = Classroom Management, and SDO = School & District Operations. Each of these categories have multiple sub-categories with which the products were labeled

dispalying the first 5 elements of products_info

In [None]:

products_info.head()

our products dataframe contains 6 columns and 372 rows

In [None]:
products_info.info()

#### Engagement data

The engagement data are aggregated at school district level, and each file in the folder engagement_data represents data from one school district. The 4-digit file name represents district_id which can be used to link to district information in district_info.csv. The lp_id can be used to link to product information in product_info.csv.

**Name** Description

**time** date in "YYYY-MM-DD"

**lp_id** The unique identifier of the product

**pct_access** Percentage of students in the district have at least one page-load event of a given product and on a given day

**engagement_index** Total page-load events per one thousand students of a given product and on a given day

dispalying the first 5 elements of engagement sample (this is one of many engagement datasets we have for each district )

In [None]:
engagement_sample.head()

In [None]:
engagement_sample.info()

## data preprocessing 

#### HANDLING MISSING VALUES 

In [None]:
# how many missing values exist or better still what is the % of missing values in the dataset?
def percent_missing(df,name:str):
    '''df: the dataframe you want to calculate the missing values 
    name: the name of your dataframe '''

    # Calculate total number of cells in dataframe
    totalCells = np.product(df.shape)

    # Count number of missing values per column
    missingCount = df.isnull().sum()

    # Calculate total number of missing values
    totalMissing = missingCount.sum()

    # Calculate percentage of missing values
    print(name+' has', round(((totalMissing/totalCells) * 100), 2), "%", "missing values.")

In [None]:
percent_missing( districts_info, 'districts_info')

Now which column(s) has missing values

In [None]:
districts_info.isna().sum()

Dropping Districts with NaN States, we are left with a reduced districts_info dataframe with 176 districts 

In [None]:
districts_info = districts_info[districts_info.state.notna()].reset_index(drop=True)
districts_info.shape #shape of new dataframe with all states with NAN removed 

let us merge our remaining districts to their corresponding engagement data in one dataframe by adding the key column district_id to each engagement file



In [None]:
PATH = '../input/learnplatform-covid19-impact-on-digital-learning/engagement_data' 

temp = []

for district in districts_info.district_id.unique():
    df = pd.read_csv(f'{PATH}/{district}.csv', index_col=None, header=0)
    df["district_id"] = district
    temp.append(df)
    
    
engagement = pd.concat(temp)
engagement = engagement.reset_index(drop=True)

In [None]:
engagement.head()

checking if all nan values in engagement_index are caused by o.o or NAN of pct_access ( Percentage of students in the district have at least one page-load event of a given product and on a given day) and this will help us to know how we will handle our missing values in preprocessing

and as it can be seen NON valuesa are due to the lack of students in the district have at least one page-load event of a given product and on a given day

In [None]:
a = engagement.loc[(engagement['engagement_index'].isna()) ]# create a new dataframe with only nan values 
print('unique values in pct_access column are' ,a['pct_access'].unique()) # finding if they are other values in pct_access which might cause nan values in engagement_index 
a.head()

all NAN engagement_index values are now replaced by o

In [None]:
engagement['engagement_index'] = engagement['engagement_index'].fillna(0)
engagement.head()

let us check if we are still having unhandled missing values from our engagement dataframe and as result we get 0.01 % missing values from lp_id and pct_access 

In [None]:
percent_missing( engagement, 'engagement dataframe ')
engagement.isna().sum()

to get a dataframe with all handled NAN values let us drop rows with no lp_id and replace pct_access NAN with 0

In [None]:
engagement = engagement[engagement.lp_id.notnull()] #droping rowa with  lp_id missing values 
engagement['pct_access'] = engagement['pct_access'].fillna(0) #replace pct_access NAN with 0
percent_missing( engagement, 'engagement dataframe ') #calculating the % of remaining missing values in our dataframe


change time to datetime64[ns]  for easier handling

In [None]:
engagement.time = engagement.time.astype('datetime64[ns]')

splitting up the primary essential function into main and sub category

In [None]:
products_info['primary_function_main'] = products_info['Primary Essential Function'].apply(lambda x: x.split(' - ')[0] if x == x else x)
products_info['primary_function_sub'] = products_info['Primary Essential Function'].apply(lambda x: x.split(' - ')[1] if x == x else x)

# Synchronize similar values
products_info['primary_function_sub'] = products_info['primary_function_sub'].replace({'Sites, Resources & References' : 'Sites, Resources & Reference'})
products_info.drop("Primary Essential Function", axis=1, inplace=True)
products_info.head()

## DATA VISUALIZATION

#### DISTRICS 

we can see that we have a large number of districts from Utah(29 districts) and  'Connecticut(30) states=

In [None]:
state_abb = {
    'Alabama': 'AL',
    'Alaska': 'AK',
    'American Samoa': 'AS',
    'Arizona': 'AZ',
    'Arkansas': 'AR',
    'California': 'CA',
    'Colorado': 'CO',
    'Connecticut': 'CT',
    'Delaware': 'DE',
    'District Of Columbia': 'DC',
    'Florida': 'FL',
    'Georgia': 'GA',
    'Guam': 'GU',
    'Hawaii': 'HI',
    'Idaho': 'ID',
    'Illinois': 'IL',
    'Indiana': 'IN',
    'Iowa': 'IA',
    'Kansas': 'KS',
    'Kentucky': 'KY',
    'Louisiana': 'LA',
    'Maine': 'ME',
    'Maryland': 'MD',
    'Massachusetts': 'MA',
    'Michigan': 'MI',
    'Minnesota': 'MN',
    'Mississippi': 'MS',
    'Missouri': 'MO',
    'Montana': 'MT',
    'Nebraska': 'NE',
    'Nevada': 'NV',
    'New Hampshire': 'NH',
    'New Jersey': 'NJ',
    'New Mexico': 'NM',
    'New York': 'NY',
    'North Carolina': 'NC',
    'North Dakota': 'ND',
    'Northern Mariana Islands':'MP',
    'Ohio': 'OH',
    'Oklahoma': 'OK',
    'Oregon': 'OR',
    'Pennsylvania': 'PA',
    'Puerto Rico': 'PR',
    'Rhode Island': 'RI',
    'South Carolina': 'SC',
    'South Dakota': 'SD',
    'Tennessee': 'TN',
    'Texas': 'TX',
    'Utah': 'UT',
    'Vermont': 'VT',
    'Virgin Islands': 'VI',
    'Virginia': 'VA',
    'Washington': 'WA',
    'West Virginia': 'WV',
    'Wisconsin': 'WI',
    'Wyoming': 'WY'
}

districts_info['state_abb'] = districts_info['state'].map(state_abb)

fig = go.Figure()
layout = dict(
    title_text = "distribution of districts in the available States",
    title_font = dict(
            family = "monospace",
            size = 25,
            color = "black"
            ),
    geo_scope = 'usa'
)

fig.add_trace(
    go.Choropleth(
        locations = districts_info['state_abb'].value_counts().to_frame().reset_index()['index'],
        zmax = 1,
        z = districts_info['state_abb'].value_counts().to_frame().reset_index()['state_abb'],
        locationmode = 'USA-states',
        marker_line_color = 'white',
        geo = 'geo',
        colorscale = "cividis", 
    )
)
         
fig.update_layout(layout)   
fig.show()

In [None]:
def plot_bar(df,column:str,title:str,y_label:str,x_label:str):
    fig = px.bar(df[column].value_counts().reset_index(), x = 'index', y = column,
            text= df[column].value_counts().to_frame().reset_index()[column],
            labels={column:y_label,'index':x_label},
            title=title)
    fig.update_traces(marker_color='#8CC0FF')
    fig.show()
    
def plot_count(df,column:str,hue:str,title:str):
    plt.figure(figsize=(16,9))
    plt.xticks(rotation=90)

    sns.countplot(data=df, x=column, hue = hue).set(title= title)

In [None]:
 plot_bar(districts_info,'state','Number of Districts in the Available States','number of districts','states')

now let us have a look on how sectors are distributed in our states. we can see that we have many suburb and fue of town and city sectors 

In [None]:
plot_count(districts_info,'state',"locale",'distribution of area in available states')

we have104 districts from suburb sector, 33 districts are rural sectors, 29 are districts from city sector and 10 are town districts  

In [None]:
 plot_bar(districts_info,'locale','Number of Districts in Each Type of Area','number of districts','Area')

Cities have highest percenatge of people who cleiam them to be Black/Hispanic whereas its lowest for towns and rural areas.


In [None]:
plot_count(districts_info,'locale',"pct_black/hispanic",'distribution ofpct_black/hispanic in available Area')

In cities there are maximum numer of people who are eleigible for free/reducded lunch whereas it is opposite in case of rural areas.
Govt. spends mostly in rural areas for students developement and lowest in towns.


In [None]:
plot_count(districts_info,'locale',"pct_free/reduced",'distribution pct_free/reduced in available Area')

In [None]:
plot_count(districts_info,'locale',"county_connections_ratio",'distribution county_connections_ratio in available Area')

we can see now that suburbs and rural areas are the ones wth the higest distribution pp_total_raw in available Area

In [None]:
plot_count(districts_info,'locale',"pp_total_raw",'distribution pp_total_raw in available Area')

#### PRODUCTS

In [None]:
print('we have {} proucts used for studies, let us see the 20 most used in this project '.format(products_info['Product Name'].nunique()))


In [None]:
engagement.rename(columns={"lp_id": "LP ID"}, inplace=True)
merged=pd.merge(engagement, products_info, on= "LP ID")
merged = pd.merge(merged, districts_info, on = 'district_id')
# merged.to_csv('submission.csv')


In [None]:
merged.head()

In [None]:
a=merged.groupby("Product Name")["pct_access"].mean().sort_values(ascending=False).head(20)
b=merged.groupby("Product Name")["engagement_index"].sum().sort_values(ascending=False).head(20)

# plot
plt.figure(figsize=(15,4))

plt.subplot(121)
plt.bar(a.index, a.values, color=["#6930c3","#5e60ce","#0096c7","#48cae4","#ade8f4","#ff7f51","#ff9b54","#ffbf69"])
plt.xlabel('Product Name')
plt.xticks(rotation=90)
plt.ylabel('Mean percentage of students')
plt.title("top 20 used products per pct_access")

plt.subplot(122)
plt.bar(b.index, b.values, color=["#4f000b","#720026","#ce4257","#ff7f51","#ff9b54"])
plt.xlabel('Product Name')
plt.xticks(rotation=90)
plt.ylabel('Page-load per 1000 students')
plt.title("top 20 used products perengagement_index")

In [None]:
c=merged.groupby("Provider/Company Name")["pct_access"].mean().sort_values(ascending=False).head(20)
d=merged.groupby("Provider/Company Name")["engagement_index"].sum().sort_values(ascending=False).head(20)

# plot
plt.figure(figsize=(15,4))

plt.subplot(121)
plt.bar(a.index, a.values, color=["#6930c3","#5e60ce","#0096c7","#48cae4","#ade8f4","#ff7f51","#ff9b54","#ffbf69"])
plt.xlabel('Product provider')
plt.xticks(rotation=90)
plt.ylabel('Mean percentage of students')
plt.title("top 20 product providers per pct_access")

plt.subplot(122)
plt.bar(b.index, b.values, color=["#4f000b","#720026","#ce4257","#ff7f51","#ff9b54"])
plt.xlabel('Product provider')
plt.xticks(rotation=90)
plt.ylabel('Page-load per 1000 students')
plt.title("top 20 product providers per engagement_index")

In [None]:
fig = px.pie(merged.query("primary_function_main != 'x'")['primary_function_main'].value_counts().
             reset_index().rename(columns = {'primary_function_main': 'count'}), 
             values = 'count', names = 'index', width = 700, height = 700,
            title="Count of Products by primary_function_main")
fig.show()


In [None]:
fig = px.pie(merged.query("primary_function_sub != 'x'")['primary_function_sub'].value_counts().
             reset_index().rename(columns = {'primary_function_sub': 'count'}), 
             values = 'count', names = 'index', width = 700, height = 700,
            title="Count of Products by primary_function_sub")
fig.show()


In [None]:
plot_bar(products_info,'Sector(s)','Number of products available to each sector','number of products','Sector(s)')

### ENGANGEMENTS

In [None]:
#CODES HAVE BEEN TAKEN FROM :https://www.kaggle.com/fumbanibanda/eda-covid19
st_acсess = merged.groupby(['state', 'time']).agg({'pct_access': 'mean'}).reset_index()
st_eng = merged.groupby(['state', 'time']).agg({'engagement_index': 'mean'}).reset_index()
loc_acсess = merged.groupby(['locale', 'time']).agg({'pct_access': 'mean'}).reset_index()
loc_eng = merged.groupby(['locale', 'time']).agg({'engagement_index': 'mean'}).reset_index()
cat_acсess = merged.groupby(['primary_function_main', 'time']).agg({'pct_access': 'mean'}).reset_index()
cat_eng = merged.groupby(['primary_function_main', 'time']).agg({'engagement_index': 'mean'}).reset_index()
cat_eng.to_csv('submission.csv')
for i in [st_acсess, st_eng, loc_acсess, loc_eng, cat_acсess, cat_eng]:
    i['day_of_week'] = i['time'].dt.dayofweek
    
loc_acсess.head(3)

In [None]:
#This code has bee taken from:
#https://www.kaggle.com/dmitryuarov/eda-covid-19-impact-on-digital-learning/notebook
fig = px.line(st_acсess, x="time", y="pct_access", color="state", line_group="state")

fig.update_layout(plot_bgcolor = 'white', title = 'Dynamics of pct_access of all products by states', 
                  title_font_size = 20, title_x = 0.5)
fig.update_xaxes(showline = True, linecolor = '#f5f2f2', linewidth = 2,tickfont_size = 12)
fig.update_yaxes(showline = True, linecolor = '#f5f2f2', 
                 showgrid = True, gridwidth = 1, gridcolor = '#f5f2f2',
                 linewidth = 2,tickfont_size = 12)

fig.add_vline(x = '2020-03-11', line_width = 3, line_color="red")

fig.add_annotation(
        x='2020-03-11',
        y=2.7,
        text="WHO has declared Covid-19 as pandemic",
        showarrow=True,
        font=dict(
            family="monospace",
            size=11,
            color="black"
            ),
        arrowhead=2,
        arrowsize=1,
        arrowwidth=2,
        arrowcolor="#636363",
        ax= 130,
        ay=1
        )

fig.add_vrect(x0="2020-06-01", x1="2020-08-31", fillcolor="yellow", opacity=0.25, line_width=0)

fig.add_annotation(
        x='2020-07-15',
        y=2.25,
        text="Summer holidays",
        showarrow=False,
        font=dict(
            size=11,
            )
        )

fig.update_traces(line_width=1)

fig.show()

In [None]:
fig = px.line(st_eng, x="time", y="engagement_index", color="state", line_group="state")

fig.update_layout(plot_bgcolor = 'white', title = 'Dynamics of engagement index of all products by states', 
                  title_font_size = 20, title_x = 0.5)
fig.update_xaxes(showline = True, linecolor = '#f5f2f2', linewidth = 2,tickfont_size = 12)
fig.update_yaxes(showline = True, linecolor = '#f5f2f2', 
                 showgrid = True, gridwidth = 1, gridcolor = '#f5f2f2',
                 linewidth = 2,tickfont_size = 12)

fig.add_vline(x = '2020-03-11', line_width = 3, line_color="red")

fig.add_annotation(
        x='2020-03-11',
        y=1150,
        text="WHO has declared Covid-19 a pandemic",
        showarrow=True,
        font=dict(
            size=11
            ),
        arrowhead=2,
        arrowsize=1,
        arrowwidth=2,
        arrowcolor="#636363",
        ax= 130,
        ay=1
        )

fig.add_vrect(x0="2020-06-01", x1="2020-08-31", fillcolor="yellow", opacity=0.25, line_width=0)

fig.add_annotation(
        x='2020-07-15',
        y=900,
        text="Summer holidays",
        showarrow=False,
        font=dict(
            size=11,
            )
        )

fig.update_traces(line_width=1)

fig.show()

In [None]:
months_map = {1:"January",2:"February",3:"March",4:"April",
              5:"May",6:"June",7:"July",8:"August",9:"September",
              10:"October",11:"November",12:"December"}

for i in [st_acсess, st_eng]:
    i['state_abb'] = i['state'].map(state_abb)
    i['month'] = i.time.dt.month.map(months_map)

    fig = px.choropleth(data_frame = i.groupby(['state', 'state_abb', 'month']).agg({i.columns[2]: 'mean'}).reset_index(), locations = "state_abb", locationmode = "USA-states",
                    color = i.groupby(['state', 'state_abb', 'month']).agg({i.columns[2]: 'mean'}).reset_index()[i.groupby(['state', 'state_abb', 'month']).agg({i.columns[2]: 'mean'}).reset_index().columns[3]], scope = "usa",
                    color_continuous_scale = "viridis", animation_frame = "month", hover_name = "state")
    
    fig.update_layout(title_text = f'Monthly dynamics of {i.columns[2]}', title_font = dict(size = 25,color = "black")) 
    
    fig.show()

In [None]:

fig = px.line(loc_acсess, x="time", y="pct_access", color="locale", line_group="locale")

fig.update_layout(plot_bgcolor = 'white', title = 'Dynamics of pct_access of all sectors', 
                  title_font_size = 20, title_x = 0.5)
fig.update_xaxes(showline = True, linecolor = '#f5f2f2', linewidth = 2,tickfont_size = 12)
fig.update_yaxes(showline = True, linecolor = '#f5f2f2', 
                 showgrid = True, gridwidth = 1, gridcolor = '#f5f2f2',
                 linewidth = 2,tickfont_size = 12)

fig.add_vline(x = '2020-03-11', line_width = 3, line_color="red")

fig.add_annotation(
        x='2020-03-11',
        y=2.7,
        text="WHO has declared Covid-19 as pandemic",
        showarrow=True,
        font=dict(
            family="monospace",
            size=11,
            color="black"
            ),
        arrowhead=2,
        arrowsize=1,
        arrowwidth=2,
        arrowcolor="#636363",
        ax= 130,
        ay=1
        )

fig.add_vrect(x0="2020-06-01", x1="2020-08-31", fillcolor="yellow", opacity=0.25, line_width=0)

fig.add_annotation(
        x='2020-07-15',
        y=2.25,
        text="Summer holidays",
        showarrow=False,
        font=dict(
            size=11,
            )
        )

fig.update_traces(line_width=1)

fig.show()

In [None]:
fig = px.line(loc_eng, x="time", y="engagement_index", color="locale", line_group="locale")
fig.update_layout(plot_bgcolor = 'white', title = 'Dynamics of engagement_index of all sectors', 
                  title_font_size = 20, title_x = 0.5)
fig.update_xaxes(showline = True, linecolor = '#f5f2f2', linewidth = 2,tickfont_size = 12)
fig.update_yaxes(showline = True, linecolor = '#f5f2f2', 
                 showgrid = True, gridwidth = 1, gridcolor = '#f5f2f2',
                 linewidth = 2,tickfont_size = 12)

fig.add_vline(x = '2020-03-11', line_width = 3, line_color="red")

fig.add_annotation(
        x='2020-03-11',
        y=2.7,
        text="WHO has declared Covid-19 as pandemic",
        showarrow=True,
        font=dict(
            family="monospace",
            size=11,
            color="black"
            ),
        arrowhead=2,
        arrowsize=1,
        arrowwidth=2,
        arrowcolor="#636363",
        ax= 130,
        ay=1
        )

fig.add_vrect(x0="2020-06-01", x1="2020-08-31", fillcolor="yellow", opacity=0.25, line_width=0)

fig.add_annotation(
        x='2020-07-15',
        y=2.25,
        text="Summer holidays",
        showarrow=False,
        font=dict(
            size=11,
            )
        )

fig.update_traces(line_width=1)

fig.show()

In [None]:
fig = px.line(cat_eng, x="time", y="engagement_index", color="primary_function_main", line_group="primary_function_main")

fig.update_layout(plot_bgcolor = 'white', title = 'Dynamics of engagement_index of all primary_function', 
                  title_font_size = 20, title_x = 0.5)
fig.update_xaxes(showline = True, linecolor = '#f5f2f2', linewidth = 2,tickfont_size = 12)
fig.update_yaxes(showline = True, linecolor = '#f5f2f2', 
                 showgrid = True, gridwidth = 1, gridcolor = '#f5f2f2',
                 linewidth = 2,tickfont_size = 12)

fig.add_vline(x = '2020-03-11', line_width = 3, line_color="red")

fig.add_annotation(
        x='2020-03-11',
        y=2.7,
        text="WHO has declared Covid-19 as pandemic",
        showarrow=True,
        font=dict(
            family="monospace",
            size=11,
            color="black"
            ),
        arrowhead=2,
        arrowsize=1,
        arrowwidth=2,
        arrowcolor="#636363",
        ax= 130,
        ay=1
        )

fig.add_vrect(x0="2020-06-01", x1="2020-08-31", fillcolor="yellow", opacity=0.25, line_width=0)

fig.add_annotation(
        x='2020-07-15',
        y=2.25,
        text="Summer holidays",
        showarrow=False,
        font=dict(
            size=11,
            )
        )

fig.update_traces(line_width=1)

fig.show()

In [None]:

fig = px.line(cat_acсess, x="time", y="pct_access", color="primary_function_main", line_group="primary_function_main")

fig.update_layout(plot_bgcolor = 'white', title = 'Dynamics of pct_access of all primary_function', 
                  title_font_size = 20, title_x = 0.5)
fig.update_xaxes(showline = True, linecolor = '#f5f2f2', linewidth = 2,tickfont_size = 12)
fig.update_yaxes(showline = True, linecolor = '#f5f2f2', 
                 showgrid = True, gridwidth = 1, gridcolor = '#f5f2f2',
                 linewidth = 2,tickfont_size = 12)

fig.add_vline(x = '2020-03-11', line_width = 3, line_color="red")

fig.add_annotation(
        x='2020-03-11',
        y=2.7,
        text="WHO has declared Covid-19 as pandemic",
        showarrow=True,
        font=dict(
            family="monospace",
            size=11,
            color="black"
            ),
        arrowhead=2,
        arrowsize=1,
        arrowwidth=2,
        arrowcolor="#636363",
        ax= 130,
        ay=1
        )

fig.add_vrect(x0="2020-06-01", x1="2020-08-31", fillcolor="yellow", opacity=0.25, line_width=0)

fig.add_annotation(
        x='2020-07-15',
        y=2.25,
        text="Summer holidays",
        showarrow=False,
        font=dict(
            size=11,
            )
        )

fig.update_traces(line_width=1)

fig.show()

In [None]:
cov_imp = pd.DataFrame(st_acсess['state'].unique().tolist()).rename(columns = {0: 'state'})

# We have no information about Texas during the start of pandemic
cov_imp = cov_imp.query("state != 'Texas'").reset_index()
cov_imp.drop('index', axis = 1, inplace = True)

for i in ['mean_access', '1w_acess_change%', '2w_acess_change%', 'mean_eng', '1w_eng_change%', '2w_eng_change%']:
    cov_imp[i] = 0.0

states = cov_imp['state'].unique().tolist()

for i in states:
    cov_imp['mean_access'][states.index(i)] = round(st_acсess.query("time >= '2020-03-09' & time <= '2020-03-13' & state == @i")['pct_access'].mean(), 2)
    cov_imp['1w_acess_change%'][states.index(i)] = round((st_acсess.query("time >= '2020-03-16' & time <= '2020-03-20' & state == @i")['pct_access'].mean() / cov_imp['mean_access'][states.index(i)] - 1) * 100, 1)
    cov_imp['2w_acess_change%'][states.index(i)] = round((st_acсess.query("time >= '2020-03-23' & time <= '2020-03-27' & state == @i")['pct_access'].mean() / st_acсess.query("time >= '2020-03-16' & time <= '2020-03-20' & state == @i")['pct_access'].mean() - 1) * 100, 1)
    cov_imp['mean_eng'][states.index(i)] = round(st_eng.query("time >= '2020-03-09' & time <= '2020-03-13' & state == @i")['engagement_index'].mean(), 1)
    cov_imp['1w_eng_change%'][states.index(i)] = round((st_eng.query("time >= '2020-03-16' & time <= '2020-03-20' & state == @i")['engagement_index'].mean() / cov_imp['mean_eng'][states.index(i)] - 1) * 100, 1)
    cov_imp['2w_eng_change%'][states.index(i)] = round((st_eng.query("time >= '2020-03-23' & time <= '2020-03-27' & state == @i")['engagement_index'].mean() / st_eng.query("time >= '2020-03-16' & time <= '2020-03-20' & state == @i")['engagement_index'].mean() - 1) * 100, 1)

def color_values(val):
    color = 'red' if val < 0 else 'green'
    return 'color: %s' % color

slice_ = ['1w_acess_change%', '2w_acess_change%', '1w_eng_change%', '2w_eng_change%']
slice_2 = ['mean_access', '1w_acess_change%', '2w_acess_change%']
slice_3 = ['mean_eng', '1w_eng_change%', '2w_eng_change%']
cov_imp.style.applymap(color_values, subset = slice_).set_precision(1).set_properties(**{'background-color': '#fafafa'}, subset=slice_2).set_properties(**{'background-color': '#f7f7f7'}, subset=slice_3)

In [None]:
cov_imp2 = pd.DataFrame(loc_acсess['locale'].unique().tolist()).rename(columns = {0: 'locale'})

for i in ['mean_access', '1w_acess_change%', '2w_acess_change%', 'mean_eng', '1w_eng_change%', '2w_eng_change%']:
    cov_imp2[i] = 0.0

locales = cov_imp2['locale'].unique().tolist()

for i in locales:
    cov_imp2['mean_access'][locales.index(i)] = round(loc_acсess.query("time >= '2020-03-09' & time <= '2020-03-13' & locale == @i")['pct_access'].mean(), 2)
    cov_imp2['1w_acess_change%'][locales.index(i)] = round((loc_acсess.query("time >= '2020-03-16' & time <= '2020-03-20' & locale == @i")['pct_access'].mean() / cov_imp2['mean_access'][locales.index(i)] - 1) * 100, 1)
    cov_imp2['2w_acess_change%'][locales.index(i)] = round((loc_acсess.query("time >= '2020-03-23' & time <= '2020-03-27' & locale == @i")['pct_access'].mean() / loc_acсess.query("time >= '2020-03-16' & time <= '2020-03-20' & locale == @i")['pct_access'].mean() - 1) * 100, 1)
    cov_imp2['mean_eng'][locales.index(i)] = round(loc_eng.query("time >= '2020-03-09' & time <= '2020-03-13' & locale == @i")['engagement_index'].mean(), 1)
    cov_imp2['1w_eng_change%'][locales.index(i)] = round((loc_eng.query("time >= '2020-03-16' & time <= '2020-03-20' & locale == @i")['engagement_index'].mean() / cov_imp2['mean_eng'][locales.index(i)] - 1) * 100, 1)
    cov_imp2['2w_eng_change%'][locales.index(i)] = round((loc_eng.query("time >= '2020-03-23' & time <= '2020-03-27' & locale == @i")['engagement_index'].mean() / loc_eng.query("time >= '2020-03-16' & time <= '2020-03-20' & locale == @i")['engagement_index'].mean() - 1) * 100, 1)

cov_imp2.style.applymap(color_values, subset = slice_).set_precision(1).set_properties(**{'background-color': '#fafafa'}, subset=slice_2).set_properties(**{'background-color': '#f7f7f7'}, subset=slice_3)