In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objs as go
import plotly.offline as py


import warnings
warnings.simplefilter(action='ignore', category=Warning)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

#Rethinking Family Engagement During School Closures

Author: DR. RACHAEL MAHMOOD - APRIL 27, 2020

"Taking time to check assumptions about family engagement can make a huge difference in the lives of your students and their caregivers."

"Research confirms that family involvement positively impacts students’ academic experiences. And in this moment of crisis, especially, engaging our students means engaging their families: Including caretakers is one way to support our students from a distance."

"More often, however, the examples caregivers used to describe their engagement did not align with what educators frequently identify as family involvement. Parents and guardians described teaching cultural lessons and supporting social emotional learning, for example. They engaged with their children’s schooling by relocating to change school districts and navigating social services to ensure their children’s needs were met."

"When designing engaging lessons, even the most well-intentioned teachers can make incorrect assumptions about family involvement. Not all caregivers are home with their children during this distance learning time. Many students are home alone, are being cared for by older children or are taking care of younger children. Caregivers who are at home may be adjusting to working remotely. Only a small percentage of students have a caregiver free of other responsibilities who can dedicate time to guiding them through their learning packets or navigating websites."

https://www.learningforjustice.org/magazine/rethinking-family-engagement-during-school-closures

#COVID-19 and School Closures - One year of education disruption

![](https://reliefweb.int/sites/reliefweb.int/files/styles/report-small/public/resources-pdf-previews/1557862-COVID19-and-school-closures.png?itok=Rc_AOJ4g)

In [None]:
df=pd.read_csv('/kaggle/input/cusersmarildownloadsattendancecsv/attendance.csv',encoding ='ISO-8859-1',sep=";")
df.tail()

In [None]:
df.isnull().sum()

"Of course caregivers want to be able to help their children. Many may feel guilt that they cannot be there to help with schoolwork or frustration that they can’t navigate the technology their children are required to use. "

"Educators can increase opportunities for family engagement by being flexible. Providing students with menus of activities that they can pick from will help them to be more independent. And many caregivers prefer having a week’s worth of activities up front so that they can plan around their work schedules"

"Instead of expecting caregivers to become teachers, educators can complement family engagement by partnering with caregivers to address children’s social and emotional needs. For families, educators can be the person who checks in on their child. Teachers can be there to talk to their child about all that is going on when they are busy trying to make ends meet. Teachers can be an advocate for their child when there is a need. "

"It’s OK to put the worksheets down. It’s OK to spend time checking on your students. Families trust your expertise—use it to ease their stress rather than add to it during this crisis. Assure parents that you are there to support their children and that their children won’t miss any learning opportunities that cannot be made up when this moment passes."

https://www.learningforjustice.org/magazine/rethinking-family-engagement-during-school-closures

In [None]:
corr=df[df.columns.sort_values()].corr()
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

fig = go.Figure(data=go.Heatmap(z=corr.mask(mask),
                                x=corr.columns.values,
                                y=corr.columns.values,
                                xgap=1, ygap=1,
                                colorscale="Rainbow",
                                colorbar_thickness=20,
                                colorbar_ticklen=3,
                                zmid=0),
                layout = go.Layout(title_text='Correlation Matrix', template='plotly_dark',
                height=900,
                xaxis_showgrid=False,
                yaxis_showgrid=False,
                yaxis_autorange='reversed'))
fig.show()

In [None]:
#bivariate analysis male/poorest_20perc
sns.jointplot(x = 'male', y = 'poorest_20perc', data = df, kind = 'reg');

In [None]:
from scipy.stats import norm
#histogram
sns.distplot(df['poorest_20perc'], fit = norm)
plt.title('20 percent of the Poorest children');

In [None]:
y = df.loc[~df.total.isnull()][['total', 'male', 'female']]

y_log=np.log1p(y)

In [None]:
#Code by Oniel Gracious https://www.kaggle.com/onielg/simplecatboostwithclassifierchains/notebook

#histogram and normal probability plot
from scipy import stats

from scipy.stats import norm
target=['total','male','female']
for t in target:
    fig = plt.figure()
    sns.distplot(y_log[t], fit=norm);
    res = stats.probplot(y_log[t], plot=plt)

In [None]:
#Code by Rossin Endrew https://www.kaggle.com/endrewrossin/fast-initial-lightgbm-model-to-detect-exam-result/comments

import shap
import lightgbm as lgb
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import KFold
import random

In [None]:
SEED = 99
random.seed(SEED)
np.random.seed(SEED)

In [None]:
dfmodel = df.copy()

# read the "object" columns and use labelEncoder to transform to numeric
for col in dfmodel.columns[dfmodel.dtypes == 'object']:
    le = LabelEncoder()
    dfmodel[col] = dfmodel[col].astype(str)
    le.fit(dfmodel[col])
    dfmodel[col] = le.transform(dfmodel[col])

In [None]:
X = dfmodel.drop(['poorest_20perc','attendance'], axis = 1)
y = dfmodel['poorest_20perc']

In [None]:
#Code by Rossin Endrew https://www.kaggle.com/endrewrossin/fast-initial-lightgbm-model-to-detect-exam-result/comments

lgb_params = {
                    'objective':'binary',
                    'metric':'auc',
                    'n_jobs':-1,
                    'learning_rate':0.005,
                    'num_leaves': 20,
                    'max_depth':-1,
                    'subsample':0.9,
                    'n_estimators':2500,
                    'seed': SEED,
                    'early_stopping_rounds':100, 
                }

In [None]:
#Code by Rossin Endrew https://www.kaggle.com/endrewrossin/fast-initial-lightgbm-model-to-detect-exam-result/comments

# choose the number of folds, and create a variable to store the auc values and the iteration values.
K = 5
folds = KFold(K, shuffle = True, random_state = SEED)
best_scorecv= 0
best_iteration=0

# Separate data in folds, create train and validation dataframes, train the model and cauculate the mean AUC.
for fold , (train_index,test_index) in enumerate(folds.split(X, y)):
    print('Fold:',fold+1)
          
    X_traincv, X_testcv = X.iloc[train_index], X.iloc[test_index]
    y_traincv, y_testcv = y.iloc[train_index], y.iloc[test_index]
    
    train_data = lgb.Dataset(X_traincv, y_traincv)
    val_data   = lgb.Dataset(X_testcv, y_testcv)
    
    LGBM = lgb.train(lgb_params, train_data, valid_sets=[train_data,val_data], verbose_eval=250)
    best_scorecv += LGBM.best_score['valid_1']['auc']
    best_iteration += LGBM.best_iteration

best_scorecv /= K
best_iteration /= K
print('\n Mean AUC score:', best_scorecv)
print('\n Mean best iteration:', best_iteration)

In [None]:
#Code by Rossin Endrew https://www.kaggle.com/endrewrossin/fast-initial-lightgbm-model-to-detect-exam-result/comments

lgb_params = {
                    'objective':'binary',
                    'metric':'auc',
                    'n_jobs':-1,
                    'learning_rate':0.05,
                    'num_leaves': 20,
                    'max_depth':-1,
                    'subsample':0.9,
                    'n_estimators':round(best_iteration),
                    'seed': SEED,
                    'early_stopping_rounds':None, 
                }

train_data_final = lgb.Dataset(X, y)
LGBM = lgb.train(lgb_params, train_data)

In [None]:
print(LGBM)

In [None]:
# telling wich model to use
explainer = shap.TreeExplainer(LGBM)
# Calculating the Shap values of X features
shap_values = explainer.shap_values(X)

In [None]:
shap.summary_plot(shap_values[1], X, plot_type="bar")

 A variable importance plot lists the most significant variables in descending order. The top variables contribute more to the model than the bottom ones and thus have high predictive power.
 
 https://towardsdatascience.com/explain-your-model-with-the-shap-values-bc36aac4de3d

#The SHAP value plot

The SHAP value plot can further show the positive and negative relationships of the predictors with the target variable. The code shap.summary_plot(shap_values, X_train)produces the following plot.

https://towardsdatascience.com/explain-your-model-with-the-shap-values-bc36aac4de3d

In [None]:
shap.summary_plot(shap_values[1], X)

"The plot above is made of all the dots in the train data. It demonstrates the following information:
Feature importance: Variables are ranked in descending order."

"Impact: The horizontal location shows whether the effect of that value is associated with a higher or lower prediction."

"Original value: Color shows whether that variable is high (in red) or low (in blue) for that observation."

"Correlation: A high level of the “alcohol” content has a high and positive impact on the quality rating. The “high” comes from the red color, and the “positive” impact is shown on the X-axis. Similarly, we will say the “richest-20perc” is negatively correlated with the target variable.

https://towardsdatascience.com/explain-your-model-with-the-shap-values-bc36aac4de3d

In [None]:
df.dtypes