#Situation Overview and Humanitarian Needs

Senegal reported its first confirmed COVID-19 case on March 2. As of June 22, there were 5970 cases, 3953 fully recovered and 86 deceased persons with a significant increase in cases in the last weeks. 51 out of 79 health districts are now affected but with the highest concentration of cases in the regions of Dakar, Thiès, Diourbel and Sédhiou. The Senegalese government is leading the response and prevention work with support of key partners including UNICEF. Many preventive measures are in place including a state of national emergency, school closure, night curfew and closed borders. In mid-June the government lifted a few restrictions, such as the ban on inter-regional travel and the re-opening of some public spaces. Schools are expected to partially reopen on 25 June, starting with examination classes.

Recently, the government shifted its treatment strategy of persons having tested positive to COVID-19. From treating all positive cases in hospitals, those with no or only light symptoms are now treated in other centres (mainly hotels). UNICEF provides technical support to the Ministry of Health in this process.

A large delivery of personal protective equipment, thermometers, oxygen concentrators and hygiene items that recently arrived in the country by UNICEF is now being distributed to the regions most in need. They continue working closely with the Ministry of Education to prepare schools for the re-opening of exam classes (planned for 25 June) by providing for example hygiene supplies and handwashing stands.

They also continue supporting the return to families for the ‘talibés’ children in the daaras (coranic schools) within the coordinated initiative "Zero street children in the context of COVID-19". UNICEF contributed with food support to children in the daaras, continued to find alternative care to children without parental care and to provide psychosocial care to all vulnerable children. https://reliefweb.int/report/senegal/senegal-covid-19-situation-report-06-29-may-22-june-2020

#Scouts du Senegal

![](https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcQKxvPPzXQpGL-PXzh781eq7-UC62jda-KYrw&usqp=CAU)m.facebook.com

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as py
import plotly.express as px

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

#Senegal Uses U.S.-Donated Field Hospitals in Fight Against COVID-19 

In October 2019, the United States donated two deployable field hospitals to the Senegalese Armed Forces through the Africa Peacekeeping Rapid Response Partnership.  The $6.5 million dollar donation of the hospitals and associated medical training is designed to support the Senegalese military by offering heightened medical capacity in the field.  This investment is already paying dividends in the form of ensuring access to healthcare and treatment for the people of Senegal as one these hospitals, and the trained medical personnel, has been deployed to the Senegalese city of Touba to assist in the emergency response to the Covid-19 outbreak.

![](https://d2v9ipibika81v.cloudfront.net/uploads/sites/209/cov1-1.jpg)https://sn.usembassy.gov/senegal-uses-u-s-donated-field-hospitals-in-fight-against-covid-19/

In [None]:
import matplotlib.gridspec as gridspec
from scipy.stats import skew
from sklearn.preprocessing import RobustScaler,MinMaxScaler
from scipy import stats
import matplotlib.style as style
style.use('seaborn-colorblind')

In [None]:
df = pd.read_csv('../input/hackathon/task_2-owid_covid_data-21_June_2020.csv')
df.head()

In [None]:
senegal = df[(df['location']=='Senegal')].reset_index(drop=True)
senegal.head()

In [None]:
# Distribution of different type of amount
fig , ax = plt.subplots(1,3,figsize = (12,5))

total_cases = senegal.total_cases.values
total_deaths = senegal.total_deaths.values
total_cases_per_million = senegal.total_cases_per_million.values

sns.distplot(total_cases , ax = ax[0] , color = 'blue').set_title('Senegal Covid19 Total Cases' , fontsize = 14)
sns.distplot(total_deaths , ax = ax[1] , color = 'cyan').set_title('Senegal Covid19 Deaths' , fontsize = 14)
sns.distplot(total_cases_per_million , ax = ax[2] , color = 'purple').set_title('Senegal Covid19 Cases per Milion' , fontsize = 14)

plt.show()

In [None]:
fig , ((ax1,ax2),(ax3,ax4)) = plt.subplots(nrows = 2, ncols = 2, figsize = (14,6))

sns.violinplot(x = 'total_cases' , y = 'new_cases' , data = senegal , ax = ax1 , palette = 'Set2')
sns.violinplot(x = 'total_cases' , y = 'total_deaths' , data = senegal , ax = ax2 , palette = 'Set2')
sns.boxplot(x = 'total_cases' , y = 'total_cases_per_million', data = senegal, ax = ax3 , palette = 'Set2')
sns.boxplot(x = 'total_cases',y = 'new_cases_per_million', data = senegal, ax = ax4, palette = 'Set2')

In [None]:
n_r=0.6                # Remove Null value ratio more than n_r. For example 0.6 means if column null ratio more than %60 then remove column
s_r=0.50               # If skewness more than %75 transform column to get normal distribution
c_r=1                  # Remove correlated columns
n_f= df.shape[1]  # n_f number of features. dataset.shape[1] means all columns. If you change it to 10, it will select 10 most correlated feature
r_s=42                  # random seed

#Codes from Mehmet Sungur https://www.kaggle.com/medyasun/house-price-all-regressor-algorithms

#Fill null values with Mode/Median (for categorical features -Mode and for numbers-Median)

In [None]:
cat=df.select_dtypes("object")
for column in cat:
    df[column].fillna(df[column].mode()[0], inplace=True)
    #dataset[column].fillna("NA", inplace=True)


fl=df.select_dtypes(["float64","int64"]).drop("total_cases",axis=1)
for column in fl:
    df[column].fillna(df[column].median(), inplace=True)
    #dataset[column].fillna(0, inplace=True)

In [None]:
# categorical features
categorical_feat = [feature for feature in df.columns if df[feature].dtypes=='O']
print('Total categorical features: ', len(categorical_feat))
print('\n',categorical_feat)

#Label Encoding

In [None]:
from sklearn import preprocessing
encoder = preprocessing.LabelEncoder()
df["iso_code"] = encoder.fit_transform(df["iso_code"].fillna('Nan'))
df["continent"] = encoder.fit_transform(df["continent"].fillna('Nan'))
df["location"] = encoder.fit_transform(df["location"].fillna('Nan'))
df["date"] = encoder.fit_transform(df["date"].fillna('Nan'))
df["tests_units"] = encoder.fit_transform(df["tests_units"].fillna('Nan'))
df.head()

In [None]:
def plotting_3_chart(df, feature): 
    ## Creating a customized chart. and giving in figsize and everything. 
    fig = plt.figure(constrained_layout=True, figsize=(10,6))
    ## crea,ting a grid of 3 cols and 3 rows. 
    grid = gridspec.GridSpec(ncols=3, nrows=3, figure=fig)
    #gs = fig3.add_gridspec(3, 3)

    ## Customizing the histogram grid. 
    ax1 = fig.add_subplot(grid[0, :2])
    ## Set the title. 
    ax1.set_title('Histogram')
    ## plot the histogram. 
    sns.distplot(df.loc[:,feature], norm_hist=True, ax = ax1)

    # customizing the QQ_plot. 
    ax2 = fig.add_subplot(grid[1, :2])
    ## Set the title. 
    ax2.set_title('QQ_plot')
    ## Plotting the QQ_Plot. 
    stats.probplot(df.loc[:,feature], plot = ax2)

    ## Customizing the Box Plot. 
    ax3 = fig.add_subplot(grid[:, 2])
    ## Set title. 
    ax3.set_title('Box Plot')
    ## Plotting the box plot. 
    sns.boxplot(df.loc[:,feature], orient='v', ax = ax3 );
 

print('Skewness: '+ str(df['total_cases'].skew())) 
print("Kurtosis: " + str(df['total_cases'].kurt()))
plotting_3_chart(df, 'total_cases')

#Target was skewed so we need to transformation. Mehmet used log but you try other transformation

In [None]:
#log transform the target:
df["total_cases"] = np.log1p(df["total_cases"])

In [None]:
print('Skewness: '+ str(df['total_cases'].skew()))   
print("Kurtosis: " + str(df['total_cases'].kurt()))
plotting_3_chart(df, 'total_cases')

Auto Detect Outliers

In [None]:
train_o=df[df["total_cases"].notnull()]
from sklearn.neighbors import LocalOutlierFactor
def detect_outliers(x, y, top=5, plot=True):
    lof = LocalOutlierFactor(n_neighbors=40, contamination=0.1)
    x_ =np.array(x).reshape(-1,1)
    preds = lof.fit_predict(x_)
    lof_scr = lof.negative_outlier_factor_
    out_idx = pd.Series(lof_scr).sort_values()[:top].index
    if plot:
        f, ax = plt.subplots(figsize=(9, 6))
        plt.scatter(x=x, y=y, c=np.exp(lof_scr), cmap='RdBu')
    return out_idx

outs = detect_outliers(train_o['total_deaths'], train_o['total_cases'],top=5)
outs
plt.show()

In [None]:
outs

#Detect and Remove outliers

In [None]:
from collections import Counter
outliers=outs
all_outliers=[]
numeric_features = train_o.dtypes[train_o.dtypes != 'object'].index
for feature in numeric_features:
    try:
        outs = detect_outliers(train_o[feature], train_o['total_cases'],top=5, plot=False)
    except:
        continue
    all_outliers.extend(outs)

print(Counter(all_outliers).most_common())
for i in outliers:
    if i in all_outliers:
        print(i)
train_o = train_o.drop(train_o.index[outliers])
test_o=df[df["total_cases"].isna()]
df =  pd.concat(objs=[train_o, test_o], axis=0,sort=False).reset_index(drop=True)

#Check Skewness and fit transformations if needed.

In [None]:
from scipy.special import boxcox1p
from scipy.stats import boxcox
lam = 0.15

#log transform skewed numeric features:
numeric_feats = df.dtypes[df.dtypes != "object"].index

skewed_feats = df[numeric_feats].apply(lambda x: skew(x.dropna())) #compute skewness
skewed_feats = skewed_feats[skewed_feats > s_r]
skewed_feats = skewed_feats.index

df[skewed_feats] = boxcox1p(df[skewed_feats],lam)

#Now we don't have any missing value

In [None]:
df.columns[df.isnull().any()]

#Check Correlation between features and remove features with high correlations.

In [None]:
train_heat=df[df["total_cases"].notnull()]
train_heat=train_heat.drop(["total_deaths"],axis=1)
style.use('ggplot')
sns.set_style('whitegrid')
plt.subplots(figsize = (20,16))
## Plotting heatmap. 

# Generate a mask for the upper triangle (taken from seaborn example gallery)
mask = np.zeros_like(train_heat.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True


sns.heatmap(train_heat.corr(), 
            cmap=sns.diverging_palette(255, 133, l=60, n=7), 
            mask = mask, 
            annot=True, 
            center = 0, 
           );
## Give title. 
plt.title("Heatmap of all the Features", fontsize = 30);

#Remove correlated features

In [None]:
feature_corr = train_heat.corr().abs()
target_corr=df.corr()["total_cases"].abs()
target_corr=pd.DataFrame(target_corr)
target_corr=target_corr.reset_index()
feature_corr_unstack= feature_corr.unstack()
df_fc=pd.DataFrame(feature_corr_unstack,columns=["corr"])
df_fc=df_fc[(df_fc["corr"]>=.80)&(df_fc["corr"]<1)].sort_values(by="corr",ascending=False)
df_dc=df_fc.reset_index()

#df_dc=pd.melt(df_dc, id_vars=['corr'], var_name='Name')
target_corr=df_dc.merge(target_corr, left_on='level_1', right_on='index',
          suffixes=('_left', '_right'))

cols=target_corr["level_0"].values

target_corr

#Remove low features with low variances

In [None]:
all_features = df.keys()
# Removing features.
df = df.drop(df.loc[:,(df==0).sum()>=(df.shape[0]*0.9994)],axis=1)
df = df.drop(df.loc[:,(df==1).sum()>=(df.shape[0]*0.9994)],axis=1) 
# Getting and printing the remaining features.
remain_features = df.keys()
remov_features = [st for st in all_features if st not in remain_features]
print(len(remov_features), 'features were removed:', remov_features)

#Create regression models and compare the accuracy to our best regressor.

In [None]:
train=df[df["total_cases"].notnull()]
test=df[df["total_cases"].isna()]

In [None]:
k = n_f # if you change it 10 model uses most 10 correlated features
corrmat=abs(df.corr())
cols = corrmat.nlargest(k, 'total_cases')['total_cases'].index
train_x=df[cols].drop("total_cases",axis=1)
train_y=df["total_cases"]
X_test=test[cols].drop("total_cases",axis=1)

#Classic Train Test Split

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(train_x, train_y, test_size=0.20, random_state=r_s)

#Do you know all models names in sckitlearn? I learnt right now.

In [None]:
from sklearn.utils.testing import all_estimators
from sklearn import base

estimators = all_estimators()

for name, class_ in estimators:
    if issubclass(class_, base.RegressorMixin):
       print(name+"()")

In [None]:
np.random.seed(seed=r_s)

from sklearn.metrics import mean_squared_error,mean_absolute_error
from sklearn.ensemble import GradientBoostingRegressor,RandomForestRegressor,AdaBoostRegressor,ExtraTreesRegressor,HistGradientBoostingRegressor
from lightgbm import LGBMRegressor
from catboost import CatBoostRegressor
from xgboost import XGBRegressor
from sklearn.linear_model import Ridge,RidgeCV,BayesianRidge,LinearRegression,Lasso,LassoCV,ElasticNet,RANSACRegressor,HuberRegressor,PassiveAggressiveRegressor,ElasticNetCV
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import VotingRegressor
from sklearn.svm import SVR
from sklearn.kernel_ridge import KernelRidge
from sklearn.cross_decomposition import CCA
from sklearn.neural_network import MLPRegressor



my_regressors=[ 
               ElasticNet(alpha=0.001,l1_ratio=0.70,max_iter=100,tol=0.01, random_state=r_s),
               ElasticNetCV(l1_ratio=0.9,max_iter=100,tol=0.01,random_state=r_s),
               CatBoostRegressor(logging_level='Silent',random_state=r_s),
               GradientBoostingRegressor(n_estimators=3000, learning_rate=0.05, max_depth=4, max_features='sqrt', min_samples_leaf=15, min_samples_split=10, loss='huber',random_state =r_s),
               LGBMRegressor(objective='regression', 
                                       num_leaves=4,
                                       learning_rate=0.01, 
                                       n_estimators=5000,
                                       max_bin=200, 
                                       bagging_fraction=0.75,
                                       bagging_freq=5, 
                                       bagging_seed=7,
                                       feature_fraction=0.2,
                                       feature_fraction_seed=7,
                                       verbose=-1,
                                       random_state=r_s
                                       ),
               RandomForestRegressor(random_state=r_s),
               AdaBoostRegressor(random_state=r_s),
               ExtraTreesRegressor(random_state=r_s),
               SVR(C= 20, epsilon= 0.008, gamma=0.0003),
               Ridge(alpha=6),
               RidgeCV(),
               BayesianRidge(),
               DecisionTreeRegressor(),
               LinearRegression(),
               KNeighborsRegressor(),
               Lasso(alpha=0.00047,random_state=r_s),
               LassoCV(),
               KernelRidge(),
               CCA(),
               MLPRegressor(random_state=r_s),
               HistGradientBoostingRegressor(random_state=r_s),
               HuberRegressor(),
               RANSACRegressor(random_state=r_s),
               PassiveAggressiveRegressor(random_state=r_s)
               #XGBRegressor(random_state=r_s)
              ]

regressors=[]

for my_regressor in my_regressors:
    regressors.append(my_regressor)


scores_val=[]
scores_train=[]
MAE=[]
MSE=[]
RMSE=[]


for regressor in regressors:
    scores_val.append(regressor.fit(X_train,y_train).score(X_val,y_val))
    scores_train.append(regressor.fit(X_train,y_train).score(X_train,y_train))
    y_pred=regressor.predict(X_val)
    MAE.append(mean_absolute_error(y_val,y_pred))
    MSE.append(mean_squared_error(y_val,y_pred))
    RMSE.append(np.sqrt(mean_squared_error(y_val,y_pred)))

    
results=zip(scores_val,scores_train,MAE,MSE,RMSE)
results=list(results)
results_score_val=[item[0] for item in results]
results_score_train=[item[1] for item in results]
results_MAE=[item[2] for item in results]
results_MSE=[item[3] for item in results]
results_RMSE=[item[4] for item in results]


df_results=pd.DataFrame({"Algorithms":my_regressors,"Training Score":results_score_train,"Validation Score":results_score_val,"MAE":results_MAE,"MSE":results_MSE,"RMSE":results_RMSE})
df_results

#There is No Missing Values, though the program returned the error above.

In [None]:
ls ../input/hackathon/task_1-google_search_txt_files_v2/SN/

In [None]:
Senegal = '../input/hackathon/task_1-google_search_txt_files_v2/SN/Senegal-fr-result-31.txt'

In [None]:
text = open(Senegal, 'r',encoding='utf-8',
                 errors='ignore').read()

In [None]:
print(text[:2000])

In [None]:
df1 = pd.read_csv('../input/hackathon/task_2-BCG_world_atlas_data-bcg_strain-7July2020.csv', encoding='utf8')
df1.head()

In [None]:
SEN = df1[(df1['country_name']=='Senegal')].reset_index(drop=True)
SEN.head()

In [None]:
fig, ax = plt.subplots(1,3, figsize = (20,6), sharex=True)
sns.countplot(x='bcg_strain_original',data=SEN, palette="nipy_spectral", ax=ax[0])
sns.countplot(x='bcg_strain_id', palette="YlOrBr", data=SEN,ax=ax[1])
sns.countplot(x='bcg_policy_first_year_original', palette="flag", data=SEN,ax=ax[2])
ax[0].title.set_text('Senegal BCG Original Strain')
ax[1].title.set_text('Senegal BCG Strain ID')
ax[2].title.set_text('Senegal BCG Policy 1st Year')
plt.xticks(rotation=45)
plt.show()

In [None]:
#word cloud
from wordcloud import WordCloud, ImageColorGenerator
text = " ".join(str(each) for each in SEN.country_name)
# Create and generate a word cloud image:
wordcloud = WordCloud(max_words=200,colormap='Set2', background_color="black").generate(text)
plt.figure(figsize=(10,6))
plt.figure(figsize=(15,10))
# Display the generated image:
plt.imshow(wordcloud, interpolation='Bilinear')
plt.axis("off")
plt.figure(1,figsize=(12, 12))
plt.show()

Das War's, Kaggle Notebook Runner: Marília Prata  @mpwolke 