# Data Against Covid-19  

# Covid19 : Correlation with the Google Comunity Mobility Report

### Data

__Day__ : day of the analysis <br /><br />
__retail_and_recreation_percent_change_from_baseline__ : Mobility trends for places like restaurants,cafes, shopping centers, theme parks,museums, libraries, and movie theaters <br /><br />
__grocery_and_pharmacy_percent_change_from_baseline__ : Mobility trends for places like grocerymarkets, food warehouses, farmersmarkets, specialty food shops, drug stores,and pharmacies <br /><br />
__parks_percent_change_from_baseline__ : Mobility trends for places like national parks,public beaches, marinas, dog parks, plazas,and public gardens <br /><br />
__transit_stations_percent_change_from_baseline__ : Mobility trends for places like public transporthubs such as subway, bus, and train stations <br /><br />
__workplaces_percent_change_from_baseline__ : Mobility trends for places of work <br /><br />
__residential_percent_change_from_baseline__ : Mobility trends for places of residence

### Remarques : 

Be careful :
- The column 'country_region' doesn't have null value but the column 'country_region_code' yes : Namibia : see annexe 1 : That's the reason why we will be working with the 'country_region'
- There are some Nan value ex : Corsica for residential_percent_change_from_baseline' day 22/03, 29/03, 05/04, 12/04,19/04,26/04

# Initialisation

In [645]:
print("Starting the program")
import pandas as pd
import urllib.request
import os
import numpy as np

In [517]:
filePath = "./Global_Mobility_Report.csv"
# Country we are looking for : Denmark, Italy, Germany, Spain, United_Kingdom, France, Norway, Belgium, Austria, Sweden, Switzerland, Greece, Portugal, Netherlands
# let's begin the mapping for the country : Create a dictionnary with the mapping that you wish between the google name and the name in your database
# Annexe 2 there are a useful tool to look for your country 
# For example in this example the only difference is for United Kingdom
countries_list =  ['Denmark', 'Italy', 'Germany', 'Spain', 'United Kingdom', 'France', 'Norway', 'Belgium', 'Austria', 'Sweden', 'Switzerland', 'Greece', 'Portugal', 'Netherlands']

countries = {
 'Denmark': 'Denmark',
 'Italy': 'Italy',
 'Germany': 'Germany',
 'Spain': 'Spain',
 'United Kingdom': 'United_Kingdom',
 'France': 'France',
 'Norway': 'Norway',
 'Belgium': 'Belgium',
 'Austria': 'Austria',
 'Sweden': 'Sweden',
 'Switzerland': 'Switzerland',
 'Greece': 'Greece',
 'Portugal': 'Portugal',
 'Netherlands': 'Netherlands',
  }

# Regions we are looking for :  Auvergne-Rhône-Alpes, Bourgogne-Franche-Comté, Bretagne, Centre-Val de Loire, Corse, Grand Est, Hauts-de-France, Normandie, Nouvelle-Aquitaine, Occitanie, Pays de la Loire,  Provence-Alpes-Côte d'Azur, Île-de-France, France-hopitaux, France-OC19, France-EHPAD    
# let's begin the mapping for the regions : Create a dictionnary with the mapping that you wish between the google name and the name in your database
# Annexe 2 there are a useful tool to look for your regions 
# difference : Brittany, normandy, Corsica

all_france = 'France-OC19'

regions = {
    
    'Auvergne-Rhône-Alpes' : 'Auvergne-Rhône-Alpes',
    'Bourgogne-Franche-Comté' : 'Bourgogne-Franche-Comté',
    'Brittany' : 'Bretagne',
    'Centre-Val de Loire' : 'Centre-Val de Loire',
    'Corsica' : 'Corse',
    'Grand Est' : 'Grand Est',
    'Hauts-de-France' : 'Hauts-de-France',
    'Normandy' : 'Normandie',
    'Nouvelle-Aquitaine' : 'Nouvelle-Aquitaine',
    'Occitanie' : 'Occitanie',
    'Pays de la Loire' : 'Pays de la Loire',
    "Provence-Alpes-Côte d'Azur" : "Provence-Alpes-Côte d'Azur",
    'Île-de-France' : 'Île-de-France'
    
}

regions_code = {
    
    'Auvergne-Rhône-Alpes' : 84,
    'Bourgogne-Franche-Comté' : 27,
    'Bretagne' : 53,
    'Centre-Val de Loire' : 24,
    'Corse' :94,
    'Grand Est' : 44,
    'Hauts-de-France' : 32,
    'Normandie' : 28,
    'Nouvelle-Aquitaine' : 75,
    'Occitanie' : 76,
    'Pays de la Loire' : 52,
    "Provence-Alpes-Côte d'Azur" : 93,
    'Île-de-France' : 11
    
}

## Check if the file exists

In [772]:
import os.path

if os.path.isfile(filePath):
    os.remove(filePath)
else:
    print ("File not exists")


## Get the file 

In [773]:
print('Start the download')

try :
    url = 'https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv?cachebust=e0c5a582159f5662'
    urllib.request.urlretrieve(url, filePath)
    print('The file have been correctly downloaded')
except Exception as e:
    print("The file have NOT been correctly downloaded")
    print(e)

Start the download
The file have been correctly downloaded


## Read the file : Community Mobility Report

In [774]:
print("Start the processing")
csv_df = pd.read_csv("./Global_Mobility_Report.csv")

Start the processing


# Filter for the countries that we are interested in  

In [737]:
if len(csv_df[csv_df['country_region'].isin(countries_list)]['country_region'].unique()) != len (countries_list):
    print("Error : You haven't put the countries that you want in the format of the google community report")
    

In [738]:
df_countries = csv_df[csv_df['country_region'].isin(countries_list)].copy()

In [739]:
df_countries.count()

country_region_code                                   24928
country_region                                        24928
sub_region_1                                          23920
sub_region_2                                              0
date                                                  24928
retail_and_recreation_percent_change_from_baseline    24510
grocery_and_pharmacy_percent_change_from_baseline     24502
parks_percent_change_from_baseline                    21214
transit_stations_percent_change_from_baseline         24156
workplaces_percent_change_from_baseline               24834
residential_percent_change_from_baseline              21404
dtype: int64

## Clean the name according to our database

#### Change the name of the countries

In [740]:
df_countries['country_region'].unique()
#df_countries['country_region'].count()

array(['Austria', 'Belgium', 'Switzerland', 'Germany', 'Denmark', 'Spain',
       'France', 'United Kingdom', 'Greece', 'Italy', 'Netherlands',
       'Norway', 'Portugal', 'Sweden'], dtype=object)

In [741]:
df_countries['country_region'].replace(countries, inplace=True)

In [742]:
df_countries['country_region'].unique()
#df_countries['country_region'].count()

array(['Austria', 'Belgium', 'Switzerland', 'Germany', 'Denmark', 'Spain',
       'France', 'United_Kingdom', 'Greece', 'Italy', 'Netherlands',
       'Norway', 'Portugal', 'Sweden'], dtype=object)

#### Fill the nan value of France

In [743]:
df_countries[df_countries['country_region']=='France']['sub_region_1'].unique()


array([nan, 'Auvergne-Rhône-Alpes', 'Bourgogne-Franche-Comté', 'Brittany',
       'Centre-Val de Loire', 'Corsica', 'Grand Est', 'Hauts-de-France',
       'Île-de-France', 'Normandy', 'Nouvelle-Aquitaine', 'Occitanie',
       'Pays de la Loire', "Provence-Alpes-Côte d'Azur"], dtype=object)

In [744]:
df_countries['sub_region_1'].count()

23920

In [745]:
df_countries.loc[(df_countries['country_region']=='France') & (df_countries['sub_region_1'].isnull()), 'sub_region_1'] = all_france

In [746]:
df_countries[df_countries['country_region']=='France']['sub_region_1'].unique()
#df_countries['sub_region_1'].count()

array(['France-OC19', 'Auvergne-Rhône-Alpes', 'Bourgogne-Franche-Comté',
       'Brittany', 'Centre-Val de Loire', 'Corsica', 'Grand Est',
       'Hauts-de-France', 'Île-de-France', 'Normandy',
       'Nouvelle-Aquitaine', 'Occitanie', 'Pays de la Loire',
       "Provence-Alpes-Côte d'Azur"], dtype=object)

#### Change the name of the regions

In [747]:
len(df_countries['sub_region_1'].unique())

335

In [748]:
df_countries['sub_region_1'][df_countries['country_region']=='France'].unique()

array(['France-OC19', 'Auvergne-Rhône-Alpes', 'Bourgogne-Franche-Comté',
       'Brittany', 'Centre-Val de Loire', 'Corsica', 'Grand Est',
       'Hauts-de-France', 'Île-de-France', 'Normandy',
       'Nouvelle-Aquitaine', 'Occitanie', 'Pays de la Loire',
       "Provence-Alpes-Côte d'Azur"], dtype=object)

In [749]:
df_countries['sub_region_1'].replace(regions, inplace=True)

In [750]:
len(df_countries['sub_region_1'].unique())

335

In [751]:
df_countries['sub_region_1'][df_countries['country_region']=='France'].unique()

array(['France-OC19', 'Auvergne-Rhône-Alpes', 'Bourgogne-Franche-Comté',
       'Bretagne', 'Centre-Val de Loire', 'Corse', 'Grand Est',
       'Hauts-de-France', 'Île-de-France', 'Normandie',
       'Nouvelle-Aquitaine', 'Occitanie', 'Pays de la Loire',
       "Provence-Alpes-Côte d'Azur"], dtype=object)

#### Add the region code 

In [752]:
df_countries.columns

Index(['country_region_code', 'country_region', 'sub_region_1', 'sub_region_2',
       'date', 'retail_and_recreation_percent_change_from_baseline',
       'grocery_and_pharmacy_percent_change_from_baseline',
       'parks_percent_change_from_baseline',
       'transit_stations_percent_change_from_baseline',
       'workplaces_percent_change_from_baseline',
       'residential_percent_change_from_baseline'],
      dtype='object')

In [753]:
df_countries['sub_region_1_code'] = df_countries['sub_region_1'].map(regions_code)

In [754]:
df_countries.columns

Index(['country_region_code', 'country_region', 'sub_region_1', 'sub_region_2',
       'date', 'retail_and_recreation_percent_change_from_baseline',
       'grocery_and_pharmacy_percent_change_from_baseline',
       'parks_percent_change_from_baseline',
       'transit_stations_percent_change_from_baseline',
       'workplaces_percent_change_from_baseline',
       'residential_percent_change_from_baseline', 'sub_region_1_code'],
      dtype='object')

In [759]:
df_countries = df_countries[['country_region_code','country_region', 'sub_region_1_code', 'sub_region_1', 'sub_region_2','date', 'retail_and_recreation_percent_change_from_baseline',
       'grocery_and_pharmacy_percent_change_from_baseline',
       'parks_percent_change_from_baseline',
       'transit_stations_percent_change_from_baseline',
       'workplaces_percent_change_from_baseline',
       'residential_percent_change_from_baseline']]
print("End of the processing")

End of the processing


In [761]:
df_countries.describe()

Unnamed: 0,sub_region_1_code,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
count,720.0,24510.0,24502.0,21214.0,24156.0,24834.0,21404.0
mean,51.8,-37.591881,-13.510815,-7.404686,-35.309116,-31.857051,12.327415
std,27.154818,36.838445,23.235566,43.620709,32.314992,28.084827,11.654913
min,11.0,-100.0,-96.0,-93.0,-93.0,-92.0,-6.0
25%,27.0,-74.0,-30.0,-35.0,-65.0,-59.0,1.0
50%,48.0,-36.0,-6.0,-5.0,-39.0,-30.0,10.0
75%,76.0,-1.0,3.0,16.0,-3.0,-1.0,24.0
max,93.0,87.0,107.0,377.0,93.0,17.0,50.0


## Write the result in a CSV file

In [771]:
print("Save the result with the name : Global_Mobility_Report_prepared.csv")
df_countries.to_csv('Global_Mobility_Report_prepared.csv')

Save the result with the name : Global_Mobility_Report_prepared.csv


# Analyze

In [247]:
csv_df.describe()

Unnamed: 0,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
count,239356.0,232094.0,123860.0,151104.0,273621.0,159503.0
mean,-20.003526,-4.707817,-5.161917,-23.779079,-21.568458,9.595211
std,29.857701,20.486933,41.032907,30.273161,22.692384,10.010652
min,-100.0,-100.0,-100.0,-100.0,-94.0,-25.0
25%,-41.0,-14.0,-31.0,-48.0,-39.0,0.0
50%,-15.0,-1.0,-4.0,-18.0,-23.0,8.0
75%,4.0,7.0,16.0,1.0,1.0,17.0
max,313.0,337.0,430.0,497.0,248.0,54.0


In [26]:
csv_df.columns

Index(['country_region_code', 'country_region', 'sub_region_1', 'sub_region_2',
       'date', 'retail_and_recreation_percent_change_from_baseline',
       'grocery_and_pharmacy_percent_change_from_baseline',
       'parks_percent_change_from_baseline',
       'transit_stations_percent_change_from_baseline',
       'workplaces_percent_change_from_baseline',
       'residential_percent_change_from_baseline'],
      dtype='object')

In [24]:
df_fr.describe()

Unnamed: 0,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
count,1008.0,1008.0,1008.0,1008.0,1008.0,1002.0
mean,-49.374008,-23.515873,-30.50496,-45.993056,-39.651786,16.835329
std,39.501109,26.761511,39.31029,39.576193,29.382773,13.527632
min,-92.0,-90.0,-89.0,-90.0,-91.0,-3.0
25%,-83.0,-43.25,-62.0,-81.0,-67.0,2.0
50%,-79.0,-32.0,-49.0,-75.0,-50.0,18.0
75%,-4.0,1.0,4.0,-4.0,-5.0,30.0
max,25.0,53.0,107.0,60.0,5.0,42.0


# Annexes

Annexe 1 : Find the null value in 'country_region_code'

In [61]:
look = csv_df[csv_df['country_region_code'].isnull()]['country_region'].to_frame()
look['country_region'].unique()

array(['Namibia'], dtype=object)

Annexe 2 : Find the country that correspond to the name you want

In [79]:
# Enter the country that you are looking for : 
country = 'Denmark'
cd = country[:3]
csv_df['country_region'].unique()[pd.Series(csv_df['country_region'].unique()).str.startswith(cd)]

array(['Denmark'], dtype=object)

In [157]:
# Enter the regions that you are looking for : 
regions = 'Auvergne-Rhône-Alpes'
cd = regions[:3]
reg = csv_df[csv_df['country_region_code']=='FR']['sub_region_1'].unique()
reg[pd.Series(reg).str.startswith(cd, na = False)]

array(['Auvergne-Rhône-Alpes'], dtype=object)

Annexe 3 : Group by regions ( Be careful it doesn't take in account the nan value ) 

In [None]:
df_fr = csv_df[csv_df['country_region_code']=='FR']
df_rg = df_fr.groupby(df_fr['sub_region_1'])
df_rg.count()

Annexe 4 : Find the nana value for a column 

In [None]:
df_fr[df_fr['sub_region_1']=='Corsica'][df_fr['residential_percent_change_from_baseline'].isnull()]

Annexe 5 : Which country have a sub_region_2

In [None]:
csv_df[pd.Series(csv_df['sub_region_2']).notna()]['country_region'].unique()