# Milestone 3 research and data visualisation

### Table of Contents

* [Digital Propagation](#chapter1)
    * [Loading the data](#section_1_1)
    * [Overview of the data](#Section_1_2)
    * [Time series analysis](#Section_1_3)
    
        * [Check for stationarity](#section_1_3_1)
        * [Autocorrelation](#section_1_3_2)
        * [Decomposition](#section_1_3_3)
        
    * [Google mobility data](#Section_1_4) 
    
        * [Data processing](#section_1_4_1)
        * [Analysis per country](#section_1_4_1)
        
        
* [COVID-19 dataset](#chapter2)
    * [Downloading the data](#section_2_1)
    * [Overview of the data](#section_2_2)
    * [Time series analysis](#Section_2_3)
        * [Check for stationarity](#section_2_3_1)
        * [Autocorrelation](#section_2_3_2)
        * [Decomposition](#section_2_3_3)
        
        
* [Pearson Correlation](#chapter3)      


* [Trust dataset](#chapter4)  
    * [Visualizing Government trust](#section_4_1)
    * [Visualizing Trust in Journalists](#section_4_2)
    * [Visualizing Trust in Science](#section_4_3)
    * [Analysis](#section_4_4)


* [Clustering](#chapter5) 

We import all the librairies needed to compute and plot our results.

In [85]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn
from scipy import stats
from helper import *
from scipy.stats.mstats import gmean
#Importation of all the packages
import datetime
import math
import json
import zipfile  
import ssl
from scipy import stats
import statsmodels.formula.api as smf
import statsmodels.api as sm
import plotly.express as px

#To dowload data
import requests
import io
import gzip

#To create the mapchart
import iso3166
import plotly
from iso3166 import countries
import plotly.graph_objects as go

## Digital propagation <a class="anchor" id="chapter1"></a>
### Loading the data <a class="anchor" id="section_1_1"></a>

We first load the raw data for pageviews and population, and then clean it to obtain a dataframe of pageviews, cumulative pageviews, pageviews per 100,000 inhabitants, and cumulative pageviews per 100,000 inhabitants using the `get_pageviews_df` function.

For cases and deaths due to COVID-19, we do the same, but we obtain the raw data using URLs with the `get_cases_deaths_df` function.

For our initial analysis, we consider the data from the start to the end of the COVID-19 pandemic, from **January 22, 2020** to **July 31, 2022**.

In [86]:
#Loading raw df from csv file
pageview_df = pd.read_csv("page_views_covid_related.csv.gz")
population_df = pd.read_csv("Population_countries.csv")
#get cleaned dfs, cumulative df and per 100k of population dfs for pageviews, covid cases and deaths data 
df_pageviews, df_pageviews_cumul, df_pageviews100k, df_pageviews_cumul100k = get_pageviews_df(pageview_df, population_df, get_country_dict('original'), '2020-01-22', '2020-09-01')
deaths, cases, deaths_cumul, cases_cumul, deaths100k, deaths100k_cumul, cases100k, cases100k_cumul = get_cases_deaths_df(population_df, get_country_dict('original'), '2020-01-22', '2020-09-01')

Here, we create a dictionary `o_country_dict` that maps country names to their language codes using the `get_country_dict` function then we create a reversed version of the dictionary, `inv_o_country_dict`, that maps language codes to country names. We also defines a dictionary `other_country_name` that contains exceptions for country names that differ from their language codes to use them later for our mapchart plots.

In [87]:
# Get a dictionary mapping country names to their language codes
o_country_dict = get_country_dict('original')

# Reverse the dictionary to map language codes to country names
inv_o_country_dict = {v: k for k, v in o_country_dict.items()}

# Add exceptions for country names that differ from their language codes
other_country_name = {
    "Russia": "Russian Federation",
    "Turkey":"Türkiye",
    "Vietnam" : "Viet Nam",
    "South Korea" : "Korea, Democratic People's Republic of"
}

In [88]:
def mapcharts_df(df, country_dict, interest):
    # Empty dataframe for the map chart
    df_mapchart = pd.DataFrame({})
    inv_country_dict = {v: k for k, v in country_dict.items()}
    other_country_name = {
    "Russia": "Russian Federation",
    "Turkey":"Türkiye",
    "Vietnam" : "Viet Nam",
    "South Korea" : "Korea, Democratic People's Republic of"
    }
    # Iterate through each country build the correct dataframe for the mapchart
    for country in country_dict.keys():
        
        df = pd.DataFrame(df.rename(columns= inv_country_dict)[country])
        df = df.rename(columns= {country: interest})
        if (country in list(other_country_name.keys())):
            print("ca passe")
            df['Country_code'] = [countries.get(other_country_name[country]).alpha3] * len(df)
        else:
            df['Country_code'] = [countries.get(country).alpha3] * len(df)
        df['date'] = df.index
        df['country'] = country

        #df = df.iloc[::2,:]
        df_mapchart = pd.concat([df_mapchart, df], axis= 0)
    return df_mapchart

In [89]:
def mapcharts(df, color_serie, hover_serie, title, subtile= None, font_title= 16, font_subtile= 12, colorcode= 'Reds'):
# Create the map chart
  fig = px.choropleth(df, locations= "Country_code",
                      color= df[color_serie],
                      animation_frame= 'date',
                      hover_name= df[hover_serie], # column to add to hover information
                      range_color= [0,np.percentile(df[color_serie],99)],
                      color_continuous_scale= colorcode, 
                      title= format_title(title, subtile, font_subtile, font_title),
                      width= 700,
                      height= 700)

  #Update layout, menus and buttons options
  fig.update_layout(transition = {'duration': 5})
  fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 5 # buttons
  fig.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 5
  fig.layout.updatemenus[0].buttons[1].args[1]["frame"]["duration"] = 5
  fig.layout.updatemenus[0].buttons[1].args[1]["transition"]["duration"] = 5
  fig.layout.sliders[0].steps[0].args[1]["frame"]["duration"] = 5 # slider
  fig.layout.updatemenus[0].buttons[0].args[1]["visible"] = False

  #Zoom on specific part of the map
  fig.update_geos(
      center=dict(lon=80, lat=35),
      projection_type="mercator",
      lataxis_range=[-50,80], lonaxis_range=[-10, 230]
  )

  fig.show()
  fig.write_html("data/deaths_mapchart.html",default_width= 500, default_height= 500)

In [90]:
# Empty dataframe for the map chart
deaths_mapchart = pd.DataFrame({})
# Iterate through each country build the correct dataframe for the mapchart
for country in list(o_country_dict.keys()): 
  df = pd.DataFrame(deaths100k_cumul.rename(columns= inv_o_country_dict)[country])
  df = df.rename(columns= {country: 'deaths'})
  if (country in list(other_country_name.keys())):
    df['Country_code'] = [countries.get(other_country_name[country]).alpha3] * len(df)
  else:
    df['Country_code'] = [countries.get(country).alpha3] * len(df)
  df['date'] = df.index
  df['country'] = country

  deaths_mapchart = pd.concat([deaths_mapchart, df], axis= 0)

#deaths_mapchart = mapcharts_df(deaths, o_country_dict, 'deaths')
mapcharts(deaths_mapchart, 'deaths', 'country', 'Number of culmulative death of COVID-19 per 100k inhabitants and per country',
'The colour of the country corresponds to how much death per 100k inhabitants was recorded from 22-01-2020.', 16, 12, 'Reds')

In [93]:
pageviews_mapchart = pd.DataFrame({})
for country in o_country_dict.keys():
    
    df = pd.DataFrame(df_pageviews_cumul100k.rename(columns= inv_o_country_dict)[country])
    df = df.rename(columns= {country: 'pageviews'})
    if (country in list(other_country_name.keys())):
        df['Country_code'] = [countries.get(other_country_name[country]).alpha3] * len(df)
    else:
        df['Country_code'] = [countries.get(country).alpha3] * len(df)
    df['date'] = df.index
    df['country'] = country

    pageviews_mapchart = pd.concat([pageviews_mapchart, df], axis= 0)

mapcharts(pageviews_mapchart, 'pageviews', 'country', 'Number of cumulative pageviews per 100000 inhabitants and per country',
'The colour of the country corresponds to how much pageviews per 100k inhabitants was recorded from 22-01-2020.', 16, 10, 'ylorbr')

In [94]:
cases_mapchart = pd.DataFrame({})
for country in o_country_dict.keys():
    
    df = pd.DataFrame(cases100k_cumul.rename(columns= inv_o_country_dict)[country])
    df = df.rename(columns= {country: 'Cases'})
    if (country in list(other_country_name.keys())):
        df['Country_code'] = [countries.get(other_country_name[country]).alpha3] * len(df)
    else:
        df['Country_code'] = [countries.get(country).alpha3] * len(df)
    df['date'] = df.index
    df['country'] = country

    #To keep every 5 rows
    df = df.iloc[::2,:]
    cases_mapchart = pd.concat([cases_mapchart, df], axis= 0)

mapcharts(cases_mapchart, 'Cases', 'country', 'Number of cumulative cases per 100000 inhabitants and per country')