<h1>Projet 1: Analysez des données de systèmes éducatifs</h1>

Nordine Oural - 29/12/2022

---

**Table of contents**<a id='toc0_'></a>    
- [Introduction](#toc1_1_)    
  - [Choix des critères retenus](#toc1_2_)    
  - [Import des différents modules](#toc1_3_)    
  - [Visualisation des données de départ](#toc1_4_)    
    - [Indicateurs](#toc1_4_1_)    
    - [Pays](#toc1_4_2_)    
  - [Paramètres initiaux](#toc1_5_)    
  - [Nettoyage des pays](#toc1_6_)    
  - [Chargement des données à analyser](#toc1_7_)    
  - [Filtrage des données par pays](#toc1_8_)    
  - [Filtrage des données par indicateurs](#toc1_9_)    
  - [Simplification des données selon les années](#toc1_10_)    
  - [Agrégation des indicateurs par pays](#toc1_11_)    
  - [Ajout de la variable Region](#toc1_12_)    
  - [Renommage des colonnes pour simplification](#toc1_13_)    
  - [Nettoyage](#toc1_14_)    
    - [Population totale](#toc1_14_1_)    
    - [Région](#toc1_14_2_)    
  - [Imputation](#toc1_15_)    
    - [PIB / habitant](#toc1_15_1_)    
    - [Population de 14 à 19 ans](#toc1_15_2_)    
    - [Population de moins de 14 ans](#toc1_15_3_)    
  - [Nettoyage des autres indicateurs](#toc1_16_)    
  - [Nettoyage](#toc1_17_)    
  - [Calcul de la population de plus de 20 ans](#toc1_18_)    
  - [Aggregation des données par régions](#toc1_19_)    
  - [Visualisation globale des régions](#toc1_20_)    
  - [Création d'un indicateur composite](#toc1_21_)    
    - [Indicateur de confiance appliqué aux régions](#toc1_21_1_)    
    - [Indicateur de confiance appliqué aux pays](#toc1_21_2_)    
  - [Conclusion](#toc1_22_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

### <a id='toc1_1_'></a>[Introduction](#toc0_)

Le but de ce projet est de fournir à une entreprise spécialisée dans la formation en ligne pour les niveaux Lycée et Université, une analyse pré-exploratoire à partir d'un ensemble de données par pays et zones géographiques.

Les données de départ sont fournies par le site [www.worldbank.org](https://www.wordlbank.org) via le lien de téléchargement : [lien](https://s3-eu-west-1.amazonaws.com/static.oc-static.com/prod/courses/files/Parcours_data_scientist/Projet+-+Donn%C3%A9es+%C3%A9ducatives/Projet+Python_Dataset_Edstats_csv.zip "lien de téléchargement du set de données")

Le fichier téléchargé est un dossier zippé contenant:
- EdStatSeries.csv : contient la liste des indicateurs utilisés dans les études, avec les indicateurs, sources des études, sujet de l'indicateur
- EdStatsCountry.csv : contient la liste des pays et zones géographiques avec informations d'identifications, région, classification économique des revenus et différentes dates des dernières études réalisées 
- EdStatsData.csv : contient toues les données recueillies par pays et indicateur. Pour chaque ligne, les données sont fournies pour différentes années entre 1970 et 2100 (certaines estimations sont fournies)
- EdStatsCountry-Series.csv : contient des commentaires concernant certains indicateurs pour certains pays (613 lignes)
- EdStatsFootNote.csv : contient une indication quant à la source pour chaque indicateur, chaque pays et chaque année disponible


Les fichiers sont extraits dans un répertoire assets.


### <a id='toc1_2_'></a>[Choix des critères retenus](#toc0_)

Pour pouvoir suivre les formations en ligne, nos usagers auront besoin d'un ordinateur ainsi que d'une connexion Internet. Nous allons retenir les indicateurs suivants:
- IT.CMP.PCMP.P2 : Personal computers (per 100 people)
- IT.NET.USER.P2 : Internet users (per 100 people) 

Afin d'analyser le potentiel économique des différents pays, nous allons aussi retenir le PIB par habitant, la population totale, ainsi que les populations de plus de 20 ans et de moins de 20 ans:
- NY.GDP.PCAP.PP.KD : GDP per capita, PPP (constant 2011 international $)
- NY.GDP.MKTP.KD : GDP (constant 2010 US$)
- SP.POP.TOTL : Population, total
- SP.POP.1419.TO.UN : Population, ages 14-19, total
- SP.POP.0014.TO : Population ages 0-14, total

Pour obtenir la population agée de 20 ans et plus, nous soustrairons les valeurs des indicateurs SP.POP.1419.TO.UN et SP.POP.0014.TO à SP.POP.TOTL

Pour obtenir la polulation agée de moins de 20 ans, nous additionnerons les valeurs des indicateurs SP.POP.1419.TO.UN et SP.POP.0014.TO


### <a id='toc1_3_'></a>[Import des différents modules](#toc0_)
Pour notre étude, nous utiliserons les librairies Pandas, NumPy ainsi que Express et graph_objects de PlotLY

In [2133]:
import pandas as pd
import numpy as np
import plotly.express as px, plotly.graph_objects as go


### <a id='toc1_4_'></a>[Visualisation des données de départ](#toc0_)

#### <a id='toc1_4_1_'></a>[Indicateurs](#toc0_)

In [2134]:
df_indicateurs = pd.read_csv("assets/EdStatsSeries.csv")
df_indicateurs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 21 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   Series Code                          3665 non-null   object 
 1   Topic                                3665 non-null   object 
 2   Indicator Name                       3665 non-null   object 
 3   Short definition                     2156 non-null   object 
 4   Long definition                      3665 non-null   object 
 5   Unit of measure                      0 non-null      float64
 6   Periodicity                          99 non-null     object 
 7   Base Period                          314 non-null    object 
 8   Other notes                          552 non-null    object 
 9   Aggregation method                   47 non-null     object 
 10  Limitations and exceptions           14 non-null     object 
 11  Notes from original source    

Le dataframe df_indicateurs contient 3665 lignes.

In [2135]:
print(f"""Nombre de codes indicateurs differents: {len(df_indicateurs["Series Code"].unique())}""")
print(f"""Nombre de noms d' indicateurs differents: {len(df_indicateurs["Indicator Name"].unique())}""")

Nombre de codes indicateurs differents: 3665
Nombre de noms d' indicateurs differents: 3665


Nous n'avons pas de doublons dans les indicateurs.

Voyons combien de sujets sont abordés dans cet ensemble:

In [2136]:
print(f"""Nombre de sujets abordés: {len(df_indicateurs["Topic"].unique())}""")

Nombre de sujets abordés: 37


Voyons le nombre d' études sources différentes:

In [2137]:
print(f"""Nombre d' études sources: {len(df_indicateurs["Source"].unique())}""")

Nombre d' études sources: 31


Nous avons donc 3665 indicateurs différents répartis entre 37 sujets différents et qui proviennent de 31 sources différents.

Voyons la répartition de ces indicateurs par sujet et par source:

In [2138]:
df_indicateurs_by_topic = df_indicateurs.assign(nb_topic=1).groupby("Topic").agg({"nb_topic": "sum"}).sort_values(by="nb_topic", ascending=False).reset_index()

fig = px.pie(df_indicateurs_by_topic, values='nb_topic', names='Topic', title='Répartition des indicateurs par sujet', height=600, width=1100)
fig.update_traces(textposition='inside')
fig.update_layout(uniformtext_minsize=12, uniformtext_mode='hide')
fig.show()

In [2139]:
df_indicateurs_by_source = df_indicateurs.assign(nb_source=1).groupby("Source").agg({"nb_source": "sum"}).sort_values(by="nb_source", ascending=False).reset_index()
df_indicateurs_by_source["Source"] = df_indicateurs_by_source["Source"].str[:80]
fig2 = px.pie(df_indicateurs_by_source, values='nb_source', names='Source', title='Répartition des indicateurs par étude', height=600, width=1100)
fig2.update_traces(textposition='inside')
fig2.update_layout(uniformtext_minsize=7, uniformtext_mode='hide')
fig2.show()

#### <a id='toc1_4_2_'></a>[Pays](#toc0_)
On importe la liste des pays depuis le fichier EdStatsCountry.csv 

In [2140]:
countries = pd.read_csv("assets/EdStatsCountry.csv")

countries.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 241 entries, 0 to 240
Data columns (total 32 columns):
 #   Column                                             Non-Null Count  Dtype  
---  ------                                             --------------  -----  
 0   Country Code                                       241 non-null    object 
 1   Short Name                                         241 non-null    object 
 2   Table Name                                         241 non-null    object 
 3   Long Name                                          241 non-null    object 
 4   2-alpha code                                       238 non-null    object 
 5   Currency Unit                                      215 non-null    object 
 6   Special Notes                                      145 non-null    object 
 7   Region                                             214 non-null    object 
 8   Income Group                                       214 non-null    object 
 9   WB-2 code 

In [2141]:
print(f"""Nombre de codes pays différents: {len(countries["Country Code"].unique())}""")
print(f"""Nombre de noms de pays différents: {len(countries["Table Name"].unique())}""")

Nombre de codes pays différents: 241
Nombre de noms de pays différents: 241


A priori il n'y a pas de doublons au niveau des pays.

Une variable Region est associée à chaque individu. Voyons la répartition des pays par région.

In [2142]:
countries_by_region = countries.assign(nb=1).groupby("Region").agg({"nb": "sum"}).sort_values(by="nb", ascending=False).reset_index()
fig = px.pie(countries_by_region, values='nb', names='Region', title='Répartition des pays par Région', height=500, width=700)
fig.update_traces(textposition='inside')
fig.update_layout(uniformtext_minsize=12, uniformtext_mode='hide')
fig.show()

### <a id='toc1_5_'></a>[Paramètres initiaux](#toc0_)
Afin de pouvoir paramétrer facilement notre étude, nous définissons toutes les variables de paramètrage à un seul endroit

In [2143]:
''' Seuil minimal pour considérer l'étude d'un pays '''
population_threshold = 2000000

''' mapping code indicateur / label '''
final_indicators = [
    {
        "name": "% PC Population",
        "indicators": "IT.CMP.PCMP.P2"
    },
    {
        "name": "% Internet Population",
        "indicators": "IT.NET.USER.P2"
    },
    {
        "name": "PIB/habitant",
        "indicators": "NY.GDP.PCAP.PP.KD"
    },
    {
        "name": "PIB",
        "indicators": "NY.GDP.MKTP.KD"
    },
    {
        "name": "Population Totale",
        "indicators": "SP.POP.TOTL"
    },
    {
        "name": "Population 1419",
        "indicators": "SP.POP.1419.TO.UN"
    },
    {
        "name": "Population -14",
        "indicators": "SP.POP.0014.TO"
    }
]

A partir des paramètres fournis, on crée la liste des indicateurs à retenir

In [2144]:
indicators = list(map(lambda x: x["indicators"], final_indicators))


### <a id='toc1_6_'></a>[Nettoyage des pays](#toc0_)

In [2145]:
countries.head(10)

Unnamed: 0,Country Code,Short Name,Table Name,Long Name,2-alpha code,Currency Unit,Special Notes,Region,Income Group,WB-2 code,...,IMF data dissemination standard,Latest population census,Latest household survey,Source of most recent Income and expenditure data,Vital registration complete,Latest agricultural census,Latest industrial data,Latest trade data,Latest water withdrawal data,Unnamed: 31
0,ABW,Aruba,Aruba,Aruba,AW,Aruban florin,SNA data for 2000-2011 are updated from offici...,Latin America & Caribbean,High income: nonOECD,AW,...,,2010,,,Yes,,,2012.0,,
1,AFG,Afghanistan,Afghanistan,Islamic State of Afghanistan,AF,Afghan afghani,Fiscal year end: March 20; reporting period fo...,South Asia,Low income,AF,...,General Data Dissemination System (GDDS),1979,"Multiple Indicator Cluster Survey (MICS), 2010/11","Integrated household survey (IHS), 2008",,2013/14,,2012.0,2000.0,
2,AGO,Angola,Angola,People's Republic of Angola,AO,Angolan kwanza,"April 2013 database update: Based on IMF data,...",Sub-Saharan Africa,Upper middle income,AO,...,General Data Dissemination System (GDDS),1970,"Malaria Indicator Survey (MIS), 2011","Integrated household survey (IHS), 2008",,2015,,,2005.0,
3,ALB,Albania,Albania,Republic of Albania,AL,Albanian lek,,Europe & Central Asia,Upper middle income,AL,...,General Data Dissemination System (GDDS),2011,"Demographic and Health Survey (DHS), 2008/09",Living Standards Measurement Study Survey (LSM...,Yes,2012,2010.0,2012.0,2006.0,
4,AND,Andorra,Andorra,Principality of Andorra,AD,Euro,,Europe & Central Asia,High income: nonOECD,AD,...,,2011. Population figures compiled from adminis...,,,Yes,,,2006.0,,
5,ARB,Arab World,Arab World,Arab World,1A,,Arab World aggregate. Arab World is composed o...,,,1A,...,,,,,,,,,,
6,ARE,United Arab Emirates,United Arab Emirates,United Arab Emirates,AE,U.A.E. dirham,April 2013 database update: Based on data from...,Middle East & North Africa,High income: nonOECD,AE,...,General Data Dissemination System (GDDS),2010,,,,2012,,2011.0,2005.0,
7,ARG,Argentina,Argentina,Argentine Republic,AR,Argentine peso,,Latin America & Caribbean,Upper middle income,AR,...,Special Data Dissemination Standard (SDDS),2010,"Multiple Indicator Cluster Survey (MICS), 2011/12","Integrated household survey (IHS), 2012",Yes,2013,2002.0,2012.0,2000.0,
8,ARM,Armenia,Armenia,Republic of Armenia,AM,Armenian dram,,Europe & Central Asia,Lower middle income,AM,...,Special Data Dissemination Standard (SDDS),2011,"Demographic and Health Survey (DHS), 2010","Integrated household survey (IHS), 2012",Yes,2013/14,,2012.0,2007.0,
9,ASM,American Samoa,American Samoa,American Samoa,AS,U.S. dollar,,East Asia & Pacific,Upper middle income,AS,...,,2010,,,Yes,2007,,,,


En plus des pays, cette liste contient différentes zones géographiques/économiques qui ne sont pas de véritables pays.

Pour filtrer les lignes qui ne correspondent pas à des pays réels, on récupère une liste de codes pays depuis  l'URL https://datahub.io/core/country-codes/r/country-codes.csv . 


In [2146]:
country_codes = pd.read_csv("https://datahub.io/core/country-codes/r/country-codes.csv")
country_codes.head()

Unnamed: 0,FIFA,Dial,ISO3166-1-Alpha-3,MARC,is_independent,ISO3166-1-numeric,GAUL,FIPS,WMO,ISO3166-1-Alpha-2,...,Sub-region Name,official_name_ru,Global Name,Capital,Continent,TLD,Languages,Geoname ID,CLDR display name,EDGAR
0,TPE,886,TWN,ch,Yes,158.0,925,TW,,TW,...,,,,Taipei,AS,.tw,"zh-TW,zh,nan,hak",1668284.0,Taiwan,
1,AFG,93,AFG,af,Yes,4.0,1,AF,AF,AF,...,Southern Asia,Афганистан,World,Kabul,AS,.af,"fa-AF,ps,uz-AF,tk",1149361.0,Afghanistan,B2
2,ALB,355,ALB,aa,Yes,8.0,3,AL,AB,AL,...,Southern Europe,Албания,World,Tirana,EU,.al,"sq,el",783754.0,Albania,B3
3,ALG,213,DZA,ae,Yes,12.0,4,AG,AL,DZ,...,Northern Africa,Алжир,World,Algiers,AF,.dz,ar-DZ,2589581.0,Algeria,B4
4,ASA,1-684,ASM,as,Territory of US,16.0,5,AQ,,AS,...,Polynesia,Американское Самоа,World,Pago Pago,OC,.as,"en-AS,sm,to",5880801.0,American Samoa,B5


On constate une variable "ISO3166-1-Alpha-2" de country_codes qui correspond la variable "2-alpha code" de countries.

On peut donc filtrer le dataframe countries en ne gardant que les individus dont la variable "2-alpha code" est incluse dans la colonne "ISO3166-1-Alpha-2" de country_codes

In [2147]:
countries = countries.loc[countries["2-alpha code"].isin(country_codes["ISO3166-1-Alpha-2"])]

countries.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 216 entries, 0 to 240
Data columns (total 32 columns):
 #   Column                                             Non-Null Count  Dtype  
---  ------                                             --------------  -----  
 0   Country Code                                       216 non-null    object 
 1   Short Name                                         216 non-null    object 
 2   Table Name                                         216 non-null    object 
 3   Long Name                                          216 non-null    object 
 4   2-alpha code                                       213 non-null    object 
 5   Currency Unit                                      215 non-null    object 
 6   Special Notes                                      120 non-null    object 
 7   Region                                             214 non-null    object 
 8   Income Group                                       214 non-null    object 
 9   WB-2 code 

Il y avait donc dans le fichier de départ des pays 241 - 216 = 25 lignes qui ne correspondaient pas à des pays réels.

### <a id='toc1_7_'></a>[Chargement des données à analyser](#toc0_)

On charge le fichier EdStatsData.csv

In [2148]:
data= pd.read_csv("assets/EdStatsData.csv")
data.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1970,1971,1972,1973,1974,1975,...,2060,2065,2070,2075,2080,2085,2090,2095,2100,Unnamed: 69
0,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2,,,,,,,...,,,,,,,,,,
1,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2.F,,,,,,,...,,,,,,,,,,
2,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2.GPI,,,,,,,...,,,,,,,,,,
3,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2.M,,,,,,,...,,,,,,,,,,
4,Arab World,ARB,"Adjusted net enrolment rate, primary, both sex...",SE.PRM.TENR,54.822121,54.894138,56.209438,57.267109,57.991138,59.36554,...,,,,,,,,,,


In [2149]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 886930 entries, 0 to 886929
Data columns (total 70 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   Country Name    886930 non-null  object 
 1   Country Code    886930 non-null  object 
 2   Indicator Name  886930 non-null  object 
 3   Indicator Code  886930 non-null  object 
 4   1970            72288 non-null   float64
 5   1971            35537 non-null   float64
 6   1972            35619 non-null   float64
 7   1973            35545 non-null   float64
 8   1974            35730 non-null   float64
 9   1975            87306 non-null   float64
 10  1976            37483 non-null   float64
 11  1977            37574 non-null   float64
 12  1978            37576 non-null   float64
 13  1979            36809 non-null   float64
 14  1980            89122 non-null   float64
 15  1981            38777 non-null   float64
 16  1982            37511 non-null   float64
 17  1983      

In [2150]:
data.assign(nb=1).groupby(["Country Code"]).agg({"nb": "sum"}).reset_index().sort_values(by="nb")


Unnamed: 0,Country Code,nb
0,ABW,3665
153,MRT,3665
154,MUS,3665
155,MWI,3665
156,MYS,3665
...,...,...
86,GTM,3665
87,GUM,3665
88,GUY,3665
60,ECS,3665


In [2151]:
data.assign(nb2=1).groupby("Country Name").agg({"nb2": "sum"}).reset_index().sort_values(by="nb2")

Unnamed: 0,Country Name,nb2
0,Afghanistan,3665
153,Myanmar,3665
154,Namibia,3665
155,Nauru,3665
156,Nepal,3665
...,...,...
86,Guatemala,3665
87,Guinea,3665
88,Guinea-Bissau,3665
60,Ecuador,3665


In [2152]:
data.assign(nb=1).groupby(["Indicator Code"]).agg({"nb": "sum"}).reset_index().sort_values(by="nb")


Unnamed: 0,Indicator Code,nb
0,BAR.NOED.1519.FE.ZS,242
2436,SP.POP.0710.MA.UN,242
2437,SP.POP.0710.TO.UN,242
2438,SP.POP.0711.FE.UN,242
2439,SP.POP.0711.MA.UN,242
...,...,...
1225,LO.PASEC.MAT5.LO.MA,242
1226,LO.PASEC.MAT5.MA,242
1227,LO.PASEC.REA.2,242
1229,LO.PASEC.REA.2.L0,242


On constate que chaque indicateur est associé à chaque pays dans data.

Vu le grand nombre de colonnes, on affiche les noms des colonnes sous forme d'une liste

In [2153]:
cols = list(data)
print(cols)

['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977', '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2020', '2025', '2030', '2035', '2040', '2045', '2050', '2055', '2060', '2065', '2070', '2075', '2080', '2085', '2090', '2095', '2100', 'Unnamed: 69']


### <a id='toc1_8_'></a>[Filtrage des données par pays](#toc0_)
Nous devons exclure du dataframe précedent les pays qui ne sont pas de réels pays

In [2154]:
real_country_codes = countries["Country Code"]
data_orig = data.copy()

data = data.loc[data["Country Code"].isin(real_country_codes)]
data.reset_index(drop=True, inplace=True)

print(f"""Nombre de lignes d'indicateurs avec tous les pays (non-réels compris): {data_orig.shape[0]}""")
print(f"""Nombre de lignes d'indicateurs pour les pays réels: {data.shape[0]}""")

Nombre de lignes d'indicateurs avec tous les pays (non-réels compris): 886930
Nombre de lignes d'indicateurs pour les pays réels: 791640


### <a id='toc1_9_'></a>[Filtrage des données par indicateurs](#toc0_)

Les indicateurs retenus sont dans la liste "indicators" définie dans le chapitre "Paramètres initiaux". On filtre donc le dataframe pour ne retenir que les individus dont la variable "Indicator Code" est contenue dans cette liste.

In [2155]:
data = data.loc[data["Indicator Code"].isin(indicators), :]
data.reset_index(drop=True, inplace=True)
data

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1970,1971,1972,1973,1974,1975,...,2060,2065,2070,2075,2080,2085,2090,2095,2100,Unnamed: 69
0,Afghanistan,AFG,GDP at market prices (constant 2005 US$),NY.GDP.MKTP.KD,,,,,,,...,,,,,,,,,,
1,Afghanistan,AFG,"GDP per capita, PPP (constant 2011 internation...",NY.GDP.PCAP.PP.KD,,,,,,,...,,,,,,,,,,
2,Afghanistan,AFG,Internet users (per 100 people),IT.NET.USER.P2,,,,,,,...,,,,,,,,,,
3,Afghanistan,AFG,Personal computers (per 100 people),IT.CMP.PCMP.P2,,,,,,,...,,,,,,,,,,
4,Afghanistan,AFG,"Population, ages 0-14, total",SP.POP.0014.TO,4919245.0,5087088.0,5252450.0,5411955.0,5559831.0,5688570.0,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1507,Zimbabwe,ZWE,Internet users (per 100 people),IT.NET.USER.P2,,,,,,,...,,,,,,,,,,
1508,Zimbabwe,ZWE,Personal computers (per 100 people),IT.CMP.PCMP.P2,,,,,,,...,,,,,,,,,,
1509,Zimbabwe,ZWE,"Population, ages 0-14, total",SP.POP.0014.TO,2494843.0,2583782.0,2677958.0,2775375.0,2872434.0,2967188.0,...,,,,,,,,,,
1510,Zimbabwe,ZWE,"Population, ages 14-19, total",SP.POP.1419.TO.UN,,,,,,,...,,,,,,,,,,


Visualisons le nombre d'indicateurs (retenus) fournis par année passée

In [2156]:
import datetime

today = datetime.date.today();
this_year = today.year

nb_indicators_years = []
nb_indicators_numbers = []
nb_countries_indicated = []
for year in cols[4:-1]:
    if int(year)>=this_year:
        continue
    nb_indicators_years.append(year)
    nb_indicators_numbers.append(len(data.loc[~data[year].isna(),:]["Indicator Code"].unique()))
df_nb_indicators_per_year = pd.DataFrame({'year': nb_indicators_years, 'count': nb_indicators_numbers})
px.bar(df_nb_indicators_per_year, x='year', y='count', title="Nombre max d'indicateurs renseignés par année", labels={"year":"Année", "count":"Nb d'indicateurs"}, width=1000, height=460).show()

Ici, pour l'année 1970, on compte 5 indicateurs différents renseignés au moins une fois. 

In [2157]:
nb_indicators_years = []
nb_indicators_nb = []
df_rep_indicators_per_nb = pd.DataFrame()

dict_rep_indicators_per_nb = {
    "year": [],
    "Nombre indicateurs renseignés": [],
    "Nombre pays": []
}
col_names = []
for i in range(0,8):
    col_names.append(f"""{i} indicateur{"s" if i>1 else ""}""")

for year in cols[4:-1]:
    if int(year)>=this_year:
        continue
    #df_tmp = data.loc[~data[year].isna(),:].assign(nb=1)
    df_tmp = pd.concat([data.loc[~data[year].isna(),:].assign(nb=1), data.loc[data[year].isna(),:].assign(nb=0)])
    df_tmp = df_tmp.groupby(by="Country Code").agg({"nb": "sum"}).reset_index()
    for i in range(0,8):
        dict_rep_indicators_per_nb["year"].append(year)
        dict_rep_indicators_per_nb["Nombre indicateurs renseignés"].append(col_names[i])
        dict_rep_indicators_per_nb["Nombre pays"].append(df_tmp.loc[df_tmp["nb"]==i].shape[0])
    
df_rep_indicators_per_nb = pd.DataFrame(dict_rep_indicators_per_nb)

fig = px.bar(df_rep_indicators_per_nb, x="year", y="Nombre pays", 
    title="Répartition du nombre de pays renseignés par année et par nombre d'indicateurs",
    color = "Nombre indicateurs renseignés",
    labels={"year": "Année", "Nombre indicateurs renseignés": "Nb indicateurs<br>renseignés"},
    height=460, width=1100
)
fig.show()


### <a id='toc1_10_'></a>[Simplification des données selon les années](#toc0_)

Nous avons vu que que les colonnes comportaient les valeurs pour différentes années entre 1970 et 2100 mais pas de manière uniforme.
Ici, nous ne prendrons en compte que la dernière valeur passée définie.

Pour cela, nous définissons une fonction get_latest_value qui va, pour un individu donné, partir de l'année en cours et qui va chercher la 1ère année qui contient une valeur. La recherche se fera jusqu'à 1970.

In [2091]:


def get_latest_value(row):
    current_year = this_year
    while current_year>=1970: 
        s_current_year = str(current_year)
        # cols a été calculé plus haut et contient les noms de toutes les variables du dataframe
        if s_current_year in cols and not pd.isna(row[s_current_year]):
            return row[s_current_year]
        else:
            current_year -= 1
    return None



On ajoute une variable latest_value au dataframe qui contiendra le résultat de l'appel de la fonction get_latest_value pour chaque individu

In [2092]:
data["latest_value"] = data.apply(lambda x: get_latest_value(x), axis=1)
data

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1970,1971,1972,1973,1974,1975,...,2065,2070,2075,2080,2085,2090,2095,2100,Unnamed: 69,latest_value
0,Afghanistan,AFG,GDP at market prices (constant 2005 US$),NY.GDP.MKTP.KD,,,,,,,...,,,,,,,,,,2.066392e+10
1,Afghanistan,AFG,"GDP per capita, PPP (constant 2011 internation...",NY.GDP.PCAP.PP.KD,,,,,,,...,,,,,,,,,,1.739583e+03
2,Afghanistan,AFG,Internet users (per 100 people),IT.NET.USER.P2,,,,,,,...,,,,,,,,,,1.059573e+01
3,Afghanistan,AFG,Personal computers (per 100 people),IT.CMP.PCMP.P2,,,,,,,...,,,,,,,,,,3.901483e-01
4,Afghanistan,AFG,"Population, ages 0-14, total",SP.POP.0014.TO,4919245.0,5087088.0,5252450.0,5411955.0,5559831.0,5688570.0,...,,,,,,,,,,1.519997e+07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1507,Zimbabwe,ZWE,Internet users (per 100 people),IT.NET.USER.P2,,,,,,,...,,,,,,,,,,2.311999e+01
1508,Zimbabwe,ZWE,Personal computers (per 100 people),IT.CMP.PCMP.P2,,,,,,,...,,,,,,,,,,7.431140e+00
1509,Zimbabwe,ZWE,"Population, ages 0-14, total",SP.POP.0014.TO,2494843.0,2583782.0,2677958.0,2775375.0,2872434.0,2967188.0,...,,,,,,,,,,6.682925e+06
1510,Zimbabwe,ZWE,"Population, ages 14-19, total",SP.POP.1419.TO.UN,,,,,,,...,,,,,,,,,,1.981469e+06


on supprime les colonnes correspondants aux années, pour simplifier le dataframe

In [2093]:
current_year = 2100
while current_year>=1970: 
    s_current_year = str(current_year)
    if s_current_year in cols:
        del data[s_current_year]
    current_year -= 1
    
data.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,Unnamed: 69,latest_value
0,Afghanistan,AFG,GDP at market prices (constant 2005 US$),NY.GDP.MKTP.KD,,20663920000.0
1,Afghanistan,AFG,"GDP per capita, PPP (constant 2011 internation...",NY.GDP.PCAP.PP.KD,,1739.583
2,Afghanistan,AFG,Internet users (per 100 people),IT.NET.USER.P2,,10.59573
3,Afghanistan,AFG,Personal computers (per 100 people),IT.CMP.PCMP.P2,,0.3901483
4,Afghanistan,AFG,"Population, ages 0-14, total",SP.POP.0014.TO,,15199970.0


Calculons le nombre d'indicateurs renseignés par pays

In [2094]:
data.loc[(data["Indicator Code"]=="IT.CMP.PCMP.P2") & (~data["latest_value"].isna())]

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,Unnamed: 69,latest_value
3,Afghanistan,AFG,Personal computers (per 100 people),IT.CMP.PCMP.P2,,0.390148
10,Albania,ALB,Personal computers (per 100 people),IT.CMP.PCMP.P2,,4.593538
17,Algeria,DZA,Personal computers (per 100 people),IT.CMP.PCMP.P2,,1.030597
38,Angola,AGO,Personal computers (per 100 people),IT.CMP.PCMP.P2,,0.646019
45,Antigua and Barbuda,ATG,Personal computers (per 100 people),IT.CMP.PCMP.P2,,20.966370
...,...,...,...,...,...,...
1480,Virgin Islands (U.S.),VIR,Personal computers (per 100 people),IT.CMP.PCMP.P2,,2.783758
1487,West Bank and Gaza,PSE,Personal computers (per 100 people),IT.CMP.PCMP.P2,,5.477750
1494,"Yemen, Rep.",YEM,Personal computers (per 100 people),IT.CMP.PCMP.P2,,2.903922
1501,Zambia,ZMB,Personal computers (per 100 people),IT.CMP.PCMP.P2,,1.142108


In [2095]:
df_indic_pays = data.loc[~data["latest_value"].isna(),["Country Code", "Indicator Code"]].reset_index(drop=True).groupby(by="Country Code").count().reset_index().rename(columns={"Indicator Code": "Nb indicateurs"})
rep_indicateurs_pays = df_indic_pays.groupby(by="Nb indicateurs").count().reset_index().rename(columns={"Country Code": "Nb pays"})
fig = px.pie(rep_indicateurs_pays, values='Nb pays', names='Nb indicateurs', title="Répartition des pays par Nombre d'indicateurs renseignés", height=500, width=700, labels= {"7": "toto"})
fig.update_traces(textposition='inside')
fig.update_layout(uniformtext_minsize=12, uniformtext_mode='hide', legend=dict(
    title="Nombre d'indicateurs<br>renseignés"
))
fig.show()


Nous voyons que 81.4% des pays ont les 7 indicateurs choisis remplis.

### <a id='toc1_11_'></a>[Agrégation des indicateurs par pays](#toc0_)
Ici, on transforme le dataframe pour n'obtenir qu'un seul individu par pays. Les différents indicateurs seront des variables.

<ul>
<li>1. On crée un dataframe data2 vide</li>
<li>2. Pour chacun des pays:
<ul>
    <li>2.1. On crée un dictionnaire country_dict  dans lequel on met le code du pays, le nom du pays, ainsi que les valeurs de chaque indicateur.</li>
    <li>2.2. On transforme le dictionnaire en dataframe à 1 ligne: country_df</li>
    <li>2.3. On ajoute country_df à la fin de data2</li>
</ul>    
</li>
<li>3. On renomme data2 en data</li>

In [2096]:
data2 = pd.DataFrame() #1
# on boucle sur les différents pays
for country_code in data["Country Code"].unique(): #2
    '''
    2.1
    '''
    tmp = data.loc[data["Country Code"]==country_code]
    country_dict = {"Country Code": [country_code], "Country Name": [tmp["Country Name"].iloc[0]]} #2.1
    for indicator in indicators:
        country_dict[indicator] = [tmp.loc[tmp["Indicator Code"]==indicator, :]["latest_value"].iloc[0]]

    country_df = pd.DataFrame(country_dict) # 2.2
    data2 = pd.concat([data2, country_df], ignore_index=True) #2.3

data = data2 #3

data.head()


Unnamed: 0,Country Code,Country Name,IT.CMP.PCMP.P2,IT.NET.USER.P2,NY.GDP.PCAP.PP.KD,NY.GDP.MKTP.KD,SP.POP.TOTL,SP.POP.1419.TO.UN,SP.POP.0014.TO
0,AFG,Afghanistan,0.390148,10.595726,1739.583177,20663920000.0,34656032.0,4810680.0,15199971.0
1,ALB,Albania,4.593538,66.363445,11359.099157,13474440000.0,2876101.0,312759.0,509714.0
2,DZA,Algeria,1.030597,42.945527,13921.180022,196034800000.0,40606052.0,3556170.0,11777882.0
3,ASM,American Samoa,,0.0,,534555100.0,55599.0,,
4,AND,Andorra,,97.930637,,3298477000.0,77281.0,4556.0,


### <a id='toc1_12_'></a>[Ajout de la variable Region](#toc0_)

In [2097]:
countries2 = countries.loc[:,["Country Code", "Region"]]
data = data.merge(countries2, on="Country Code")
data.head()

Unnamed: 0,Country Code,Country Name,IT.CMP.PCMP.P2,IT.NET.USER.P2,NY.GDP.PCAP.PP.KD,NY.GDP.MKTP.KD,SP.POP.TOTL,SP.POP.1419.TO.UN,SP.POP.0014.TO,Region
0,AFG,Afghanistan,0.390148,10.595726,1739.583177,20663920000.0,34656032.0,4810680.0,15199971.0,South Asia
1,ALB,Albania,4.593538,66.363445,11359.099157,13474440000.0,2876101.0,312759.0,509714.0,Europe & Central Asia
2,DZA,Algeria,1.030597,42.945527,13921.180022,196034800000.0,40606052.0,3556170.0,11777882.0,Middle East & North Africa
3,ASM,American Samoa,,0.0,,534555100.0,55599.0,,,East Asia & Pacific
4,AND,Andorra,,97.930637,,3298477000.0,77281.0,4556.0,,Europe & Central Asia


### <a id='toc1_13_'></a>[Renommage des colonnes pour simplification](#toc0_)
On utilise le tableau final_indicators initialement défini pour renommer les colonnnes 

In [2098]:
def rename_column(t):
    for f_i in final_indicators:
        if f_i["indicators"]==t:
            return f_i["name"]
    return t

data = data.rename(columns=rename_column).reset_index(drop=True)
data.head()


Unnamed: 0,Country Code,Country Name,% PC Population,% Internet Population,PIB/habitant,PIB,Population Totale,Population 1419,Population -14,Region
0,AFG,Afghanistan,0.390148,10.595726,1739.583177,20663920000.0,34656032.0,4810680.0,15199971.0,South Asia
1,ALB,Albania,4.593538,66.363445,11359.099157,13474440000.0,2876101.0,312759.0,509714.0,Europe & Central Asia
2,DZA,Algeria,1.030597,42.945527,13921.180022,196034800000.0,40606052.0,3556170.0,11777882.0,Middle East & North Africa
3,ASM,American Samoa,,0.0,,534555100.0,55599.0,,,East Asia & Pacific
4,AND,Andorra,,97.930637,,3298477000.0,77281.0,4556.0,,Europe & Central Asia


On cherche les infos à compléter

In [2099]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 216 entries, 0 to 215
Data columns (total 10 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Country Code           216 non-null    object 
 1   Country Name           216 non-null    object 
 2   % PC Population        193 non-null    float64
 3   % Internet Population  208 non-null    float64
 4   PIB/habitant           192 non-null    float64
 5   PIB                    202 non-null    float64
 6   Population Totale      215 non-null    float64
 7   Population 1419        192 non-null    float64
 8   Population -14         194 non-null    float64
 9   Region                 214 non-null    object 
dtypes: float64(7), object(3)
memory usage: 17.0+ KB


### <a id='toc1_14_'></a>[Nettoyage](#toc0_)

#### <a id='toc1_14_1_'></a>[Population totale](#toc0_)
On constate un pays qui n'a pas sa population totale. Examinons ses données en détails.

In [2100]:
data.loc[data["Population Totale"].isna()]

Unnamed: 0,Country Code,Country Name,% PC Population,% Internet Population,PIB/habitant,PIB,Population Totale,Population 1419,Population -14,Region
136,NRU,Nauru,,,,,,,,


Aucune donnée n'est fournie pour ce pays. On décide donc de le supprimer du dataframe

In [2101]:
data = data.loc[~data["Population Totale"].isna()].reset_index(drop=True)
data

Unnamed: 0,Country Code,Country Name,% PC Population,% Internet Population,PIB/habitant,PIB,Population Totale,Population 1419,Population -14,Region
0,AFG,Afghanistan,0.390148,10.595726,1739.583177,2.066392e+10,34656032.0,4810680.0,15199971.0,South Asia
1,ALB,Albania,4.593538,66.363445,11359.099157,1.347444e+10,2876101.0,312759.0,509714.0,Europe & Central Asia
2,DZA,Algeria,1.030597,42.945527,13921.180022,1.960348e+11,40606052.0,3556170.0,11777882.0,Middle East & North Africa
3,ASM,American Samoa,,0.000000,,5.345551e+08,55599.0,,,East Asia & Pacific
4,AND,Andorra,,97.930637,,3.298477e+09,77281.0,4556.0,,Europe & Central Asia
...,...,...,...,...,...,...,...,...,...,...
210,VIR,Virgin Islands (U.S.),2.783758,59.608316,,3.165495e+09,102951.0,,20839.0,Latin America & Caribbean
211,PSE,West Bank and Gaza,5.477750,61.178385,,1.170060e+10,4551566.0,708845.0,1812828.0,Middle East & North Africa
212,YEM,"Yemen, Rep.",2.903922,24.579208,2325.069960,1.874809e+10,27584213.0,3810911.0,11104607.0,Middle East & North Africa
213,ZMB,Zambia,1.142108,25.506579,3646.962322,2.699875e+10,16591390.0,2014677.0,7481345.0,Sub-Saharan Africa


#### <a id='toc1_14_2_'></a>[Région](#toc0_)

In [2102]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 215 entries, 0 to 214
Data columns (total 10 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Country Code           215 non-null    object 
 1   Country Name           215 non-null    object 
 2   % PC Population        193 non-null    float64
 3   % Internet Population  208 non-null    float64
 4   PIB/habitant           192 non-null    float64
 5   PIB                    202 non-null    float64
 6   Population Totale      215 non-null    float64
 7   Population 1419        192 non-null    float64
 8   Population -14         194 non-null    float64
 9   Region                 214 non-null    object 
dtypes: float64(7), object(3)
memory usage: 16.9+ KB


On constate aussi qu'un pays n'a pas de région

In [2103]:
data.loc[data["Region"].isna()]

Unnamed: 0,Country Code,Country Name,% PC Population,% Internet Population,PIB/habitant,PIB,Population Totale,Population 1419,Population -14,Region
73,GIB,Gibraltar,54.842602,94.444472,,,34408.0,,,


En l'occurence un seul pays. Le pays étant très petit, nous décidons de le sortir de notre dataframe (on considerera qu'il appartient au Royaume-Uni):

In [2104]:
data = data.loc[data["Country Code"]!="GIB"].reset_index(drop=True)

In [2105]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 214 entries, 0 to 213
Data columns (total 10 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Country Code           214 non-null    object 
 1   Country Name           214 non-null    object 
 2   % PC Population        192 non-null    float64
 3   % Internet Population  207 non-null    float64
 4   PIB/habitant           192 non-null    float64
 5   PIB                    202 non-null    float64
 6   Population Totale      214 non-null    float64
 7   Population 1419        192 non-null    float64
 8   Population -14         194 non-null    float64
 9   Region                 214 non-null    object 
dtypes: float64(7), object(3)
memory usage: 16.8+ KB


### <a id='toc1_15_'></a>[Imputation](#toc0_)
#### <a id='toc1_15_1_'></a>[PIB / habitant](#toc0_)
Pour les individus dont le PIB/habitant n'est pas fourni, on divise le PIB par le nombre total d'habitants

In [2106]:
data.loc[(data["PIB/habitant"].isna()),"PIB/habitant"] = data.loc[(data["PIB/habitant"].isna()),"PIB"] / data.loc[(data["PIB/habitant"].isna()),"Population Totale"]
data.loc[(data["PIB/habitant"].isna())]

Unnamed: 0,Country Code,Country Name,% PC Population,% Internet Population,PIB/habitant,PIB,Population Totale,Population 1419,Population -14,Region
38,CHI,Channel Islands,,,,,164541.0,,24052.0,Europe & Central Asia
49,CUW,Curacao,,,,,153500.0,,29702.0,Latin America & Caribbean
67,PYF,French Polynesia,10.985389,68.440419,,,280208.0,,66224.0,East Asia & Pacific
100,PRK,"Korea, Dem. People’s Rep.",,0.0,,,25368620.0,2321592.0,5294350.0,East Asia & Pacific
137,NCL,New Caledonia,17.489091,74.001753,,,278000.0,,63099.0,East Asia & Pacific
168,SXM,Sint Maarten (Dutch part),,,,,39689.0,,,Latin America & Caribbean
172,SOM,Somalia,0.885798,1.88,,,14317996.0,1434880.0,6663518.0,Sub-Saharan Africa
179,MAF,St. Martin (French part),,,,,31264.0,,,Latin America & Caribbean
186,SYR,Syrian Arab Republic,9.201759,31.869627,,,18430453.0,2726938.0,6886362.0,Middle East & North Africa
197,TCA,Turks and Caicos Islands,,0.0,,,34900.0,,,Latin America & Caribbean


#### <a id='toc1_15_2_'></a>[Population de 14 à 19 ans](#toc0_)

Nous constatons qu'il y a encore des valeurs non renseignées pour la variable "Population 1419"
Pour les imputer, nous allons faire l'hypothèse que la population est uniformément répartie en fonction de l'âge entre 0 et 19 ans. Nous aurons donc:
> Population 1419 = $\frac{"Population -14" * 6}{14}$ 

In [2107]:
data.loc[(data["Population 1419"].isna()),"Population 1419"] = data["Population -14"] * 6 / 14

data.loc[data["Population -14"].isna() | data["Population 1419"].isna()]

Unnamed: 0,Country Code,Country Name,% PC Population,% Internet Population,PIB/habitant,PIB,Population Totale,Population 1419,Population -14,Region
3,ASM,American Samoa,,0.0,9614.472672,534555100.0,55599.0,,,East Asia & Pacific
4,AND,Andorra,,97.930637,42681.603824,3298477000.0,77281.0,4556.0,,Europe & Central Asia
21,BMU,Bermuda,22.296631,98.0,50669.31477,5151445000.0,65331.0,4413.0,,North America
35,CYM,Cayman Islands,,79.0,49903.02854,,60765.0,3130.0,,Latin America & Caribbean
54,DMA,Dominica,18.485603,67.03,10151.342523,506021900.0,73543.0,8196.0,,Latin America & Caribbean
63,FRO,Faroe Islands,6.7251,95.108682,46850.747947,2301168000.0,49117.0,,,Europe & Central Asia
74,GRL,Greenland,10.750954,68.500874,41382.527421,2325119000.0,56186.0,,,Europe & Central Asia
91,IMN,Isle of Man,,,83474.031719,6989865000.0,83737.0,,,Europe & Central Asia
102,XKX,Kosovo,,,9331.687718,7065142000.0,1816200.0,,,Europe & Central Asia
111,LIE,Liechtenstein,,98.093904,134932.471674,5082366000.0,37666.0,2513.0,,Europe & Central Asia


#### <a id='toc1_15_3_'></a>[Population de moins de 14 ans](#toc0_)
Ici, on peut encore, en faisant encore l'hypothèse d'une population uniformément répartie sur les ages, déduire la Population de moins de 14 ans:
> Population -14 = $\frac{"Population -1419" * 14}{6}$ 

In [2108]:
data.loc[(data["Population -14"].isna()),"Population -14"] = data["Population 1419"] * 6 / 14

data.loc[data["Population -14"].isna() | data["Population 1419"].isna()]

Unnamed: 0,Country Code,Country Name,% PC Population,% Internet Population,PIB/habitant,PIB,Population Totale,Population 1419,Population -14,Region
3,ASM,American Samoa,,0.0,9614.472672,534555100.0,55599.0,,,East Asia & Pacific
63,FRO,Faroe Islands,6.7251,95.108682,46850.747947,2301168000.0,49117.0,,,Europe & Central Asia
74,GRL,Greenland,10.750954,68.500874,41382.527421,2325119000.0,56186.0,,,Europe & Central Asia
91,IMN,Isle of Man,,,83474.031719,6989865000.0,83737.0,,,Europe & Central Asia
102,XKX,Kosovo,,,9331.687718,7065142000.0,1816200.0,,,Europe & Central Asia
128,MCO,Monaco,,95.208178,138982.176247,5350675000.0,38499.0,,,Europe & Central Asia
142,MNP,Northern Mariana Islands,,0.0,19115.376475,1051785000.0,55023.0,,,East Asia & Pacific
160,SMR,San Marino,80.554367,49.6,56500.469107,1746352000.0,33203.0,,,Europe & Central Asia
168,SXM,Sint Maarten (Dutch part),,,,,39689.0,,,Latin America & Caribbean
179,MAF,St. Martin (French part),,,,,31264.0,,,Latin America & Caribbean


### <a id='toc1_16_'></a>[Nettoyage des autres indicateurs](#toc0_)
Il nous reste à voir les pays avec des données encore manquantes

In [2109]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 214 entries, 0 to 213
Data columns (total 10 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Country Code           214 non-null    object 
 1   Country Name           214 non-null    object 
 2   % PC Population        192 non-null    float64
 3   % Internet Population  207 non-null    float64
 4   PIB/habitant           204 non-null    float64
 5   PIB                    202 non-null    float64
 6   Population Totale      214 non-null    float64
 7   Population 1419        202 non-null    float64
 8   Population -14         202 non-null    float64
 9   Region                 214 non-null    object 
dtypes: float64(7), object(3)
memory usage: 16.8+ KB


In [2110]:
data.loc[(data["% PC Population"].isna() | data["% Internet Population"].isna() | data["PIB/habitant"].isna() |  data["Population 1419"].isna() |  data["Population -14"].isna() )]

Unnamed: 0,Country Code,Country Name,% PC Population,% Internet Population,PIB/habitant,PIB,Population Totale,Population 1419,Population -14,Region
3,ASM,American Samoa,,0.0,9614.472672,534555100.0,55599.0,,,East Asia & Pacific
4,AND,Andorra,,97.930637,42681.603824,3298477000.0,77281.0,4556.0,1952.571,Europe & Central Asia
35,CYM,Cayman Islands,,79.0,49903.02854,,60765.0,3130.0,1341.429,Latin America & Caribbean
38,CHI,Channel Islands,,,,,164541.0,10308.0,24052.0,Europe & Central Asia
49,CUW,Curacao,,,,,153500.0,12729.43,29702.0,Latin America & Caribbean
63,FRO,Faroe Islands,6.7251,95.108682,46850.747947,2301168000.0,49117.0,,,Europe & Central Asia
67,PYF,French Polynesia,10.985389,68.440419,,,280208.0,28381.71,66224.0,East Asia & Pacific
74,GRL,Greenland,10.750954,68.500874,41382.527421,2325119000.0,56186.0,,,Europe & Central Asia
76,GUM,Guam,,77.01,32013.769377,5214915000.0,162896.0,17519.57,40879.0,East Asia & Pacific
91,IMN,Isle of Man,,,83474.031719,6989865000.0,83737.0,,,Europe & Central Asia


Nous constatons ici qu'une grosse partie des pays pour lesquels il manque des informations ont leur population relativement petite.
Nous allons donc supprimer les individus "à petite population" (défini par le seuil population_threshold)


In [2111]:
data = data.loc[data["Population Totale"]>population_threshold].reset_index(drop=True)

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 147 entries, 0 to 146
Data columns (total 10 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Country Code           147 non-null    object 
 1   Country Name           147 non-null    object 
 2   % PC Population        142 non-null    float64
 3   % Internet Population  146 non-null    float64
 4   PIB/habitant           144 non-null    float64
 5   PIB                    143 non-null    float64
 6   Population Totale      147 non-null    float64
 7   Population 1419        147 non-null    float64
 8   Population -14         147 non-null    float64
 9   Region                 147 non-null    object 
dtypes: float64(7), object(3)
memory usage: 11.6+ KB


### <a id='toc1_17_'></a>[Nettoyage](#toc0_)

In [2112]:
data.loc[(data["% PC Population"].isna() | data["% Internet Population"].isna() | data["PIB/habitant"].isna() |  data["Population 1419"].isna() |  data["Population -14"].isna() )]

Unnamed: 0,Country Code,Country Name,% PC Population,% Internet Population,PIB/habitant,PIB,Population Totale,Population 1419,Population -14,Region
65,KAZ,Kazakhstan,,76.8,23443.093603,188309200000.0,17797032.0,1269065.0,4876864.0,Europe & Central Asia
67,PRK,"Korea, Dem. People’s Rep.",,0.0,,,25368620.0,2321592.0,5294350.0,East Asia & Pacific
74,LBR,Liberia,,7.319106,753.557379,1627047000.0,4613823.0,659375.0,1942948.0,Sub-Saharan Africa
114,SLE,Sierra Leone,,11.773184,1368.829098,3369622000.0,7396190.0,876909.0,3137091.0,Sub-Saharan Africa
118,SOM,Somalia,0.885798,1.88,,,14317996.0,1434880.0,6663518.0,Sub-Saharan Africa
120,SSD,South Sudan,,,1964.63464,,11296173.0,2038944.0,4757537.0,Sub-Saharan Africa
126,SYR,Syrian Arab Republic,9.201759,31.869627,,,18430453.0,2726938.0,6886362.0,Middle East & North Africa


Nous supprimons les pays restant qui n'ont pas les informations complètes

In [2113]:
data = data.loc[~(data["% PC Population"].isna() | data["% Internet Population"].isna() | data["PIB/habitant"].isna() |  data["Population 1419"].isna() |  data["Population -14"].isna() )].reset_index(drop=True)

Nous n'avons plus besoin de la variable PIB (nous utiliserons la variable PIB/habitant)

In [2114]:
del data["PIB"]

### <a id='toc1_18_'></a>[Calcul de la population de plus de 20 ans](#toc0_)

Nous calculons la population de plus de 20 ans via la formule:
> "Population 20+" = "Population Totale" - "Population -14" - "Population 1419"

In [2115]:
data["Population 20+"] = data["Population Totale"] - data["Population -14"] - data["Population 1419"]

In [2116]:
del data["Population 1419"]
del data["Population -14"]

data.head()

Unnamed: 0,Country Code,Country Name,% PC Population,% Internet Population,PIB/habitant,Population Totale,Region,Population 20+
0,AFG,Afghanistan,0.390148,10.595726,1739.583177,34656032.0,South Asia,14645381.0
1,ALB,Albania,4.593538,66.363445,11359.099157,2876101.0,Europe & Central Asia,2053628.0
2,DZA,Algeria,1.030597,42.945527,13921.180022,40606052.0,Middle East & North Africa,25272000.0
3,AGO,Angola,0.646019,13.0,5984.640422,28813463.0,Sub-Saharan Africa,12527965.0
4,ARG,Argentina,9.05613,70.150764,18489.434893,43847430.0,Latin America & Caribbean,28761809.0


Faisons une vérification rapide de la cohérence des données

In [2117]:
data.describe()

Unnamed: 0,% PC Population,% Internet Population,PIB/habitant,Population Totale,Population 20+
count,140.0,140.0,140.0,140.0,140.0
mean,17.386068,49.248197,18052.863678,52041580.0,33435830.0
std,23.759731,28.187377,19444.178794,164863200.0,112016500.0
min,0.022211,1.177119,647.880445,2038501.0,845843.0
25%,1.715906,24.208873,3628.291879,5937780.0,3765750.0
50%,6.698709,52.840904,11343.25567,12156720.0,7646328.0
75%,22.993469,73.907339,25764.40843,38355720.0,22187690.0
max,96.170657,97.298204,118207.134623,1378665000.0,1022082000.0


Ici les pourcentages sont entre 0 et 100, les populations sont toutes au dela de 2 000 000 (seuil défini précédemment).

On voit que la population max est plus de 26 fois la moyenne de tous les pays, mais il s'agit de la Chine, ce n'est donc pas un outlier. Idem pour les autres points (Inde, Etats-Unis, Indonésie, Brésil, Pakistan, Nigéria ...)

In [2118]:
data.sort_values(by="Population Totale", ascending=False).head(10)

Unnamed: 0,Country Code,Country Name,% PC Population,% Internet Population,PIB/habitant,Population Totale,Region,Population 20+
26,CHN,China,5.588372,53.2,14399.44766,1378665000.0,East Asia & Pacific,1022082000.0
55,IND,India,3.192145,29.547163,6092.648097,1324171000.0,South Asia,803038800.0
131,USA,United States,79.915488,76.176737,53341.815958,323127500.0,North America,235547500.0
56,IDN,Indonesia,1.980591,25.366301,10764.547997,261115500.0,East Asia & Pacific,163646800.0
16,BRA,Brazil,16.116693,59.682747,14023.691623,207652900.0,Latin America & Caribbean,140915300.0
95,PAK,Pakistan,0.454261,15.514558,4854.632259,193203500.0,South Asia,102531200.0
92,NGA,Nigeria,0.859686,25.67,5434.734795,185989600.0,Sub-Saharan Africa,80303510.0
9,BGD,Bangladesh,2.415981,18.246938,3319.353556,162951600.0,South Asia,94912810.0
106,RUS,Russian Federation,13.220608,76.409085,24026.004916,144342400.0,Europe & Central Asia,111536600.0
80,MEX,Mexico,13.378918,59.540446,16832.463042,127540400.0,Latin America & Caribbean,80695950.0


In [2119]:
fig = go.Figure()
fig.add_trace(go.Box(x=data["Population Totale"], boxmean=True, name="Population Totale"))
fig.add_trace(go.Box(x=data["Population 20+"], boxmean=True, name="Population 20+"))
fig.update_layout(
    title="Répartition des Populations Totales des différents pays",
    height=600
)


fig.show()

### <a id='toc1_19_'></a>[Aggregation des données par régions](#toc0_)

Nous allons créer un 2ème dataframe data_regions qui reprendra les informations du premier dataframe en les regroupant par région

In [2120]:
def weighted(x, cols, w="Population Totale"):
    return pd.Series(np.average(x[cols], weights=x[w], axis=0), cols)

def summed(x, cols):
    return pd.Series(np.sum(x[cols], axis=0), cols)


data_regions1 = data.groupby("Region").apply(weighted, ["% PC Population", "% Internet Population", "PIB/habitant"]).reset_index()
data_regions2 = data.groupby("Region").apply(summed, ["Population Totale", "Population 20+"]).reset_index()

data_regions = data_regions2.merge(data_regions1, on="Region")
data_regions.head(10)

Unnamed: 0,Region,Population Totale,Population 20+,% PC Population,% Internet Population,PIB/habitant
0,East Asia & Pacific,2242385000.0,1605094000.0,9.83542,52.550446,15619.980566
1,Europe & Central Asia,885820400.0,670043900.0,33.881344,73.76564,29163.031712
2,Latin America & Caribbean,632734300.0,408090900.0,12.022304,56.355476,14201.591426
3,Middle East & North Africa,415485800.0,250832100.0,11.31006,48.367235,18318.394299
4,North America,359413900.0,263624800.0,81.382559,77.55618,52306.566379
5,South Asia,1765168000.0,1044994000.0,2.727815,26.464197,5617.309026
6,Sub-Saharan Africa,984814100.0,438337000.0,1.84865,20.378028,3476.605873


### <a id='toc1_20_'></a>[Visualisation globale des régions](#toc0_)

Nous allons utiliser graphiques en radar pour avoir une vue comparative des régions

In [2121]:

data_regions_normalized = data_regions.copy()
subjects = list(data_regions)[1:]
for s in subjects:
  data_regions_normalized[s] = data_regions_normalized[s] / np.max(data_regions_normalized[s])
subjects.append(subjects[0])

fig = go.Figure()

for region in data_regions["Region"]:
  r = data_regions_normalized.loc[data_regions["Region"]==region,].iloc[0].values.tolist()
  r = r[1:]
  r.append(r[0])
  fig.add_trace(go.Scatterpolar(
    r=r,
    theta=subjects,
    fill='toself',
    name=region
  ))

fig.update_layout(
  polar=dict(
    radialaxis=dict(
      visible=True,
      range=[0, 1]
    )),
  showlegend=True,
  height=600,
  width=1000,
  title=go.layout.Title(
        text="Répartition en radar des indicateurs retenus<br><sup>Chaque indicateur est rapporté par rapport au max de la variable correspondante</sup>",
        xref="paper",
        x=0
    )
)

fig.show()

Avec la vision globale en radar, nous voyons quels sont les indicateurs en faveur de chaque région.

Nous allons comparer les valeurs de chaque région via des boites à moustaches.

In [2122]:

indicateurs= list(data_regions)
indicateurs = subjects[1:]

data_regions = data_regions2.merge(data_regions1, on="Region")

# Use x instead of y argument for horizontal plot


for ind0,indicateur in enumerate(indicateurs):

    fig = go.Figure()
    moustaches_regions = []
    region_names = []
    for ind, region in enumerate(data_regions["Region"]):
        vals = data.loc[data["Region"]==region, indicateur]
        region_names.append(region)
        fig.add_trace(go.Box(x=vals, boxmean=True, name=region))
    fig.update_layout(
        title=indicateur
    )
    fig.show()


### <a id='toc1_21_'></a>[Création d'un indicateur composite](#toc0_)
Afin de pouvoir comparer les pays entre-eux, nous allons créer un indicateur "indice_confiance" qui synthétisera les différents indicateurs.

Pour rappel, nous avons comme indicateurs:
- La population totale
- La population des plus de 20 ans
- Le PIB/habitant
- Le pourcentage de PC par rapport à la population
- Le pourcentage de connexion Internet par rapport à la population


Au niveau individuel, les prospects doivent avoir un PC et une connexion Internet. Leur PIB/habitant comptera aussi dans le calcul individuel. Nous utiliserons un calcul des aires des triangles pour ce calcul:
> conf_individu = "% PC Population" * "% Internet Population" + "% PC Population" * 100 * $\frac{"PIB/habitant"}{max("PIB/habitant")}$ + "% Internet Population" * 100 * $\frac{"PIB/habitant"}{max("PIB/habitant")}$

<img src="assets/triangles.PNG">

L'indice de confiance individuel consiste à calculer les sommes des aires des triangles formés par:
- Le centre du camembert, le PIB/habitant et le pourcentage de PC dans la population
- Le centre du camembert, le PIB/habitant et le pourcentage de connexion Internet dans la population
- Le centre du camembert, le pourcentage de PC dans la population et le pourcentage de connexion Internet dans la population

<ins>Remarque:</ins> la véritable formule de l'aire d'un triangle est longueur A * longueur B * sin(angle(A,B)) . Nous avons 3 angles égaux, et nous savons qu'ensuite les valeurs seront normalisées, on peut donc supprimer les sinus du calcul.  

Au niveau des populations, on multipliera conf_individu par la racine carrée de la some de population agée de plus de 20 ans, et la population agée de moins de 20 ans multipliée par 0.25 (pour considérer le potentiel à plus long terme)

La formule utilisée sera donc:
> conf_globale = conf_individu * $\frac{"PIB/habitant"}{max("PIB/habitant")}$) * $\sqrt{( "Population 20+" + 0.25 * "Population 20-" )}$

ou encore :
> conf_globale = ("% PC Population" * "% Internet Population" + "% PC Population" * 100 * $\frac{"PIB/habitant"}{max("PIB/habitant")}$ + "% Internet Population" * 100 * $\frac{"PIB/habitant"}{max("PIB/habitant")}$) * $\sqrt{( "Population 20+" + 0.25 * "Population 20-" )}$

Nous avons utilisé ici la racine carrée de la population pour diminuer les effets des pays à très grande population (Chine et Inde)

#### <a id='toc1_21_1_'></a>[Indicateur de confiance appliqué aux régions](#toc0_)

In [2123]:
from math import sqrt

data_regions["Population 20-"] = data_regions["Population Totale"] - data_regions["Population 20+"]

def calcul_indicateur_confiance(x):
    return x["% PC Population"] * x["% Internet Population"] + x["% Internet Population"] * 100 * x["PIB/habitant"] / x["PIB/habitant"].max()

data_regions["indicateur_confiance_individuel"] = calcul_indicateur_confiance(data_regions)
#On normalise par rapport aux min et max
data_regions["indicateur_confiance_individuel"] = (data_regions["indicateur_confiance_individuel"] - data_regions["indicateur_confiance_individuel"].min()) / (data_regions["indicateur_confiance_individuel"].max() - data_regions["indicateur_confiance_individuel"].min())

data_regions["indicateur_confiance_global"] = round(data_regions["indicateur_confiance_individuel"] * np.sqrt(data_regions["Population 20+"] + 0.25 * data_regions["Population 20-"]))
#On normalise par rapport aux min et max
data_regions["indicateur_confiance_global"] = (data_regions["indicateur_confiance_global"] - data_regions["indicateur_confiance_global"].min()) / (data_regions["indicateur_confiance_global"].max() - data_regions["indicateur_confiance_global"].min())


data_regions.head(10)

Unnamed: 0,Region,Population Totale,Population 20+,% PC Population,% Internet Population,PIB/habitant,Population 20-,indicateur_confiance_individuel,indicateur_confiance_global
0,East Asia & Pacific,2242385000.0,1605094000.0,9.83542,52.550446,15619.980566,637291200.0,0.137685,0.341019
1,Europe & Central Asia,885820400.0,670043900.0,33.881344,73.76564,29163.031712,215776500.0,0.463423,0.735287
2,Latin America & Caribbean,632734300.0,408090900.0,12.022304,56.355476,14201.591426,224643400.0,0.146427,0.186048
3,Middle East & North Africa,415485800.0,250832100.0,11.31006,48.367235,18318.394299,164653700.0,0.148824,0.149959
4,North America,359413900.0,263624800.0,81.382559,77.55618,52306.566379,95789160.0,1.0,1.0
5,South Asia,1765168000.0,1044994000.0,2.727815,26.464197,5617.309026,720173900.0,0.013191,0.027244
6,Sub-Saharan Africa,984814100.0,438337000.0,1.84865,20.378028,3476.605873,546477100.0,0.0,0.0


Représentons cet indice de confiance sous forme de camembert pour les régions:

In [2124]:
from plotly.subplots import make_subplots



fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]], subplot_titles=['Par personne', 'Global'])
fig.add_trace(go.Pie(labels=data_regions['Region'], values=data_regions['indicateur_confiance_individuel'], name="Par personne"), 1, 1)
fig.add_trace(go.Pie(labels=data_regions['Region'], values=data_regions['indicateur_confiance_global'], name="Global"), 1, 2)
fig.update_layout(
    title="Répartition de l'indicateur de confiance par région",
    width=1400,
    height=800
)
fig.show()

#### <a id='toc1_21_2_'></a>[Indicateur de confiance appliqué aux pays](#toc0_)

In [2125]:
data["Population 20-"] = data["Population Totale"] - data["Population 20+"]

data["indicateur_confiance_individuel"] = calcul_indicateur_confiance(data)
data["indicateur_confiance_individuel"] = round((data["indicateur_confiance_individuel"] - data["indicateur_confiance_individuel"].min()) / (data["indicateur_confiance_individuel"].max() - data["indicateur_confiance_individuel"].min()), 3)

data["indicateur_confiance_global"] = round(data["indicateur_confiance_individuel"] * np.sqrt(data["Population 20+"] + 0.25 * data["Population 20-"]))
data["indicateur_confiance_global"] = (data["indicateur_confiance_global"] - data["indicateur_confiance_global"].min()) / (data["indicateur_confiance_global"].max() - data["indicateur_confiance_global"].min())


Nous pouvons facilement voir les 10 premiers pays selon notre indicateur de confiance par personne:

In [2126]:
data.sort_values(by="indicateur_confiance_individuel", ascending=False).head(10)

Unnamed: 0,Country Code,Country Name,% PC Population,% Internet Population,PIB/habitant,Population Totale,Region,Population 20+,Population 20-,indicateur_confiance_individuel,indicateur_confiance_global
119,CHE,Switzerland,96.170657,89.405568,57430.053265,8372098.0,Europe & Central Asia,6598551.0,1773547.0,1.0,0.224744
88,NLD,Netherlands,90.987849,90.410959,47302.70257,17018408.0,Europe & Central Asia,13000732.0,4017676.0,0.915,0.289948
22,CAN,Canada,94.446705,89.84,43087.757365,36286425.0,North America,28077236.0,8209189.0,0.909,0.422559
118,SWE,Sweden,88.011928,91.506828,46662.050625,9903122.0,Europe & Central Asia,7585659.0,2317463.0,0.901,0.218054
111,SGP,Singapore,72.061563,81.0,81443.356456,5607283.0,East Asia & Pacific,4388540.0,1218743.0,0.882,0.161826
93,NOR,Norway,62.775903,97.298204,64179.038815,5232929.0,Europe & Central Asia,3921272.0,1311657.0,0.88,0.153612
104,QAT,Qatar,16.22568,94.291118,118207.134623,2569804.0,Middle East & North Africa,2141447.0,428357.0,0.847,0.107545
130,GBR,United Kingdom,80.170361,94.775801,39229.848765,65637239.0,Europe & Central Asia,49706602.0,15930637.0,0.83,0.515031
53,HKG,"Hong Kong SAR, China",69.39403,87.300343,54279.183474,7346700.0,East Asia & Pacific,6056128.0,1290572.0,0.778,0.166399
131,USA,United States,79.915488,76.176737,53341.815958,323127513.0,North America,235547543.0,87579970.0,0.736,1.0


Faisons de même pour l'indice de confiance global

In [2127]:
data.sort_values(by="indicateur_confiance_global", ascending=False).head(10)

Unnamed: 0,Country Code,Country Name,% PC Population,% Internet Population,PIB/habitant,Population Totale,Region,Population 20+,Population 20-,indicateur_confiance_individuel,indicateur_confiance_global
131,USA,United States,79.915488,76.176737,53341.815958,323127513.0,North America,235547543.0,87579970.0,0.736,1.0
130,GBR,United Kingdom,80.170361,94.775801,39229.848765,65637239.0,Europe & Central Asia,49706602.0,15930637.0,0.83,0.515031
46,DEU,Germany,64.485079,89.647101,44260.359679,82667685.0,Europe & Central Asia,66923325.0,15744360.0,0.706,0.50326
63,JPN,Japan,41.107747,92.0,38252.301138,126994511.0,East Asia & Pacific,103303791.0,23690720.0,0.522,0.461936
22,CAN,Canada,94.446705,89.84,43087.757365,36286425.0,North America,28077236.0,8209189.0,0.909,0.422559
43,FRA,France,64.677574,85.6222,38058.873046,66896109.0,Europe & Central Asia,50128686.0,16767423.0,0.641,0.400034
66,KOR,"Korea, Rep.",58.618921,92.716545,34985.849333,51245707.0,East Asia & Pacific,40659488.0,10586219.0,0.632,0.352189
88,NLD,Netherlands,90.987849,90.410959,47302.70257,17018408.0,Europe & Central Asia,13000732.0,4017676.0,0.915,0.289948
108,SAU,Saudi Arabia,65.234645,73.750904,50458.169847,32275687.0,Middle East & North Africa,20852545.0,11423142.0,0.615,0.25362
6,AUS,Australia,60.140967,88.238658,44414.029479,24127159.0,East Asia & Pacific,17899247.0,6227912.0,0.666,0.248793


Nous pouvons visualiser la répartition des indicateurs de confiance de tous les pays

In [2128]:
fig = make_subplots(rows = 2, cols = 1, horizontal_spacing = 0.05)
fig.append_trace(go.Histogram(x=data["indicateur_confiance_individuel"], nbinsx=data.shape[0], name=""), 1, 1)
fig.append_trace(go.Box(x=data["indicateur_confiance_individuel"], boxmean=True, name=""), 2, 1)
fig.update_layout(
    title="Répartition des indices de confiance par personne pour tous les pays",
    margin={"t": 50},
    showlegend= False
)
fig.show()

fig = make_subplots(rows = 2, cols = 1, horizontal_spacing = 0.05)
fig.append_trace(go.Histogram(x=data["indicateur_confiance_global"], name="", nbinsx=data.shape[0]), 1, 1)
fig.append_trace(go.Box(x=data["indicateur_confiance_global"], boxmean=True, name=""), 2, 1)
fig.update_layout(
    title="Répartition des indices de confiance global pour tous les pays",
    margin={"t": 50},
    showlegend= False
)
fig.show()

Visualisons la répartition de l'indicateur de confiance individuel au sein de chaque région du monde

In [2129]:
from IPython.display import display_markdown

for region in data["Region"].unique():
    f = data.loc[data["Region"]==region, :]
    fig = make_subplots(rows = 2, cols = 1, horizontal_spacing = 0.05)
    fig.append_trace(go.Histogram(x=f["indicateur_confiance_individuel"], nbinsx=data.shape[0], name=""), 1, 1)
    fig.append_trace(go.Box(x=f["indicateur_confiance_individuel"], boxmean=True, name=""), 2, 1)
    fig.update_layout(
        title="Répartition des indices de confiance par personne pour la région "+region,
        margin={"t": 50},
        showlegend= False
    )
    fig.show()

    fig = make_subplots(rows = 2, cols = 1, horizontal_spacing = 0.05)
    fig.append_trace(go.Histogram(x=f["indicateur_confiance_global"], name="", nbinsx=data.shape[0]), 1, 1)
    fig.append_trace(go.Box(x=f["indicateur_confiance_global"], boxmean=True, name=""), 2, 1)
    fig.update_layout(
        title="Répartition des indices de confiance global pour la région "+region,
        margin={"t": 50},
        showlegend= False
    )
    fig.show()

    t_individuel = data.loc[data["Region"]==region].sort_values(by="indicateur_confiance_individuel", ascending=False).loc[:, ["Country Name", "indicateur_confiance_individuel"]].reset_index(drop=True).to_html()
    moyenne = data.loc[data["Region"]==region]["indicateur_confiance_individuel"].mean()
    median = data.loc[data["Region"]==region]["indicateur_confiance_individuel"].median()
    ecart_type = data.loc[data["Region"]==region]["indicateur_confiance_individuel"].std()
    t2_individuel = f"<br>Moyenne: {moyenne}<br>Médiane: {median}<br>Ecart-type={ecart_type}"

    t_global = data.loc[data["Region"]==region].sort_values(by="indicateur_confiance_global", ascending=False).loc[:, ["Country Name", "indicateur_confiance_global"]].reset_index(drop=True).to_html()
    moyenne = data.loc[data["Region"]==region]["indicateur_confiance_global"].mean()
    median = data.loc[data["Region"]==region]["indicateur_confiance_global"].median()
    ecart_type = data.loc[data["Region"]==region]["indicateur_confiance_global"].std()
    t2_global = f"<br>Moyenne: {moyenne}<br>Médiane: {median}<br>Ecart-type={ecart_type}"

    with pd.option_context('display.max_rows', None, 'display.max_columns', None):
        display_markdown(f"""<table border="0" style="border-collapse:collapse;border: 1px solid black;"><tr><td style="border: 1px solid black;">{t_individuel}{t2_individuel}</td><td>{t_global}{t2_global}</td></tr></table>""", raw=True)


<table border="0" style="border-collapse:collapse;border: 1px solid black;"><tr><td style="border: 1px solid black;"><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Country Name</th>
      <th>indicateur_confiance_individuel</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Sri Lanka</td>
      <td>0.033</td>
    </tr>
    <tr>
      <th>1</th>
      <td>India</td>
      <td>0.019</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Bangladesh</td>
      <td>0.007</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Pakistan</td>
      <td>0.005</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Nepal</td>
      <td>0.004</td>
    </tr>
    <tr>
      <th>5</th>
      <td>Afghanistan</td>
      <td>0.001</td>
    </tr>
  </tbody>
</table><br>Moyenne: 0.011500000000000002<br>Médiane: 0.006<br>Ecart-type=0.012227019260637483</td><td><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Country Name</th>
      <th>indicateur_confiance_global</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>India</td>
      <td>0.049115</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Sri Lanka</td>
      <td>0.011178</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Bangladesh</td>
      <td>0.006266</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Pakistan</td>
      <td>0.004742</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Nepal</td>
      <td>0.001440</td>
    </tr>
    <tr>
      <th>5</th>
      <td>Afghanistan</td>
      <td>0.000339</td>
    </tr>
  </tbody>
</table><br>Moyenne: 0.012179975724729725<br>Médiane: 0.005504276399356423<br>Ecart-type=0.018499022220664196</td></tr></table>

<table border="0" style="border-collapse:collapse;border: 1px solid black;"><tr><td style="border: 1px solid black;"><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Country Name</th>
      <th>indicateur_confiance_individuel</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Switzerland</td>
      <td>1.000</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Netherlands</td>
      <td>0.915</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Sweden</td>
      <td>0.901</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Norway</td>
      <td>0.880</td>
    </tr>
    <tr>
      <th>4</th>
      <td>United Kingdom</td>
      <td>0.830</td>
    </tr>
    <tr>
      <th>5</th>
      <td>Ireland</td>
      <td>0.711</td>
    </tr>
    <tr>
      <th>6</th>
      <td>Germany</td>
      <td>0.706</td>
    </tr>
    <tr>
      <th>7</th>
      <td>Denmark</td>
      <td>0.702</td>
    </tr>
    <tr>
      <th>8</th>
      <td>France</td>
      <td>0.641</td>
    </tr>
    <tr>
      <th>9</th>
      <td>Austria</td>
      <td>0.640</td>
    </tr>
    <tr>
      <th>10</th>
      <td>Finland</td>
      <td>0.566</td>
    </tr>
    <tr>
      <th>11</th>
      <td>Slovak Republic</td>
      <td>0.514</td>
    </tr>
    <tr>
      <th>12</th>
      <td>Belgium</td>
      <td>0.489</td>
    </tr>
    <tr>
      <th>13</th>
      <td>Spain</td>
      <td>0.421</td>
    </tr>
    <tr>
      <th>14</th>
      <td>Slovenia</td>
      <td>0.395</td>
    </tr>
    <tr>
      <th>15</th>
      <td>Czech Republic</td>
      <td>0.318</td>
    </tr>
    <tr>
      <th>16</th>
      <td>Italy</td>
      <td>0.313</td>
    </tr>
    <tr>
      <th>17</th>
      <td>Hungary</td>
      <td>0.289</td>
    </tr>
    <tr>
      <th>18</th>
      <td>Lithuania</td>
      <td>0.284</td>
    </tr>
    <tr>
      <th>19</th>
      <td>Macedonia, FYR</td>
      <td>0.261</td>
    </tr>
    <tr>
      <th>20</th>
      <td>Portugal</td>
      <td>0.224</td>
    </tr>
    <tr>
      <th>21</th>
      <td>Poland</td>
      <td>0.220</td>
    </tr>
    <tr>
      <th>22</th>
      <td>Croatia</td>
      <td>0.204</td>
    </tr>
    <tr>
      <th>23</th>
      <td>Russian Federation</td>
      <td>0.198</td>
    </tr>
    <tr>
      <th>24</th>
      <td>Romania</td>
      <td>0.170</td>
    </tr>
    <tr>
      <th>25</th>
      <td>Greece</td>
      <td>0.160</td>
    </tr>
    <tr>
      <th>26</th>
      <td>Serbia</td>
      <td>0.144</td>
    </tr>
    <tr>
      <th>27</th>
      <td>Azerbaijan</td>
      <td>0.129</td>
    </tr>
    <tr>
      <th>28</th>
      <td>Bulgaria</td>
      <td>0.121</td>
    </tr>
    <tr>
      <th>29</th>
      <td>Turkey</td>
      <td>0.119</td>
    </tr>
    <tr>
      <th>30</th>
      <td>Bosnia and Herzegovina</td>
      <td>0.085</td>
    </tr>
    <tr>
      <th>31</th>
      <td>Moldova</td>
      <td>0.085</td>
    </tr>
    <tr>
      <th>32</th>
      <td>Belarus</td>
      <td>0.082</td>
    </tr>
    <tr>
      <th>33</th>
      <td>Armenia</td>
      <td>0.080</td>
    </tr>
    <tr>
      <th>34</th>
      <td>Albania</td>
      <td>0.073</td>
    </tr>
    <tr>
      <th>35</th>
      <td>Georgia</td>
      <td>0.051</td>
    </tr>
    <tr>
      <th>36</th>
      <td>Ukraine</td>
      <td>0.044</td>
    </tr>
    <tr>
      <th>37</th>
      <td>Uzbekistan</td>
      <td>0.030</td>
    </tr>
    <tr>
      <th>38</th>
      <td>Turkmenistan</td>
      <td>0.028</td>
    </tr>
    <tr>
      <th>39</th>
      <td>Kyrgyz Republic</td>
      <td>0.013</td>
    </tr>
    <tr>
      <th>40</th>
      <td>Tajikistan</td>
      <td>0.005</td>
    </tr>
  </tbody>
</table><br>Moyenne: 0.3424634146341463<br>Médiane: 0.224<br>Ecart-type=0.29785744052826474</td><td><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Country Name</th>
      <th>indicateur_confiance_global</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>United Kingdom</td>
      <td>0.515031</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Germany</td>
      <td>0.503260</td>
    </tr>
    <tr>
      <th>2</th>
      <td>France</td>
      <td>0.400034</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Netherlands</td>
      <td>0.289948</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Switzerland</td>
      <td>0.224744</td>
    </tr>
    <tr>
      <th>5</th>
      <td>Spain</td>
      <td>0.223812</td>
    </tr>
    <tr>
      <th>6</th>
      <td>Sweden</td>
      <td>0.218054</td>
    </tr>
    <tr>
      <th>7</th>
      <td>Italy</td>
      <td>0.191041</td>
    </tr>
    <tr>
      <th>8</th>
      <td>Russian Federation</td>
      <td>0.183504</td>
    </tr>
    <tr>
      <th>9</th>
      <td>Norway</td>
      <td>0.153612</td>
    </tr>
    <tr>
      <th>10</th>
      <td>Austria</td>
      <td>0.147599</td>
    </tr>
    <tr>
      <th>11</th>
      <td>Denmark</td>
      <td>0.128969</td>
    </tr>
    <tr>
      <th>12</th>
      <td>Belgium</td>
      <td>0.126683</td>
    </tr>
    <tr>
      <th>13</th>
      <td>Ireland</td>
      <td>0.116267</td>
    </tr>
    <tr>
      <th>14</th>
      <td>Poland</td>
      <td>0.105343</td>
    </tr>
    <tr>
      <th>15</th>
      <td>Finland</td>
      <td>0.102210</td>
    </tr>
    <tr>
      <th>16</th>
      <td>Slovak Republic</td>
      <td>0.092811</td>
    </tr>
    <tr>
      <th>17</th>
      <td>Czech Republic</td>
      <td>0.080532</td>
    </tr>
    <tr>
      <th>18</th>
      <td>Turkey</td>
      <td>0.076891</td>
    </tr>
    <tr>
      <th>19</th>
      <td>Hungary</td>
      <td>0.070539</td>
    </tr>
    <tr>
      <th>20</th>
      <td>Romania</td>
      <td>0.058430</td>
    </tr>
    <tr>
      <th>21</th>
      <td>Portugal</td>
      <td>0.056059</td>
    </tr>
    <tr>
      <th>22</th>
      <td>Slovenia</td>
      <td>0.044288</td>
    </tr>
    <tr>
      <th>23</th>
      <td>Greece</td>
      <td>0.040901</td>
    </tr>
    <tr>
      <th>24</th>
      <td>Lithuania</td>
      <td>0.037260</td>
    </tr>
    <tr>
      <th>25</th>
      <td>Croatia</td>
      <td>0.032264</td>
    </tr>
    <tr>
      <th>26</th>
      <td>Azerbaijan</td>
      <td>0.029892</td>
    </tr>
    <tr>
      <th>27</th>
      <td>Serbia</td>
      <td>0.029384</td>
    </tr>
    <tr>
      <th>28</th>
      <td>Macedonia, FYR</td>
      <td>0.028876</td>
    </tr>
    <tr>
      <th>29</th>
      <td>Bulgaria</td>
      <td>0.025320</td>
    </tr>
    <tr>
      <th>30</th>
      <td>Ukraine</td>
      <td>0.022949</td>
    </tr>
    <tr>
      <th>31</th>
      <td>Belarus</td>
      <td>0.019561</td>
    </tr>
    <tr>
      <th>32</th>
      <td>Moldova</td>
      <td>0.012279</td>
    </tr>
    <tr>
      <th>33</th>
      <td>Bosnia and Herzegovina</td>
      <td>0.012279</td>
    </tr>
    <tr>
      <th>34</th>
      <td>Uzbekistan</td>
      <td>0.012109</td>
    </tr>
    <tr>
      <th>35</th>
      <td>Armenia</td>
      <td>0.010331</td>
    </tr>
    <tr>
      <th>36</th>
      <td>Albania</td>
      <td>0.009315</td>
    </tr>
    <tr>
      <th>37</th>
      <td>Georgia</td>
      <td>0.007452</td>
    </tr>
    <tr>
      <th>38</th>
      <td>Turkmenistan</td>
      <td>0.004742</td>
    </tr>
    <tr>
      <th>39</th>
      <td>Kyrgyz Republic</td>
      <td>0.002286</td>
    </tr>
    <tr>
      <th>40</th>
      <td>Tajikistan</td>
      <td>0.001016</td>
    </tr>
  </tbody>
</table><br>Moyenne: 0.10848484723309422<br>Médiane: 0.0584300110085528<br>Ecart-type=0.12796655546339883</td></tr></table>

<table border="0" style="border-collapse:collapse;border: 1px solid black;"><tr><td style="border: 1px solid black;"><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Country Name</th>
      <th>indicateur_confiance_individuel</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Qatar</td>
      <td>0.847</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Saudi Arabia</td>
      <td>0.615</td>
    </tr>
    <tr>
      <th>2</th>
      <td>United Arab Emirates</td>
      <td>0.599</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Kuwait</td>
      <td>0.511</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Israel</td>
      <td>0.328</td>
    </tr>
    <tr>
      <th>5</th>
      <td>Oman</td>
      <td>0.280</td>
    </tr>
    <tr>
      <th>6</th>
      <td>Lebanon</td>
      <td>0.126</td>
    </tr>
    <tr>
      <th>7</th>
      <td>Iran, Islamic Rep.</td>
      <td>0.107</td>
    </tr>
    <tr>
      <th>8</th>
      <td>Tunisia</td>
      <td>0.073</td>
    </tr>
    <tr>
      <th>9</th>
      <td>Jordan</td>
      <td>0.070</td>
    </tr>
    <tr>
      <th>10</th>
      <td>Morocco</td>
      <td>0.054</td>
    </tr>
    <tr>
      <th>11</th>
      <td>Algeria</td>
      <td>0.042</td>
    </tr>
    <tr>
      <th>12</th>
      <td>Egypt, Arab Rep.</td>
      <td>0.039</td>
    </tr>
    <tr>
      <th>13</th>
      <td>West Bank and Gaza</td>
      <td>0.036</td>
    </tr>
    <tr>
      <th>14</th>
      <td>Iraq</td>
      <td>0.023</td>
    </tr>
    <tr>
      <th>15</th>
      <td>Libya</td>
      <td>0.018</td>
    </tr>
    <tr>
      <th>16</th>
      <td>Yemen, Rep.</td>
      <td>0.009</td>
    </tr>
  </tbody>
</table><br>Moyenne: 0.2221764705882353<br>Médiane: 0.073<br>Ecart-type=0.26319722531167516</td><td><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Country Name</th>
      <th>indicateur_confiance_global</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Saudi Arabia</td>
      <td>0.253620</td>
    </tr>
    <tr>
      <th>1</th>
      <td>United Arab Emirates</td>
      <td>0.143619</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Qatar</td>
      <td>0.107545</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Kuwait</td>
      <td>0.077568</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Iran, Islamic Rep.</td>
      <td>0.070709</td>
    </tr>
    <tr>
      <th>5</th>
      <td>Israel</td>
      <td>0.069015</td>
    </tr>
    <tr>
      <th>6</th>
      <td>Oman</td>
      <td>0.043950</td>
    </tr>
    <tr>
      <th>7</th>
      <td>Egypt, Arab Rep.</td>
      <td>0.026505</td>
    </tr>
    <tr>
      <th>8</th>
      <td>Morocco</td>
      <td>0.023033</td>
    </tr>
    <tr>
      <th>9</th>
      <td>Lebanon</td>
      <td>0.022864</td>
    </tr>
    <tr>
      <th>10</th>
      <td>Algeria</td>
      <td>0.019223</td>
    </tr>
    <tr>
      <th>11</th>
      <td>Tunisia</td>
      <td>0.018122</td>
    </tr>
    <tr>
      <th>12</th>
      <td>Jordan</td>
      <td>0.014819</td>
    </tr>
    <tr>
      <th>13</th>
      <td>Iraq</td>
      <td>0.009230</td>
    </tr>
    <tr>
      <th>14</th>
      <td>West Bank and Gaza</td>
      <td>0.004996</td>
    </tr>
    <tr>
      <th>15</th>
      <td>Libya</td>
      <td>0.003218</td>
    </tr>
    <tr>
      <th>16</th>
      <td>Yemen, Rep.</td>
      <td>0.003049</td>
    </tr>
  </tbody>
</table><br>Moyenne: 0.053593221520973525<br>Médiane: 0.023033279701922264<br>Ecart-type=0.06528875980238212</td></tr></table>

<table border="0" style="border-collapse:collapse;border: 1px solid black;"><tr><td style="border: 1px solid black;"><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Country Name</th>
      <th>indicateur_confiance_individuel</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>South Africa</td>
      <td>0.077</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Namibia</td>
      <td>0.077</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Botswana</td>
      <td>0.059</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Sudan</td>
      <td>0.030</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Zimbabwe</td>
      <td>0.016</td>
    </tr>
    <tr>
      <th>5</th>
      <td>Ghana</td>
      <td>0.012</td>
    </tr>
    <tr>
      <th>6</th>
      <td>Nigeria</td>
      <td>0.011</td>
    </tr>
    <tr>
      <th>7</th>
      <td>Cote d'Ivoire</td>
      <td>0.010</td>
    </tr>
    <tr>
      <th>8</th>
      <td>Mauritania</td>
      <td>0.010</td>
    </tr>
    <tr>
      <th>9</th>
      <td>Zambia</td>
      <td>0.008</td>
    </tr>
    <tr>
      <th>10</th>
      <td>Senegal</td>
      <td>0.008</td>
    </tr>
    <tr>
      <th>11</th>
      <td>Kenya</td>
      <td>0.008</td>
    </tr>
    <tr>
      <th>12</th>
      <td>Cameroon</td>
      <td>0.007</td>
    </tr>
    <tr>
      <th>13</th>
      <td>Gambia, The</td>
      <td>0.007</td>
    </tr>
    <tr>
      <th>14</th>
      <td>Angola</td>
      <td>0.006</td>
    </tr>
    <tr>
      <th>15</th>
      <td>Lesotho</td>
      <td>0.005</td>
    </tr>
    <tr>
      <th>16</th>
      <td>Uganda</td>
      <td>0.005</td>
    </tr>
    <tr>
      <th>17</th>
      <td>Togo</td>
      <td>0.004</td>
    </tr>
    <tr>
      <th>18</th>
      <td>Rwanda</td>
      <td>0.003</td>
    </tr>
    <tr>
      <th>19</th>
      <td>Mozambique</td>
      <td>0.003</td>
    </tr>
    <tr>
      <th>20</th>
      <td>Congo, Rep.</td>
      <td>0.003</td>
    </tr>
    <tr>
      <th>21</th>
      <td>Tanzania</td>
      <td>0.003</td>
    </tr>
    <tr>
      <th>22</th>
      <td>Benin</td>
      <td>0.002</td>
    </tr>
    <tr>
      <th>23</th>
      <td>Burkina Faso</td>
      <td>0.002</td>
    </tr>
    <tr>
      <th>24</th>
      <td>Mali</td>
      <td>0.002</td>
    </tr>
    <tr>
      <th>25</th>
      <td>Ethiopia</td>
      <td>0.002</td>
    </tr>
    <tr>
      <th>26</th>
      <td>Guinea</td>
      <td>0.001</td>
    </tr>
    <tr>
      <th>27</th>
      <td>Malawi</td>
      <td>0.001</td>
    </tr>
    <tr>
      <th>28</th>
      <td>Chad</td>
      <td>0.000</td>
    </tr>
    <tr>
      <th>29</th>
      <td>Central African Republic</td>
      <td>0.000</td>
    </tr>
    <tr>
      <th>30</th>
      <td>Burundi</td>
      <td>0.000</td>
    </tr>
    <tr>
      <th>31</th>
      <td>Niger</td>
      <td>0.000</td>
    </tr>
    <tr>
      <th>32</th>
      <td>Eritrea</td>
      <td>0.000</td>
    </tr>
    <tr>
      <th>33</th>
      <td>Madagascar</td>
      <td>0.000</td>
    </tr>
    <tr>
      <th>34</th>
      <td>Congo, Dem. Rep.</td>
      <td>0.000</td>
    </tr>
  </tbody>
</table><br>Moyenne: 0.010914285714285718<br>Médiane: 0.004<br>Ecart-type=0.019731091363291455</td><td><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Country Name</th>
      <th>indicateur_confiance_global</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>South Africa</td>
      <td>0.040816</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Sudan</td>
      <td>0.012109</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Nigeria</td>
      <td>0.009654</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Namibia</td>
      <td>0.008129</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Botswana</td>
      <td>0.006182</td>
    </tr>
    <tr>
      <th>5</th>
      <td>Zimbabwe</td>
      <td>0.004234</td>
    </tr>
    <tr>
      <th>6</th>
      <td>Ghana</td>
      <td>0.004234</td>
    </tr>
    <tr>
      <th>7</th>
      <td>Kenya</td>
      <td>0.003641</td>
    </tr>
    <tr>
      <th>8</th>
      <td>Cote d'Ivoire</td>
      <td>0.003133</td>
    </tr>
    <tr>
      <th>9</th>
      <td>Cameroon</td>
      <td>0.002202</td>
    </tr>
    <tr>
      <th>10</th>
      <td>Zambia</td>
      <td>0.002117</td>
    </tr>
    <tr>
      <th>11</th>
      <td>Uganda</td>
      <td>0.002032</td>
    </tr>
    <tr>
      <th>12</th>
      <td>Senegal</td>
      <td>0.002032</td>
    </tr>
    <tr>
      <th>13</th>
      <td>Angola</td>
      <td>0.002032</td>
    </tr>
    <tr>
      <th>14</th>
      <td>Tanzania</td>
      <td>0.001440</td>
    </tr>
    <tr>
      <th>15</th>
      <td>Ethiopia</td>
      <td>0.001355</td>
    </tr>
    <tr>
      <th>16</th>
      <td>Mauritania</td>
      <td>0.001355</td>
    </tr>
    <tr>
      <th>17</th>
      <td>Mozambique</td>
      <td>0.001016</td>
    </tr>
    <tr>
      <th>18</th>
      <td>Togo</td>
      <td>0.000677</td>
    </tr>
    <tr>
      <th>19</th>
      <td>Rwanda</td>
      <td>0.000677</td>
    </tr>
    <tr>
      <th>20</th>
      <td>Gambia, The</td>
      <td>0.000593</td>
    </tr>
    <tr>
      <th>21</th>
      <td>Burkina Faso</td>
      <td>0.000508</td>
    </tr>
    <tr>
      <th>22</th>
      <td>Lesotho</td>
      <td>0.000508</td>
    </tr>
    <tr>
      <th>23</th>
      <td>Mali</td>
      <td>0.000508</td>
    </tr>
    <tr>
      <th>24</th>
      <td>Congo, Rep.</td>
      <td>0.000423</td>
    </tr>
    <tr>
      <th>25</th>
      <td>Benin</td>
      <td>0.000423</td>
    </tr>
    <tr>
      <th>26</th>
      <td>Guinea</td>
      <td>0.000254</td>
    </tr>
    <tr>
      <th>27</th>
      <td>Malawi</td>
      <td>0.000254</td>
    </tr>
    <tr>
      <th>28</th>
      <td>Eritrea</td>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>29</th>
      <td>Congo, Dem. Rep.</td>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>30</th>
      <td>Chad</td>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>31</th>
      <td>Central African Republic</td>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>32</th>
      <td>Madagascar</td>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>33</th>
      <td>Burundi</td>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>34</th>
      <td>Niger</td>
      <td>0.000000</td>
    </tr>
  </tbody>
</table><br>Moyenne: 0.003215465202085577<br>Médiane: 0.0010161741044965704<br>Ecart-type=0.00715005235522369</td></tr></table>

<table border="0" style="border-collapse:collapse;border: 1px solid black;"><tr><td style="border: 1px solid black;"><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Country Name</th>
      <th>indicateur_confiance_individuel</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Costa Rica</td>
      <td>0.184</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Puerto Rico</td>
      <td>0.183</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Chile</td>
      <td>0.170</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Uruguay</td>
      <td>0.156</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Argentina</td>
      <td>0.134</td>
    </tr>
    <tr>
      <th>5</th>
      <td>Brazil</td>
      <td>0.129</td>
    </tr>
    <tr>
      <th>6</th>
      <td>Mexico</td>
      <td>0.127</td>
    </tr>
    <tr>
      <th>7</th>
      <td>Venezuela, RB</td>
      <td>0.108</td>
    </tr>
    <tr>
      <th>8</th>
      <td>Panama</td>
      <td>0.100</td>
    </tr>
    <tr>
      <th>9</th>
      <td>Colombia</td>
      <td>0.100</td>
    </tr>
    <tr>
      <th>10</th>
      <td>Ecuador</td>
      <td>0.088</td>
    </tr>
    <tr>
      <th>11</th>
      <td>Peru</td>
      <td>0.071</td>
    </tr>
    <tr>
      <th>12</th>
      <td>Dominican Republic</td>
      <td>0.067</td>
    </tr>
    <tr>
      <th>13</th>
      <td>Paraguay</td>
      <td>0.061</td>
    </tr>
    <tr>
      <th>14</th>
      <td>Jamaica</td>
      <td>0.047</td>
    </tr>
    <tr>
      <th>15</th>
      <td>Cuba</td>
      <td>0.033</td>
    </tr>
    <tr>
      <th>16</th>
      <td>El Salvador</td>
      <td>0.028</td>
    </tr>
    <tr>
      <th>17</th>
      <td>Bolivia</td>
      <td>0.024</td>
    </tr>
    <tr>
      <th>18</th>
      <td>Guatemala</td>
      <td>0.022</td>
    </tr>
    <tr>
      <th>19</th>
      <td>Nicaragua</td>
      <td>0.016</td>
    </tr>
    <tr>
      <th>20</th>
      <td>Honduras</td>
      <td>0.014</td>
    </tr>
    <tr>
      <th>21</th>
      <td>Haiti</td>
      <td>0.006</td>
    </tr>
  </tbody>
</table><br>Moyenne: 0.08490909090909092<br>Médiane: 0.07949999999999999<br>Ecart-type=0.058140150122587886</td><td><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Country Name</th>
      <th>indicateur_confiance_global</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Brazil</td>
      <td>0.137099</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Mexico</td>
      <td>0.103396</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Argentina</td>
      <td>0.064696</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Chile</td>
      <td>0.053773</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Colombia</td>
      <td>0.050724</td>
    </tr>
    <tr>
      <th>5</th>
      <td>Venezuela, RB</td>
      <td>0.043357</td>
    </tr>
    <tr>
      <th>6</th>
      <td>Costa Rica</td>
      <td>0.029892</td>
    </tr>
    <tr>
      <th>7</th>
      <td>Peru</td>
      <td>0.028622</td>
    </tr>
    <tr>
      <th>8</th>
      <td>Puerto Rico</td>
      <td>0.025658</td>
    </tr>
    <tr>
      <th>9</th>
      <td>Ecuador</td>
      <td>0.025404</td>
    </tr>
    <tr>
      <th>10</th>
      <td>Uruguay</td>
      <td>0.021594</td>
    </tr>
    <tr>
      <th>11</th>
      <td>Dominican Republic</td>
      <td>0.015412</td>
    </tr>
    <tr>
      <th>12</th>
      <td>Panama</td>
      <td>0.014396</td>
    </tr>
    <tr>
      <th>13</th>
      <td>Paraguay</td>
      <td>0.011093</td>
    </tr>
    <tr>
      <th>14</th>
      <td>Cuba</td>
      <td>0.008637</td>
    </tr>
    <tr>
      <th>15</th>
      <td>Guatemala</td>
      <td>0.006012</td>
    </tr>
    <tr>
      <th>16</th>
      <td>Jamaica</td>
      <td>0.005843</td>
    </tr>
    <tr>
      <th>17</th>
      <td>Bolivia</td>
      <td>0.005504</td>
    </tr>
    <tr>
      <th>18</th>
      <td>El Salvador</td>
      <td>0.004912</td>
    </tr>
    <tr>
      <th>19</th>
      <td>Honduras</td>
      <td>0.002964</td>
    </tr>
    <tr>
      <th>20</th>
      <td>Nicaragua</td>
      <td>0.002794</td>
    </tr>
    <tr>
      <th>21</th>
      <td>Haiti</td>
      <td>0.001355</td>
    </tr>
  </tbody>
</table><br>Moyenne: 0.030142649289063038<br>Médiane: 0.018502836819375056<br>Ecart-type=0.034707108471095274</td></tr></table>

<table border="0" style="border-collapse:collapse;border: 1px solid black;"><tr><td style="border: 1px solid black;"><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Country Name</th>
      <th>indicateur_confiance_individuel</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Singapore</td>
      <td>0.882</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Hong Kong SAR, China</td>
      <td>0.778</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Australia</td>
      <td>0.666</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Korea, Rep.</td>
      <td>0.632</td>
    </tr>
    <tr>
      <th>4</th>
      <td>New Zealand</td>
      <td>0.563</td>
    </tr>
    <tr>
      <th>5</th>
      <td>Japan</td>
      <td>0.522</td>
    </tr>
    <tr>
      <th>6</th>
      <td>Malaysia</td>
      <td>0.272</td>
    </tr>
    <tr>
      <th>7</th>
      <td>China</td>
      <td>0.073</td>
    </tr>
    <tr>
      <th>8</th>
      <td>Thailand</td>
      <td>0.073</td>
    </tr>
    <tr>
      <th>9</th>
      <td>Mongolia</td>
      <td>0.061</td>
    </tr>
    <tr>
      <th>10</th>
      <td>Philippines</td>
      <td>0.057</td>
    </tr>
    <tr>
      <th>11</th>
      <td>Vietnam</td>
      <td>0.052</td>
    </tr>
    <tr>
      <th>12</th>
      <td>Indonesia</td>
      <td>0.022</td>
    </tr>
    <tr>
      <th>13</th>
      <td>Lao PDR</td>
      <td>0.011</td>
    </tr>
    <tr>
      <th>14</th>
      <td>Myanmar</td>
      <td>0.010</td>
    </tr>
    <tr>
      <th>15</th>
      <td>Papua New Guinea</td>
      <td>0.007</td>
    </tr>
    <tr>
      <th>16</th>
      <td>Cambodia</td>
      <td>0.006</td>
    </tr>
  </tbody>
</table><br>Moyenne: 0.27570588235294113<br>Médiane: 0.073<br>Ecart-type=0.31805203283147754</td><td><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Country Name</th>
      <th>indicateur_confiance_global</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Japan</td>
      <td>0.461936</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Korea, Rep.</td>
      <td>0.352189</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Australia</td>
      <td>0.248793</td>
    </tr>
    <tr>
      <th>3</th>
      <td>China</td>
      <td>0.206029</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Hong Kong SAR, China</td>
      <td>0.166399</td>
    </tr>
    <tr>
      <th>5</th>
      <td>Singapore</td>
      <td>0.161826</td>
    </tr>
    <tr>
      <th>6</th>
      <td>Malaysia</td>
      <td>0.110509</td>
    </tr>
    <tr>
      <th>7</th>
      <td>New Zealand</td>
      <td>0.092048</td>
    </tr>
    <tr>
      <th>8</th>
      <td>Thailand</td>
      <td>0.046151</td>
    </tr>
    <tr>
      <th>9</th>
      <td>Philippines</td>
      <td>0.040224</td>
    </tr>
    <tr>
      <th>10</th>
      <td>Vietnam</td>
      <td>0.036752</td>
    </tr>
    <tr>
      <th>11</th>
      <td>Indonesia</td>
      <td>0.025574</td>
    </tr>
    <tr>
      <th>12</th>
      <td>Mongolia</td>
      <td>0.007621</td>
    </tr>
    <tr>
      <th>13</th>
      <td>Myanmar</td>
      <td>0.005250</td>
    </tr>
    <tr>
      <th>14</th>
      <td>Lao PDR</td>
      <td>0.001948</td>
    </tr>
    <tr>
      <th>15</th>
      <td>Cambodia</td>
      <td>0.001609</td>
    </tr>
    <tr>
      <th>16</th>
      <td>Papua New Guinea</td>
      <td>0.001355</td>
    </tr>
  </tbody>
</table><br>Moyenne: 0.11565954182502876<br>Médiane: 0.04615124057921924<br>Ecart-type=0.13573051616747278</td></tr></table>

<table border="0" style="border-collapse:collapse;border: 1px solid black;"><tr><td style="border: 1px solid black;"><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Country Name</th>
      <th>indicateur_confiance_individuel</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Canada</td>
      <td>0.909</td>
    </tr>
    <tr>
      <th>1</th>
      <td>United States</td>
      <td>0.736</td>
    </tr>
  </tbody>
</table><br>Moyenne: 0.8225<br>Médiane: 0.8225<br>Ecart-type=0.12232947314527275</td><td><table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Country Name</th>
      <th>indicateur_confiance_global</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>United States</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Canada</td>
      <td>0.422559</td>
    </tr>
  </tbody>
</table><br>Moyenne: 0.7112795325599119<br>Médiane: 0.7112795325599119<br>Ecart-type=0.40831240078847214</td></tr></table>

### <a id='toc1_22_'></a>[Conclusion](#toc0_)

Nous avons défini un indicateur de confiance rapporté à la population du pays que nous avons appelé "indicateur_confiance_global".

Nous allons prendre les 20 premiers % du classement des pays avec cet indicateur. Nous devons donc définir où se trouve la valeur de l'indicateur aux 20 premiers pourcents.

In [2130]:
threshold_20pc = data["indicateur_confiance_global"].quantile(q=0.8)


Affichons maintenant le classement des 20 premiers % des pays les plus intéressants:

In [2131]:
data.loc[data["indicateur_confiance_global"]>=threshold_20pc, ["Country Name", "Region", "indicateur_confiance_global"]].sort_values(by="indicateur_confiance_global", ascending=False).reset_index(drop=True)

Unnamed: 0,Country Name,Region,indicateur_confiance_global
0,United States,North America,1.0
1,United Kingdom,Europe & Central Asia,0.515031
2,Germany,Europe & Central Asia,0.50326
3,Japan,East Asia & Pacific,0.461936
4,Canada,North America,0.422559
5,France,Europe & Central Asia,0.400034
6,"Korea, Rep.",East Asia & Pacific,0.352189
7,Netherlands,Europe & Central Asia,0.289948
8,Saudi Arabia,Middle East & North Africa,0.25362
9,Australia,East Asia & Pacific,0.248793
