# Projet 2 : Analysez des données de systèmes éducatifs

## Sommaire
* [Exploration des données](#exp_donnees)
    * [Importation des bibliothèques](#import_bib)
    * [Variable du tableau créée](#var_tab)
    * [Obtention d'informations générales sur le jeu de données](#info_general)
    * [Configuration du Pandas](#cfg_pandas)
    * [Vérification des colonnes](#verif_col)
    * [Vérification des données manquantes](#verif_don_manq)

## Exploration des données <a class="anchor" id="exp_donnees"></a>

### Importation des bibliothèques <a class="anchor" id="import_bib"></a>
Les bibliothèques nécessaires pour les traitements sont importées et se voient attribuer des alias.

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re

### Variable du tableau créée <a class="anchor" id="var_tab"></a>
Le fichier source CSV est lu dans un objet de tableau et les cinq premières colonnes sont affichées.

In [5]:
edstats = pd.read_csv('Edstats_csv/EdStatsData.csv')
edstats.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1970,1971,1972,1973,1974,1975,...,2060,2065,2070,2075,2080,2085,2090,2095,2100,Unnamed: 69
0,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2,,,,,,,...,,,,,,,,,,
1,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2.F,,,,,,,...,,,,,,,,,,
2,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2.GPI,,,,,,,...,,,,,,,,,,
3,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2.M,,,,,,,...,,,,,,,,,,
4,Arab World,ARB,"Adjusted net enrolment rate, primary, both sex...",SE.PRM.TENR,54.822121,54.894138,56.209438,57.267109,57.991138,59.36554,...,,,,,,,,,,


### Obtention d'informations générales sur le jeu de données <a class="anchor" id="info_general"></a>
- Le nombre de lignes et de colonnes est imprimé.
- Une méthode est appelée qui produit des informations de base sur le tableau, y compris le nombre d'entrées dans chaque colonne et leurs types de données.

In [6]:
edstats.shape

(886930, 70)

In [7]:
edstats.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 886930 entries, 0 to 886929
Data columns (total 70 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   Country Name    886930 non-null  object 
 1   Country Code    886930 non-null  object 
 2   Indicator Name  886930 non-null  object 
 3   Indicator Code  886930 non-null  object 
 4   1970            72288 non-null   float64
 5   1971            35537 non-null   float64
 6   1972            35619 non-null   float64
 7   1973            35545 non-null   float64
 8   1974            35730 non-null   float64
 9   1975            87306 non-null   float64
 10  1976            37483 non-null   float64
 11  1977            37574 non-null   float64
 12  1978            37576 non-null   float64
 13  1979            36809 non-null   float64
 14  1980            89122 non-null   float64
 15  1981            38777 non-null   float64
 16  1982            37511 non-null   float64
 17  1983      

### Configuration du Pandas <a class="anchor" id="cfg_pandas"></a>
- Par défaut, Pandas affiche un maximum de 20 colonnes, mais le jeu de données à manipuler en contient 70, la limite de colonne est donc supprimée.
- Pour permettre la sortie complète d'une opération de somme ultérieure, le nombre maximum de lignes à afficher est défini sur le nombre de colonnes.

In [8]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 70)

### Vérification des colonnes <a class="anchor" id="verif_col"></a>
Les noms des colonnes sont affichés.

In [9]:
edstats.columns

Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
       '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977', '1978',
       '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987',
       '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996',
       '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005',
       '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014',
       '2015', '2016', '2017', '2020', '2025', '2030', '2035', '2040', '2045',
       '2050', '2055', '2060', '2065', '2070', '2075', '2080', '2085', '2090',
       '2095', '2100', 'Unnamed: 69'],
      dtype='object')

Les séparateurs de fin de ligne dans le fichier CSV ont créé une colonne superflue, celle-ci est supprimée.

In [10]:
edstats = edstats.drop("Unnamed: 69", axis=1)
edstats.columns

Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
       '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977', '1978',
       '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987',
       '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996',
       '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005',
       '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014',
       '2015', '2016', '2017', '2020', '2025', '2030', '2035', '2040', '2045',
       '2050', '2055', '2060', '2065', '2070', '2075', '2080', '2085', '2090',
       '2095', '2100'],
      dtype='object')

### Vérification des données manquantes <a class="anchor" id="verif_don_manq"></a>
* Percentage of missing values for each column:
    * The first of the following commands creates a variable `edstats_missing` and assigns it's value as the output of `edstats.isna()` which takes the edstats object and produces a boolean table of the same dimensions with 'False' if there is data in the corresponding cell, and 'True' if there is none.
    * The second command `.sum()` outputs a list of key-value pairs with the names of each column and the total number of missing values in each column.
    * Finally `/ len(edstats) * 100` divides the number of missing values in each column by the total number of rows and multiplying by 100, thus giving a percentage of missing values for each column.

In [11]:
edstats_missing = edstats.isna()
edstats_missing.sum() / len(edstats) * 100

Country Name       0.000000
Country Code       0.000000
Indicator Name     0.000000
Indicator Code     0.000000
1970              91.849639
1971              95.993258
1972              95.984012
1973              95.992356
1974              95.971497
1975              90.156382
1976              95.773849
1977              95.763589
1978              95.763364
1979              95.849842
1980              89.951631
1981              95.627953
1982              95.770692
1983              95.663694
1984              95.647233
1985              89.819264
1986              95.560867
1987              95.643286
1988              95.653321
1989              95.767422
1990              85.973527
1991              91.607342
1992              91.482642
1993              91.454455
1994              91.266278
1995              85.189248
1996              91.340128
1997              91.718287
1998              90.426076
1999              86.601085
2000              80.080051
2001              86

In [12]:
edstats_missing_key_yrs = edstats_missing[["2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011",
                                           "2012", "2013", "2014", "2015", "2016", "2017", "2020", "2025", "2030"]]
edstats_missing_key_yrs.head()

Unnamed: 0,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2020,2025,2030
0,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True
1,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True
2,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True
3,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True
4,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,True,True,True,True


In [13]:
key_yrs_total_missing = edstats_missing_key_yrs.sum(axis=1)
key_yrs_total_missing

0         19
1         19
2         19
3         19
4          6
          ..
886925    17
886926    17
886927    17
886928    17
886929    17
Length: 886930, dtype: int64

In [14]:
indices_insuff_key_yr_data = [i for i in range(len(key_yrs_total_missing.values)) if key_yrs_total_missing.values[i] >= 10]
indices_insuff_key_yr_data

[0,
 1,
 2,
 3,
 8,
 9,
 10,
 11,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,
 185,
 186,
 187,
 188,
 189,
 190,
 191,
 192,
 193,
 194,
 195,
 196,
 197

In [15]:
edstats_suff_key_yr_data = edstats.drop(edstats.index[indices_insuff_key_yr_data])
# edstats_suff_key_yr_data.reindex
edstats_suff_key_yr_data

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2020,2025,2030,2035,2040,2045,2050,2055,2060,2065,2070,2075,2080,2085,2090,2095,2100
4,Arab World,ARB,"Adjusted net enrolment rate, primary, both sex...",SE.PRM.TENR,54.822121,54.894138,56.209438,57.267109,57.991138,59.365540,60.999962,61.922680,62.693420,64.383186,65.617767,66.085152,66.608139,67.290451,68.510094,69.033211,69.944908,71.041870,71.693779,71.699097,71.995819,72.602837,70.032722,70.464821,72.645683,71.811760,73.903511,74.425201,75.110817,76.254318,77.245682,78.800522,80.051399,80.805389,81.607063,82.489487,82.685509,83.280342,84.011871,84.195961,85.211998,85.245140,86.101669,85.511940,85.320152,,,,,,,,,,,,,,,,,,,,
5,Arab World,ARB,"Adjusted net enrolment rate, primary, female (%)",SE.PRM.TENR.FE,43.351101,43.318150,44.640701,45.845718,46.449501,48.363892,50.046188,51.245281,52.242321,54.754372,56.486790,57.314659,58.226051,59.289230,60.748180,61.520870,62.734951,64.115883,65.099663,65.129204,65.827492,66.797028,63.260429,63.972111,67.033043,65.761559,68.780800,69.267052,70.435387,72.047287,73.275620,75.132980,76.641022,77.653580,78.485397,79.475769,79.608330,80.582420,81.473801,81.695686,82.871651,82.861389,84.401413,83.914032,83.820831,,,,,,,,,,,,,,,,,,,,
6,Arab World,ARB,"Adjusted net enrolment rate, primary, gender p...",UIS.NERA.1.GPI,0.658570,0.656400,0.663290,0.672040,0.672610,0.691760,0.699950,0.710140,0.718430,0.743740,0.759440,0.769250,0.779860,0.790710,0.799610,0.806770,0.816060,0.825130,0.834190,0.834760,0.844660,0.854320,0.826490,0.834080,0.859090,0.847310,0.872660,0.872690,0.884940,0.897370,0.904060,0.912750,0.919790,0.926300,0.927680,0.930840,0.929620,0.938460,0.942500,0.943470,0.947620,0.946700,0.962080,0.964090,0.966200,,,,,,,,,,,,,,,,,,,,
7,Arab World,ARB,"Adjusted net enrolment rate, primary, male (%)",SE.PRM.TENR.MA,65.826233,65.993584,67.301857,68.219078,69.059013,69.914551,71.499512,72.162064,72.717690,73.619972,74.379982,74.507133,74.662628,74.981827,75.972542,76.255493,76.875053,77.703857,78.039368,78.021889,77.933853,78.187492,76.541100,76.697418,78.028130,77.611900,78.817490,79.372040,79.593536,80.287529,81.051369,82.315048,83.324059,83.832230,84.604393,85.380287,85.635078,85.866692,86.444138,86.590691,87.452583,87.526520,87.728172,87.039879,86.753387,,,,,,,,,,,,,,,,,,,,
12,Arab World,ARB,Adjusted net intake rate to Grade 1 of primary...,UIS.NIRA.1,52.448921,52.489750,52.635593,53.327000,54.184654,54.865627,58.701626,58.876965,59.452522,62.776375,62.559566,62.594330,62.447292,63.409138,65.837379,64.613091,65.926895,66.775635,66.447350,66.931831,67.080444,67.508705,67.653282,67.444687,68.652267,69.208015,68.651680,69.611557,66.867638,68.321686,70.737579,73.594200,74.976158,75.328583,76.428513,76.131767,75.222557,75.595695,74.893944,74.814552,76.197044,76.474968,77.319366,76.566711,76.620567,,,,,,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886252,Zimbabwe,ZWE,Theoretical duration of upper secondary educat...,SE.SEC.DURS.UP,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.0,4.0,,,,,,,,,,,,,,,,,,
886363,Zimbabwe,ZWE,Total outbound internationally mobile tertiary...,UIS.OE.56.40510,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6802.000000,8485.000000,10514.000000,12232.000000,15525.000000,17576.000000,15464.000000,15994.000000,16200.000000,21347.000000,24581.000000,20792.000000,25236.000000,28700.000000,16012.000000,15885.000000,,,,,,,,,,,,,,,,,,,,,
886611,Zimbabwe,ZWE,"Unemployment, female (% of female labor force)",SL.UEM.TOTL.FE.ZS,,,,,,,,,,,,,,,,,,,,,,4.500000,4.500000,4.600000,3.800000,4.200000,4.600000,4.900000,4.800000,4.500000,3.700000,4.500000,4.300000,4.000000,4.100000,4.300000,4.500000,4.800000,5.300000,6.000000,4.600000,6.000000,5.900000,5.800000,5.800000,5.1,4.9,,,,,,,,,,,,,,,,,,
886612,Zimbabwe,ZWE,"Unemployment, male (% of male labor force)",SL.UEM.TOTL.MA.ZS,,,,,,,,,,,,,,,,,,,,,,6.900000,7.100000,7.200000,6.100000,6.700000,7.300000,8.700000,8.200000,7.300000,5.700000,7.500000,6.100000,4.900000,4.300000,5.300000,6.700000,6.100000,5.500000,4.900000,7.700000,4.800000,4.700000,4.600000,4.500000,5.1,5.6,,,,,,,,,,,,,,,,,,


In [16]:
indicators_w_suff_data = list(set(edstats_suff_key_yr_data['Indicator Name'].values.tolist()))
indicators_w_suff_data

['Enrolment in tertiary education per 100,000 inhabitants, both sexes',
 'Repetition rate in Grade 1 of lower secondary general education, female (%)',
 'Enrolment in lower secondary education, private institutions, both sexes (number)',
 'Population of the official entrance age to primary education, female (number)',
 'Pupil/trained teacher ratio in secondary education (headcount basis)',
 'Annual statutory teacher salaries in public institutions in USD. Upper Secondary. 15 years of experience',
 'Population, age 9, total',
 'Teachers in post-secondary non-tertiary education, female (number)',
 'Repeaters in Grade 2 of primary education, both sexes (number)',
 'Population, ages 10-17, male',
 'Percentage of female teachers in upper secondary education who are trained, female (%)',
 'Gross enrolment ratio, primary, gender parity index (GPI)',
 'Enrolment in pre-primary education, female (number)',
 'UIS: Percentage of population age 25+ with completed lower secondary education. Total',

In [24]:
mf_regex = re.compile(r'^(|.*\s)([Mm]ale|[Ff]emale)(|[\s,].*)$')
indicators_suff_data_no_sex = [i for i in indicators_w_suff_data if not mf_regex.match(i)]
indicators_suff_data_no_sex

['Enrolment in tertiary education per 100,000 inhabitants, both sexes',
 'Enrolment in lower secondary education, private institutions, both sexes (number)',
 'Pupil/trained teacher ratio in secondary education (headcount basis)',
 'Annual statutory teacher salaries in public institutions in USD. Upper Secondary. 15 years of experience',
 'Population, age 9, total',
 'Repeaters in Grade 2 of primary education, both sexes (number)',
 'Gross enrolment ratio, primary, gender parity index (GPI)',
 'UIS: Percentage of population age 25+ with completed lower secondary education. Total',
 'Percentage of repeaters in Grade 6 of primary education, both sexes (%)',
 'Expenditure on tertiary as % of government expenditure on education (%)',
 'Enrolment in secondary education, private institutions, both sexes (number)',
 'Government expenditure per secondary student (constant PPP$)',
 'All staff compensation as % of total expenditure in lower secondary public institutions (%)',
 'Population, ages 

In [28]:
edstats_suff_data_no_sex = edstats_suff_key_yr_data[edstats_suff_key_yr_data['Indicator Name'].isin(indicators_suff_data_no_sex)]
edstats_suff_data_no_sex

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2020,2025,2030,2035,2040,2045,2050,2055,2060,2065,2070,2075,2080,2085,2090,2095,2100
4,Arab World,ARB,"Adjusted net enrolment rate, primary, both sex...",SE.PRM.TENR,54.822121,54.894138,56.209438,57.267109,57.991138,59.365540,60.999962,61.922680,62.693420,64.383186,65.617767,66.085152,66.608139,67.290451,68.510094,6.903321e+01,6.994491e+01,7.104187e+01,7.169378e+01,7.169910e+01,7.199582e+01,7.260284e+01,7.003272e+01,7.046482e+01,7.264568e+01,7.181176e+01,7.390351e+01,7.442520e+01,7.511082e+01,7.625432e+01,7.724568e+01,7.880052e+01,8.005140e+01,8.080539e+01,8.160706e+01,8.248949e+01,8.268551e+01,8.328034e+01,8.401187e+01,8.419596e+01,8.521200e+01,8.524514e+01,8.610167e+01,8.551194e+01,8.532015e+01,,,,,,,,,,,,,,,,,,,,
6,Arab World,ARB,"Adjusted net enrolment rate, primary, gender p...",UIS.NERA.1.GPI,0.658570,0.656400,0.663290,0.672040,0.672610,0.691760,0.699950,0.710140,0.718430,0.743740,0.759440,0.769250,0.779860,0.790710,0.799610,8.067700e-01,8.160600e-01,8.251300e-01,8.341900e-01,8.347600e-01,8.446600e-01,8.543200e-01,8.264900e-01,8.340800e-01,8.590900e-01,8.473100e-01,8.726600e-01,8.726900e-01,8.849400e-01,8.973700e-01,9.040600e-01,9.127500e-01,9.197900e-01,9.263000e-01,9.276800e-01,9.308400e-01,9.296200e-01,9.384600e-01,9.425000e-01,9.434700e-01,9.476200e-01,9.467000e-01,9.620800e-01,9.640900e-01,9.662000e-01,,,,,,,,,,,,,,,,,,,,
12,Arab World,ARB,Adjusted net intake rate to Grade 1 of primary...,UIS.NIRA.1,52.448921,52.489750,52.635593,53.327000,54.184654,54.865627,58.701626,58.876965,59.452522,62.776375,62.559566,62.594330,62.447292,63.409138,65.837379,6.461309e+01,6.592690e+01,6.677563e+01,6.644735e+01,6.693183e+01,6.708044e+01,6.750871e+01,6.765328e+01,6.744469e+01,6.865227e+01,6.920802e+01,6.865168e+01,6.961156e+01,6.686764e+01,6.832169e+01,7.073758e+01,7.359420e+01,7.497616e+01,7.532858e+01,7.642851e+01,7.613177e+01,7.522256e+01,7.559570e+01,7.489394e+01,7.481455e+01,7.619704e+01,7.647497e+01,7.731937e+01,7.656671e+01,7.662057e+01,,,,,,,,,,,,,,,,,,,,
14,Arab World,ARB,Adjusted net intake rate to Grade 1 of primary...,UIS.NIRA.1.GPI,0.736340,0.732880,0.734730,0.741420,0.746210,0.754740,0.750240,0.772300,0.776240,0.809390,0.812520,0.817960,0.824060,0.829120,0.832300,8.458600e-01,8.528200e-01,8.609200e-01,8.626300e-01,8.648200e-01,8.724900e-01,8.792300e-01,8.867200e-01,8.928200e-01,9.016000e-01,9.101200e-01,9.161200e-01,9.251700e-01,9.444900e-01,9.551900e-01,9.557000e-01,9.631500e-01,9.641100e-01,9.664500e-01,9.722700e-01,9.743000e-01,9.817100e-01,9.864300e-01,,,9.912800e-01,9.843900e-01,9.872000e-01,9.920000e-01,9.957100e-01,,,,,,,,,,,,,,,,,,,,
17,Arab World,ARB,"Adult illiterate population, 15+ years, both s...",UIS.LP.AG15T99,,,,,,,,,,,,,,,,6.031946e+07,6.031946e+07,6.031946e+07,6.031946e+07,6.031946e+07,6.031946e+07,6.031946e+07,6.031946e+07,6.031946e+07,6.031946e+07,6.155295e+07,6.155295e+07,6.155295e+07,6.155295e+07,6.155295e+07,6.155295e+07,6.155295e+07,6.155295e+07,6.155295e+07,6.155295e+07,5.569861e+07,5.569861e+07,5.569861e+07,5.569861e+07,5.569861e+07,5.569861e+07,5.569861e+07,5.569861e+07,5.569861e+07,5.569861e+07,,,,,,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886250,Zimbabwe,ZWE,Theoretical duration of primary education (years),SE.PRM.DURS,7.000000,7.000000,7.000000,7.000000,7.000000,7.000000,7.000000,7.000000,7.000000,7.000000,7.000000,7.000000,7.000000,7.000000,7.000000,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.000000e+00,7.0,7.0,,,,,,,,,,,,,,,,,,
886251,Zimbabwe,ZWE,Theoretical duration of secondary education (y...,SE.SEC.DURS,6.000000,6.000000,6.000000,6.000000,6.000000,6.000000,6.000000,6.000000,6.000000,6.000000,6.000000,6.000000,6.000000,6.000000,6.000000,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.000000e+00,6.0,6.0,,,,,,,,,,,,,,,,,,
886252,Zimbabwe,ZWE,Theoretical duration of upper secondary educat...,SE.SEC.DURS.UP,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.000000e+00,4.0,4.0,,,,,,,,,,,,,,,,,,
886363,Zimbabwe,ZWE,Total outbound internationally mobile tertiary...,UIS.OE.56.40510,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6.802000e+03,8.485000e+03,1.051400e+04,1.223200e+04,1.552500e+04,1.757600e+04,1.546400e+04,1.599400e+04,1.620000e+04,2.134700e+04,2.458100e+04,2.079200e+04,2.523600e+04,2.870000e+04,1.601200e+04,1.588500e+04,,,,,,,,,,,,,,,,,,,,,


In [31]:
len(edstats_suff_data_no_sex) / len(edstats) * 100

5.475403921391767

In [35]:
region_country_codes = ["ARB", "EAP", "EAS", "ECA", "ECS", "EMU", "EUU", "HIC", "HPC", "LAC", "LCN", "LDC", "LIC",
                        "LMC", "LMY", "MEA", "MIC", "MNA", "NAC", "OED", "SAS", "SSA", "SSF", "UMC", "WLD"]
edstats_suff_data_no_sex_region = edstats_suff_data_no_sex[~edstats_suff_data_no_sex['Country Code'].isin(region_country_codes)]
edstats_suff_data_no_sex_region

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2020,2025,2030,2035,2040,2045,2050,2055,2060,2065,2070,2075,2080,2085,2090,2095,2100
92327,Afghanistan,AFG,Duration of compulsory education (years),SE.COM.DURS,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,9.0,9.0,9.0,9.0,9.0,9.0,9.0,9.0,9.0,,,,,,,,,,,,,,,,,,
92758,Afghanistan,AFG,Enrolment in Grade 1 of lower secondary genera...,UIS.E.2.GPV.G1.T,38844.0,40868.0,43644.0,46386.0,49282.0,56555.0,69782.0,70712.0,72997.0,,86774.0,93254.0,44771.0,,52607.0,53109.0,52371.0,,53708.0,56583.0,,,,74962.0,,114002.0,,,,,,,,129419.0,189028.0,208145.0,,348498.0,,563596.0,594126.0,594073.0,622065.0,630990.0,652756.0,,,,,,,,,,,,,,,,,,,,
92760,Afghanistan,AFG,"Enrolment in Grade 1 of primary education, bot...",UIS.E.1.G1,132066.0,143743.0,154080.0,157621.0,168449.0,168884.0,174099.0,187835.0,199875.0,,236940.0,244852.0,70835.0,,98763.0,108733.0,128146.0,,147970.0,140159.0,,,,141442.0,,292759.0,,,,,,,1330567.0,1410344.0,,903979.0,,811282.0,,948131.0,1032665.0,1087009.0,1159081.0,1179675.0,1159427.0,,,,,,,,,,,,,,,,,,,,
92784,Afghanistan,AFG,"Enrolment in lower secondary education, both s...",UIS.E.2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,202463.0,,291312.0,420950.0,461349.0,735697.0,750492.0,1063816.0,1273188.0,1476813.0,1465558.0,1534469.0,1569153.0,1633965.0,,,,,,,,,,,,,,,,,,,,
92792,Afghanistan,AFG,"Enrolment in lower secondary general, both sex...",UIS.E.2.GPV,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,202463.0,287490.0,291312.0,420950.0,461349.0,735697.0,750492.0,1063816.0,1273188.0,1476813.0,1465558.0,1534469.0,1569153.0,1633965.0,,,,,,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886250,Zimbabwe,ZWE,Theoretical duration of primary education (years),SE.PRM.DURS,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,,,,,,,,,,,,,,,,,,
886251,Zimbabwe,ZWE,Theoretical duration of secondary education (y...,SE.SEC.DURS,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,,,,,,,,,,,,,,,,,,
886252,Zimbabwe,ZWE,Theoretical duration of upper secondary educat...,SE.SEC.DURS.UP,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,,,,,,,,,,,,,,,,,,
886363,Zimbabwe,ZWE,Total outbound internationally mobile tertiary...,UIS.OE.56.40510,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6802.0,8485.0,10514.0,12232.0,15525.0,17576.0,15464.0,15994.0,16200.0,21347.0,24581.0,20792.0,25236.0,28700.0,16012.0,15885.0,,,,,,,,,,,,,,,,,,,,,


In [36]:
len(edstats_suff_data_no_sex_region) / len(edstats) * 100

5.034331908944336

In [42]:
total_pops = edstats_suff_data_no_sex_region.loc[edstats_suff_data_no_sex_region['Indicator Name'] == 'Population, total']
with pd.option_context('display.max_rows', None, 'display.max_columns', None): display(total_pops)
# total_pops

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2020,2025,2030,2035,2040,2045,2050,2055,2060,2065,2070,2075,2080,2085,2090,2095,2100
94158,Afghanistan,AFG,"Population, total",SP.POP.TOTL,11126123.0,11417825.0,11721940.0,12027822.0,12321541.0,12590286.0,12840299.0,13067538.0,13237734.0,13306695.0,13248370.0,13053954.0,12749640.0,12389270.0,12047120.0,11783050.0,11601040.0,11502760.0,11540890.0,11777610.0,12249110.0,12993660.0,13981230.0,15095100.0,16172720.0,17099540.0,17822880.0,18381600.0,18864000.0,19403680.0,20093760.0,20966460.0,21979920.0,23064850.0,24118980.0,25070800.0,25893450.0,26616790.0,27294030.0,28004330.0,28803170.0,29708600.0,30696960.0,31731690.0,32758020.0,33736490.0,34656030.0,,,,,,,,,,,,,,,,,,
97823,Albania,ALB,"Population, total",SP.POP.TOTL,2135479.0,2187853.0,2243126.0,2296752.0,2350124.0,2404831.0,2458526.0,2513546.0,2566266.0,2617832.0,2671997.0,2726056.0,2784278.0,2843960.0,2904429.0,2964762.0,3022635.0,3083605.0,3142336.0,3227943.0,3286542.0,3266790.0,3247039.0,3227287.0,3207536.0,3187784.0,3168033.0,3148281.0,3128530.0,3108778.0,3089027.0,3060173.0,3051010.0,3039616.0,3026939.0,3011487.0,2992547.0,2970017.0,2947314.0,2927519.0,2913021.0,2905195.0,2900401.0,2895092.0,2889104.0,2880703.0,2876101.0,,,,,,,,,,,,,,,,,,
101488,Algeria,DZA,"Population, total",SP.POP.TOTL,14550034.0,14960109.0,15377093.0,15804428.0,16247113.0,16709099.0,17190239.0,17690184.0,18212326.0,18760761.0,19337715.0,19943664.0,20575700.0,21228290.0,21893850.0,22565900.0,23241270.0,23917900.0,24591490.0,25257670.0,25912370.0,26554330.0,27181090.0,27786260.0,28362250.0,28904300.0,29411420.0,29886840.0,30335730.0,30765610.0,31183660.0,31592150.0,31995050.0,32403510.0,32831100.0,33288440.0,33777920.0,34300080.0,34860720.0,35465760.0,36117640.0,36819560.0,37565850.0,38338560.0,39113310.0,39871530.0,40606050.0,,,,,,,,,,,,,,,,,,
105153,American Samoa,ASM,"Population, total",SP.POP.TOTL,27292.0,27916.0,28492.0,29014.0,29488.0,29932.0,30321.0,30689.0,31102.0,31673.0,32457.0,33493.0,34738.0,36160.0,37688.0,39241.0,40837.0,42450.0,44047.0,45593.0,47038.0,48375.0,49593.0,50720.0,51803.0,52868.0,53929.0,54941.0,55901.0,56770.0,57521.0,58175.0,58731.0,59117.0,59264.0,59118.0,58650.0,57903.0,57030.0,56227.0,55637.0,55320.0,55230.0,55307.0,55437.0,55537.0,55599.0,,,,,,,,,,,,,,,,,,
108818,Andorra,AND,"Population, total",SP.POP.TOTL,24276.0,25559.0,26892.0,28232.0,29520.0,30705.0,31777.0,32771.0,33737.0,34818.0,36067.0,37500.0,39114.0,40867.0,42706.0,44600.0,46517.0,48455.0,50434.0,52448.0,54509.0,56671.0,58888.0,60971.0,62677.0,63850.0,64360.0,64327.0,64142.0,64370.0,65390.0,67341.0,70049.0,73182.0,76244.0,78867.0,80991.0,82683.0,83861.0,84462.0,84449.0,83751.0,82431.0,80788.0,79223.0,78014.0,77281.0,,,,,,,,,,,,,,,,,,
112483,Angola,AGO,"Population, total",SP.POP.TOTL,6776381.0,6927269.0,7094834.0,7277960.0,7474338.0,7682479.0,7900997.0,8130988.0,8376147.0,8641521.0,8929900.0,9244507.0,9582156.0,9931562.0,10277320.0,10609040.0,10921040.0,11218270.0,11513970.0,11827240.0,12171440.0,12553450.0,12968340.0,13403730.0,13841300.0,14268990.0,14682280.0,15088980.0,15504320.0,15949770.0,16440920.0,16983270.0,17572650.0,18203370.0,18865720.0,19552540.0,20262400.0,20997690.0,21759420.0,22549550.0,23369130.0,24218560.0,25096150.0,25998340.0,26920470.0,27859300.0,28813460.0,,,,,,,,,,,,,,,,,,
116148,Antigua and Barbuda,ATG,"Population, total",SP.POP.TOTL,67098.0,68188.0,69176.0,70066.0,70878.0,71609.0,72285.0,72875.0,73324.0,73528.0,73442.0,73066.0,72448.0,71639.0,70725.0,69782.0,68809.0,67845.0,67058.0,66627.0,66696.0,67307.0,68427.0,69938.0,71719.0,73619.0,75628.0,77739.0,79851.0,81831.0,83584.0,85057.0,86266.0,87293.0,88257.0,89253.0,90301.0,91381.0,92478.0,93581.0,94661.0,95719.0,96777.0,97824.0,98875.0,99923.0,100963.0,,,,,,,,,,,,,,,,,,
119813,Argentina,ARG,"Population, total",SP.POP.TOTL,23973058.0,24366439.0,24782949.0,25213388.0,25644506.0,26066975.0,26477152.0,26878565.0,27277741.0,27684534.0,28105888.0,28543364.0,28993990.0,29454740.0,29920900.0,30388780.0,30857240.0,31326470.0,31795520.0,32263560.0,32729740.0,33193920.0,33655150.0,34110920.0,34558120.0,34994810.0,35419680.0,35833970.0,36241590.0,36648070.0,37057450.0,37471510.0,37889370.0,38309380.0,38728700.0,39145490.0,39558890.0,39970220.0,40382390.0,40799410.0,41223890.0,41656880.0,42096740.0,42539920.0,42981520.0,43417760.0,43847430.0,,,,,,,,,,,,,,,,,,
123478,Armenia,ARM,"Population, total",SP.POP.TOTL,2525065.0,2587706.0,2650484.0,2712781.0,2773747.0,2832757.0,2889579.0,2944379.0,2997411.0,3049105.0,3099751.0,3148092.0,3193686.0,3238594.0,3285595.0,3335935.0,3392256.0,3451942.0,3504651.0,3536469.0,3538165.0,3505251.0,3442810.0,3363098.0,3283660.0,3217342.0,3168215.0,3133086.0,3108684.0,3089017.0,3069588.0,3050655.0,3033897.0,3017806.0,3000612.0,2981259.0,2958500.0,2933056.0,2908220.0,2888584.0,2877311.0,2875581.0,2881922.0,2893509.0,2906220.0,2916950.0,2924816.0,,,,,,,,,,,,,,,,,,
127143,Aruba,ABW,"Population, total",SP.POP.TOTL,59063.0,59440.0,59840.0,60243.0,60528.0,60657.0,60586.0,60366.0,60103.0,59980.0,60096.0,60567.0,61345.0,62201.0,62836.0,63026.0,62644.0,61833.0,61079.0,61032.0,62149.0,64622.0,68235.0,72504.0,76700.0,80324.0,83200.0,85451.0,87277.0,89005.0,90853.0,92898.0,94992.0,97017.0,98737.0,100031.0,100832.0,101220.0,101353.0,101453.0,101669.0,102053.0,102577.0,103187.0,103795.0,104341.0,104822.0,,,,,,,,,,,,,,,,,,
