Dans ce notebook nous importons les données de la Banque Mondiale avec l'API

Installation des bibliothèques requises

In [1]:
#%pip install requests
#%pip install Faostat
#%pip install pycountry

Import des données économiques avec l'API de la Banque Mondiale (voir la doc sur https://datatopics.worldbank.org/world-development-indicators/)

Avant d'importer les variables dont nous avons besoin, il faut justement déterminer quelles sont ces variables. Une petite revue de littérature s'impose. 

idée: utiliser la variable "part de l'agriculture dans le PIB", "revenus de la déforestation" (si ça existe).
Référence qui revient souvent: la courbe de kuznets environnementale

Articles : 
- Andrée, B. P. J., Chamorro, A., Spencer, P., Koomen, E., & Dogo, H. (2019). Revisiting the relation between economic growth and the environment; a global assessment of deforestation, pollution and carbon emission. Renewable and Sustainable Energy Reviews, 114, 109221.)
  - inverted U-shaped relations between per capita income and deforestation
  - Poverty and income inequality correlate with higher deforestation
  - agricultural GDP shares correlate with deforestation
- López, R., & Galinato, G. I. (2005). Trade policies, economic growth, and the direct causes of deforestation. Land economics, 81(2), 145-169.
  -  Economy-wide factors such as trade openness and economic growth explain an important portion of the variation in three key factors of deforestation: poverty, agricultural expansion, and road building. 



Variables récupérées:

- Indice de GINI : SI.POV.GINI
- agricultural land (% of land area) : AG.LND.AGRI.ZS
- forest area (% of land area): AG.LND.FRST.ZS
- GDP (US$): NY.GDP.MKTP.CD
- GDP GROWTH : NY.GDP.MKTP.KD.GZ


In [None]:
import requests
import pandas as pd

def fetch_worldbank_data(indicator, start_date="1960", end_date="2024"):
    
    base_url = f"https://api.worldbank.org/v2/country/all/indicator/{indicator}?date={start_date}:{end_date}&format=json"
    all_records = []
    page = 1

    url = f"{base_url}&page={page}"
    req = requests.get(url)
    data_eco = req.json()
# déterminer le nombre de pages non vides
    if len(data_eco) > 0 and isinstance(data_eco[0], dict) and "pages" in data_eco[0]:
        total_pages = data_eco[0]["pages"]
        print(f"{indicator} - Nombre total de pages : {total_pages}")
    else:
        print(f"{indicator} - Impossible de déterminer le nombre de pages.")
        return None
# faire tourner le code jusqu'à la dernière page
    while page <= total_pages:
        url = f"{base_url}&page={page}"
        req = requests.get(url)
        data_eco = req.json()

        if len(data_eco) > 1 and isinstance(data_eco[1], list):
            all_records.extend(data_eco[1])
        page += 1

    df = pd.DataFrame(all_records)

    if "country" in df.columns and "date" in df.columns and "value" in df.columns:
        # Extraire le nom du pays depuis le dictionnaire dans la colonne 'country'
        df["country"] = df["country"].apply(lambda x: x["value"] if isinstance(x, dict) else x)
        table = df[["date", "value", "country"]]
        return table
    else:
        print(f"{indicator} - Les colonnes attendues ne sont pas disponibles.")
        return None


data = fetch_worldbank_data("NY.GDP.MKTP.CD", start_date="1960", end_date="2024")

NY.GDP.MKTP.CD - Nombre total de pages : 341


In [4]:
data

Unnamed: 0,date,value,country
0,2023,1.245472e+12,Africa Eastern and Southern
1,2022,1.191423e+12,Africa Eastern and Southern
2,2021,1.085745e+12,Africa Eastern and Southern
3,2020,9.333918e+11,Africa Eastern and Southern
4,2019,1.009721e+12,Africa Eastern and Southern
...,...,...,...
17019,1964,1.217138e+09,Zimbabwe
17020,1963,1.159512e+09,Zimbabwe
17021,1962,1.117602e+09,Zimbabwe
17022,1961,1.096647e+09,Zimbabwe


In [5]:
merged_data

Unnamed: 0,date,PIB,country,GINI,Part terres agricoles,Part terres foret
0,1960,,Afghanistan,,,
1,1960,2.421063e+10,Africa Eastern and Southern,,,
2,1960,1.190495e+10,Africa Western and Central,,,
3,1960,,Albania,,,
4,1960,2.723615e+09,Algeria,,,
...,...,...,...,...,...,...
17019,2023,1.742080e+10,West Bank and Gaza,,,
17020,2023,1.061717e+14,World,,,
17021,2023,,"Yemen, Rep.",,,
17022,2023,2.757796e+10,Zambia,,,


In [6]:
# Affichage des pays présents dans le document
#pays_bm = merged_data["country"].unique()
#print(pays_bm)

Pour pouvoir merger les données de la banque mondiale avec celles des deux autres bases de données (qui utilise des noms différents pour les pays) on utilise le package pycountry pour obtenir les codes pays iso 3166 

In [7]:
import pandas as pd

## pip install pycountry
import pycountry

# Fonction pour récupérer le code pays ISO 3166
def get_country_code(country_name):
    try:
        # Recherche dans pycountry
        country = pycountry.countries.lookup(country_name)
        return country.alpha_3  # Retourne le code à 3 lettres
    except LookupError:
        return None  # Si le pays n'est pas trouvé

# Ajouter une colonne 'code_pays' avec les codes ISO 3166
merged_data['code_pays'] = merged_data['country'].apply(get_country_code)

merged_data

Unnamed: 0,date,PIB,country,GINI,Part terres agricoles,Part terres foret,code_pays
0,1960,,Afghanistan,,,,AFG
1,1960,2.421063e+10,Africa Eastern and Southern,,,,
2,1960,1.190495e+10,Africa Western and Central,,,,
3,1960,,Albania,,,,ALB
4,1960,2.723615e+09,Algeria,,,,DZA
...,...,...,...,...,...,...,...
17019,2023,1.742080e+10,West Bank and Gaza,,,,
17020,2023,1.061717e+14,World,,,,
17021,2023,,"Yemen, Rep.",,,,
17022,2023,2.757796e+10,Zambia,,,,ZMB


On va maintenant changer les valeur Nan en pour pouvoir faire les régressions plus facilement

In [8]:
# Remplacer toutes les valeurs NaN par 0
data_BM = merged_data.apply(lambda col: col.fillna(0) if col.isnull().any() else col)
data_BM

Unnamed: 0,date,PIB,country,GINI,Part terres agricoles,Part terres foret,code_pays
0,1960,0.000000e+00,Afghanistan,0.0,0.0,0.0,AFG
1,1960,2.421063e+10,Africa Eastern and Southern,0.0,0.0,0.0,0
2,1960,1.190495e+10,Africa Western and Central,0.0,0.0,0.0,0
3,1960,0.000000e+00,Albania,0.0,0.0,0.0,ALB
4,1960,2.723615e+09,Algeria,0.0,0.0,0.0,DZA
...,...,...,...,...,...,...,...
17019,2023,1.742080e+10,West Bank and Gaza,0.0,0.0,0.0,0
17020,2023,1.061717e+14,World,0.0,0.0,0.0,0
17021,2023,0.000000e+00,"Yemen, Rep.",0.0,0.0,0.0,0
17022,2023,2.757796e+10,Zambia,0.0,0.0,0.0,ZMB


In [9]:
# Export des données en csv
data_BM.to_csv('data_BM.csv', index=False)
