# **Economic Development vs. Sustainability**
# Data Preprocessing
Katlyn Goeujon-Mackness <br>
05/05/2025

## Introduction
Economic growth is often pursued at the cost of environmental sustainability. This study aims to analyze the balance between economic development and sustainable practices across different regions, industries, and policies.

In this phase of the data analysis, we will locate, collect and process necessary raw data, applying any transformations necessary. Sections will be organized by key indicator. Finally, we will export processed data in CSV format for analysis.

### Key Challenge
Achieving sustainable economic growth requires balancing financial prosperity with environmental and social responsibility. Identifying actionable patterns in historical data can inform policymakers, businesses, and environmental advocates.

### Data of Interest
- GDP growth rate compared to carbon emissions per capita.
- Percentage of renewable energy adoption.
- Employment trends in green industries.
- Improvement in environmental quality indicators (air quality, water safety).
- Sustainability index scores vs. economic performance.

### Locating Relevant Data
- **World Bank**: Economic and environmental indicators.
    * [GDP per capita growth (annual %)](https://data.worldbank.org/indicator/NY.GDP.PCAP.KD.ZG)
- **United Nations SDGs Database**: Sustainable development statistics.
- **OECD**: Policy effectiveness on sustainability.
- **NASA Earth Observations**: Environmental impact metrics.
- **National Employment Data**: Job growth in sustainable sectors.

### Contents
* [GDP](#gdp)


In [866]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Prevent truncating columns and rows
pd.set_option("display.max_rows", None) 
pd.set_option("display.max_columns", None) 

---
## GDP
### Discovering
From the World Bank Group, raw data indluces per capita growth by annual percent, as well as metadata indicators to aid analysis.

In [None]:
gdp = pd.read_csv("data/raw/gdp/GDP-per-capita-percent-growth.csv", encoding="latin1")
gdp_meta = pd.read_csv("data/raw/gdp/Metadata-GDP-per-capita-percent-growth.csv", encoding="latin1")
gdp_meta_ind = pd.read_csv("data/raw/gdp/Metadata_Indicator_GDP-per-capita-percent-growth.csv", encoding='latin1')

In [868]:
# Metadata indicator provides information about the dataset
print(gdp_meta_ind)

  ï»¿"INDICATOR_CODE"                    INDICATOR_NAME  \
0   NY.GDP.PCAP.KD.ZG  GDP per capita growth (annual %)   

                                         SOURCE_NOTE  \
0  Annual percentage growth rate of GDP per capit...   

                                 SOURCE_ORGANIZATION  Unnamed: 4  
0  World Bank national accounts data, and OECD Na...         NaN  


In [869]:
gdp.head(3)

Unnamed: 0,"ï»¿""Country Name""",Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024,Unnamed: 69
0,Aruba,ABW,GDP per capita growth (annual %),NY.GDP.PCAP.KD.ZG,,,,,,,,,,,,,,,,,,,,,,,,,,,,17.593206,18.304687,10.066932,0.13448,2.813435,1.111856,0.492195,2.751524,-0.292643,-2.733864,2.978397,-0.48716,-0.125966,6.519224,3.212406,-1.628099,-0.033839,5.026912,-2.930823,-0.673258,2.322678,1.061772,-12.274936,-2.956953,2.610525,-2.484648,4.855279,-2.629615,-1.635753,0.951538,7.040657,2.234428,-2.496549,-25.79323,25.154964,8.912308,4.216132,,
1,Africa Eastern and Southern,AFE,GDP per capita growth (annual %),NY.GDP.PCAP.KD.ZG,,-2.13663,5.009835,2.794289,1.830475,2.245841,1.917904,2.418062,1.18621,2.097636,-1.747625,2.477136,-0.110166,1.595887,2.383976,-1.515149,-0.561033,-1.699047,-1.480369,-0.240458,2.346863,0.84449,-2.858611,-3.045377,0.277329,-3.091139,-0.744192,0.885399,1.287891,-0.166717,-2.655658,-2.808265,-4.910566,-3.314233,-0.745923,1.61081,2.699137,1.214009,-0.850095,-0.007474,0.570581,0.860717,1.163242,0.294752,2.803787,3.378235,3.770141,3.814958,1.563935,-1.852648,2.393038,1.308359,-0.957114,1.512944,1.218467,0.264191,-0.468619,0.007074,-0.102513,-0.549575,-5.45141,1.842087,0.903488,-0.226928,,
2,Afghanistan,AFG,GDP per capita growth (annual %),NY.GDP.PCAP.KD.ZG,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,-10.119484,22.02019,2.345672,-2.148212,7.383377,1.132485,11.692303,1.677279,17.043896,11.055031,-3.213295,8.279369,2.052068,-0.939985,-1.665057,-0.300121,-0.19557,-1.713743,0.856295,-5.382515,-22.584482,-7.576669,0.540656,,


In [870]:
gdp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 266 entries, 0 to 265
Data columns (total 70 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   ï»¿"Country Name"  266 non-null    object 
 1   Country Code       266 non-null    object 
 2   Indicator Name     266 non-null    object 
 3   Indicator Code     266 non-null    object 
 4   1960               0 non-null      float64
 5   1961               145 non-null    float64
 6   1962               152 non-null    float64
 7   1963               152 non-null    float64
 8   1964               152 non-null    float64
 9   1965               152 non-null    float64
 10  1966               155 non-null    float64
 11  1967               159 non-null    float64
 12  1968               161 non-null    float64
 13  1969               161 non-null    float64
 14  1970               161 non-null    float64
 15  1971               185 non-null    float64
 16  1972               185 non

In [871]:
gdp.describe()

Unnamed: 0,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024,Unnamed: 69
count,0.0,145.0,152.0,152.0,152.0,152.0,155.0,159.0,161.0,161.0,161.0,185.0,185.0,185.0,185.0,187.0,190.0,191.0,196.0,196.0,197.0,206.0,208.0,210.0,210.0,213.0,214.0,216.0,220.0,221.0,223.0,242.0,242.0,243.0,243.0,244.0,245.0,245.0,247.0,247.0,247.0,248.0,248.0,252.0,252.0,252.0,252.0,253.0,253.0,256.0,257.0,257.0,257.0,257.0,258.0,258.0,257.0,257.0,258.0,257.0,257.0,257.0,254.0,243.0,0.0,0.0
mean,,1.551609,2.982683,2.762094,3.97995,3.57312,2.421155,1.777137,3.927512,4.517874,4.316691,2.932702,3.098565,3.293566,3.08158,-0.013086,3.903495,2.145544,2.254961,2.028502,0.668481,0.219847,-1.010194,-0.679292,1.190731,1.018527,1.317372,1.703381,2.474171,0.97366,1.138905,-1.070382,-0.635606,-0.142672,0.421589,2.292233,2.878801,3.73535,1.981517,1.718242,3.164254,1.727875,1.919283,2.419399,4.357371,3.588082,4.198835,3.970836,2.21756,-1.344087,3.061513,2.378194,1.78168,1.704722,1.914273,1.372379,1.832635,2.087969,1.884746,1.760215,-5.723905,4.818938,3.254277,2.123455,,
std,,5.57907,4.778384,5.428235,5.269092,5.565084,4.284444,6.757169,7.300174,4.525287,7.432219,4.383079,5.556503,7.383348,6.108751,5.585016,6.169807,5.011204,5.435626,6.164588,6.111777,5.837776,4.987514,5.02532,4.760954,4.633277,4.894383,5.023585,5.52647,5.721857,7.060999,7.820009,9.346929,6.866627,7.363371,5.828973,6.071763,11.406349,4.803281,4.608824,6.227211,4.934002,4.491697,5.178328,5.137998,3.928841,4.026833,4.27262,4.049335,4.995444,3.958534,5.114071,7.643561,4.394242,3.247474,4.38871,3.665226,3.991325,2.897729,3.322241,7.631557,5.708325,6.060594,6.193483,,
min,,-26.527644,-20.854896,-14.163578,-14.458423,-15.018576,-10.519981,-17.52377,-8.580169,-9.300131,-48.243296,-13.31412,-15.612517,-19.237863,-18.599207,-17.555827,-26.374675,-14.64238,-25.892434,-28.435258,-24.388705,-24.756413,-21.074529,-17.441688,-19.243349,-14.894531,-20.143467,-19.412935,-15.252919,-43.604223,-44.552252,-64.423582,-45.324884,-35.546284,-41.540612,-14.230487,-18.31943,-13.590922,-23.877852,-27.113647,-16.217142,-11.716099,-14.988201,-38.538218,-6.824979,-12.676373,-6.893259,-22.076582,-18.654258,-15.899715,-13.015172,-49.127857,-48.428726,-36.824547,-24.511054,-30.150755,-12.09711,-9.306452,-18.05056,-12.498917,-55.228911,-22.584482,-22.745681,-21.164316,,
25%,,-0.417608,1.163929,0.330003,1.924268,0.89927,-0.265863,-0.763237,1.18621,1.894998,1.258591,0.599649,-0.328982,0.881619,0.636585,-2.571406,0.631037,-0.054005,-0.37927,-0.252211,-2.299758,-2.287274,-3.528409,-3.601858,-1.435592,-1.256779,-0.928378,-1.119946,0.003492,-0.571288,-2.049457,-2.843855,-3.089044,-2.242711,-1.503778,0.354981,0.883746,1.214009,0.365098,-0.406102,1.12397,-0.016422,0.17168,0.625484,2.099233,1.705244,2.117991,1.854643,0.1747,-4.517814,0.95227,0.800573,-0.353869,0.226083,0.558534,0.00806,0.277408,0.46232,0.503853,0.071604,-8.075626,1.771729,1.349278,0.29487,,
50%,,2.198412,2.733083,3.079408,4.33008,3.386048,2.461462,2.048103,3.653098,4.358969,3.807399,2.728298,3.247397,3.788932,2.941323,-0.183188,3.784248,2.449207,2.692403,2.491293,1.13499,0.732017,-0.685287,-0.250305,1.620469,1.621519,1.77121,1.389041,2.573625,1.339109,1.260802,-0.384955,0.127308,0.459093,1.768206,2.233087,2.572464,2.794926,2.059203,1.882233,3.089054,1.630748,1.624427,2.586906,3.735167,3.20013,3.734873,3.761896,2.233752,-1.273541,3.019247,2.435003,1.526211,1.754549,1.89961,1.585844,1.895257,2.111455,2.006731,1.641704,-4.66529,4.414695,2.808548,1.708805,,
75%,,4.466541,4.806262,4.742374,5.727695,5.1467,4.892294,4.557754,5.210488,6.406064,6.386637,4.424564,5.158774,5.530668,4.579107,2.63912,5.741338,4.775645,5.307611,4.952525,4.258175,3.504438,1.444202,2.52152,3.756309,3.052309,3.379565,4.113804,4.726862,3.517981,3.498611,2.298572,3.45435,3.103502,3.740303,3.979864,4.496944,4.425325,3.899009,3.686224,4.42279,3.120589,3.842652,4.704912,5.962149,5.436045,6.271037,6.289637,4.42259,1.922682,5.028538,4.776793,3.89322,3.712311,3.690974,3.442575,3.488788,3.93498,3.8152,3.228631,-2.460782,7.06063,4.882776,3.732239,,
max,,22.113903,27.110688,32.280917,39.840747,42.016075,15.702949,62.285555,77.41001,22.461098,46.366027,23.694967,24.65025,55.823529,44.365224,19.340903,32.536867,20.201679,20.442132,23.669347,20.883835,19.147281,20.966438,13.144281,23.46106,20.570991,20.034433,25.178805,32.321813,21.194246,55.872942,46.443818,51.103509,31.01041,17.553636,61.873535,60.341092,140.490578,30.825438,20.816108,77.089569,53.235688,22.02019,14.619701,49.073853,26.660095,33.030488,23.590685,20.513208,17.043896,24.656802,18.828534,91.78137,18.140048,13.984753,23.443691,31.097891,30.395061,8.384961,21.827247,43.512346,33.768559,62.111024,74.674529,,


In [872]:
# Metadata table provides additional categorical information that could be useful in the current analysis
gdp_meta.head()

Unnamed: 0,"ï»¿""Country Code""",Region,IncomeGroup,SpecialNotes,TableName,Unnamed: 5
0,ABW,Latin America & Caribbean,High income,,Aruba,
1,AFE,,,"26 countries, stretching from the Red Sea in t...",Africa Eastern and Southern,
2,AFG,South Asia,Low income,The reporting period for national accounts dat...,Afghanistan,
3,AFW,,,"22 countries, stretching from the westernmost ...",Africa Western and Central,
4,AGO,Sub-Saharan Africa,Lower middle income,The World Bank systematically assesses the app...,Angola,


In [873]:
gdp_meta.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 265 entries, 0 to 264
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   ï»¿"Country Code"  265 non-null    object 
 1   Region             217 non-null    object 
 2   IncomeGroup        216 non-null    object 
 3   SpecialNotes       127 non-null    object 
 4   TableName          265 non-null    object 
 5   Unnamed: 5         0 non-null      float64
dtypes: float64(1), object(5)
memory usage: 12.6+ KB


In [874]:
gdp.shape

(266, 70)

#### Comments
- GDP main dataset:
    * Dataset needs to be melted from a wide format to a long format with a single column for year for efficiency
    * Column names are inconsistent
    * There is a lot of inconsistencies with years of data. Some years can be deleted, and some can be filled by interpolation.
- GDP metadata:
    * Tablenames can be used to filter out regional groups from the list of countries
    * Region and IncomeGroup can be added to the main dataset for future analysis
    * SpecialNotes will not be needed for this analysis and can be dropped


---
### Restructuring

In [875]:
# Remove special characters from column header
gdp.rename(columns={gdp.columns[0]: "Country Name"}, inplace=True)
gdp_meta.rename(columns={gdp_meta.columns[0]: "Country Code"}, inplace=True)

In [876]:
# Drop columns from gdp that are completely empty
gdp = gdp.dropna(axis=1, how='all')

# Drop unneeded columns
gdp.drop(columns=['Indicator Code'], inplace=True)

In [877]:
# Melt dataset from wide format to long format
gdp = gdp.melt(
    id_vars=['Country Name', 'Country Code', 'Indicator Name'],
    var_name="Year", # Creates new column to store Year data
    value_name="Value"  # Data values go here
)

# Ensure Year is coded as a numeric column
gdp["Year"] = pd.to_numeric(gdp["Year"], errors="coerce")

# preview
gdp.head(10)

Unnamed: 0,Country Name,Country Code,Indicator Name,Year,Value
0,Aruba,ABW,GDP per capita growth (annual %),1961,
1,Africa Eastern and Southern,AFE,GDP per capita growth (annual %),1961,-2.13663
2,Afghanistan,AFG,GDP per capita growth (annual %),1961,
3,Africa Western and Central,AFW,GDP per capita growth (annual %),1961,-0.247796
4,Angola,AGO,GDP per capita growth (annual %),1961,
5,Albania,ALB,GDP per capita growth (annual %),1961,
6,Andorra,AND,GDP per capita growth (annual %),1961,
7,Arab World,ARB,GDP per capita growth (annual %),1961,
8,United Arab Emirates,ARE,GDP per capita growth (annual %),1961,
9,Argentina,ARG,GDP per capita growth (annual %),1961,3.697198


---

In [878]:
print(gdp_meta.columns)

Index(['Country Code', 'Region', 'IncomeGroup', 'SpecialNotes', 'TableName',
       'Unnamed: 5'],
      dtype='object')


In [879]:
# Remove unneeded columns from gdp_meta
gdp_meta.drop(columns=['SpecialNotes', 'Unnamed: 5'], inplace=True)

In [880]:
# Create a new DataFrame that contains the Region and Income groups
gdp_groups_regions = gdp_meta[gdp_meta["Region"].isna()].copy()
gdp_groups_regions.drop(columns=["Region", "IncomeGroup"], inplace=True)
gdp_groups_regions.rename(columns={'TableName': "IncomeRegionGroup"}, inplace=True)

# Set "Country Code" as the index
gdp_groups_regions.reset_index(drop=True, inplace=True)

# Preview the updated table
gdp_groups_regions.head(5)


Unnamed: 0,Country Code,IncomeRegionGroup
0,AFE,Africa Eastern and Southern
1,AFW,Africa Western and Central
2,ARB,Arab World
3,CEB,Central Europe and the Baltics
4,CSS,Caribbean small states


In [881]:
# Remove empty rows after region extraction
gdp_meta_cleaned = gdp_meta[gdp_meta["Region"].notna()]

# Overwrite original dataframe
gdp_meta = gdp_meta_cleaned.copy()

In [882]:
# Remove unneeded columns from gdp_meta
gdp_meta.reset_index(drop=True, inplace=True)
gdp_meta.head(10)

Unnamed: 0,Country Code,Region,IncomeGroup,TableName
0,ABW,Latin America & Caribbean,High income,Aruba
1,AFG,South Asia,Low income,Afghanistan
2,AGO,Sub-Saharan Africa,Lower middle income,Angola
3,ALB,Europe & Central Asia,Upper middle income,Albania
4,AND,Europe & Central Asia,High income,Andorra
5,ARE,Middle East & North Africa,High income,United Arab Emirates
6,ARG,Latin America & Caribbean,Upper middle income,Argentina
7,ARM,Europe & Central Asia,Upper middle income,Armenia
8,ASM,East Asia & Pacific,High income,American Samoa
9,ATG,Latin America & Caribbean,High income,Antigua and Barbuda


---

In [883]:
# Filter out the region and income group rows from GDP
gdp_filtered_rows = gdp[gdp["Country Code"].isin(gdp_groups_regions["Country Code"])]

# Merge filtered out rows with gdp_groups_regions table
gdp_groups_regions = gdp_groups_regions.merge(gdp_filtered_rows, on="Country Code", how="left")
gdp_groups_regions.head(5)

Unnamed: 0,Country Code,IncomeRegionGroup,Country Name,Indicator Name,Year,Value
0,AFE,Africa Eastern and Southern,Africa Eastern and Southern,GDP per capita growth (annual %),1961,-2.13663
1,AFE,Africa Eastern and Southern,Africa Eastern and Southern,GDP per capita growth (annual %),1962,5.009835
2,AFE,Africa Eastern and Southern,Africa Eastern and Southern,GDP per capita growth (annual %),1963,2.794289
3,AFE,Africa Eastern and Southern,Africa Eastern and Southern,GDP per capita growth (annual %),1964,1.830475
4,AFE,Africa Eastern and Southern,Africa Eastern and Southern,GDP per capita growth (annual %),1965,2.245841


In [884]:
gdp_groups_regions.drop(columns={"Country Name"}, inplace=True)
gdp_groups_regions.head(5)

Unnamed: 0,Country Code,IncomeRegionGroup,Indicator Name,Year,Value
0,AFE,Africa Eastern and Southern,GDP per capita growth (annual %),1961,-2.13663
1,AFE,Africa Eastern and Southern,GDP per capita growth (annual %),1962,5.009835
2,AFE,Africa Eastern and Southern,GDP per capita growth (annual %),1963,2.794289
3,AFE,Africa Eastern and Southern,GDP per capita growth (annual %),1964,1.830475
4,AFE,Africa Eastern and Southern,GDP per capita growth (annual %),1965,2.245841


### Export GDP Income Groups and Regions to CSV

In [885]:
gdp_groups_regions.to_csv("data/processed/gdp_groups_regions.csv", index=False)

---

In [886]:
# Remove the rows in gdp that are present in gdp_filtered_rows
gdp = gdp[~gdp.index.isin(gdp_filtered_rows.index)]
gdp.reset_index(drop=True, inplace=True)
gdp.head(10)

Unnamed: 0,Country Name,Country Code,Indicator Name,Year,Value
0,Aruba,ABW,GDP per capita growth (annual %),1961,
1,Afghanistan,AFG,GDP per capita growth (annual %),1961,
2,Angola,AGO,GDP per capita growth (annual %),1961,
3,Albania,ALB,GDP per capita growth (annual %),1961,
4,Andorra,AND,GDP per capita growth (annual %),1961,
5,United Arab Emirates,ARE,GDP per capita growth (annual %),1961,
6,Argentina,ARG,GDP per capita growth (annual %),1961,3.697198
7,Armenia,ARM,GDP per capita growth (annual %),1961,
8,American Samoa,ASM,GDP per capita growth (annual %),1961,
9,Antigua and Barbuda,ATG,GDP per capita growth (annual %),1961,


### Export GDP by Country to CSV

In [887]:
gdp.to_csv("data/processed/gdp.csv", index=False)

---

Author: Katlyn Goeujon-Mackness