# Raw Datasets Exploratory Data Analysis
This notebook performs performs EDA and processing on raw GDP datasets contained in `..\data\raw\gdp`.

## Step 0: Import and Read Data

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('ggplot')
pd.set_option('display.max_columns', 100)

In [2]:
df_gdp_annual_growth_wb = pd.read_csv(
    "../data/raw/gdp/gdp_annual_growth_percentage.csv",
    skiprows=4,
    skip_blank_lines=True
)
df_gdp_wb = pd.read_csv(
    "../data/raw/gdp/gdp_constant_USD2015.csv",
    skiprows=4,
    skip_blank_lines=True
)
df_gdp_per_capita_wb = pd.read_csv(
    "../data/raw/gdp/gdp_per_capita_constant_PPP.csv",
    skiprows=4,
    skip_blank_lines=True
)
df_gdp_per_capita_owid = pd.read_csv('../data/raw/gdp/gdp-per-capita-worldbank.csv')

 ---

## Step 1: Data Understanding
Inspect Dataframes using `info()`, `head()` and `describe()`.

Below the three datasets from ["WorldBank"](https://data.worldbank.org/indicator/NY.GDP.MKTP.CD) are explored. It can be seen that the format is different than the ["OurWorldInData"](https://ourworldindata.org/grapher/gdp-per-capita-worldbank) datasets (both CO2 emissions and GDP), since they contain a column for each year and not a single 'Year' column as in OWID. Therefore in the Data Processing phase, these three datasets are converted in the same 'Year' coulumn format, to ensure consistency. 

### 1.1 World Bank
GDP annual growth in percentage

In [3]:
df_gdp_annual_growth_wb.info()

<class 'pandas.DataFrame'>
RangeIndex: 266 entries, 0 to 265
Data columns (total 70 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Name    266 non-null    str    
 1   Country Code    266 non-null    str    
 2   Indicator Name  266 non-null    str    
 3   Indicator Code  266 non-null    str    
 4   1960            0 non-null      float64
 5   1961            145 non-null    float64
 6   1962            152 non-null    float64
 7   1963            152 non-null    float64
 8   1964            152 non-null    float64
 9   1965            152 non-null    float64
 10  1966            155 non-null    float64
 11  1967            159 non-null    float64
 12  1968            160 non-null    float64
 13  1969            160 non-null    float64
 14  1970            160 non-null    float64
 15  1971            184 non-null    float64
 16  1972            184 non-null    float64
 17  1973            184 non-null    float64
 18  1

In [4]:
df_gdp_annual_growth_wb['Indicator Code'].unique()

<StringArray>
['NY.GDP.MKTP.KD.ZG']
Length: 1, dtype: str

In [5]:
df_gdp_annual_growth_wb['Indicator Name'].unique()

<StringArray>
['GDP growth (annual %)']
Length: 1, dtype: str

`Indicator Code` and `Indicator Name` columns can be removed.

### 1.2 World Bank
GDP per capita 

In [6]:
df_gdp_per_capita_wb.info()

<class 'pandas.DataFrame'>
RangeIndex: 266 entries, 0 to 265
Data columns (total 70 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Name    266 non-null    str    
 1   Country Code    266 non-null    str    
 2   Indicator Name  266 non-null    str    
 3   Indicator Code  266 non-null    str    
 4   1960            0 non-null      float64
 5   1961            0 non-null      float64
 6   1962            0 non-null      float64
 7   1963            0 non-null      float64
 8   1964            0 non-null      float64
 9   1965            0 non-null      float64
 10  1966            0 non-null      float64
 11  1967            0 non-null      float64
 12  1968            0 non-null      float64
 13  1969            0 non-null      float64
 14  1970            0 non-null      float64
 15  1971            0 non-null      float64
 16  1972            0 non-null      float64
 17  1973            0 non-null      float64
 18  1

### 1.3 World Bank
GDP 

In [7]:
df_gdp_wb.info()

<class 'pandas.DataFrame'>
RangeIndex: 266 entries, 0 to 265
Data columns (total 70 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Name    266 non-null    str    
 1   Country Code    266 non-null    str    
 2   Indicator Name  266 non-null    str    
 3   Indicator Code  266 non-null    str    
 4   1960            145 non-null    float64
 5   1961            152 non-null    float64
 6   1962            152 non-null    float64
 7   1963            152 non-null    float64
 8   1964            152 non-null    float64
 9   1965            154 non-null    float64
 10  1966            158 non-null    float64
 11  1967            159 non-null    float64
 12  1968            159 non-null    float64
 13  1969            159 non-null    float64
 14  1970            182 non-null    float64
 15  1971            182 non-null    float64
 16  1972            182 non-null    float64
 17  1973            182 non-null    float64
 18  1

 ---
### 1.4 Our World In Data (OWID)
GDP per capita

In [8]:
df_gdp_per_capita_owid.info()

<class 'pandas.DataFrame'>
RangeIndex: 7240 entries, 0 to 7239
Data columns (total 5 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Entity                          7240 non-null   str    
 1   Code                            7240 non-null   str    
 2   Year                            7240 non-null   int64  
 3   GDP per capita                  7240 non-null   float64
 4   World region according to OWID  6785 non-null   str    
dtypes: float64(1), int64(1), str(3)
memory usage: 282.9 KB


This format (OWID) is cleaner.

 ---
## Step 2: Data Preparation
Convert datasets from WorldBank format (a column for each year), to OurWorldInData format (a column 'Year').

The three datasets are merged into `df_gdp_wb_merged` and, finally, the latter is splitted into:
- Countries only: `df_gdp_countries` 
- Other aggregates: `df_gdp_aggregates`.

### 2.1 WorldBank
GDP annual growth in percentage from WB format to OWID format

In [9]:
#remove unnamed columns
df_gdp_annual_growth_wb = df_gdp_annual_growth_wb.loc[:, ~df_gdp_annual_growth_wb.columns.str.contains("^Unnamed")]
df_gdp_annual_growth_wb.drop(columns=["Indicator Name", "Indicator Code"], inplace=True)

#transform the DataFrame from wide to long format
df_gdp_annual_growth_wb_to_owid = df_gdp_annual_growth_wb.melt(
    id_vars=["Country Name", "Country Code"], # columns to keep
    var_name="Year", # name for the new column
    value_name="GDP growth (annual %)" # name for the new value column
)

# convert the "Year" column to numeric and sort the DataFrame by "Country Name" and "Year"
# coercing errors to NaN and then dropping rows with NaN in "Year"
df_gdp_annual_growth_wb_to_owid["Year"] = pd.to_numeric(df_gdp_annual_growth_wb_to_owid["Year"], errors="coerce")
df_gdp_annual_growth_wb_to_owid = df_gdp_annual_growth_wb_to_owid.dropna(subset=["Year"])
df_gdp_annual_growth_wb_to_owid.sort_values(by=["Country Name", "Year"], inplace=True)
df_gdp_annual_growth_wb_to_owid.head()

Unnamed: 0,Country Name,Country Code,Year,GDP growth (annual %)
2,Afghanistan,AFG,1960,
268,Afghanistan,AFG,1961,
534,Afghanistan,AFG,1962,
800,Afghanistan,AFG,1963,
1066,Afghanistan,AFG,1964,


In [10]:
df_gdp_annual_growth_wb_to_owid.info()

<class 'pandas.DataFrame'>
Index: 17290 entries, 2 to 17289
Data columns (total 4 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Country Name           17290 non-null  str    
 1   Country Code           17290 non-null  str    
 2   Year                   17290 non-null  int64  
 3   GDP growth (annual %)  14133 non-null  float64
dtypes: float64(1), int64(1), str(2)
memory usage: 675.4 KB


### 2.2 WorldBank
GDP per capita from WB format to OWID format

In [11]:
#remove unnamed columns
df_gdp_per_capita_wb = df_gdp_per_capita_wb.loc[:, ~df_gdp_per_capita_wb.columns.str.contains("^Unnamed")]
df_gdp_per_capita_wb.drop(columns=["Indicator Name", "Indicator Code"], inplace=True)

#transform the DataFrame from wide to long format
df_gdp_per_capita_wb_to_owid = df_gdp_per_capita_wb.melt(
    id_vars=["Country Name", "Country Code"], # columns to keep
    var_name="Year", # name for the new column
    value_name="GDP per capita" # name for the new value column
)

# convert the "Year" column to numeric and sort the DataFrame by "Country Name" and "Year"
# coercing errors to NaN and then dropping rows with NaN in "Year"
df_gdp_per_capita_wb_to_owid["Year"] = pd.to_numeric(df_gdp_per_capita_wb_to_owid["Year"], errors="coerce")
df_gdp_per_capita_wb_to_owid = df_gdp_per_capita_wb_to_owid.dropna(subset=["Year"])
df_gdp_per_capita_wb_to_owid.sort_values(by=["Country Name", "Year"], inplace=True)
df_gdp_per_capita_wb_to_owid.head()

Unnamed: 0,Country Name,Country Code,Year,GDP per capita
2,Afghanistan,AFG,1960,
268,Afghanistan,AFG,1961,
534,Afghanistan,AFG,1962,
800,Afghanistan,AFG,1963,
1066,Afghanistan,AFG,1964,


In [12]:
df_gdp_per_capita_wb_to_owid.info()

<class 'pandas.DataFrame'>
Index: 17290 entries, 2 to 17289
Data columns (total 4 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Name    17290 non-null  str    
 1   Country Code    17290 non-null  str    
 2   Year            17290 non-null  int64  
 3   GDP per capita  8465 non-null   float64
dtypes: float64(1), int64(1), str(2)
memory usage: 675.4 KB


### 2.3 WorldBank
GDP from WB format to OWID format

In [13]:
#remove unnamed columns
df_gdp_wb = df_gdp_wb.loc[:, ~df_gdp_wb.columns.str.contains("^Unnamed")]
df_gdp_wb.drop(columns=["Indicator Name", "Indicator Code"], inplace=True)

#transform the DataFrame from wide to long format
df_gdp_wb_to_owid = df_gdp_wb.melt(
    id_vars=["Country Name", "Country Code"], # columns to keep
    var_name="Year", # name for the new column
    value_name="GDP" # name for the new value column
)

# convert the "Year" column to numeric and sort the DataFrame by "Country Name" and "Year"
# coercing errors to NaN and then dropping rows with NaN in "Year"
df_gdp_wb_to_owid["Year"] = pd.to_numeric(df_gdp_wb_to_owid["Year"], errors="coerce")
df_gdp_wb_to_owid = df_gdp_wb_to_owid.dropna(subset=["Year"])
df_gdp_wb_to_owid.sort_values(by=["Country Name", "Year"], inplace=True)
df_gdp_wb_to_owid.head()

Unnamed: 0,Country Name,Country Code,Year,GDP
2,Afghanistan,AFG,1960,
268,Afghanistan,AFG,1961,
534,Afghanistan,AFG,1962,
800,Afghanistan,AFG,1963,
1066,Afghanistan,AFG,1964,


In [14]:
df_gdp_wb_to_owid.info()

<class 'pandas.DataFrame'>
Index: 17290 entries, 2 to 17289
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Country Name  17290 non-null  str    
 1   Country Code  17290 non-null  str    
 2   Year          17290 non-null  int64  
 3   GDP           14306 non-null  float64
dtypes: float64(1), int64(1), str(2)
memory usage: 675.4 KB


Now the three datasets from WB have been converted to OWID format with 'Year' column.


 ---
### 2.4 Dataset Merging
Below the three converted datasets are merged into a single dataset `df_gdp_wb_merged`.

In [15]:
df_gdp_wb_partial_merge = pd.merge(
    df_gdp_wb_to_owid,
    df_gdp_per_capita_wb_to_owid,
    on=["Country Name", "Country Code", "Year"],
    how="outer", # keep all rows from both DataFrames even if there is no match or NaNs
)
df_gdp_wb_partial_merge.info()

<class 'pandas.DataFrame'>
RangeIndex: 17290 entries, 0 to 17289
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Name    17290 non-null  str    
 1   Country Code    17290 non-null  str    
 2   Year            17290 non-null  int64  
 3   GDP             14306 non-null  float64
 4   GDP per capita  8465 non-null   float64
dtypes: float64(2), int64(1), str(2)
memory usage: 675.5 KB


In [16]:
df_gdp_wb_merged = pd.merge(
    df_gdp_wb_partial_merge,
    df_gdp_annual_growth_wb_to_owid,
    on=["Country Name", "Country Code", "Year"],
    how="outer",
)
df_gdp_wb_merged.info()

<class 'pandas.DataFrame'>
RangeIndex: 17290 entries, 0 to 17289
Data columns (total 6 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Country Name           17290 non-null  str    
 1   Country Code           17290 non-null  str    
 2   Year                   17290 non-null  int64  
 3   GDP                    14306 non-null  float64
 4   GDP per capita         8465 non-null   float64
 5   GDP growth (annual %)  14133 non-null  float64
dtypes: float64(3), int64(1), str(2)
memory usage: 810.6 KB


Quick check to ensure data in WB are the same as OWID:

In [17]:
TOLERANCE = 0.015
temp_df_comparison = pd.merge(
    df_gdp_wb_merged,
    df_gdp_per_capita_owid,
    left_on=["Country Code", "Year"],
    right_on=["Code", "Year"],       
    suffixes=('_wb', '_owid'),       
    how='inner' # keep only rows that have a match in both DataFrames
    )

temp_df_comparison['diff_absolute'] = temp_df_comparison['GDP per capita_wb'] - temp_df_comparison['GDP per capita_owid']
temp_df_comparison['is_different'] = temp_df_comparison['diff_absolute'].abs() > TOLERANCE
temp_df_comparison['is_different'].value_counts()

is_different
False    6768
Name: count, dtype: int64

Here is confirmed that OWID data where taken from `gdp_per_capita_constant_PPP.csv` therefore we can use the new merged DataFrame `df_gdp_wb_merged` created from WorlBank data.

Remove unnecessary rows (where all GDP, GDP Per Capita and GDP growth is NaN).

In [18]:
df_gdp_wb_merged.info()

<class 'pandas.DataFrame'>
RangeIndex: 17290 entries, 0 to 17289
Data columns (total 6 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Country Name           17290 non-null  str    
 1   Country Code           17290 non-null  str    
 2   Year                   17290 non-null  int64  
 3   GDP                    14306 non-null  float64
 4   GDP per capita         8465 non-null   float64
 5   GDP growth (annual %)  14133 non-null  float64
dtypes: float64(3), int64(1), str(2)
memory usage: 810.6 KB


In [19]:
df_gdp_wb_merged = df_gdp_wb_merged.dropna(subset=['GDP', 'GDP per capita', 'GDP growth (annual %)'], how='all')
df_gdp_wb_merged.info()

<class 'pandas.DataFrame'>
Index: 14394 entries, 40 to 17289
Data columns (total 6 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Country Name           14394 non-null  str    
 1   Country Code           14394 non-null  str    
 2   Year                   14394 non-null  int64  
 3   GDP                    14306 non-null  float64
 4   GDP per capita         8465 non-null   float64
 5   GDP growth (annual %)  14133 non-null  float64
dtypes: float64(3), int64(1), str(2)
memory usage: 787.2 KB


 ---
### 2.5 Separating datasets for countries and continents

This block separates the dataset `df_gdp_wb_merged` into:
- Countries only: `df_gdp_countries` 
- Other aggregates: `df_gdp_aggregates`.


In [20]:
unique_countries = df_gdp_wb_merged['Country Name'].unique()

for country in sorted(unique_countries):
    print(country)

Afghanistan
Africa Eastern and Southern
Africa Western and Central
Albania
Algeria
American Samoa
Andorra
Angola
Antigua and Barbuda
Arab World
Argentina
Armenia
Aruba
Australia
Austria
Azerbaijan
Bahamas, The
Bahrain
Bangladesh
Barbados
Belarus
Belgium
Belize
Benin
Bermuda
Bhutan
Bolivia
Bosnia and Herzegovina
Botswana
Brazil
Brunei Darussalam
Bulgaria
Burkina Faso
Burundi
Cabo Verde
Cambodia
Cameroon
Canada
Caribbean small states
Cayman Islands
Central African Republic
Central Europe and the Baltics
Chad
Channel Islands
Chile
China
Colombia
Comoros
Congo, Dem. Rep.
Congo, Rep.
Costa Rica
Cote d'Ivoire
Croatia
Cuba
Curacao
Cyprus
Czechia
Denmark
Djibouti
Dominica
Dominican Republic
Early-demographic dividend
East Asia & Pacific
East Asia & Pacific (IDA & IBRD countries)
East Asia & Pacific (excluding high income)
Ecuador
Egypt, Arab Rep.
El Salvador
Equatorial Guinea
Eritrea
Estonia
Eswatini
Ethiopia
Euro area
Europe & Central Asia
Europe & Central Asia (IDA & IBRD countries)
Europe &

Split the dataset into countries and non-countries

In [21]:
# list of non country aggregates to separate from the main dataset
aggregates_list = [ "Africa Eastern and Southern", "Africa Western and Central", "Arab World", "Caribbean small states", 
    "Central Europe and the Baltics", "Early-demographic dividend", "East Asia & Pacific", 
    "East Asia & Pacific (IDA & IBRD countries)", "East Asia & Pacific (excluding high income)", "Euro area", 
    "Europe & Central Asia", "Europe & Central Asia (IDA & IBRD countries)", "Europe & Central Asia (excluding high income)", 
    "European Union", "Fragile and conflict affected situations", "Heavily indebted poor countries (HIPC)", "High income", 
    "IBRD only", "IDA & IBRD total", "IDA blend", "IDA only", "IDA total", "Late-demographic dividend", "Latin America & Caribbean", 
    "Latin America & Caribbean (excluding high income)", "Latin America & the Caribbean (IDA & IBRD countries)", 
    "Least developed countries: UN classification", "Low & middle income", "Low income", "Lower middle income", "Middle East, "
    "North Africa, Afghanistan & Pakistan", "Middle East, North Africa, Afghanistan & Pakistan (IDA & IBRD)", 
    "Middle East, North Africa, Afghanistan & Pakistan (excluding high income)", "Middle income", "North America", 
    "Not classified", "OECD members", "Other small states", "Pacific island small states", "Post-demographic dividend", 
    "Pre-demographic dividend", "Small states", "South Asia", "South Asia (IDA & IBRD)", "Sub-Saharan Africa", 
    "Sub-Saharan Africa (IDA & IBRD countries)", "Sub-Saharan Africa (excluding high income)", "Upper middle income", "World"
]

# take everything that is NOT in the aggregates list as nations, and everything that IS in the aggregates list as aggregates
df_gdp_countries = df_gdp_wb_merged[~df_gdp_wb_merged['Country Name'].isin(aggregates_list)].copy()
df_gdp_aggregates = df_gdp_wb_merged[df_gdp_wb_merged['Country Name'].isin(aggregates_list)].copy()

print(f"Total rows original: {len(df_gdp_wb_merged)}")
print(f"Rows Nations: {len(df_gdp_countries)}")
print(f"Rows Aggregates: {len(df_gdp_aggregates)}")

# Quick check
print("\nExample Nations remaining:")
print(df_gdp_countries['Country Name'].unique()[:10])

Total rows original: 14394
Rows Nations: 11427
Rows Aggregates: 2967

Example Nations remaining:
<StringArray>
[        'Afghanistan',             'Albania',             'Algeria',
      'American Samoa',             'Andorra',              'Angola',
 'Antigua and Barbuda',           'Argentina',             'Armenia',
               'Aruba']
Length: 10, dtype: str


Dataset containing only geographical regions:

In [22]:
# regions = [
#     "Africa Eastern and Southern",
#     "Africa Western and Central",
#     "East Asia & Pacific",
#     "European Union",
#     "Latin America & Caribbean",
#     "Middle East, North Africa, Afghanistan & Pakistan",
#     "North America",
#     "Sub-Saharan Africa",
#     "South Asia"
# ]
# df_gdp_regions = df_gdp_aggregates[df_gdp_aggregates['Country Name'].isin(regions)]
# df_gdp_regions.head()

Dataset based on income:

In [23]:
# income = [
#     "High income",
#     "Low & middle income",
#     "Low income",
#     "Lower middle income",
#     "Middle income",
#     "Upper middle income"
# ]
# df_gdp_income = df_gdp_aggregates[df_gdp_aggregates['Country Name'].isin(income)]
# df_gdp_income.head()

 ---
## Step 3: Feature Understanding
Plotting feature distributions using geographical plots.
Below three plots are created from the `df_gdp_countries` DataFrame:
- Global GDP Over Time
- Global GDP Per Capita Over Time
- Annual GDP Growth (%) Over Time

In [24]:
df_gdp_countries.info()

<class 'pandas.DataFrame'>
Index: 11427 entries, 40 to 17289
Data columns (total 6 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Country Name           11427 non-null  str    
 1   Country Code           11427 non-null  str    
 2   Year                   11427 non-null  int64  
 3   GDP                    11339 non-null  float64
 4   GDP per capita         6785 non-null   float64
 5   GDP growth (annual %)  11214 non-null  float64
dtypes: float64(3), int64(1), str(2)
memory usage: 624.9 KB


### 3.1 Global GDP Per Capita Over Time

In [25]:
import plotly.express as px

fig = px.choropleth(
    df_gdp_countries.sort_values("Year"), 
    locations="Country Code",        # Use the 3-letter ISO code (e.g., ITA, USA)
    color="GDP",                     # The column to color
    hover_name="Country Name",       # What to show when hovering
    animation_frame="Year",          # Create slider
    color_continuous_scale=px.colors.sequential.Plasma, # scale colors
    projection="natural earth",      # Map projection
    title="Global GDP Over Time (Constant USD 2015)", 
    range_color=[0, df_gdp_countries['GDP'].max()] # Fix color scale to the max GDP for better comparison
)

fig.update_layout(margin={"r":0,"t":50,"l":0,"b":0})
fig.show()

### 3.2 Global GDP Per Capita Over Time

In [26]:
first_valid_year = df_gdp_countries.dropna(subset=['GDP per capita'])['Year'].min()

print(f"The chart will start from the year: {first_valid_year}")

df_plot = df_gdp_countries[df_gdp_countries['Year'] >= first_valid_year].sort_values("Year")

fig = px.choropleth(
    df_plot, 
    locations="Country Code",
    color="GDP per capita",
    hover_name="Country Name",
    animation_frame="Year",
    color_continuous_scale=px.colors.sequential.Plasma,
    projection="natural earth",
    title=f"Global GDP Per Capita Over Time (Starting {first_valid_year})",
    range_color=[0, df_plot['GDP per capita'].max()] 
)

fig.update_layout(margin={"r":0,"t":50,"l":0,"b":0})
fig.show()

The chart will start from the year: 1990


### 3.2 Global GDP growth (%) Over Time

In [27]:
fig = px.choropleth(
    df_gdp_countries.sort_values("Year"), 
    locations="Country Code",        # Use the 3-letter ISO code (e.g., ITA, USA)
    color="GDP growth (annual %)",          # The column to color
    hover_name="Country Name",       # What to show when hovering
    animation_frame="Year",          # Create slider
    color_continuous_scale=px.colors.sequential.Plasma, # scale colors
    projection="natural earth",      # Map projection
    title="Global GDP growth (annual %) Over Time", 
    range_color=[0, df_gdp_countries['GDP growth (annual %)'].max()] # Fix color scale to the max GDP for better comparison
)

fig.update_layout(margin={"r":0,"t":50,"l":0,"b":0})
fig.show()

 ---
## 4. Data Saving
Saving the DataFrames into `..\data\processed\`.
- `df_gdp_countries` --> `..\data\processed\gdp_countries.csv`

In [28]:
df_gdp_countries.to_csv('../data/processed/gdp_countries.csv', index=False)