# Country, Regional and World GDP Dataset

**Notebook introduction**

In this notebook we are going to analise the GDP (Gross Domestic Product) of all the country in the world making differents plots to illustrate that. The data is sourced from the World Bank, that is an international organization that offers developmental assistance to middle-income and low-income countries.

![](https://s3.envato.com/files/160905269/Money%20World.jpg)

**Import libraries**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import folium

**Read csv data**

In [None]:
filepath = '../input/country-regional-and-world-gdp/gdp_csv.csv'
data = pd.read_csv(filepath)

**Make a copy of the data**

In [None]:
df = data.copy()

# **Easy Data Exploration**

**First of al we see the shapes of our notebook**

In [None]:
df.shape

**Secondly we want to know what type of data we are going to use**

In [None]:
df.info()

**Then we are going to understand if there are some missing values**

In [None]:
df.isnull().sum()

Fortunately, there is no lack of value. What a great find!

**Then let's take a look at the data by observing the first lines of our dataframe**

In [None]:
df.head()

**As you see in the 'Country Name' there are the name of a region instead of the name of a country. So now I want to know what are all the unique values of this column in order to split the dataframe in two parts: one for tha regions and one for the countries**

In [None]:
df['Country Name'].unique()

**Now that we know which are the values i'm going to split the dataframe in two parts like we said before**

In [None]:
region =['Arab World', 'Caribbean small states',
       'Central Europe and the Baltics', 'Early-demographic dividend',
       'East Asia & Pacific',
       'East Asia & Pacific (excluding high income)',
       'East Asia & Pacific (IDA & IBRD countries)', 'Euro area',
       'Europe & Central Asia',
       'Europe & Central Asia (excluding high income)',
       'Europe & Central Asia (IDA & IBRD countries)', 'European Union',
       'Fragile and conflict affected situations',
       'Heavily indebted poor countries (HIPC)', 'High income',
       'IBRD only', 'IDA & IBRD total', 'IDA blend', 'IDA only',
       'IDA total', 'Late-demographic dividend',
       'Latin America & Caribbean',
       'Latin America & Caribbean (excluding high income)',
       'Latin America & the Caribbean (IDA & IBRD countries)',
       'Least developed countries: UN classification',
       'Low & middle income', 'Low income', 'Lower middle income',
       'Middle East & North Africa',
       'Middle East & North Africa (excluding high income)',
       'Middle East & North Africa (IDA & IBRD countries)',
       'Middle income', 'North America', 'OECD members',
       'Other small states', 'Pacific island small states',
       'Post-demographic dividend', 'Pre-demographic dividend',
       'Small states', 'South Asia', 'South Asia (IDA & IBRD)',
       'Sub-Saharan Africa', 'Sub-Saharan Africa (excluding high income)',
       'Sub-Saharan Africa (IDA & IBRD countries)', 'Upper middle income',
       'World']

regions = df[df['Country Name'].isin(region)]
regions.reset_index(inplace=True, drop=True)

countries = df[~df['Country Name'].isin(region)]
countries.reset_index(inplace=True, drop=True)

# **Regions and Countries dataframes**

**First of all, let's check both dataframes by observing the first 5 lines of both and observing their shapes to understand how they have been divided**

In [None]:
regions.head()

In [None]:
regions.shape

In [None]:
countries.head()

In [None]:
countries.shape

# **Countries dataframe analysis**

**To start the analysis in the right way I think the best thing to do is have a look if all the countries have the same amount of data in the dataframe**

In [None]:
countries.groupby("Country Name")["Year"].count()

**Unfortunately we can see that each country has a different amount of data in the dataframe as we can understand by seeing the numbers in the right column. So I think we should do something to have a data frame with the same amount of data for each country**

1. *As the first thing we can group the dataframe by the column 'Country Name' and have a look of what are the latest year of data we have for each country*

In [None]:
countries1 = countries.groupby("Country Name", as_index = False)["Year"].max()
countries1

2. *Secondly we are going to keep only the countries that have the data of the latest year of this dataframe, the 2016*

In [None]:
countries2 = countries1[countries1["Year"] == 2016]
countries2

3. *Thirdly, we will store in the variable countries3 the list of names for the countries that have 2016 data and then create another dataframe called 'countries4' in which we will put all the data concerning the countries contained in the list just created and fetch them from the first countries dataframe*

In [None]:
countries3 = countries2['Country Name']

countries4 = countries[countries['Country Name'].isin(countries3)]

countries4

4. *Fourthly, we will find out which years are the oldest for all these countries, then we will see which state has the oldest data year and exclude them. I did this procedure until I found 2001 which I think is a good start and allows us to work with a good period of time, from 2001 to 2016*

In [None]:
min_year = countries4.groupby("Country Name", as_index = False)["Year"].min()
min_year['Year'].max()

min_year[min_year['Year'] == 2013]

new_countries = countries4[countries4["Country Name"]!="Somalia"]
min_year = new_countries.groupby("Country Name", as_index = False)["Year"].min()
min_year['Year'].max()

countries5 = countries4[countries4["Country Name"]!="Somalia"]

min_year = countries5.groupby("Country Name", as_index=False)["Year"].min()

min_year["Year"].max()

min_year[min_year['Year'] == 2007]

countries6 = countries5[countries5["Country Name"]!="Nauru"]

min_year = countries6.groupby("Country Name", as_index=False)["Year"].min()

min_year["Year"].max()

min_year[min_year['Year'] == 2002]

countries7 = countries6[~countries6["Country Name"].isin(['American Samoa','Guam','Northern Mariana Islands'])]

min_year = countries7.groupby("Country Name", as_index=False)["Year"].min()

min_year["Year"].max()

min_year[min_year['Year'] == 2001]

5. *Now, after this long process, we can create a final dataframe using only the data from 2001 to 2016 from our country list*

In [None]:
country = countries7[countries7["Year"]>=2001]

country

6. *As a sixth we have to see if there are missing values or if all countries have data for all 16 years*

In [None]:
missing = country.groupby("Country Name", as_index=False)["Year"].count()

missing[missing["Year"]!=16]

**As we can see there are 2 countries with missing years values so we are going to extromit them**

In [None]:
country_new = pd.DataFrame(country[~country["Country Name"].isin(["Congo, Dem. Rep.","Iraq"])])

country_new.reset_index(drop=True, inplace=True)

country_new.sort_values(by='Value',ascending=False,inplace=True)

country_new["Country Name"].nunique()

**Finally we can have a look of our new dataframe**

In [None]:
country_new.head()

# **Making Plots to illustrate the situation**

**First of all we are going to create the variables for the various plots**

In [None]:
value_country_2001 = country_new[country_new["Year"] == 2001]
value_country_2001.reset_index(drop=True, inplace=True)

value_country_2007 = country_new[country_new["Year"] == 2007]
value_country_2007.reset_index(drop=True, inplace=True)

value_country_2016 = country_new[country_new["Year"] == 2016]
value_country_2016.reset_index(drop=True, inplace=True)

countries_2001_value = pd.DataFrame(value_country_2001["Value"])
countries_2001_value.reset_index(drop=True, inplace=True)

header=["Value 2000"]
countries_2001_value.columns = header

countries_2001_2016 = pd.concat([value_country_2016, countries_2001_value], axis=1, ignore_index=False)

countries_2001_2016["GDP Growth Rate(%)"] = ((countries_2001_2016["Value"]-countries_2001_2016["Value 2000"])/countries_2001_2016["Value 2000"])

countries_2001_2016.drop(labels=["Year"], axis=1, inplace=True)

countries_2001_2016.sort_values("GDP Growth Rate(%)", axis=0, ascending=False, kind='quicksort', inplace=True)

top_GDP = countries_2001_2016.head(15)
top_GDP.reset_index(drop=True, inplace=True)

rounded = np.round(top_GDP["GDP Growth Rate(%)"], decimals=2)
rounded_GDP_growth = pd.DataFrame(rounded)
top_GDP["GDP Growth Rate(%)"] = rounded_GDP_growth

# 2001 situation: Which are the richest and poorest countries?

In [None]:
colors = ['lightslategray',] * 5
colors[0] = 'crimson'
top_valuables_countries_01 = value_country_2001[:10]
fig = go.Figure(data=[go.Bar(
    x= top_valuables_countries_01['Country Name'],
    y= top_valuables_countries_01['Value'],
    marker_color=colors # marker color can be a single color value or an iterable
)])
fig.update_layout(title_text='Top 10 most valuable countries in 2001')

In [None]:
least_valuables_countries_01 = value_country_2001[-10:]
colors = ['lightslategray',] * 10
colors[9] = 'crimson'
fig = go.Figure(data=[go.Bar(
    x= least_valuables_countries_01['Country Name'],
    y= least_valuables_countries_01['Value'],
    marker_color=colors # marker color can be a single color value or an iterable
)])
fig.update_layout(title_text='Top 10 least valuable countries in 2001')

In [None]:
fig1 = px.choropleth(value_country_2001, locations="Country Name", locationmode='country names', color="Value",
                    color_continuous_scale=px.colors.sequential.Darkmint)

fig1.update_layout(title_text="Country values in 2001", title_font_size=24,
                  height=800, width=1000, yaxis_title="GDP($)", xaxis_title="Country",
                  title_y=0.85, title_x=0.45)

fig1.show()

# **2007 situation:  There are the changes among the richest and poorest countries?**

In [None]:
colors = ['lightslategray',] * 5
colors[0] = 'crimson'
top_valuables_countries_07 = value_country_2007[:10]
fig = go.Figure(data=[go.Bar(
    x= top_valuables_countries_07['Country Name'],
    y= top_valuables_countries_07['Value'],
    marker_color=colors # marker color can be a single color value or an iterable
)])
fig.update_layout(title_text='Top 10 most valuable countries in 2007')

**Observations: as we can see there have been some changes from 2001 to 2007. We can see that China has had a big increase and Brazil is a new entry in the top 10 by 'stealing' Mexico's place.**

In [None]:
least_valuables_countries_07 = value_country_2007[-10:]
colors = ['lightslategray',] * 10
colors[9] = 'crimson'
fig = go.Figure(data=[go.Bar(
    x= least_valuables_countries_07['Country Name'],
    y= least_valuables_countries_07['Value'],
    marker_color=colors # marker color can be a single color value or an iterable
)])
fig.update_layout(title_text='Top 10 least valuable countries in 2007')

**Observations: unfortunately there are two new entries which are Solomon Islans and Dominica who take the place of Samoa and Vanuatu. We can also see that Comoros and Tonga have had a small increase in these 6 years, good for them**

In [None]:
fig2 = px.choropleth(value_country_2007, locations="Country Name", locationmode='country names', color="Value",
                    color_continuous_scale=px.colors.sequential.Darkmint)

fig2.update_layout(title_text="Country values in 2007", title_font_size=24,
                  height=800, width=1000, yaxis_title="GDP($)", xaxis_title="Country",
                  title_y=0.85, title_x=0.45)

fig2.show()

# **2016 situation: Which are the changes among the richest and poorest countries?**

In [None]:
colors = ['lightslategray',] * 5
colors[0] = 'crimson'
top_valuables_countries = value_country_2016[:10]
fig = go.Figure(data=[go.Bar(
    x= top_valuables_countries['Country Name'],
    y= top_valuables_countries['Value'],
    marker_color=colors # marker color can be a single color value or an iterable
)])
fig.update_layout(title_text='Top 10 most valuable countries in 2016')

**Observations: We can see that China has continued its growth becoming the second most valuable country in the world while India is a new entry in this top 10 surpassing large countries such as Italy, Brazil and Canada. What a news!**

In [None]:
least_valuables_countries = value_country_2016[-10:]
colors = ['lightslategray',] * 10
colors[9] = 'crimson'
fig = go.Figure(data=[go.Bar(
    x= least_valuables_countries['Country Name'],
    y= least_valuables_countries['Value'],
    marker_color=colors # marker color can be a single color value or an iterable
)])
fig.update_layout(title_text='Top 10 least valuable countries in 2016')

Observations: unfortunately there is one new entry that is 'St. Vincent and the Grenadines' which take the place of the Solomon Islands. We can also see that Sao Tome and Principe have had a small increase in these 9 years, good for them

In [None]:
fig3 = px.choropleth(value_country_2016, locations="Country Name", locationmode='country names', color="Value",
                    color_continuous_scale=px.colors.sequential.Darkmint)
fig3.update_layout(title_text="Country values in 2016", title_font_size=24,
                  height=800, width=1000, yaxis_title="GDP($)", xaxis_title="Country",
                  title_y=0.85, title_x=0.45)
fig3.show()

# **Countries with best and worst growth rates**

In [None]:
most_increased_countries = countries_2001_2016[:10]
colors = ['lightslategray',] * 5
colors[0] = 'crimson'
fig = go.Figure(data=[go.Bar(
    x= most_increased_countries['Country Name'],
    y= most_increased_countries['GDP Growth Rate(%)'],
    marker_color=colors # marker color can be a single color value or an iterable
)])
fig.update_layout(title_text='Countries with the greatest growth in value form 2001 to 2016 in %')

In [None]:
least_increased_countries = countries_2001_2016[-10:]
colors = ['lightslategray',] * 10
colors[9] = 'crimson'
fig = go.Figure(data=[go.Bar(
    x= least_increased_countries['Country Name'],
    y= least_increased_countries['GDP Growth Rate(%)'],
    marker_color=colors # marker color can be a single color value or an iterable
)])
fig.update_layout(title_text='Countries with the least growth in value form 2001 to 2016 in %')

In [None]:
fig4 = px.choropleth(countries_2001_2016, locations="Country Name", locationmode='country names', color="GDP Growth Rate(%)",
                    color_continuous_scale=px.colors.sequential.Darkmint)

fig4.update_layout(title_text="Heatmap of GDP Growth Rate % (2000-2016)", title_font_size=24,
                  height=800, width=1000, yaxis_title="GDP($)", xaxis_title="Country",
                  title_y=0.85, title_x=0.45)

fig4.show()

# **Thank you so much for looking at this notebook, I hope you enjoyed it and if so I would invite you to put an upvote. If you have found any errors, please write them to me in the comments or even if you have any suggestions for improving the notebook. thank you very much again and good Kaggling!**