In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import plotly.express as px

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## Supress scientific notation

In [None]:
pd.set_option('display.float_format', lambda x: '%.2f' % x)

## Load dataset

In [None]:
df = pd.read_csv('../input/country-regional-and-world-gdp/gdp_csv.csv')

In [None]:
print(df.shape)
df.head()

In [None]:
df.describe()

* We have the GDP data from 1960 to 2016.
* The lowest GDP registered was 8,824,447.74 (~ 8 million dollars) and the highest was 79,049,230,590,610.91 (~ 79 trillion dollars).

## Categorical variable **Country Name**

In [None]:
print(df['Country Name'].nunique())
df['Country Name'].unique()

## Categorical variable **Country Code**

In [None]:
df['Country Code'].unique()

Not all **Country Name** is actually a country. Let's filter the following regions e/or categories:

In [None]:
regions = ['Arab World', 'Caribbean small states',
           'Central Europe and the Baltics', 'Early-demographic dividend',
           'East Asia & Pacific',
           'East Asia & Pacific (excluding high income)',
           'East Asia & Pacific (IDA & IBRD countries)', 'Euro area',
           'Europe & Central Asia',
           'Europe & Central Asia (excluding high income)',
           'Europe & Central Asia (IDA & IBRD countries)', 'European Union',
           'Fragile and conflict affected situations',
           'Heavily indebted poor countries (HIPC)', 'High income',
           'IBRD only', 'IDA & IBRD total', 'IDA blend', 'IDA only',
           'IDA total', 'Late-demographic dividend',
           'Latin America & Caribbean',
           'Latin America & Caribbean (excluding high income)',
           'Latin America & the Caribbean (IDA & IBRD countries)',
           'Least developed countries: UN classification',
           'Low & middle income', 'Low income', 'Lower middle income',
           'Middle East & North Africa',
           'Middle East & North Africa (excluding high income)',
           'Middle East & North Africa (IDA & IBRD countries)',
           'Middle income', 'North America', 'OECD members',
           'Other small states', 'Pacific island small states',
           'Post-demographic dividend', 'Pre-demographic dividend',
           'Small states', 'South Asia', 'South Asia (IDA & IBRD)',
           'Sub-Saharan Africa', 'Sub-Saharan Africa (excluding high income)',
           'Sub-Saharan Africa (IDA & IBRD countries)', 'Upper middle income',
           'World']
print(len(regions))

In this analysis, we'll choose to use only countries. You can use this categories for future analysis if you want.

In [None]:
df_country = df.loc[~df['Country Name'].isin(regions)]

In [None]:
df_country['Country Name'].unique()

In [None]:
print(df_country.shape)
df_country.describe()

## GDP over the years in **Plotly**

[Plotly](https://plotly.com/) have been chosen to visualize this data because of its interactiveness. Besides we can select or not a country in order to visualize the GDP in the plot. 

In [None]:
annotations = []
fig = px.line(df_country, x="Year", y="Value", color="Country Name",
              line_group="Country Name", hover_name="Country Name")
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.05,
                              xanchor='left', yanchor='bottom',
                              text='GDP over the years (1960 - 2016)',
                              font=dict(family='Arial',
                                        size=30,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig.update_layout(annotations=annotations)
fig.show()

Feel free to select the combination of countries you want on the right panel.

If we consider the last years, we can highlight EUA, China, Japan, Germany and United Kingdom as we can see on the plot.

It is also important to note that a better analysis would include GDP per capita in order to mitigate the size of the country's population.

It would be interesting to have more recent data (2017 or higher). If you have it, share with us so we can analyse them.

We could also make a regression analysis here. If you want, be free to fork this notebook in order to play with this.

Thanks for reading.