# UN Data Exploration

## 3. Import Required Packages

In [3]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## 4. Read in GDP Data Set
I'm examining the top 10 and bottom 10 rows of the dataframe to see the variables and indexes.

In [5]:
gdp_df = pd.read_csv('../data/gdp_per_capita.csv/UNdata_Export_20241004_025527195.csv') # read in data file
gdp_df.head(10)

Unnamed: 0,Country or Area,Year,Value,Value Footnotes
0,Afghanistan,2021,1673.964059,
1,Afghanistan,2020,2078.595086,
2,Afghanistan,2019,2168.133765,
3,Afghanistan,2018,2110.239384,
4,Afghanistan,2017,2096.093111,
5,Afghanistan,2016,2023.834656,
6,Afghanistan,2015,2128.125938,
7,Afghanistan,2014,2110.829568,
8,Afghanistan,2013,2062.059176,
9,Afghanistan,2012,1958.447627,


In [6]:
gdp_df.tail(10)

Unnamed: 0,Country or Area,Year,Value,Value Footnotes
7718,Zimbabwe,1999,2279.549784,
7719,Zimbabwe,1998,2299.395445,
7720,Zimbabwe,1997,2246.209391,
7721,Zimbabwe,1996,2185.928529,
7722,Zimbabwe,1995,1977.675574,
7723,Zimbabwe,1994,1958.125362,
7724,Zimbabwe,1993,1765.451299,
7725,Zimbabwe,1992,1731.232787,
7726,Zimbabwe,1991,1907.652489,
7727,Zimbabwe,1990,1794.153646,


## 5. Drop Value Footnotes Column + Rename Columns

In [8]:
gdp_df = (
    gdp_df
    .drop('Value Footnotes', axis=1) #drop the last column to delete NaN values.
    .rename(columns={'Country or Area': 'Country', 'Year': 'Year', 'Value': 'GDP_per_Capita'}) #rename columns for accuracy
)

In [9]:
gdp_df.head()

Unnamed: 0,Country,Year,GDP_per_Capita
0,Afghanistan,2021,1673.964059
1,Afghanistan,2020,2078.595086
2,Afghanistan,2019,2168.133765
3,Afghanistan,2018,2110.239384
4,Afghanistan,2017,2096.093111


## 6. Compute amount of Rows and Columns
We have 7728 rows and 3 columns. 
- Country Column = object,
- Year Column = int
- GDP_per_capita Column = float

In [11]:
gdp_df.info() # use the info because it produces both the amout of rows and columns and their data types.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7728 entries, 0 to 7727
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country         7728 non-null   object 
 1   Year            7728 non-null   int64  
 2   GDP_per_Capita  7728 non-null   float64
dtypes: float64(1), int64(1), object(1)
memory usage: 181.3+ KB


## 7. Display Years and Observations per year
The observations occur between the years 1990 and 2013. The ranges from 208 to 244 and is increasing over time and varies.

In [14]:
gdp_df['Year'].value_counts().sort_values(ascending=False) # the range of years and observations per year sorted from largest to smallest.

Year
2013    244
2011    244
2009    243
2015    243
2014    243
2010    243
2019    242
2018    242
2016    242
2008    242
2012    242
2017    242
2021    241
2020    241
2007    240
2006    240
2005    239
2004    239
2003    238
2002    238
2001    237
2000    236
2022    232
1999    228
1998    227
1997    227
1996    226
1995    226
1994    216
1993    214
1992    213
1991    210
1990    208
Name: count, dtype: int64

In [15]:
gdp_df['Year'].value_counts().max() # I want to know the max number of observations

244

In [16]:
gdp_df['Year'].value_counts().min() # I want to know the min number of observations

208

## 8. Number of Countries in Dataset - Least represented Countries 
There are 246 countries in the dataset. South Sudan, Somalia, Dijibouti, Turks and Caicos Islands, and Sint Maarten (Dutch part) have the lowest number of observations. The countries are underdeveloped, war-torn (some), and struggle to collect and produce data. It is also possible they started collecting data at a later date. 

In [72]:
gdp_df['Country'].nunique() # I want to know how many countries are in the dataset. 

246

In [75]:
gdp_df['Country'].value_counts() # I want to know what countries have the lowest number of observations.

Country
Zimbabwe                     33
Marshall Islands             33
Low income                   33
Lower middle income          33
Luxembourg                   33
                             ..
Sint Maarten (Dutch part)    14
Turks and Caicos Islands     12
Djibouti                     10
Somalia                      10
South Sudan                   8
Name: count, Length: 246, dtype: int64