## Reading in Dataset

Firstly, we import os and pandas so that we can read the CSV file and create a dataframe.

In [1]:
import os
import pandas as pd

# Read data from CSV into a pandas dataframe
path = "../data/"
filename_read = os.path.join(path, "emissions.csv")
df = pd.read_csv(filename_read, na_values=['NA', '?'])

# Output first 5 rows of the dataframe
print(df[:5])

                  Country Name        1960        1961        1962  \
0                        Aruba  204.631696  208.837879  226.081890   
1  Africa Eastern and Southern    0.906060    0.922474    0.930816   
2                  Afghanistan    0.046057    0.053589    0.073721   
3   Africa Western and Central    0.090880    0.095283    0.096612   
4                       Angola    0.100835    0.082204    0.210533   

         1963        1964        1965        1966        1967        1968  \
0  214.785217  207.626699  185.213644  172.158729  210.819017  194.917536   
1    0.940570    0.996033    1.047280    1.033908    1.052204    1.079727   
2    0.074161    0.086174    0.101285    0.107399    0.123409    0.115142   
3    0.112376    0.133258    0.184803    0.193676    0.189305    0.143989   
4    0.202739    0.213562    0.205891    0.268937    0.172096    0.289702   

   ...      2009      2010      2011      2012      2013      2014      2015  \
0  ...       NaN       NaN       NaN

## Cleaning the data
The next step is to clean the data, we can see that after reading in the CSV, there are a lot of missing data. We have chosen to simply drop any rows that have any NA values. This has left us with data for 191 countries to work with. We can also see towards the end that the max value from the mean values of each country lies within reason and that there aren't any outliers in our data.

In [2]:
# Drop rows in the dataframe that contain null values
df = df.dropna()
print(df)

                    Country Name      1960      1961      1962      1963  \
1    Africa Eastern and Southern  0.906060  0.922474  0.930816  0.940570   
2                    Afghanistan  0.046057  0.053589  0.073721  0.074161   
3     Africa Western and Central  0.090880  0.095283  0.096612  0.112376   
4                         Angola  0.100835  0.082204  0.210533  0.202739   
5                        Albania  1.258195  1.374186  1.439956  1.181681   
..                           ...       ...       ...       ...       ...   
257                      Vietnam  0.181947  0.183099  0.217694  0.196997   
259                        World  3.121158  3.068090  3.114839  3.221195   
260                        Samoa  0.135031  0.163542  0.158377  0.184037   
262                  Yemen, Rep.  0.011038  0.013599  0.012729  0.014518   
263                 South Africa  5.727223  5.832621  5.887168  5.961337   

         1964      1965      1966      1967      1968  ...      2009  \
1    0.996033  

In [3]:
print(f"Countries with complete data throughout the dataset: {len(df.axes[0])}")

Countries with complete data throughout the dataset: 191


In [4]:
# Transpose table (swap rows with columns)
df = df.set_index('Country Name').T
df.head()

Country Name,Africa Eastern and Southern,Afghanistan,Africa Western and Central,Angola,Albania,Arab World,United Arab Emirates,Argentina,Antigua and Barbuda,Australia,...,Upper middle income,Uruguay,United States,St. Vincent and the Grenadines,"Venezuela, RB",Vietnam,World,Samoa,"Yemen, Rep.",South Africa
1960,0.90606,0.046057,0.09088,0.100835,1.258195,0.609268,0.119037,2.383343,0.677418,8.582937,...,2.573291,1.701585,15.999779,0.135865,7.009414,0.181947,3.121158,0.135031,0.011038,5.727223
1961,0.922474,0.053589,0.095283,0.082204,1.374186,0.662618,0.109136,2.458551,0.866667,8.641569,...,2.408432,1.602728,15.681256,0.133884,6.153191,0.183099,3.06809,0.163542,0.013599,5.832621
1962,0.930816,0.073721,0.096612,0.210533,1.439956,0.727117,0.163542,2.538447,1.838457,8.835688,...,2.370116,1.54066,16.013937,0.132162,6.188716,0.217694,3.114839,0.158377,0.012729,5.887168
1963,0.94057,0.074161,0.112376,0.202739,1.181681,0.853116,0.175833,2.330685,1.487469,9.22644,...,2.435563,1.639287,16.482762,0.174204,6.208593,0.196997,3.221195,0.184037,0.014518,5.961337
1964,0.996033,0.086174,0.133258,0.213562,1.111742,0.972381,0.132815,2.553442,1.590448,9.759073,...,2.523331,1.710104,16.968119,0.215409,6.041541,0.20987,3.324046,0.208106,0.01755,6.332343


In [5]:
mean_vals = df.describe().loc['mean'].values
print(f"{mean_vals}\n\nMax value from mean values: {mean_vals.max()}")

[ 1.08940549  0.14938204  0.44069151  0.68754025  1.66043288  2.92316364
 30.41657644  3.54462142  5.38209619 14.74163966  7.30147811 11.06564421
  0.22223818  0.07693901  0.22775958  7.01099706 19.69587727 13.01814302
  1.53015729  0.95318804  1.48616998  3.65982706 19.65388816  0.06512205
 15.66106749  8.0915212   5.85743582  2.82756422  2.81821594  0.41111735
  0.31221343  0.08726443  0.72282786  1.48099365  0.15622533  0.4699592
  1.1064521   4.95424628  2.49978089  5.16749411  0.68102688  1.26622291
 10.04102072  1.42046764  2.38212135  2.30775894  1.38993548  3.02796468
 11.00220976  9.41038116  1.55714578  1.45443895  5.2060304   0.05915819
  7.84481343  0.89520284  9.61967992  1.11966072  6.72827315  4.36506694
  9.53232306  0.30104306  0.19024711  0.18733115  0.14602582  2.96366733
  5.90056986  1.46541238  0.62322171  2.04485343 11.03999274  0.65977374
  0.19368719  0.15145459  6.11987044  2.95400638  2.44904198  0.46815651
  0.84609324  0.97370212  0.27221359  0.75507461  8.