<h3> 2016 Ecological Footprint Exploratory Data Analysis

<h5>This EDA is expected to get insights from the following queries:</h5>

* Ecological Deficit / Reserve of South East Asian countries.
* Countries with highest and lowest number of population in the World / South East Asia.
* Countries with highest and lowest ecological footprint / biocapacity.
* Comparison between forest footprint and forest land.
* Comparison between cropland footprint and cropland area.
* Comparison between grazing footprint and grazing land.
* Comparison between fish footprint and fishing water.
* Comparison between carbon footprint and urban land.
* Relationship between GDP per capita and Ecological Footprint.
* Relationship between Population and Ecological Footprint.
* Countries with highest and lowest carbon emissions and its relationship to its population.


In [4]:
#Importing the necessary libraries
import pandas as pd

In [6]:
#Loading the dataset

EF_df = pd.read_csv('EcologicalFootprint.csv')

#Looking for the first five rows
EF_df.head()

Unnamed: 0,Country,Region,Population (millions),HDI,GDP per Capita,Cropland Footprint,Grazing Footprint,Forest Footprint,Carbon Footprint,Fish Footprint,...,Cropland,Grazing Land,Forest Land,Fishing Water,Urban Land,Total Biocapacity,Biocapacity Deficit or Reserve,Earths Required,Countries Required,Data Quality
0,Afghanistan,Middle East/Central Asia,29.82,0.46,$614.66,0.3,0.2,0.08,0.18,0.0,...,0.24,0.2,0.02,0.0,0.04,0.5,-0.3,0.46,1.6,6
1,Albania,Northern/Eastern Europe,3.16,0.73,"$4,534.37",0.78,0.22,0.25,0.87,0.02,...,0.55,0.21,0.29,0.07,0.06,1.18,-1.03,1.27,1.87,6
2,Algeria,Africa,38.48,0.73,"$5,430.57",0.6,0.16,0.17,1.14,0.01,...,0.24,0.27,0.03,0.01,0.03,0.59,-1.53,1.22,3.61,5
3,Angola,Africa,20.82,0.52,"$4,665.91",0.33,0.15,0.12,0.2,0.09,...,0.2,1.42,0.64,0.26,0.04,2.55,1.61,0.54,0.37,6
4,Antigua and Barbuda,Latin America,0.09,0.78,"$13,205.10",,,,,,...,,,,,,0.94,-4.44,3.11,5.7,2


In [11]:
#Some information about the dataset
print('Shape: ', EF_df.shape)
print('Size: ', EF_df.size)

Shape:  (188, 21)
Size:  3948


There are a total of 3948 records in the dataset with 188 rows and 21 columns.

In [9]:
#Analyzing the columns
EF_df.columns

Index(['Country', 'Region', 'Population (millions)', 'HDI', 'GDP per Capita',
       'Cropland Footprint', 'Grazing Footprint', 'Forest Footprint',
       'Carbon Footprint', 'Fish Footprint', 'Total Ecological Footprint',
       'Cropland', 'Grazing Land', 'Forest Land', 'Fishing Water',
       'Urban Land', 'Total Biocapacity', 'Biocapacity Deficit or Reserve',
       'Earths Required', 'Countries Required', 'Data Quality'],
      dtype='object')

Some columns are not necessary in this analysis so we will drop them later.

In [15]:
#Check for descriptive data
EF_df.describe(include='all')

Unnamed: 0,Country,Region,Population (millions),HDI,GDP per Capita,Cropland Footprint,Grazing Footprint,Forest Footprint,Carbon Footprint,Fish Footprint,...,Cropland,Grazing Land,Forest Land,Fishing Water,Urban Land,Total Biocapacity,Biocapacity Deficit or Reserve,Earths Required,Countries Required,Data Quality
count,188,188,188.0,172.0,173,173.0,173.0,173.0,173.0,173.0,...,173.0,173.0,173.0,173.0,173.0,188.0,188.0,188.0,188.0,188.0
unique,188,7,,,173,,,,,,...,,,,,,,,,,7.0
top,Afghanistan,Africa,,,$614.66,,,,,,...,,,,,,,,,,5.0
freq,1,52,,,1,,,,,,...,,,,,,,,,,66.0
mean,,,37.342372,0.68636,,0.578208,0.263179,0.373815,1.804913,0.122486,...,0.53185,0.45659,2.459191,0.595145,0.06711,4.019681,0.702074,1.915745,4.037397,
std,,,140.756836,0.15604,,0.355691,0.352067,0.359349,1.898283,0.158427,...,0.672567,1.014738,10.593956,1.661872,0.054844,11.689075,11.771339,1.369624,12.444616,
min,,,0.0,0.34,,0.07,0.0,0.01,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.05,-14.14,0.24,0.02,
25%,,,2.0375,0.5575,,0.35,0.08,0.17,0.42,0.02,...,0.18,0.03,0.06,0.03,0.03,0.675,-1.935,0.855,0.9425,
50%,,,7.97,0.72,,0.52,0.18,0.26,1.14,0.07,...,0.35,0.12,0.34,0.11,0.05,1.31,-0.73,1.58,1.705,
75%,,,24.87,0.8025,,0.7,0.32,0.46,2.6,0.15,...,0.59,0.34,1.17,0.37,0.09,2.815,0.2125,2.6775,2.8475,


The World has 195 total countries but only 188 countries have records in this particular dataset so there are 7 missing country.

In [16]:
#Display information about the columns
EF_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 188 entries, 0 to 187
Data columns (total 21 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Country                         188 non-null    object 
 1   Region                          188 non-null    object 
 2   Population (millions)           188 non-null    float64
 3   HDI                             172 non-null    float64
 4   GDP per Capita                  173 non-null    object 
 5   Cropland Footprint              173 non-null    float64
 6   Grazing Footprint               173 non-null    float64
 7   Forest Footprint                173 non-null    float64
 8   Carbon Footprint                173 non-null    float64
 9   Fish Footprint                  173 non-null    float64
 10  Total Ecological Footprint      188 non-null    float64
 11  Cropland                        173 non-null    float64
 12  Grazing Land                    173 

GDP per Capita's data type is an object. For us to be able to sort its values, we will change its data type to a float.

In [21]:

#Check for the number of missing values
EF_df.isna().sum()

Country                            0
Region                             0
Population (millions)              0
HDI                               16
GDP per Capita                    15
Cropland Footprint                15
Grazing Footprint                 15
Forest Footprint                  15
Carbon Footprint                  15
Fish Footprint                    15
Total Ecological Footprint         0
Cropland                          15
Grazing Land                      15
Forest Land                       15
Fishing Water                     15
Urban Land                        15
Total Biocapacity                  0
Biocapacity Deficit or Reserve     0
Earths Required                    0
Countries Required                 0
Data Quality                       0
dtype: int64

After looking at the relevant information on our datasets. We will perform some basic transformation.

We will fill all the missing values with zero.

In [22]:
#Filling the missing values with zero. 
EF_df.fillna(0, inplace=True)


In [30]:
#Check again if the results are different.
EF_df.isna().sum()

Country                           0
Region                            0
Population (millions)             0
HDI                               0
GDP per Capita                    0
Cropland Footprint                0
Grazing Footprint                 0
Forest Footprint                  0
Carbon Footprint                  0
Fish Footprint                    0
Total Ecological Footprint        0
Cropland                          0
Grazing Land                      0
Forest Land                       0
Fishing Water                     0
Urban Land                        0
Total Biocapacity                 0
Biocapacity Deficit or Reserve    0
Earths Required                   0
Countries Required                0
Data Quality                      0
dtype: int64

'Data Quality' column is not necessary in this analysis so we will drop it.


In [None]:
#Removing the 'Data Quality' column
EF_df.drop('Data Quality', axis=1, inplace=True)


In [38]:
#Check if the 'Data Quality' column is remove.
EF_df.columns

Index(['Country', 'Region', 'Population (millions)', 'HDI', 'GDP per Capita',
       'Cropland Footprint', 'Grazing Footprint', 'Forest Footprint',
       'Carbon Footprint', 'Fish Footprint', 'Total Ecological Footprint',
       'Cropland', 'Grazing Land', 'Forest Land', 'Fishing Water',
       'Urban Land', 'Total Biocapacity', 'Biocapacity Deficit or Reserve',
       'Earths Required', 'Countries Required'],
      dtype='object')

<h4>Now we can proceed in analysing the data and answering our queries.

We will create a subset dataframe for South East Asian countries.

In [44]:
#Creating the dataframe

SEA_countries = ['Brunei', 'Burma', 'Cambodia', 'Timor-Leste', 'Indonesia', 'Laos', 'Malaysia', 'Philippines', 'Singapore', 'Thailand', 'Vietnam']

SEA_EF_df = EF_df.loc[EF_df['Country'].isin(SEA_countries)]

SEA_EF_df

Unnamed: 0,Country,Region,Population (millions),HDI,GDP per Capita,Cropland Footprint,Grazing Footprint,Forest Footprint,Carbon Footprint,Fish Footprint,Total Ecological Footprint,Cropland,Grazing Land,Forest Land,Fishing Water,Urban Land,Total Biocapacity,Biocapacity Deficit or Reserve,Earths Required,Countries Required
30,Cambodia,Asia-Pacific,14.86,0.55,$877.64,0.0,0.0,0.0,0.0,0.0,1.21,0.0,0.0,0.0,0.0,0.0,1.09,-0.11,0.7,1.11
80,Indonesia,Asia-Pacific,246.86,0.68,"$3,688.53",0.44,0.03,0.2,0.64,0.21,1.58,0.46,0.06,0.3,0.38,0.06,1.26,-0.32,0.91,1.25
106,Malaysia,Asia-Pacific,29.24,0.77,"$10,252.60",0.67,0.12,0.38,2.1,0.36,3.71,0.75,0.01,0.73,0.84,0.07,2.41,-1.3,2.14,1.54
135,Philippines,Asia-Pacific,96.71,0.66,"$2,379.44",0.36,0.03,0.09,0.34,0.23,1.1,0.32,0.02,0.09,0.07,0.05,0.54,-0.56,0.64,2.03
152,Singapore,Asia-Pacific,5.3,0.91,"$53,122.40",0.67,0.24,0.91,5.91,0.22,7.97,0.0,0.0,0.0,0.01,0.03,0.05,-7.92,4.61,159.47
167,Thailand,Asia-Pacific,66.78,0.72,"$5,479.29",0.67,0.02,0.24,1.54,0.13,2.66,0.77,0.01,0.2,0.19,0.07,1.24,-1.42,1.54,2.14
168,Timor-Leste,Asia-Pacific,1.11,0.6,"$5,167.86",0.25,0.07,0.04,0.06,0.02,0.48,0.21,0.06,0.52,0.94,0.04,1.78,1.3,0.28,0.27


In [46]:
#Check the Ecological Deficit / Reserve of each SEA country

SEA_EF_df.loc[SEA_EF_df['Biocapacity Deficit or Reserve'].sort_values(ascending=False).index, ['Country', 'Biocapacity Deficit or Reserve']]

Unnamed: 0,Country,Biocapacity Deficit or Reserve
168,Timor-Leste,1.3
30,Cambodia,-0.11
80,Indonesia,-0.32
135,Philippines,-0.56
106,Malaysia,-1.3
167,Thailand,-1.42
152,Singapore,-7.92
