For this notebook, only Pandas library will be imported

In [1]:
import pandas as pd

Now, we'll take a look into the three datasets provided

In [2]:
df_1 = pd.read_csv('wastestats.csv')
df_2 = pd.read_csv('2018_2019_waste.csv')
df_3 = pd.read_csv('energy_saved.csv',skiprows=2)

display(df_1.head(5))
display(df_2.head(5))
display(df_3.head(5))

Unnamed: 0,waste_type,waste_disposed_of_tonne,total_waste_recycled_tonne,total_waste_generated_tonne,recycling_rate,year
0,Food,679900,111100.0,791000,0.14,2016
1,Paper/Cardboard,576000,607100.0,1183100,0.51,2016
2,Plastics,762700,59500.0,822200,0.07,2016
3,C&D,9700,1585700.0,1595400,0.99,2016
4,Horticultural waste,111500,209000.0,320500,0.65,2016


Unnamed: 0,Waste Type,Total Generated ('000 tonnes),Total Recycled ('000 tonnes),Year
0,Construction& Demolition,1440,1434,2019
1,Ferrous Metal,1278,1270,2019
2,Paper/Cardboard,1011,449,2019
3,Plastics,930,37,2019
4,Food,7440,136,2019


Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5
0,material,Plastic,Glass,Ferrous Metal,Non-Ferrous Metal,Paper
1,energy_saved,5774 Kwh,42 Kwh,642 Kwh,14000 Kwh,4000 kWh
2,crude_oil saved,16 barrels,,1.8 barrels,40 barrels,1.7 barrels


After looking into the datasets provided, we can see there are some issues that we need to address in order to display information as requested:
- Not all years should be taken into account since we'll analyze the waste disposed between **2015** and **2019**.
- Not all types of waste are relevant to the analysis, since Singapore Government only wants to count the energy saved from recycling `glass`, `plastic`, `Ferrous Metal` and `Non-Ferrous Metal`.
- **`df_3`** is transposed.
- Some parameters are not relevant (`waste_disposed_of_tonne`, `total_waste_generated_tonne`, `recycling_rate`, `Total Generated ('000 tonnes)` and `crude_oil_saved`), because the analysis will only take into account the amount of energy saved by recycling `glass`, `plastic`,`ferrous metal` and `non-ferrous metal`.
- Some material name corrections had to be made, since some of them are similar but refer to the same types of waste (e.g: "Plastics" and "Plastic").
- Column names should be normalized to be able to work with them (e.g: `waste_type` in **`df_1`** is the same as `Waste Type` in **`df_2`**)
- **Waste generated** should have uniform units (in **`df_1`** is in *tonnes* while in **`df_2`** is in *thousands of tonnes*).
- Attribute `energy_saved` has its values in string format (which is composed of a number + units), and it needs to have integer values to be able to perform neccesary calculations.
- Data is spread across three tables, and it should be unified and aggregated into a single one.

We'll first work with df_1, dropping some columns we don't need

In [3]:
# Drop "waste_disposed_of_tonne" and "total_waste_recycled_tonne" columns since we don't need them
df_1 = df_1.drop(columns=['waste_disposed_of_tonne','recycling_rate','total_waste_generated_tonne'])

display(df_1.head(4))

Unnamed: 0,waste_type,total_waste_recycled_tonne,year
0,Food,111100.0,2016
1,Paper/Cardboard,607100.0,2016
2,Plastics,59500.0,2016
3,C&D,1585700.0,2016


Regarding df_2, we'll drop a column and rename some attributes to match other tables's names. We'll also multiply by 1000 the column `Total Generated ('000 tonnes)` to normalize units

In [4]:
# Renaming columns, standardizing to match df_1 ones, and making them `snake_case` for easier usability
df_2 = df_2.rename(columns={"Waste Type":"waste_type","Total Generated ('000 tonnes)": "total_waste_generated","Total Recycled ('000 tonnes)" : 'total_waste_recycled_tonne',"Year":"year"})

df_2 = df_2.drop(columns=['total_waste_generated'])


df_2['total_waste_recycled_tonne'] = df_2['total_waste_recycled_tonne']*1000
df_2.head(4)

Unnamed: 0,waste_type,total_waste_recycled_tonne,year
0,Construction& Demolition,1434000,2019
1,Ferrous Metal,1270000,2019
2,Paper/Cardboard,449000,2019
3,Plastics,37000,2019


Then, we'll concatenate `df_1` and `df_2` into one dataframe (`df_new`)

In [5]:
df_new = pd.concat([df_1,df_2])
df_new.head(5)

Unnamed: 0,waste_type,total_waste_recycled_tonne,year
0,Food,111100.0,2016
1,Paper/Cardboard,607100.0,2016
2,Plastics,59500.0,2016
3,C&D,1585700.0,2016
4,Horticultural waste,209000.0,2016


We'll now apply necessary transformations to `df_3`

In [6]:
# Transpose dataframe
df_3 = df_3.T

# Set column names, remove first row
df_3.columns = df_3.iloc[0]
df_3 = df_3.iloc[1:,].reset_index(drop=True)

# Normalize materials column name
df_3 = df_3.rename(columns={"material":"waste_type"})

# Drop "crude_oil saved" column since we don't need this data right now
df_3 = df_3.drop(columns='crude_oil saved')

# Remove " kwh" from "energy_saved" column
df_3.energy_saved = df_3.energy_saved.str.replace("( ).*","")
df_3

Unnamed: 0,waste_type,energy_saved
0,Plastic,5774
1,Glass,42
2,Ferrous Metal,642
3,Non-Ferrous Metal,14000
4,Paper,4000


Create a list of unique materials and remove paper since it's not relevant to present analysis

In [7]:
# Create list of relevant materials, remove "Paper" since it's not relevant
list_materials = df_3['waste_type'].unique().tolist()
list_materials.remove('Paper')
list_materials

['Plastic', 'Glass', 'Ferrous Metal', 'Non-Ferrous Metal']

We do some name correction of mispelled material names

In [8]:
# Normalize names
df_new.replace("Plastics", "Plastic",inplace=True)
df_new.replace(['Ferrous metal','Ferrous Metals'], "Ferrous Metal",inplace=True)
df_new.replace(["Non-ferrous Metals",'Non-ferrous metal','Non-ferrous metals'], "Non-Ferrous Metal",inplace=True)

Select from df_new only relevant materials (contained into `list_materials` list)

In [9]:
# Select material types that are "glass, plastic, ferrous and non-ferrous metals"
df_new = df_new[df_new['waste_type'].isin(list_materials)]

We now merge `df_new` with `df_3`, set `year` as index parameter and set all numeric variables as integers (int-64 and not int-32 or less to avoid the variable overflow numbers this large would cause)

In [10]:
# Join first merged dataframe with df_3 to add the energy saved per tonne
df_new_final = df_new.merge(df_3,on='waste_type')

# Set year as index
annual_energy_savings = df_new_final.set_index('year')

# Set total_waste_recycled_tonne and energy_saved as int64 types
annual_energy_savings.total_waste_recycled_tonne = annual_energy_savings.total_waste_recycled_tonne.astype('int64')
annual_energy_savings.energy_saved = annual_energy_savings.energy_saved.astype('int64')
display(annual_energy_savings.head(5))

Unnamed: 0_level_0,waste_type,total_waste_recycled_tonne,energy_saved
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2016,Plastic,59500,5774
2015,Plastic,57800,5774
2014,Plastic,80000,5774
2013,Plastic,91100,5774
2012,Plastic,82100,5774


We then multiply the energy savings per tonne with the total tonnes recycled of all materials

In [11]:
# Calculate total energy saved taking into account amount of tonnes recycled and drop non relevant columns
annual_energy_savings['total_energy_saved'] = annual_energy_savings['total_waste_recycled_tonne'] * annual_energy_savings['energy_saved']
annual_energy_savings = annual_energy_savings.drop(columns=['total_waste_recycled_tonne','waste_type','energy_saved'])
display(annual_energy_savings.head(5))

Unnamed: 0_level_0,total_energy_saved
year,Unnamed: 1_level_1
2016,343553000
2015,333737200
2014,461920000
2013,526011400
2012,474045400


We now aggregate all energy savings for each year

In [12]:
# Aggregating by year and selecting only relevant years
annual_energy_savings = annual_energy_savings.groupby(['year']).sum()
annual_energy_savings.head(5)

Unnamed: 0_level_0,total_energy_saved
year,Unnamed: 1_level_1
2003,1800181800
2004,1850495000
2005,2023445800
2006,1861336800
2007,1920549600


We finally obtain the table that shows how much KwH the Singapore has saved by recycling plastic, ferrous metals, non-ferrous metals and glass

In [13]:
annual_energy_savings = annual_energy_savings.loc[2015:2019]
annual_energy_savings.head(5)

Unnamed: 0_level_0,total_energy_saved
year,Unnamed: 1_level_1
2015,3435929000
2016,2554433400
2017,2470596000
2018,2698130000
2019,2765440000
