# Annual Energy Savings from Recycled Materials in Singapore

## Project Goals
The goal of this project is to analyze the total garbage collection and recycling rate in Singapore, and to determine the amount of energy saved from recycling.

In this analysis, we will answer questions such as:
1. How much energy was saved per year? In which year was this amount the highest? The lowest? 
2. What is the trend for recycled energy savings in Singapore from 2003 to 2022?
3. What is the greatest source of recycled energy savings in 2022 and how has this changed over time?

For more information about how recycling can save energy, please refer here: https://greentumble.com/how-does-recycling-save-energy

## Data
- Recycled energy data for 2003 to 2016 a csv file is taken from the reference for this project, [kingabzpro](https://github.com/kingabzpro/Annual-Recycled-Energy-Saved-in-Singapore/tree/main/Data)
- Recycled energy data for 2017 to 2021 is taken from the Waste and Recycling Statistics [document](https://www.nea.gov.sg/docs/default-source/default-document-library/waste-and-recycling-statistics-2017-to-2021.pdf) on the NEA website. The data has been extracted to an Excel file.
- Recycled energy data for 2022 is taken from the [Waste Statistics and Overall Recycling NEA webpage](https://www.nea.gov.sg/our-services/waste-management/waste-statistics-and-overall-recycling)

**Data Dictionary**

|Variable|Description|
|-----|-----|

- Data for energy conversion is taken from the website [Greentumble](https://greentumble.com/how-does-recycling-save-energy) which has been saved in a csv file.

**Data Dictionary**

|Variable|Description|
|-----|-----|

## Table of Contents
1. Data Acquisition
2. Data Cleaning and Pre-processing
3. Data Exploration and Visualization
4. Conclusions

***

## 1. Data Acquisition

### 1.1 Import Libraries

In [1]:
import pandas as pd
import re

import requests
from bs4 import BeautifulSoup

from sqlalchemy import create_engine
import psycopg2

### 1.2 2003-2016: Import Data from `.csv`

In [2]:
# 2003 - 2016
df_03to16 = pd.read_csv('data/waste-and-recycling-statistics-2003-to-2016.csv')

In [3]:
df_03to16.head()

Unnamed: 0,waste_type,waste_disposed_of_tonne,total_waste_recycled_tonne,total_waste_generated_tonne,recycling_rate,year
0,Food,679900,111100.0,791000,0.14,2016
1,Paper/Cardboard,576000,607100.0,1183100,0.51,2016
2,Plastics,762700,59500.0,822200,0.07,2016
3,C&D,9700,1585700.0,1595400,0.99,2016
4,Horticultural waste,111500,209000.0,320500,0.65,2016


### 1.3 2017-2021: Import Data from `.xlsx`

In [4]:
# 2017-2021
sheets = ['2017', '2018', '2019', '2020', '2021']

df_17to21_list = []
for sheet in sheets:
    df = pd.read_excel('data/waste-and-recycling-statistics-2017-to-2021.xlsx', sheet_name=sheet)
    df = df.rename(columns=df.iloc[0]).loc[1:]
    df['year'] = sheet
    df_17to21_list.append(df)
    
df_17to21 = pd.concat(df_17to21_list, axis=0)

In [5]:
df_17to21.head()

Unnamed: 0,Waste Type,Total Generated\n('000 tonnes),Total Recycled\n('000 tonnes),Recycling Rate,Total Disposed\n('000 tonnes),year
1,C&D,1609,1600,99%,9,2017
2,Ferrous metal,1379,1371,99%,8,2017
3,Paper/Cardboard,1145,569,50%,576,2017
4,Plastics,815,52,6%,763,2017
5,Food,810,133,16%,677,2017


### 1.4 2022: Scrape Data with BeautifulSoup

In [6]:
#2022
url = 'https://www.nea.gov.sg/our-services/waste-management/waste-statistics-and-overall-recycling'
response = requests.get(url)

if response.status_code == 200:
    soup = BeautifulSoup(response.content, "html.parser")
    table = soup.find('table')
    data = [(cell.text for cell in row.find_all('td')) for row in table.find_all('tr')]

df_22 = pd.DataFrame(data)

In [7]:
df_22.head()

Unnamed: 0,0,1,2,3,4
0,,,,,
1,Ferrous metal,1338.0,1331.0,99%,7.0
2,Paper/Cardboard,1064.0,394.0,37%,671.0
3,Construction & Demolition,1424.0,1419.0,99%,5.0
4,Plastics,1001.0,57.0,6%,944.0


### 1.5 Import Energy Conversion

In [8]:
energy_saved = pd.read_csv('data/energy_saved.csv')

In [9]:
energy_saved.head()

Unnamed: 0,The table gives the amount of energy saved in kilowatt hour (kWh) and the amount of crude oil (barrels) by recycling 1 metric tonne (1000 kilogram) per waste type,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5
0,1 barrel oil is approximately 159 litres of oil,,,,,
1,,,,,,
2,material,Plastic,Glass,Ferrous Metal,Non-Ferrous Metal,Paper
3,energy_saved,5774 Kwh,42 Kwh,642 Kwh,14000 Kwh,4100 kWh
4,crude_oil saved,16 barrels,0.12 barrels,1.8 barrels,40 barrels,11 barrels


***

## 2. Data Cleaning and Pre-processing

### 2.1 Cleaning `df_03to16`

In [10]:
df_03to16.head()

Unnamed: 0,waste_type,waste_disposed_of_tonne,total_waste_recycled_tonne,total_waste_generated_tonne,recycling_rate,year
0,Food,679900,111100.0,791000,0.14,2016
1,Paper/Cardboard,576000,607100.0,1183100,0.51,2016
2,Plastics,762700,59500.0,822200,0.07,2016
3,C&D,9700,1585700.0,1595400,0.99,2016
4,Horticultural waste,111500,209000.0,320500,0.65,2016


In [11]:
df_03to16.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 225 entries, 0 to 224
Data columns (total 6 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   waste_type                   225 non-null    object 
 1   waste_disposed_of_tonne      225 non-null    int64  
 2   total_waste_recycled_tonne   225 non-null    float64
 3   total_waste_generated_tonne  225 non-null    int64  
 4   recycling_rate               225 non-null    float64
 5   year                         225 non-null    int64  
dtypes: float64(2), int64(3), object(1)
memory usage: 10.7+ KB


In [12]:
# change data types waste_disposed_of_tonne,total_waste_generated_tonne to float
dtype= {'waste_disposed_of_tonne': 'float64', 
        'total_waste_generated_tonne': 'float64'}

df_03to16 = df_03to16.astype(dtype)

In [13]:
# reoder columns
df_03to16 = df_03to16[['waste_type',
                       'total_waste_generated_tonne',
                       'total_waste_recycled_tonne',
                       'recycling_rate',
                       'waste_disposed_of_tonne',
                       'year']]

In [14]:
# check update
df_03to16.reset_index(drop=True).head(1)

Unnamed: 0,waste_type,total_waste_generated_tonne,total_waste_recycled_tonne,recycling_rate,waste_disposed_of_tonne,year
0,Food,791000.0,111100.0,0.14,679900.0,2016


### 2.2 Cleaning `df_17to22`

In [15]:
df_17to21.head()

Unnamed: 0,Waste Type,Total Generated\n('000 tonnes),Total Recycled\n('000 tonnes),Recycling Rate,Total Disposed\n('000 tonnes),year
1,C&D,1609,1600,99%,9,2017
2,Ferrous metal,1379,1371,99%,8,2017
3,Paper/Cardboard,1145,569,50%,576,2017
4,Plastics,815,52,6%,763,2017
5,Food,810,133,16%,677,2017


In [16]:
df_17to21.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 75 entries, 1 to 15
Data columns (total 6 columns):
 #   Column                         Non-Null Count  Dtype 
---  ------                         --------------  ----- 
 0   Waste Type                     75 non-null     object
 1   Total Generated
('000 tonnes)  75 non-null     object
 2   Total Recycled
('000 tonnes)   75 non-null     object
 3   Recycling Rate                 75 non-null     object
 4   Total Disposed
('000 tonnes)   75 non-null     object
 5   year                           75 non-null     object
dtypes: object(6)
memory usage: 4.1+ KB


In [17]:
# recursively rename columns
col_list = df_03to16.columns.tolist()
for idx,col in enumerate(col_list):
    df_17to21 = df_17to21.rename(columns={df_17to21.columns[idx]:col})

In [18]:
df_17to21.head(1)

Unnamed: 0,waste_type,total_waste_generated_tonne,total_waste_recycled_tonne,recycling_rate,waste_disposed_of_tonne,year
1,C&D,1609,1600,99%,9,2017


In [19]:
# remove special characters from columns (comma, %)
cols = ['total_waste_generated_tonne','total_waste_recycled_tonne','recycling_rate','waste_disposed_of_tonne']
df_17to21[cols] = df_17to21[cols].replace(r'[^\w\s]', '', regex=True)

In [20]:
# update data types
dtype = {'total_waste_generated_tonne':'float64', 'total_waste_recycled_tonne':'float64', 'waste_disposed_of_tonne':'float64',
        'recycling_rate':'float64'}
df_17to21 = df_17to21.astype(dtype)

In [21]:
cols = ['total_waste_generated_tonne','total_waste_recycled_tonne','waste_disposed_of_tonne']
df_17to21[cols] = df_17to21[cols] * 1000
df_17to21['recycling_rate'] = df_17to21['recycling_rate'] / 100

In [22]:
df_17to21.reset_index(drop=True).head(1)

Unnamed: 0,waste_type,total_waste_generated_tonne,total_waste_recycled_tonne,recycling_rate,waste_disposed_of_tonne,year
0,C&D,1609000.0,1600000.0,0.99,9000.0,2017


### 2.3 Cleaning `df_22`

In [23]:
df_22.head()

Unnamed: 0,0,1,2,3,4
0,,,,,
1,Ferrous metal,1338.0,1331.0,99%,7.0
2,Paper/Cardboard,1064.0,394.0,37%,671.0
3,Construction & Demolition,1424.0,1419.0,99%,5.0
4,Plastics,1001.0,57.0,6%,944.0


In [24]:
df_22.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       15 non-null     object
 1   1       15 non-null     object
 2   2       15 non-null     object
 3   3       15 non-null     object
 4   4       15 non-null     object
dtypes: object(5)
memory usage: 768.0+ bytes


In [25]:
df_22['year'] = 2022

In [26]:
# recursively rename columns
col_list = df_03to16.columns.tolist()
for idx,col in enumerate(col_list):
    df_22 = df_22.rename(columns={df_22.columns[idx]:col})

In [27]:
df_22.head(2)

Unnamed: 0,waste_type,total_waste_generated_tonne,total_waste_recycled_tonne,recycling_rate,waste_disposed_of_tonne,year
0,,,,,,2022
1,Ferrous metal,1338.0,1331.0,99%,7.0,2022


In [28]:
# drop the first row
df_22 = df_22.loc[1:]

In [29]:
# remove special characters from columns (comma, %)
cols = ['total_waste_generated_tonne','total_waste_recycled_tonne','recycling_rate','waste_disposed_of_tonne']
df_22[cols] = df_22[cols].replace(r'[^\w\s]', '', regex=True)

In [30]:
# update data types
dtype = {'total_waste_generated_tonne':'float64', 'total_waste_recycled_tonne':'float64', 'waste_disposed_of_tonne':'float64'}
df_22 = df_22.astype(dtype)
df_22['recycling_rate'] = pd.to_numeric(df_22['recycling_rate'],errors='coerce')

In [31]:
cols = ['total_waste_generated_tonne','total_waste_recycled_tonne','waste_disposed_of_tonne']
df_22[cols] = df_22[cols] * 1000
df_22['recycling_rate'] = df_22['recycling_rate'] / 100

In [32]:
df_22.reset_index(drop=True).head(1)

Unnamed: 0,waste_type,total_waste_generated_tonne,total_waste_recycled_tonne,recycling_rate,waste_disposed_of_tonne,year
0,Ferrous metal,1338000.0,1331000.0,0.99,7000.0,2022


### 2.4 Putting it all together

In [33]:
df0 = pd.concat([df_03to16,df_17to21, df_22],ignore_index=True).reset_index(drop=True)

In [34]:
df0.head()

Unnamed: 0,waste_type,total_waste_generated_tonne,total_waste_recycled_tonne,recycling_rate,waste_disposed_of_tonne,year
0,Food,791000.0,111100.0,0.14,679900.0,2016
1,Paper/Cardboard,1183100.0,607100.0,0.51,576000.0,2016
2,Plastics,822200.0,59500.0,0.07,762700.0,2016
3,C&D,1595400.0,1585700.0,0.99,9700.0,2016
4,Horticultural waste,320500.0,209000.0,0.65,111500.0,2016


In [35]:
df0['waste_type'] = df0['waste_type'].str.replace(r'[^A-Za-z0-9\s]+','') \
                                     .apply(lambda x: ' '.join((' '.join(re.findall('[a-zA-Z][^A-Z]*', x))).split())) \
                                     .str.lower()

  df0['waste_type'] = df0['waste_type'].str.replace(r'[^A-Za-z0-9\s]+','') \


In [36]:
df0['waste_type'].value_counts()

paper cardboard                      21
used slag                            21
glass                                21
textile leather                      21
scrap tyres                          21
plastics                             20
ferrous metal                        17
horticultural waste                  15
total                                15
others stones ceramics rubber etc    13
nonferrous metals                    12
construction debris                  12
food waste                           11
sludge                               11
wood timber                          11
food                                 10
wood                                 10
ash sludge                            9
nonferrous metal                      9
c d                                   6
horticultural                         6
others stones ceramics etc            6
overall                               6
ferrous metals                        4
construction demolition c d           2


In [37]:
#rename columns
# keep only paper/plastic/glass/metals
df0 = df0

In [38]:
# check duplicates
df0.duplicated().sum()

0

In [39]:
# check missing values
df0.isna().sum()

waste_type                     0
total_waste_generated_tonne    0
total_waste_recycled_tonne     0
recycling_rate                 1
waste_disposed_of_tonne        0
year                           0
dtype: int64

In [40]:
df0[df0['recycling_rate'].isna()]

Unnamed: 0,waste_type,total_waste_generated_tonne,total_waste_recycled_tonne,recycling_rate,waste_disposed_of_tonne,year
313,others stones ceramics etc,249000.0,30000.0,,219000.0,2022


In [41]:
# check the math of recycling rate
df0.loc[313,'recycling_rate'] = round(df0.loc[313,'total_waste_recycled_tonne'] / 
                                      df0.loc[313,'total_waste_generated_tonne'],2)

In [42]:
df0.isna().sum()

waste_type                     0
total_waste_generated_tonne    0
total_waste_recycled_tonne     0
recycling_rate                 0
waste_disposed_of_tonne        0
year                           0
dtype: int64

### 2.5 Cleaning `energy_saved` 

In [43]:
energy_saved.head()

Unnamed: 0,The table gives the amount of energy saved in kilowatt hour (kWh) and the amount of crude oil (barrels) by recycling 1 metric tonne (1000 kilogram) per waste type,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5
0,1 barrel oil is approximately 159 litres of oil,,,,,
1,,,,,,
2,material,Plastic,Glass,Ferrous Metal,Non-Ferrous Metal,Paper
3,energy_saved,5774 Kwh,42 Kwh,642 Kwh,14000 Kwh,4100 kWh
4,crude_oil saved,16 barrels,0.12 barrels,1.8 barrels,40 barrels,11 barrels


In [44]:
conversion = energy_saved.T.iloc[1:, 2:] \
                         .reset_index(drop=True) \
                         .rename(columns={2: "material", 
                                          3: "energy_saved_kwh", 
                                          4: "crude_oil_saved_barrels"})
conversion

Unnamed: 0,material,energy_saved_kwh,crude_oil_saved_barrels
0,Plastic,5774 Kwh,16 barrels
1,Glass,42 Kwh,0.12 barrels
2,Ferrous Metal,642 Kwh,1.8 barrels
3,Non-Ferrous Metal,14000 Kwh,40 barrels
4,Paper,4100 kWh,11 barrels


In [45]:
cols = ['energy_saved_kwh','crude_oil_saved_barrels']
conversion[cols] = conversion[cols].replace(r'[A-Za-z]*', '', regex=True)

In [46]:
conversion

Unnamed: 0,material,energy_saved_kwh,crude_oil_saved_barrels
0,Plastic,5774,16.0
1,Glass,42,0.12
2,Ferrous Metal,642,1.8
3,Non-Ferrous Metal,14000,40.0
4,Paper,4100,11.0


### 2.6 Load Tables Into PostgreSQL Database

In [47]:
%load_ext sql

In [48]:
try:
    connection = psycopg2.connect(
        user="postgres",
        password="password",
        host="127.0.0.1",
        port="5432",
        database="testdb")
    cursor = connection.cursor()
    print("Connected to the database successfully")
except (Exception, psycopg2.Error) as error:
    print("Error while connecting to PostgreSQL", error)

Connected to the database successfully


In [49]:
engine = create_engine('postgresql+psycopg2://postgres:password@localhost:5432/testdb')
conn = engine.connect()
df0.to_sql('recycling', con=conn, if_exists='replace', index=False)
conversion.to_sql('conversion', con=conn, if_exists='replace', index=False)

5

In [50]:
# connect to database
%sql postgresql+psycopg2://postgres:password@localhost:5432/testdb

'Connected: postgres@testdb'

In [51]:
%%sql
-- test connection

SELECT * FROM recycling
LIMIT 1;

 * postgresql+psycopg2://postgres:***@localhost:5432/testdb
1 rows affected.


waste_type,total_waste_generated_tonne,total_waste_recycled_tonne,recycling_rate,waste_disposed_of_tonne,year
food,791000.0,111100.0,0.14,679900.0,2016


In [52]:
%%sql
-- test connection

SELECT * FROM conversion
LIMIT 1;

 * postgresql+psycopg2://postgres:***@localhost:5432/testdb
1 rows affected.


material,energy_saved_kwh,crude_oil_saved_barrels
Plastic,5774,16


***

## 3. Data Exploration and Visualization

## 4. Conclusions

***