# Annual Energy Savings from Recycled Materials in Singapore

## Project Goals
The goal of this project is to analyze the total garbage collection and recycling rate in Singapore, and to determine the amount of energy saved from recycling.

In this analysis, we will answer questions such as:
1. How much energy was saved per year? In which year was this amount the highest? The lowest? 
2. What is the trend for recycled energy savings in Singapore from 2003 to 2022?
3. What is the greatest source of recycled energy savings in 2022 and how has this changed over time?

For more information about how recycling can save energy, please refer here: https://greentumble.com/how-does-recycling-save-energy

## Data
- Recycled energy data for 2003 to 2016 a csv file is taken from the reference for this project, [kingabzpro](https://github.com/kingabzpro/Annual-Recycled-Energy-Saved-in-Singapore/tree/main/Data)
- Recycled energy data for 2017 to 2021 is taken from the Waste and Recycling Statistics [document](https://www.nea.gov.sg/docs/default-source/default-document-library/waste-and-recycling-statistics-2017-to-2021.pdf) on the NEA website. The data has been extracted to an Excel file.
- Recycled energy data for 2022 is taken from the [Waste Statistics and Overall Recycling NEA webpage](https://www.nea.gov.sg/our-services/waste-management/waste-statistics-and-overall-recycling)

**Data Dictionary**

|Variable|Description|
|-----|-----|

## Table of Contents
1. Data Acquisition
2. Data Cleaning and Pre-processing
3. Data Exploration and Visualization
4. Conclusions

***

## 1. Data Acquisition

#### Import Libraries

In [177]:
import pandas as pd
import re
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

import requests
from bs4 import BeautifulSoup

import sqlite3
from sqlalchemy import create_engine
import psycopg2

In [176]:
pip install psycopg2-binary

Collecting psycopg2-binary
  Downloading psycopg2_binary-2.9.7-cp39-cp39-win_amd64.whl (1.2 MB)
Installing collected packages: psycopg2-binary
Successfully installed psycopg2-binary-2.9.7
Note: you may need to restart the kernel to use updated packages.


#### 2003-2016: Import Data from `.csv`

In [13]:
# 2003 - 2016
df_03to16 = pd.read_csv('data/waste-and-recycling-statistics-2003-to-2016.csv')

In [14]:
df_03to16.head()

Unnamed: 0,waste_type,waste_disposed_of_tonne,total_waste_recycled_tonne,total_waste_generated_tonne,recycling_rate,year
0,Food,679900,111100.0,791000,0.14,2016
1,Paper/Cardboard,576000,607100.0,1183100,0.51,2016
2,Plastics,762700,59500.0,822200,0.07,2016
3,C&D,9700,1585700.0,1595400,0.99,2016
4,Horticultural waste,111500,209000.0,320500,0.65,2016


#### 2017-2021: Import Data from `.xlsx`

In [15]:
# 2017-2021
sheets = ['2017', '2018', '2019', '2020', '2021']

df_17to21_list = []
for sheet in sheets:
    df = pd.read_excel('data/waste-and-recycling-statistics-2017-to-2021.xlsx', sheet_name=sheet)
    df = df.rename(columns=df.iloc[0]).loc[1:]
    df['year'] = sheet
    df_17to21_list.append(df)
    
df_17to21 = pd.concat(df_17to21_list, axis=0)

In [16]:
df_17to21.head()

Unnamed: 0,Waste Type,Total Generated\n('000 tonnes),Total Recycled\n('000 tonnes),Recycling Rate,Total Disposed\n('000 tonnes),year
1,C&D,1609,1600,99%,9,2017
2,Ferrous metal,1379,1371,99%,8,2017
3,Paper/Cardboard,1145,569,50%,576,2017
4,Plastics,815,52,6%,763,2017
5,Food,810,133,16%,677,2017


#### 2022: Scrape Data with BeautifulSoup

In [56]:
#2022
url = 'https://www.nea.gov.sg/our-services/waste-management/waste-statistics-and-overall-recycling'
response = requests.get(url)

if response.status_code == 200:
    soup = BeautifulSoup(response.content, "html.parser")
    table = soup.find('table')
    data = [(cell.text for cell in row.find_all('td')) for row in table.find_all('tr')]

df_22 = pd.DataFrame(data)

In [57]:
df_22

Unnamed: 0,0,1,2,3,4
0,,,,,
1,Ferrous metal,1338.0,1331.0,99%,7.0
2,Paper/Cardboard,1064.0,394.0,37%,671.0
3,Construction & Demolition,1424.0,1419.0,99%,5.0
4,Plastics,1001.0,57.0,6%,944.0
5,Food,813.0,146.0,18%,667.0
6,Horticultural,221.0,188.0,85%,32.0
7,Wood,419.0,298.0,71%,121.0
8,Ash & sludge,241.0,27.0,11%,213.0
9,Textile/Leather,254.0,5.0,2%,249.0


***

## 2. Data Cleaning and Pre-processing

### Cleaning `df_03to16`

In [19]:
df_03to16.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 225 entries, 0 to 224
Data columns (total 6 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   waste_type                   225 non-null    object 
 1   waste_disposed_of_tonne      225 non-null    int64  
 2   total_waste_recycled_tonne   225 non-null    float64
 3   total_waste_generated_tonne  225 non-null    int64  
 4   recycling_rate               225 non-null    float64
 5   year                         225 non-null    int64  
dtypes: float64(2), int64(3), object(1)
memory usage: 10.7+ KB


In [20]:
# change data types waste_disposed_of_tonne,total_waste_generated_tonne to float
dtype= {'waste_disposed_of_tonne': 'float64', 
        'total_waste_generated_tonne': 'float64'}

df_03to16 = df_03to16.astype(dtype)

In [22]:
# reoder columns
df_03to16 = df_03to16[['waste_type',
                       'total_waste_generated_tonne',
                       'total_waste_recycled_tonne',
                       'recycling_rate',
                       'waste_disposed_of_tonne',
                       'year']]

In [23]:
# check update
df_03to16.reset_index(drop=True).head(1)

Unnamed: 0,waste_type,total_waste_generated_tonne,total_waste_recycled_tonne,recycling_rate,waste_disposed_of_tonne,year
0,Food,791000.0,111100.0,0.14,679900.0,2016


### Cleaning `df_17to22`

In [24]:
df_17to21.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 75 entries, 1 to 15
Data columns (total 6 columns):
 #   Column                         Non-Null Count  Dtype 
---  ------                         --------------  ----- 
 0   Waste Type                     75 non-null     object
 1   Total Generated
('000 tonnes)  75 non-null     object
 2   Total Recycled
('000 tonnes)   75 non-null     object
 3   Recycling Rate                 75 non-null     object
 4   Total Disposed
('000 tonnes)   75 non-null     object
 5   year                           75 non-null     object
dtypes: object(6)
memory usage: 4.1+ KB


In [25]:
# recursively rename columns
col_list = df_03to16.columns.tolist()
for idx,col in enumerate(col_list):
    df_17to21 = df_17to21.rename(columns={df_17to21.columns[idx]:col})

In [26]:
df_17to21.head(1)

Unnamed: 0,waste_type,total_waste_generated_tonne,total_waste_recycled_tonne,recycling_rate,waste_disposed_of_tonne,year
1,C&D,1609,1600,99%,9,2017


In [27]:
# remove special characters from columns (comma, %)
cols = ['total_waste_generated_tonne','total_waste_recycled_tonne','recycling_rate','waste_disposed_of_tonne']
df_17to21[cols] = df_17to21[cols].replace(r'[^\w\s]', '', regex=True)

In [28]:
# update data types
dtype = {'total_waste_generated_tonne':'float64', 'total_waste_recycled_tonne':'float64', 'waste_disposed_of_tonne':'float64',
        'recycling_rate':'float64'}
df_17to21 = df_17to21.astype(dtype)

In [29]:
cols = ['total_waste_generated_tonne','total_waste_recycled_tonne','waste_disposed_of_tonne']
df_17to21[cols] = df_17to21[cols] * 1000
df_17to21['recycling_rate'] = df_17to21['recycling_rate'] / 100

In [30]:
df_17to21.reset_index(drop=True).head(1)

Unnamed: 0,waste_type,total_waste_generated_tonne,total_waste_recycled_tonne,recycling_rate,waste_disposed_of_tonne,year
0,C&D,1609000.0,1600000.0,0.99,9000.0,2017


### Cleaning `df_22`

In [58]:
df_22.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       15 non-null     object
 1   1       15 non-null     object
 2   2       15 non-null     object
 3   3       15 non-null     object
 4   4       15 non-null     object
dtypes: object(5)
memory usage: 768.0+ bytes


In [59]:
df_22['year'] = 2022

In [60]:
# recursively rename columns
col_list = df_03to16.columns.tolist()
for idx,col in enumerate(col_list):
    df_22 = df_22.rename(columns={df_22.columns[idx]:col})

In [61]:
df_22.head(2)

Unnamed: 0,waste_type,total_waste_generated_tonne,total_waste_recycled_tonne,recycling_rate,waste_disposed_of_tonne,year
0,,,,,,2022
1,Ferrous metal,1338.0,1331.0,99%,7.0,2022


In [62]:
# drop the first row
df_22 = df_22.loc[1:]

In [63]:
# remove special characters from columns (comma, %)
cols = ['total_waste_generated_tonne','total_waste_recycled_tonne','recycling_rate','waste_disposed_of_tonne']
df_22[cols] = df_22[cols].replace(r'[^\w\s]', '', regex=True)

In [64]:
# update data types
dtype = {'total_waste_generated_tonne':'float64', 'total_waste_recycled_tonne':'float64', 'waste_disposed_of_tonne':'float64'}
df_22 = df_22.astype(dtype)
df_22['recycling_rate'] = pd.to_numeric(df_22['recycling_rate'],errors='coerce')

In [65]:
cols = ['total_waste_generated_tonne','total_waste_recycled_tonne','waste_disposed_of_tonne']
df_22[cols] = df_22[cols] * 1000
df_22['recycling_rate'] = df_22['recycling_rate'] / 100

In [66]:
df_22.reset_index(drop=True).head(1)

Unnamed: 0,waste_type,total_waste_generated_tonne,total_waste_recycled_tonne,recycling_rate,waste_disposed_of_tonne,year
0,Ferrous metal,1338000.0,1331000.0,0.99,7000.0,2022


### Putting it all together

In [152]:
df0 = pd.concat([df_03to16,df_17to21, df_22],ignore_index=True).reset_index(drop=True)

In [153]:
df0['waste_type'] = df0['waste_type'].str.replace(r'[^A-Za-z0-9\s]+','') \
                                     .apply(lambda x: ' '.join((' '.join(re.findall('[a-zA-Z][^A-Z]*', x))).split())) \
                                     .str.lower()

  df0['waste_type'] = df0['waste_type'].str.replace(r'[^A-Za-z0-9\s]+','') \


In [154]:
wnl = WordNetLemmatizer()
stop = stopwords.words('english')
df0['waste_type'] = df0['waste_type'].apply(word_tokenize) \
                                .apply(lambda row: ' '.join([str(wnl.lemmatize(word,pos='n')) 
                                                             for word in row if word not in stop]))

In [155]:
df0['waste_type'].value_counts()

scrap tyre                         21
plastic                            21
ferrous metal                      21
nonferrous metal                   21
used slag                          21
glass                              21
textile leather                    21
paper cardboard                    21
horticultural waste                15
total                              15
others stone ceramic rubber etc    14
construction debris                12
food waste                         11
sludge                             11
wood timber                        11
food                               10
ash sludge                         10
wood                               10
c                                   6
horticultural                       6
others stone ceramic etc            6
overall                             6
construction demolition c           2
others                              1
construction demolition             1
Name: waste_type, dtype: int64

In [156]:
df0['waste_type'] = df0['waste_type'].replace(['construction demolition c','construction debris','c'],
                                              'construction demolition')
df0['waste_type'] = df0['waste_type'].replace(['others','others stone ceramic etc'],
                                              'others stone ceramic rubber etc')
df0['waste_type'] = df0['waste_type'].str.replace('overall','total')
df0['waste_type'] = df0['waste_type'].str.replace('horticultural waste','horticultural')
df0['waste_type'] = df0['waste_type'].str.replace('wood timber','wood')
df0['waste_type'] = df0['waste_type'].str.replace('ash sludge','sludge')
df0['waste_type'] = df0['waste_type'].str.replace('food waste','food')

In [157]:
df0['waste_type'].value_counts()

food                               21
paper cardboard                    21
plastic                            21
construction demolition            21
horticultural                      21
wood                               21
ferrous metal                      21
nonferrous metal                   21
used slag                          21
sludge                             21
glass                              21
textile leather                    21
scrap tyre                         21
others stone ceramic rubber etc    21
total                              21
Name: waste_type, dtype: int64

In [158]:
df0.duplicated().sum()

0

In [159]:
df0.isna().sum()

waste_type                     0
total_waste_generated_tonne    0
total_waste_recycled_tonne     0
recycling_rate                 1
waste_disposed_of_tonne        0
year                           0
dtype: int64

In [160]:
df0[df0['recycling_rate'].isna()]

Unnamed: 0,waste_type,total_waste_generated_tonne,total_waste_recycled_tonne,recycling_rate,waste_disposed_of_tonne,year
313,others stone ceramic rubber etc,249000.0,30000.0,,219000.0,2022


In [161]:
# check the math of recycling rate
df0.loc[313,'recycling_rate'] = round(df0.loc[313,'total_waste_recycled_tonne'] / 
                                      df0.loc[313,'total_waste_generated_tonne'],2)

In [162]:
df0.isna().sum()

waste_type                     0
total_waste_generated_tonne    0
total_waste_recycled_tonne     0
recycling_rate                 0
waste_disposed_of_tonne        0
year                           0
dtype: int64

***

## 3. Data Exploration and Visualization

In [171]:
# create new database
conn=sqlite3.connect('mydb.db')

In [172]:
# use pandas `.to_sql` to create a table 'recycling' from dataframe df
df0.to_sql(name='recycling', con=conn, if_exists='replace', index=False)
conn.commit()

In [173]:
# connect to database
%load_ext sql
%sql sqlite:///mydb.db

The sql extension is already loaded. To reload it, use:
  %reload_ext sql
Traceback (most recent call last):
  File "C:\Users\elsf1\AppData\Roaming\Python\Python39\site-packages\sql\connection.py", line 45, in __init__
    engine = sqlalchemy.create_engine(
  File "<string>", line 2, in create_engine
  File "C:\Users\elsf1\anaconda3\lib\site-packages\sqlalchemy\util\deprecations.py", line 309, in warned
  File "C:\Users\elsf1\anaconda3\lib\site-packages\sqlalchemy\engine\create.py", line 534, in create_engine
  File "C:\Users\elsf1\anaconda3\lib\site-packages\sqlalchemy\engine\url.py", line 661, in _get_entrypoint
    self.database,
  File "C:\Users\elsf1\anaconda3\lib\site-packages\sqlalchemy\util\langhelpers.py", line 343, in load
    def __init__(
sqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:sqlite

Connection info needed in SQLAlchemy format, example:
               postgresql://username:password@hostname/dbname
               or an existing connection: 

In [178]:
from sqlalchemy import create_engine
import psycopg2

try:
    connection = psycopg2.connect(
        user="postgres",
        password="password",
        host="127.0.0.1",
        port="5432",
        database="testdb"
    )
    cursor = connection.cursor()
    print("Connected to the database successfully")
except (Exception, psycopg2.Error) as error:
    print("Error while connecting to PostgreSQL", error)

Connected to the database successfully


In [182]:
pip install psycopg2-binary

Note: you may need to restart the kernel to use updated packages.


In [181]:
from sqlalchemy import create_engine

engine = create_engine('postgresql+psycopg2://postgres:password@localhost:5432/testdb')

df0.to_sql('your_table', engine, if_exists='replace', index=False)

Unexpected exception formatting exception. Falling back to standard exception


Traceback (most recent call last):
  File "C:\Users\elsf1\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3369, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "C:\Users\elsf1\AppData\Local\Temp\ipykernel_18504\2023597314.py", line 3, in <cell line: 3>
    engine = create_engine('postgresql+psycopg2://postgres:password@localhost:5432/testdb')
  File "<string>", line 2, in create_engine
  File "C:\Users\elsf1\anaconda3\lib\site-packages\sqlalchemy\util\deprecations.py", line 309, in warned
  File "C:\Users\elsf1\anaconda3\lib\site-packages\sqlalchemy\engine\create.py", line 534, in create_engine
  File "C:\Users\elsf1\anaconda3\lib\site-packages\sqlalchemy\engine\url.py", line 661, in _get_entrypoint
    self.database,
  File "C:\Users\elsf1\anaconda3\lib\site-packages\sqlalchemy\util\langhelpers.py", line 343, in load
    def __init__(
sqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:postgresql.psycopg2

During handli

In [None]:
# start querying!

In [None]:
# recycling rate of individual waste types per year
%%sql

In [None]:
# total energy saved per year
%%sql

***

## 4. Conclusions

***