# Singapore Recycling and Waste Management
Learn how much Singapore is saving energy per years by recycling plastics, paper, glass, ferrous and non-ferrous metal.

> **before going throught my notebook pleas check our my [medium](https://medium.com/@kingabzpro/annual-recycled-energy-saved-in-singapore-2d6bad49bfb2) article explaining everything in detail.**

In this project, we will clean our data and prepares for data analysis. We will be using [Singapore NEA Energy Savings | Kaggle](https://www.kaggle.com/eminbasturk/singapore-nea-energy-savings) data to analyze the total garbage collection and recycling rate. The material names are different as these data were collected from different resources. We will be added the latest data of 2020 [waste-statistics-and-overall-recycling](https://www.nea.gov.sg/our-services/waste-management/waste-statistics-and-overall-recycling) from the website so that we can have the latest statics analysis. We will be finding how much energy we can produce using [Greentumble](https://greentumble.com/how-does-recycling-save-energy/) key information.

We will be using **Recycling statistics** to calculate energy saved every year from 2003 to 2020 based on five waste types, plastics, paper, glass, ferrous and non-ferrous metal.

## Loading Data

In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

energy_saved = pd.read_csv('../input/singapore-waste-management/waste_energy_stat.csv')
waste_03_17 = pd.read_csv('../input/singapore-waste-management/2003_2017_waste.csv')
waste_18_20 = pd.read_csv('../input/singapore-waste-management/2018_2020_waste.csv')

## Cleaning Data

In [None]:
clean_waste_18_20 = waste_18_20.rename(
    columns={
        "Waste Type": "waste_type",
        "Total Generated ('000 tonnes)": "total_waste_generated_tonne",
        "Total Recycled ('000 tonnes)": "total_waste_recycled_tonne",
        "Year": "year",
    }
)
clean_waste_18_20["total_waste_generated_tonne"] = (
    clean_waste_18_20["total_waste_generated_tonne"] * 1000
)
clean_waste_18_20["total_waste_recycled_tonne"] = (
    clean_waste_18_20["total_waste_recycled_tonne"] * 1000
)


https://greentumble.com/how-does-recycling-save-energy/

In [None]:
energy_saved

- Transpose
- Removed first two columns and first row
- Resetting index
- Renaming the columns

As you can see, we have three columns, material, energy_saved, and crude_oil_saved.

In [None]:
clean_energy_saved = (
    energy_saved.T.iloc[1:, 2:]
    .reset_index(drop=True)
    .rename(columns={2: "material", 3: "energy_saved", 4: "crude_oil_saved"})
)
clean_energy_saved

In [None]:
clean_waste_03_17 = waste_03_17.loc[
    :,
    [
        "waste_type",
        "total_waste_generated_tonne",
        "total_waste_recycled_tonne",
        "recycling_rate",
        "year",
    ],
]


In [None]:
# clean_waste.iloc[16,2] = 1260000

Let's add recycling rate into our DataFrame as we will be using it later to analysis.

In [None]:
clean_waste_18_20["recycling_rate"] = round(
    clean_waste_18_20["total_waste_recycled_tonne"]
    / clean_waste_18_20["total_waste_generated_tonne"],
    2,
)
clean_waste_18_20.head()

## Data Analysis

In [None]:
data = pd.concat([clean_waste_18_20, clean_waste_03_17]).sort_values(by="year")
overall = data[(data["waste_type"] == "Overall") | (data["waste_type"] == "Total")]


fig = go.Figure()

fig.add_trace(
    go.Bar(
        x=overall["year"],
        y=overall["total_waste_generated_tonne"],
        name="Waste Generated",
    )
)

fig.add_trace(
    go.Bar(
        x=overall["year"],
        y=overall["total_waste_recycled_tonne"],
        name="Waste Recycled",
    )
)

fig.show()


In [None]:
data['waste_type'].value_counts()

In [None]:
data["waste_type"] = data["waste_type"].str.replace(
    "Non-ferrous metal", "Non-Ferrous Metal"
    )
data["waste_type"] = data["waste_type"].str.replace(
    "Non-ferrous metals", "Non-Ferrous Metal"
    )
data["waste_type"] = data["waste_type"].str.replace(
    "Non-Ferrous Metals", "Non-Ferrous Metal"
    )
data["waste_type"] = data["waste_type"].str.replace(
    "Plastics", "Plastic"
    )
data["waste_type"] = data["waste_type"].str.replace(
    "Ferrous metal", "Ferrous Metal"
    )
data["waste_type"] = data["waste_type"].str.replace(
    "Paper/Cardboard", "Paper"
    )


In [None]:
total_data = data.merge(
    clean_energy_saved, how="left", left_on="waste_type", right_on="material"
).dropna()

total_data["energy_saved"] = total_data.loc[:, "energy_saved"].str.replace("kWh", "")

total_data["energy_saved"] = (
    total_data.loc[:, "energy_saved"].str.replace("Kwh", "").astype(int)
)

total_data.head()


In [None]:
total_data["total_energy_saved"] = (
    total_data.loc[:, "total_waste_recycled_tonne"] * total_data.loc[:, "energy_saved"]
)

total_data.head()


## Visualization

In [None]:
total_data.groupby(by=["waste_type"]).mean()[
    "recycling_rate"
    ].to_frame().style.\
    background_gradient(cmap="Pastel1_r", subset=["recycling_rate"])

I wanted to check our final data for outliners and patterns. We found out that there was anomaly at year 2018 and to figure it out we have to check our dataset.

In [None]:
fig = px.box(total_data, x="year", y="total_waste_recycled_tonne")
fig.update_traces(quartilemethod="exclusive")
fig.show()


In [None]:
total_data[total_data['year']==2018]

After going through total waste recycled of 2018, we discovered that total waste generated for Ferrous Metal was 126900 tonne but total recycled waste was 126000. As we know the mean recycling rate for Ferrous metal is 90 but it was showing 10 percent which was odd, so I went back to original data on the site and discovered the mistake. We can clearly see in the [PDF](https://www.nea.gov.sg/docs/default-source/our-services/waste-management/waste-recycling-statistics-2016-to-2019.pdf) that entire zero was missing. 

In [None]:
total_data.loc[237, "total_waste_recycled_tonne"] = 1260000
total_data["total_energy_saved"] = total_data.loc[:, "total_waste_recycled_tonne"] * (
    total_data.loc[:, "energy_saved"]
)

fig = px.box(total_data, x="year", y="total_waste_recycled_tonne")
fig.update_traces(quartilemethod="exclusive") 
fig.show()


The Box Plot of total energy saved is all over the place as some of the material produce higher energy kWh per metric tonne.

In [None]:
fig = px.box(total_data, x="year", y="total_energy_saved")
fig.update_traces(quartilemethod="exclusive") 
fig.show()

We can interact more with our data and look for patter in multilevel scatter plot. As we can see total energy saved from paper and plastic have significantly reduce in past few years due to government initiative to control the waste produce.

In [None]:
fig = px.scatter(
    total_data,
    x="year",
    y="total_energy_saved",
    size="total_waste_recycled_tonne",
    color="material",
    size_max=60,
)
fig.show()


In [None]:
total_data.energy_saved.value_counts()

## Energy saved per year

its time to calculate energy saved every year from 2003 to 2020 based on five waste types, plastics, paper, glass, ferrous and non-ferrous metal. 

- Group by per year
- Summarize and extract total energy saved
- Converting it into Pandas dataframe
- Converting `total_energy_saved`from float to integer

In [None]:
annual_energy_savings = pd.DataFrame(
    total_data.groupby(by=["year"]).sum()["total_energy_saved"],
    columns=["total_energy_saved"],
    ).astype({"total_energy_saved": int})


In [None]:
annual_energy_savings["total_energy_saved"] = (
    round(annual_energy_savings["total_energy_saved"] / 1000000, 2)\
    .astype(str) + " GWh"
)
annual_energy_savings.tail()


# Final Thoughts
We have cleaned our data and made sure that it's ready for merging with other datasets. We have also learned how to detect anomalies in datasets and creating new features. This project was simple, but it taught us a lot of things about data cleaning and data visualization. 

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=5e1ff06e-9b70-4e7d-a5f8-8d4643e3b557' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>