In [78]:
import pandas as pd
import altair as alt
import requests
import io

# GreenHouse Gas Emission from Supply Chain - 2023

The source of my data is from the United States Environmental Protection Agency (US EPA) website. 
#### **Source: https://catalog.data.gov/dataset/supply-chain-greenhouse-gas-emission-factors-v1-2-by-naics-6**
My dataset is related to the Greenhouse Gas (GHG) emission related to supply chain. <br>
The key attributes of my data are the supply chain emission factors for each category classified by the NAICS (North American Industry Classification System). 

**About my Data:** Environmental impacts from supply chains include water pollution, toxic waste, deforestation, air emission and greenhouse gas emission. According to US EPA, supply chains account for more than 90% of the GHG emissions. Because supply chains consume resources in large quantities, they cause a lot of carbon emissions.

#### A sneak peak on the data

In [79]:
ghg_url = "https://raw.githubusercontent.com/rupakrish78/GHG-Astair/main/SupplyChainGHGEmissionFactors_v1.2_NAICS_CO2e_USD2021.csv"
ghg_supply = requests.get(ghg_url).content

data = pd.read_csv (io.StringIO(ghg_supply.decode('utf-8')))
data.head()

Unnamed: 0,2017 NAICS Code,2017 NAICS Title,GHG,Unit,Supply Chain Emission Factors without Margins,Margins of Supply Chain Emission Factors,Supply Chain Emission Factors with Margins,Reference USEEIO Code
0,562212,Solid Waste Landfill,All GHGs,"kg CO2e/2021 USD, purchaser price",10.989,0.0,10.989,562212
1,327310,Cement Manufacturing,All GHGs,"kg CO2e/2021 USD, purchaser price",3.768,0.09,3.858,327310
2,112111,Beef Cattle Ranching and Farming,All GHGs,"kg CO2e/2021 USD, purchaser price",3.227,0.071,3.298,1121A0
3,112112,Cattle Feedlots,All GHGs,"kg CO2e/2021 USD, purchaser price",3.227,0.071,3.298,1121A0
4,112130,Dual-Purpose Cattle Ranching and Farming,All GHGs,"kg CO2e/2021 USD, purchaser price",3.227,0.071,3.298,1121A0


**Goal:** With the concern of air pollution everywhere, it calls for the necessity of finding what materials / food items factor into the increase in the emission.
<br>
According to EIA “This imbalance between greenhouse gas emissions and the ability for natural processes to absorb those emissions has resulted in a continued increase in atmospheric concentrations of greenhouse gases.” 
<br>
Supply chains are one of the main contributors to the greenhouse gas emissions. They are supposed to account for more than 80% of the emissions.
<br>
My objective is to find which supply chain emits most carbon equivalents per kilogram of material and how it is impacting the environment. This will help figure out a good sustainability approach for those products to help reduce the emissions

**Workflow:** I am considering a bar graph which shows
- the material on the x-axis and
- the supply chain emission factor / kg CO2 (One kg of CO2 equivalents is equivalent to the effect of one kg of CO2 emission) on the y-axis.

In [80]:
bargraph = alt.Chart (data).mark_bar().encode (
    alt.X ("2017 NAICS Title", sort ='-y'),
    y = "Supply Chain Emission Factors with Margins"
)

text = bargraph.mark_text (baseline = "middle", align = "left", color = "black", angle = 270).encode (text = "Supply Chain Emission Factors with Margins")

bargraph + text

I created the graph with all the supply chain categories and realized that one of the categories (**Solid Waste**) was an outlier (Graph above).  While all the other categories were between 0 and 4, Solid Waste category was close to 11.
<br>
Based on this graph, we can see that **Solid Waste** is the main reason for the greenhouse gas emission in the US.  
<br>
We can also see that **Cement manufacturing** is the next big contributor to the greenhouse gas followed closely by **Beef/Cattle Ranching and Farming**.

### Emissions caused by food industry

The next thing I wanted to check was how much emissions was caused by the food industry, which accounts to about a third of the greenhouse emission contributions.
<br>
The source of my data is The food consumption report from github. The user had the data cleaned up and I could use them for my analysis. For my analysis, I decided to only concentrate on the USA data, since my supply chain analysis was based on that.
<br>  
#### **Source: https://github.com/brendanoct/food-cons-co2/blob/main/food_consumption.csv**

In [81]:
food_url = "https://raw.githubusercontent.com/rupakrish78/GHG-Astair/main/foodusa.csv"
ghg_food = requests.get(food_url).content

food_data = pd.read_csv (io.StringIO(ghg_food.decode('utf-8')))
food_data.head()

Unnamed: 0,country,food_category,consumption,co2_emmission
0,USA,Beef,36.24,1118.29
1,USA,Milk - inc. cheese,254.69,362.78
2,USA,Pork,27.64,97.83
3,USA,Poultry,50.01,53.72
4,USA,Fish,12.35,19.72


**Workflow:** I am considering a bar graph which shows
- the food category on the x-axis and
- the carbondioxide emissions on the y-axis.

In [82]:
foodgraph = alt.Chart (food_data).mark_bar().encode (
    alt.Y  ("food_category", sort ='-x'),
    x = "co2_emmission"
)

foodtext = foodgraph.mark_text (baseline = "middle", align = "left", color = "black").encode (text = "co2_emmission")

foodgraph + foodtext

Based on this graph, we can see that the **Beef Industry** is the main reason for the food emission in the US.  
<br>
We can also see that **Milk and Cheese** is the next big contributor to the greenhouse gas followed by **Pork**.

**Inference/Conclusion:** We noticed in the first graph that **Beef/Cattle Ranching and Farming** was amongst the top 3 contributors to the greenhouse gas emissions. When we just took the food industry, we noticed that they are the **highest** contributors to the greenhouse gas emissions.