# Research Question & Data Collection
With our project we want to compare the environmental impacts of the fashion industry across countries, particularly paying attention to the differences observed between developed and developing countries. To do this we found data from the non-profit Climate Watch that lists greenhouse gas emissions by country, by year, over a number of industries. Using this data we will answer the following question: **How does the impact of the fast fashion industry vary across countries based on the development of a country?**

The data we collected is from five CSV files beings: agriculture, energy, waste, industrial-processes, and bunker-fuels." These file contain data that shows MtCO2 emissions for almost 200 countires for years between 1991 and 2018. Some files contain data before this windown, but to keep our obesrvations consistent we will not take these values into account. 

In [None]:
## load libraries

## our old friends...
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## ...and some new ones
import seaborn as sns


# Data Cleaning, Data Description, Beginnings of EDA

Below we load the CSV files to python variables. The CSV files contained values that made jupyter interperut that columns values as strings. 
Using the na_value paramter we converted these cells into na values so that the columns could be processed at floats without their summary statistic values being affected by these unusable cells.

#Load CSVs
agriculture=pd.read_csv('agriculture.csv', na_values={'false','False','FALSE'})
energy=pd.read_csv('energy.csv',  na_values={'false','False','FALSE'})
waste=pd.read_csv('waste.csv',   na_values={'false','False','FALSE'})
industrial=pd.read_csv('industrial-processes.csv',  na_values={'false','False','FALSE'})
bunker_fuels=pd.read_csv('bunker-fuels.csv',  na_values={'false','False','FALSE'})

#### Agriculture
We have chosen to look at agriculture emissions since cotton produciton is both a major source of these emissions as well as a material used in lots of fashion production. We use the .head() and .describe() to give us a brief overview of our data, and .dtype to let us know that the CSV file was read properly. 

In [None]:
agriculture.head()

In [None]:
agriculture.describe()

In [None]:
agriculture.dtypes

#### Bunker Fuel
We have chosen to look at bunker fuel emissions since this fuel source is mostly used for large shipping containers which could be indictive of how fashion products are shipped. We use the .head() and .describe() to give us a brief overview of our data, and .dtype to let us know that the CSV file was read properly. 

In [None]:
bunker_fuels.head()

In [None]:
bunker_fuels.describe()

In [None]:
bunker_fuels.dtypes

#### Energy
We have chosen to look at energy emissions as a proxy for industrialziaiton, which could help us find relationship between development, energy, and clothing producvtion. We use the .head() and .describe() to give us a brief overview of our data, and .dtype to let us know that the CSV file was read properly. 

In [None]:
energy.head()

In [None]:
energy.describe()

In [None]:
energy.dtypes

#### Industrial Processes
Clothing production is an industrial process which is why we have included this cataegory of emissions. We use the .head() and .describe() to give us a brief overview of our data, and .dtype to let us know that the CSV file was read properly. 

In [None]:
industrial.head()

In [None]:
industrial.describe()

In [None]:
industrial.dtypes

#### Waste
Fashion production generates a lot of waste which is why we look at emissions from this sector. We use the .head() and .describe() to give us a brief overview of our data, and .dtype to let us know that the CSV file was read properly. 

In [None]:
waste.head()

In [None]:
waste.describe()

In [None]:
waste.dtypes

# More Exploratory Data Analysis

A good starting point for this data is looking at emissions over time. Not only has fast fashion uptake increased with globalization, but this impact has not been spread evenly by each country. In the following Seaborn plots, 2018-versus-1991 emissions are plotted on each axis, with each individual point representing a specific country.

In [2]:
# agriculture plot
agricultureplot = sns.scatterplot(data = agriculture, x = '1991', y = '2018') 

NameError: name 'sns' is not defined

In [None]:
# bunker fuel (shipping) plot
shippingplot = sns.scatterplot(data = bunker_fuels, x = '1991', y = '2018')

In [None]:
# energy plot
energyplot = sns.scatterplot(data = energy, x = '1991', y = '2018')

In [None]:
# industrial ghg emission plot
industrialplot = sns.scatterplot(data = industrial, x = '1991', y = '2018')

In [3]:
# waste plot
wasteplot = sns.scatterplot(data = waste, x = '1991', y = '2018')

NameError: name 'sns' is not defined

While useful, there are some outliers in this plot that are concentrating some of the points; let's remove them. In almost all cases, it’s China (making sense as a result of their massive growth in recent decades), but in the case of bunker fuels, it’s actually Singapore (as a global trading hub). Let's do that now:

In [None]:
#agriculture plot, again
newagplot = sns.scatterplot(data = agriculture[agriculture['Country/Region'] != 'China'], 
                            x = '1991', y = '2018')

In [None]:
#shipping plot, again
newbfplot = sns.scatterplot(data = bunker_fuels[bunker_fuels['Country/Region'] != 'Singapore'], 
                            x = '1991', y = '2018')

In [None]:
#energy plot, again
newegplot = sns.scatterplot(data = energy[energy['Country/Region'] != 'China'], 
                            x = '1991', y = '2018')

In [4]:
#new industrial plot, again
newidplot = sns.scatterplot(data = industrial[industrial['Country/Region'] != 'China'],
                            x = '1991', y = '2018')

NameError: name 'sns' is not defined

In [None]:
# new waste plot, again
newwsplot = sns.scatterplot(data = waste[waste['Country/Region'] != 'China'],
x = '1991', y = '2018')

This is interesting! While many of the relationships seem linear (good for exploring later), a good piece of info that proves we might be on the right track might come from looking at the industrial GHG emission scatterplot.

Many of the top-facing "outliers" in the plot (like India, South Korea, Thailand, Cameroon, and Vietnam) are actually top clothes-producing countries! This suggests that fashion is (in part) likely contributing to the rise in industrial emissions there, outpacing the global "trend line."