# 3. Main contaminants emitted
In the Exploratory Data Analysis, we've realized that some of the names of contaminants are inconsistent. In addition, contaminants were not classified in main categories. This should be addressed in order to build an analysis of the main pollutants.

In this section:
* we review the list of contaminants to make it more consistent and add relevant categories
* we assess the breakdown of air pollution by contaminant
* we identify the 3 main types of pollutants emitted
* we identify the main facilities that emitted these polluants. 

In [1]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import plotly.offline as pyo
import plotly.graph_objs as go
from plotly import tools
import plotly.figure_factory as ff

In [2]:
# display the plotly plots inside the Jupyter notebook
pyo.init_notebook_mode(connected=True)

In [3]:
# load data
path = " "
df = pd.read_csv(path+"largest-emissions-in-lbs.csv")
df.head()

Unnamed: 0,report_id,Event began,Event ended,Regulated entity RN number,Regulated entity name,Type(s) of air emissions event,County,contaminant,authorization,limit,quantity,units
0,266261,2017-08-27 00:00:00,2017-09-06 00:00:00,RN103919817,CHEVRON PHILLIPS CHEMICAL CEDAR BAYOU PLANT,AIR SHUTDOWN,HARRIS,Carbon Monoxide,1504A,1892.04 LBS/HR,244040.0,lbs (est.)
1,266378,2017-08-29 11:50:00,2017-09-01 11:50:00,RN100217389,FLINT HILLS RESOURCES PORT ARTHUR FACILITY,AIR SHUTDOWN,JEFFERSON,Carbon Monoxide,No specific Authorization,0.0,240000.0,lbs (est.)
2,267078,2017-09-08 08:00:00,2017-09-22 08:00:00,RN100217389,FLINT HILLS RESOURCES PORT ARTHUR FACILITY,AIR STARTUP,JEFFERSON,Ethylene (gaseous),No specific authorization,0.0,150000.0,lbs (est.)
3,267078,2017-09-08 08:00:00,2017-09-22 08:00:00,RN100217389,FLINT HILLS RESOURCES PORT ARTHUR FACILITY,AIR STARTUP,JEFFERSON,Carbon Monoxide,No specific authorization,0.0,150000.0,lbs (est.)
4,266566,2017-09-01 05:00:00,2017-10-01 05:00:00,RN100221662,EQUISTAR CORPUS CHRISTI PLANT,AIR STARTUP,NUECES,Carbon Monoxide,Permit 83864,721.67 LBS/HR,121000.0,lbs (est.)


## Revised and categorised list of contaminants
As the designation of contaminants was not consistent in the dataset, I've reviewed it to provide a more consistent designation when relevant, as well as categories of pollutants. In order to identify relevant categories, I've tried to remember my courses on air pollution and chemistry. As there are 161 components listed, there was still components (with significant quantities) I could not categorise. Not being an expert in air pollution by petrochemical industry, I've asked a friend who is, to help me finish this task!

-> The result is available in the "list_of_contaminants_v3.csv" file.

In [4]:
df[df['contaminant']=='Carbon Dioxide']

Unnamed: 0,report_id,Event began,Event ended,Regulated entity RN number,Regulated entity name,Type(s) of air emissions event,County,contaminant,authorization,limit,quantity,units
1350,266110,2017-09-04 10:00:00,2017-09-05 16:10:00,RN100218973,FORMOSA POINT COMFORT PLANT,AIR STARTUP,CALHOUN,Carbon Dioxide,19168/PSDTX1226,761.65 LBS/HR,1.0,lbs (est.)
1405,266116,2017-09-04 07:49:00,2017-09-04 12:21:00,RN100218973,FORMOSA POINT COMFORT PLANT,AIR STARTUP,CALHOUN,Carbon Dioxide,19168/PSDTX1226,761.65 LBS/HR,0.01,lbs (est.)


Carbon dioxide (CO2) is the main pollutant emitted, and it's not in the same order of magnitude as the rest of pollutans emitted. Usually it accounts for 98% of the pollution. Here almost no facility reported this pollutant, and we categorise it as "Other material". 

In [5]:
# load categorised list of contaminants
df_conta = pd.read_csv(path+"../4_analyses/contaminants/list_of_contaminants_v3.csv", sep = ";")
df_conta.head()

Unnamed: 0,ID,contaminant,consistent_designation,carbon no (for hydrocarbon),sub-category,category
0,25,Butane,Butanes,C4,Hydrocarbons,"VOCs, solvents & hydrocarbons"
1,26,"Butane, N-",Butanes,C4,Hydrocarbons,"VOCs, solvents & hydrocarbons"
2,27,"Butane, i",Butanes,C4,Hydrocarbons,"VOCs, solvents & hydrocarbons"
3,28,Butanes,Butanes,C4,Hydrocarbons,"VOCs, solvents & hydrocarbons"
4,156,n-butane,Butanes,C4,Hydrocarbons,"VOCs, solvents & hydrocarbons"


In [6]:
# add the category column to the main dataframe
df = pd.merge(df,df_conta[['contaminant','category']], on = ['contaminant'])
df.tail()

Unnamed: 0,report_id,Event began,Event ended,Regulated entity RN number,Regulated entity name,Type(s) of air emissions event,County,contaminant,authorization,limit,quantity,units,category
1450,266269,2017-08-27 10:45:00,2017-09-07 10:45:00,RN100224815,PASADENA TERMINAL,EMISSIONS EVENT,HARRIS,t-octene-2,NSR 5171,0.0,0.9473,lbs (est.),"VOCs, solvents & hydrocarbons"
1451,266556,2017-08-30 08:00:00,2017-08-31 20:00:00,RN100224815,PASADENA TERMINAL,EMISSIONS EVENT,HARRIS,t-octene-2,NSR 5171,0.0,0.151,lbs (est.),"VOCs, solvents & hydrocarbons"
1452,266269,2017-08-27 10:45:00,2017-09-07 10:45:00,RN100224815,PASADENA TERMINAL,EMISSIONS EVENT,HARRIS,2-methylnaphthalene,NSR 5171,0.0,0.2932,lbs (est.),"VOCs, solvents & hydrocarbons"
1453,266556,2017-08-30 08:00:00,2017-08-31 20:00:00,RN100224815,PASADENA TERMINAL,EMISSIONS EVENT,HARRIS,2-methylnaphthalene,NSR 5171,0.0,0.0467,lbs (est.),"VOCs, solvents & hydrocarbons"
1454,266116,2017-09-04 07:49:00,2017-09-04 12:21:00,RN100218973,FORMOSA POINT COMFORT PLANT,AIR STARTUP,CALHOUN,Ethyl mercaptan,19168/PSDTX1226,761.65 LBS/HR,0.02,lbs (est.),Other material


In [7]:
df['category'].nunique()

7

In [8]:
df['category'].unique()

array(['Carbon Monoxide', 'VOCs, solvents & hydrocarbons',
       'Sulfur dioxide', 'Propylene and propylene oxide',
       'Particulate Matter', 'NOx', 'Other material'], dtype=object)

As seen with the expert, main pollutants from combustion and hydrocarbon storage are:
* carbon dioxide (CO2),
* sulfur dioxide (SO2),
* nitrogen oxides (NOx),
* particulate matter (PM),
* carbon monoxide (CO),
* hydrocarbons (HC) / Volatile Organic Compounds (VOC),
* and specific pollutants (such as heavy metals, ammonia).



In the category called « VOCs, solvents & hydrocarbons », we include the following components:
* Methane (most of the natural gas)
* Hydrocarbons
* Solvents
* VOCs: Volatile Organic Compounds can be solvents, hydrocarbons or other VOCs.

NB: VOCs, solvents & hydrocarbons come from fugitive emissions (from damaged tanks and pipes) and maybe also incomplete combustion.

In [9]:
# check what's in "Other material"
df[df['category']=='Other material'].groupby(by=['contaminant']).sum().sort_values(by='quantity',ascending = False)

Unnamed: 0_level_0,report_id,quantity
contaminant,Unnamed: 1_level_1,Unnamed: 2_level_1
Mineral Spirit,267578,11175.39
Hydrogen Sulfide,20250665,9003.6502
Hydrogen,1064779,5130.23
Ammonia,2131380,1716.1
Methyl acrylate,267064,1184.0
Methyl Acetate,267064,717.0
2-Ethylhexanol,267679,527.41
Hydrofluoric acid,532839,400.0
Ethyl Methyl Disulfide,532839,400.0
H2S,266643,300.0


## Breakdown of air pollution by contaminant

In [10]:
# assess quantities per contaminant type
df2 = df.groupby(by=['category'])[['quantity']].sum()
df2.columns = ['quantity']
df2.sort_values(by = ['quantity'], ascending = False, inplace = True)
print(df2)

                                   quantity
category                                   
VOCs, solvents & hydrocarbons  2.014639e+06
Carbon Monoxide                1.993716e+06
Sulfur dioxide                 6.365035e+05
NOx                            4.226846e+05
Propylene and propylene oxide  2.343888e+05
Particulate Matter             8.626717e+04
Other material                 3.170353e+04


In [11]:
# draw horizontal bar charts of main pollutants emitted
overall_title = "Harvey-related emission from the chemical plants"
x_title = "Quantities (lbs)"
export_filename = path+"../6_communication/plots/by_contaminant_summary.html"
### --------------------------------------------------------

trace= [go.Bar(
        y=df2.index,
        x=df2['quantity'],
        orientation = 'h'
        )]
layout = go.Layout(title = overall_title, 
                   hovermode = 'closest',
                   xaxis=dict(dict(title=x_title,
                                   domain=[0.25, 1.0], anchor='y1')),
                   yaxis=dict(dict(domain=[0.0, 1.0], anchor='x1')))
fig = go.Figure(data = trace, layout = layout)
pyo.iplot(fig)

# export plot
pyo.plot(fig,filename = export_filename);

Quantities of emitted contaminants are in line with what is expected as pollution breakdown from petrochemical facilities (once CO2 has been put aside).
* VOCs, solvents & hydrocarbons are the most emitted contaminants with approximately 2 million pounds emitted by this category, followed by carbon monoxide with almost 2 million pounds emitted as well.
* Sulfur dioxide accounts for 636,000 pounds, while 422,000 pounds of nitrogen oxides were emitted. 
* Propylene and propylene oxide are specific contaminants that were emitted. This type of components is quite volatile, and might come from fugitive emissions (from damaged tanks and pipes for example). 

## What are the main emitting facilities? 
### VOCs, solvents & hydrocarbons

In [12]:
# assess quantities per contaminant type
df2c = pd.pivot_table(df, values = 'quantity', index = 'Regulated entity name', columns = 'category',
                     aggfunc = 'sum')

In [13]:
# draw horizontal bar chart of selected contaminant by facility
contaminant = 'VOCs, solvents & hydrocarbons'
overall_title = "Contaminant: " + contaminant
x_title = "Quantities (lbs)"
export_filename = path+"../6_communication/plots/contribution_facility_"+contaminant+"_summary.html"
### --------------------------------------------------------
# sort values to be displayed (descending order)
b_df = df2c[[contaminant]].dropna().sort_values(by = [contaminant], ascending = False)
### --------------------------------------------------------
trace= [go.Bar(
        y=b_df.index,
        x=b_df[contaminant],
        orientation = 'h'
        )]
layout = go.Layout(title = overall_title, 
                   hovermode = 'closest',
                   xaxis=dict(dict(title=x_title,
                                   domain=[0.25, 1.0], anchor='y1')),
                   yaxis=dict(dict(domain=[0.0, 1.0], anchor='x1')))
fig = go.Figure(data = trace, layout = layout)
pyo.iplot(fig)

# export plot
pyo.plot(fig,filename = export_filename);

In [14]:
print('Quantity of %s (lbs): %s' % (contaminant,"{:,}".format(b_df[contaminant].sum())))

Quantity of VOCs, solvents & hydrocarbons (lbs): 2,014,639.1369999996


In [15]:
b_df[contaminant].iloc[:9]

Regulated entity name
FLINT HILLS RESOURCES PORT ARTHUR FACILITY               477500.0000
CHEVRON PHILLIPS CHEMICAL CEDAR BAYOU PLANT              277998.0000
MARATHON PETROLEUM TEXAS CITY REFINERY                   175947.5400
PASADENA TERMINAL                                        165948.7077
CHOCOLATE BAYOU PLANT                                    152594.2700
CHEVRON PHILLIPS CHEMICAL SWEENY OLD OCEAN FACILITIES    131181.3500
EQUISTAR CORPUS CHRISTI PLANT                             90950.0000
GALENA PARK TERMINAL                                      76482.0000
FORMOSA POINT COMFORT PLANT                               69814.3300
Name: VOCs, solvents & hydrocarbons, dtype: float64

In [16]:
# We consider only facilities that emit more than 100,000 pounds of contaminant
b_df[contaminant].iloc[:6].sum()/b_df[contaminant].sum()

0.68556688010970601

6 plants account for 68% of emissions from VOCs, solvents & hydrocarbons.

In [17]:
# make the list of main emitting facilities
most_emitting_facilities = list(b_df[contaminant].iloc[:6].index)
print(most_emitting_facilities)

['FLINT HILLS RESOURCES PORT ARTHUR FACILITY', 'CHEVRON PHILLIPS CHEMICAL CEDAR BAYOU PLANT', 'MARATHON PETROLEUM TEXAS CITY REFINERY', 'PASADENA TERMINAL', 'CHOCOLATE BAYOU PLANT', 'CHEVRON PHILLIPS CHEMICAL SWEENY OLD OCEAN FACILITIES']


### Carbon monoxide

In [24]:
# draw horizontal bar chart of selected contaminant by facility
contaminant = 'Carbon Monoxide'
overall_title = "Contaminant: " + contaminant
x_title = "Quantities (lbs)"
export_filename = path+"../6_communication/plots/contribution_facility_"+contaminant+"_summary.html"
### --------------------------------------------------------
# sort values to be displayed (descending order)
b_df = df2c[[contaminant]].dropna().sort_values(by = [contaminant], ascending = False)
### --------------------------------------------------------
trace= [go.Bar(
        y=b_df.index,
        x=b_df[contaminant],
        orientation = 'h'
        )]
layout = go.Layout(title = overall_title, 
                   hovermode = 'closest',
                   xaxis=dict(dict(title=x_title,
                                   domain=[0.25, 1.0], anchor='y1')),
                   yaxis=dict(dict(domain=[0.0, 1.0], anchor='x1')))
fig = go.Figure(data = trace, layout = layout)
pyo.iplot(fig)

# export plot
pyo.plot(fig,filename = export_filename);

In [19]:
print('Quantity of %s (lbs): %s' % (contaminant,"{:,}".format(b_df[contaminant].sum())))

Quantity of Carbon Monoxide (lbs): 1,993,715.7523


In [20]:
b_df[contaminant].iloc[:7]

Regulated entity name
FLINT HILLS RESOURCES PORT ARTHUR FACILITY               390000.00
CHEVRON PHILLIPS CHEMICAL CEDAR BAYOU PLANT              364966.00
VALERO PORT ARTHUR REFINERY                              191900.00
EQUISTAR CORPUS CHRISTI PLANT                            166000.00
CHEVRON PHILLIPS CHEMICAL SWEENY OLD OCEAN FACILITIES    156820.00
CHOCOLATE BAYOU PLANT                                    125249.61
MARATHON PETROLEUM TEXAS CITY REFINERY                   110000.00
Name: Carbon Monoxide, dtype: float64

In [21]:
# We consider only facilities that emit more than 100,000 pounds of contaminant
b_df[contaminant].iloc[:7].sum()/b_df[contaminant].sum()

0.75483960452430032

7 plants account for 75% of emissions from carbon monoxide.

In [22]:
# add the facilities to the list of main emitting facilities
most_emitting_facilities += list(b_df[contaminant].iloc[:7].index)
print(most_emitting_facilities)

['FLINT HILLS RESOURCES PORT ARTHUR FACILITY', 'CHEVRON PHILLIPS CHEMICAL CEDAR BAYOU PLANT', 'MARATHON PETROLEUM TEXAS CITY REFINERY', 'PASADENA TERMINAL', 'CHOCOLATE BAYOU PLANT', 'CHEVRON PHILLIPS CHEMICAL SWEENY OLD OCEAN FACILITIES', 'FLINT HILLS RESOURCES PORT ARTHUR FACILITY', 'CHEVRON PHILLIPS CHEMICAL CEDAR BAYOU PLANT', 'VALERO PORT ARTHUR REFINERY', 'EQUISTAR CORPUS CHRISTI PLANT', 'CHEVRON PHILLIPS CHEMICAL SWEENY OLD OCEAN FACILITIES', 'CHOCOLATE BAYOU PLANT', 'MARATHON PETROLEUM TEXAS CITY REFINERY']


### Sulfur dioxide

In [25]:
# draw horizontal bar chart of selected contaminant by facility
contaminant = 'Sulfur dioxide'
overall_title = "Contaminant: " + contaminant
x_title = "Quantities (lbs)"
export_filename = path+"../6_communication/plots/contribution_facility_"+contaminant+"_summary.html"
### --------------------------------------------------------
# sort values to be displayed (descending order)
b_df = df2c[[contaminant]].dropna().sort_values(by = [contaminant], ascending = False)
### --------------------------------------------------------
trace= [go.Bar(
        y=b_df.index,
        x=b_df[contaminant],
        orientation = 'h'
        )]
layout = go.Layout(title = overall_title, 
                   hovermode = 'closest',
                   xaxis=dict(dict(title=x_title,
                                   domain=[0.25, 1.0], anchor='y1')),
                   yaxis=dict(dict(domain=[0.0, 1.0], anchor='x1')))
fig = go.Figure(data = trace, layout = layout)
pyo.iplot(fig)

# export plot
pyo.plot(fig,filename = export_filename);

In [26]:
print('Quantity of %s (lbs): %s' % (contaminant,"{:,}".format(b_df[contaminant].sum())))

Quantity of Sulfur dioxide (lbs): 636,503.4752000001


In [27]:
b_df[contaminant].iloc[:5]

Regulated entity name
EXXON MOBIL BAYTOWN REFINERY                                 216934.3002
VALERO PORT ARTHUR REFINERY                                  147172.7200
TOTAL PETRO CHEMICALS & REFINING USA PORT ARTHUR REFINERY     67000.0000
FHR CORPUS CHRISTI WEST PLANT                                 60621.4000
SWEENY REFINERY                                               47666.0000
Name: Sulfur dioxide, dtype: float64

In [28]:
# We consider only facilities that emit more than 60,000 pounds of contaminant
b_df[contaminant].iloc[:4].sum()/b_df[contaminant].sum()

0.77254632434723269

4 plants account for 77% of emissions from carbon monoxide.

In [29]:
# add the facilities to the list of main emitting facilities
most_emitting_facilities += list(b_df[contaminant].iloc[:4].index)
print(most_emitting_facilities)

['FLINT HILLS RESOURCES PORT ARTHUR FACILITY', 'CHEVRON PHILLIPS CHEMICAL CEDAR BAYOU PLANT', 'MARATHON PETROLEUM TEXAS CITY REFINERY', 'PASADENA TERMINAL', 'CHOCOLATE BAYOU PLANT', 'CHEVRON PHILLIPS CHEMICAL SWEENY OLD OCEAN FACILITIES', 'FLINT HILLS RESOURCES PORT ARTHUR FACILITY', 'CHEVRON PHILLIPS CHEMICAL CEDAR BAYOU PLANT', 'VALERO PORT ARTHUR REFINERY', 'EQUISTAR CORPUS CHRISTI PLANT', 'CHEVRON PHILLIPS CHEMICAL SWEENY OLD OCEAN FACILITIES', 'CHOCOLATE BAYOU PLANT', 'MARATHON PETROLEUM TEXAS CITY REFINERY', 'EXXON MOBIL BAYTOWN REFINERY', 'VALERO PORT ARTHUR REFINERY', 'TOTAL PETRO CHEMICALS & REFINING USA PORT ARTHUR REFINERY', 'FHR CORPUS CHRISTI WEST PLANT']


### What are the main emitting facilities?

In [30]:
df_most_emitting_facilities  = pd.DataFrame(most_emitting_facilities)
df_most_emitting_facilities.drop_duplicates(inplace = True)
print(df_most_emitting_facilities)

                                                    0
0          FLINT HILLS RESOURCES PORT ARTHUR FACILITY
1         CHEVRON PHILLIPS CHEMICAL CEDAR BAYOU PLANT
2              MARATHON PETROLEUM TEXAS CITY REFINERY
3                                   PASADENA TERMINAL
4                               CHOCOLATE BAYOU PLANT
5   CHEVRON PHILLIPS CHEMICAL SWEENY OLD OCEAN FAC...
8                         VALERO PORT ARTHUR REFINERY
9                       EQUISTAR CORPUS CHRISTI PLANT
13                       EXXON MOBIL BAYTOWN REFINERY
15  TOTAL PETRO CHEMICALS & REFINING USA PORT ARTH...
16                      FHR CORPUS CHRISTI WEST PLANT


In [31]:
df_most_emitting_facilities.columns = ['Regulated entity name']

In [32]:
len(df_most_emitting_facilities)

11

11 facilities emit most of the main pollutants (around 70%). Let's check where are these facilities located.

In [33]:
# identify the location of these facilities
df_most_emitting_facilities = pd.merge(df_most_emitting_facilities,df[['Regulated entity name','County']].drop_duplicates(), how = 'inner', on = ['Regulated entity name'])

In [34]:
print(df_most_emitting_facilities.sort_values(by=['County']))

                                Regulated entity name     County
4                               CHOCOLATE BAYOU PLANT   BRAZORIA
5   CHEVRON PHILLIPS CHEMICAL SWEENY OLD OCEAN FAC...   BRAZORIA
2              MARATHON PETROLEUM TEXAS CITY REFINERY  GALVESTON
1         CHEVRON PHILLIPS CHEMICAL CEDAR BAYOU PLANT     HARRIS
3                                   PASADENA TERMINAL     HARRIS
8                        EXXON MOBIL BAYTOWN REFINERY     HARRIS
0          FLINT HILLS RESOURCES PORT ARTHUR FACILITY  JEFFERSON
6                         VALERO PORT ARTHUR REFINERY  JEFFERSON
9   TOTAL PETRO CHEMICALS & REFINING USA PORT ARTH...  JEFFERSON
7                       EQUISTAR CORPUS CHRISTI PLANT     NUECES
10                      FHR CORPUS CHRISTI WEST PLANT     NUECES


The 11 most emitting facilities are located in 5 counties, as summarised in the below map.
![title](img/most_emitting_facilities.png)

## Contaminant emissions by facility

In [35]:
# assess quantities per contaminant type and per facility
df3 = pd.pivot_table(df, values = 'quantity', index = 'category', columns = 'Regulated entity name',
                     aggfunc = 'sum')

In [36]:
# draw horizontal bar chart of pollutants emitted by facility
# select the facility
facility = 'ARKEMA CROSBY PLANT' #df3.columns[0]
### --------------------------------------------------------
overall_title = "Contaminants - "+ facility
x_title = "Quantities (lbs)"
### --------------------------------------------------------
# select only emitted contaminants to be displayed
b_df = df3[[facility]].dropna().sort_values(by = [facility], ascending = False)
### --------------------------------------------------------
trace= [go.Bar(
        y=b_df.index,
        x=b_df[facility],
        orientation = 'h'
        )]
layout = go.Layout(title = overall_title, 
                   hovermode = 'closest',
                   xaxis=dict(dict(title=x_title,
                                   domain=[0.25, 1.0], anchor='y1')),
                   yaxis=dict(dict(domain=[0.0, 1.0], anchor='x1')))
fig = go.Figure(data = trace, layout = layout)
pyo.iplot(fig)

In [37]:
# to create the plot for each facility
for facility in df3.columns:
#for facility in df3.columns[:3]: #select only the first 3 facilities
    ### --------------------------------------------------------
    overall_title = "Contaminants - "+ facility
    x_title = "Quantities (lbs)"
    export_filename = path+"../6_communication/plots/by_contaminant_facility_"+facility+".html"
    ### --------------------------------------------------------
    # select only emitted contaminants to be displayed
    b_df = df3[[facility]].dropna().sort_values(by = [facility], ascending = False)
    ### --------------------------------------------------------
    trace= [go.Bar(
            y=b_df.index,
            x=b_df[facility],
            orientation = 'h'
            )]
    layout = go.Layout(title = overall_title, 
                       hovermode = 'closest',
                       xaxis=dict(dict(title=x_title,
                                       domain=[0.25, 1.0], anchor='y1')),
                       yaxis=dict(dict(domain=[0.0, 1.0], anchor='x1')))
    fig = go.Figure(data = trace, layout = layout)
    pyo.plot(fig,filename =  export_filename)

# 4. Conclusion
We've reviewed and analysed the emissions of contaminants from petrochemical facilities impacted by hurricane Harvey. Although the data had already been scrapped and pre-processed by BuzzFeed News team, the list of contaminants needed some further treatment to be able to extract meaningful insights about the main pollutants emitted. This took a large amount of my time dedicated to this project.

With more time available, what could be further addressed includes:
* Check levels of Harvey-related emissions with exceedance pollution thresholds.
* Compare Harvey-related pollution to the baseline pollution for each type of contaminants.
* Assess emissions of the main contaminants against time, as in this study, we focus only on the whole period. We could check for example if emissions were the highest close to Harvey landfall, and were more spread once startups occured. 