# Overview of the EIE Analytics Challenge

**DS4G: Environmental Insights Explorer**
 * Exploring alternatives for emissions factor calculations

Current [emissions factors methodologies](https://www.epa.gov/air-emissions-factors-and-quantification/basic-information-air-emissions-factors-and-quantification#About%20Emissions%20Factors) are based on time-consuming data collection and may include errors derived from a lack of access to granular datasets, inability to refresh data on a frequent basis, overly general modeling assumptions, and inaccurate reporting of emissions sources like fuel consumption.  This begs the question: What if there was a different way to calculate or measure emissions factors? We’re challenging the Kaggle community to see if it’s possible to use remote sensing techniques to better model emissions factors. You will develop a methodology to calculate an [average historical emissions factor](https://www.eia.gov/tools/faqs/faq.php?id=74&t=11) for electricity generation in a sub-national region.

Participants will be tasked with developing a methodology to calculate an average annual historical emissions factor for the sub-national region. Participants will also be asked to provide an explanation of the conditions that would result in a higher/lower emissions factor, as well as a recommendation for how the methodology could be applied to calculate the emissions factor of electricity for another geospatial area using similar techniques. Bonus points will be awarded for smaller time slices of the average historical emissions factors, such as one per month for the 12-month period, and additional bonus points will be awarded for participants that develop methodologies for calculating [marginal emissions factors](https://www.bloomenergy.com/sites/default/files/watttime_the_rocky_mountain_institute.pdf) for the sub-national region.

[The general equation for emissions estimation is:
](https://www.epa.gov/air-emissions-factors-and-quantification/basic-information-air-emissions-factors-and-quantification#About%20Emissions%20Factors)
# E = A x EF x (1-ER/100)

where:

E = emissions;
A = activity rate;
EF = emission factor, and
ER =overall emission reduction efficiency, %

therefore

# EF = E / [A x (1-ER/100)]


To simplify things a bit, I'll reduce that equation to: EF = E / A 
* Simplified Emissions Factor = Emissions / Activity Rate

Which again can be similified to the following:
* Simplified Emissions Factor = (Measure of NO2 emissions) / (Quanity of electricity generated)



**We can find a "measure of NO2 emissions" from the Sentinel-5p dataset.  Likewise, we can find a measure of (quanitity of electricity generated) from the power plant dataset.  To demonstrate how to get started with the data, we will load and preview those two data sources and then we will calculate a simplified emissions factor for a single power plant on the island of Vieques in Puerto Rico.**

# Preview the power plant data:

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import rasterio as rio
import folium
import tifffile as tiff 

        
def plot_points_on_map(dataframe,begin_index,end_index,latitude_column,latitude_value,longitude_column,longitude_value,zoom):
    df = dataframe[begin_index:end_index]
    location = [latitude_value,longitude_value]
    plot = folium.Map(location=location,zoom_start=zoom)
    for i in range(0,len(df)):
        popup = folium.Popup(str(df.primary_fuel[i:i+1]))
        folium.Marker([df[latitude_column].iloc[i],df[longitude_column].iloc[i]],popup=popup).add_to(plot)
    return(plot)

def overlay_image_on_puerto_rico(file_name,band_layer,lat,lon,zoom):
    band = rio.open(file_name).read(band_layer)
    m = folium.Map([lat, lon], zoom_start=zoom)
    folium.raster_layers.ImageOverlay(
        image=band,
        bounds = [[18.6,-67.3,],[17.9,-65.2]],
        colormap=lambda x: (1, 0, 0, x),
    ).add_to(m)
    return m

def split_column_into_new_columns(dataframe,column_to_split,new_column_one,begin_column_one,end_column_one):
    for i in range(0, len(dataframe)):
        dataframe.loc[i, new_column_one] = dataframe.loc[i, column_to_split][begin_column_one:end_column_one]
    return dataframe

In [None]:
power_plants = pd.read_csv('/kaggle/input/ds4g-environmental-insights-explorer/eie_data/gppd/gppd_120_pr.csv')
power_plants = split_column_into_new_columns(power_plants,'.geo','latitude',50,66)
power_plants = split_column_into_new_columns(power_plants,'.geo','longitude',31,48)
power_plants['latitude'] = power_plants['latitude'].astype(float)
a = np.array(power_plants['latitude'].values.tolist()) # 18 instead of 8
power_plants['latitude'] = np.where(a < 10, a+10, a).tolist() 
lat=18.200178; lon=-66.664513
plot_points_on_map(power_plants,0,425,'latitude',lat,'longitude',lon,9)

# Preview the NO2 emissions data:

In [None]:
image = '/kaggle/input/ds4g-environmental-insights-explorer/eie_data/s5p_no2/s5p_no2_20180708T172237_20180714T190743.tif'
latitude=18.1429005246921; longitude=-65.4440010699994
overlay_image_on_puerto_rico(image,band_layer=1,lat=latitude,lon=longitude,zoom=8)

# Zoom in on an individual power plant:

In [None]:
lat=18.1429005246921; lon=-65.4440010699994
plot_points_on_map(power_plants,0,425,'latitude',lat,'longitude',lon,12)

In [None]:
image = '/kaggle/input/ds4g-environmental-insights-explorer/eie_data/s5p_no2/s5p_no2_20180708T172237_20180714T190743.tif'
latitude =18.1429005246921; longitude =-65.4440010699994
overlay_image_on_puerto_rico(image,band_layer=1,lat=latitude,lon=longitude,zoom=12)

**Look up the estimated_generation_gwh value for the power plant on the island of Vieques:**

In [None]:
power_plants_df = power_plants.sort_values('capacity_mw',ascending=False).reset_index()
power_plants_df[['name','latitude','longitude','primary_fuel','capacity_mw','estimated_generation_gwh']][29:30]
quantity_of_electricity_generated = power_plants_df['estimated_generation_gwh'][29:30].values
print('Quanity of Electricity Generated: ', quantity_of_electricity_generated)

**Look up the NO2_emissions value for the region surrounding that same power plant:**

In [None]:
# This is just an example to illustrate that you can extract numerical values from .tiff files
# Ideally you would limit to only the bands that are related to NO2 emissions
# Likewise you might want to limit the data to only the region of interest
average_no2_emission = [np.average(tiff.imread(image))]
print('Average NO2 emissions value: ', average_no2_emission)

# Calculate a simplified emissions factor
* Simplified Emissions Factor = (Measure of NO2 emissions) / (Quanity of electricity generated)


In [None]:
simplified_emissions_factor = float(average_no2_emission/quantity_of_electricity_generated)
print('Simplified emissions factor (S.E.F.) for a single power plant on the island of Vieques =  \n\n', simplified_emissions_factor, 'S.E.F. units')

To win the Kaggle competition you should do this with more rigor and with less simplifications and you should also explain why you think that your methodology is accurate.

# HOW TO PARTICIPATE

To make a submission, complete the [submission form](https://www.kaggle.com/page/environmental-insights-explorer-submission-form). Only one submission will be judged per participant, so if you make multiple submissions we will only review the most recent entry.

**To be valid, a submission must be contained in one notebook, made public on or before the submission deadline. Participants are free to use any datasets in addition to the [official Kaggle dataset](https://www.kaggle.com/c/ds4g-environmental-insights-explorer), but those datasets must also be publicly available on either Earth Engine or Kaggle for the submission to be valid.**

# SUBMISSIONS WILL BE EVALUATED ON THE FOLLOWING

**Documentation** 

* Is the code documented in a way that is easily reproducible (i.e. thorough comments, organized notebook, clear scripts/code)?
* Does the notebook narrative clearly state all assumptions that are factored into the value and the potential impact of fluctuations? (i.e. Which plants / fuel types did the author use for their analysis, and why?)
* Does the notebook contain data visualizations (e.g. time graphs, etc.) that help convey the author’s findings and/or recommendations? 
* Did the author upload and properly cite all files for any supporting datasets that were used for their analysis?


**Recommendation**

* Did the author write a compelling and coherent narrative explaining their rationale for the scalability and accuracy of their model and recommendation?
* Does the recommendation include an explanation of what data/assumptions could be substituted to produce the value for another geospatial area or location?
* Is there documentation about the pros and cons of the model, and the geographic nuances that may have impacted the emissions factor?
* Does the explanation convey why/how this model improves current emissions factors calculations? 
* Does the recommendation indicate other datasets/factors/assumptions (and why) that could be useful to include to make future emissions factor calculation methodologies more robust?


**Accuracy**

* Does the model produce a value for the an annual average historical grid-level electricity emissions factor (based on rolling 12-months of data from July 2018 - July 2019) for the sub-national region?
* Bonus points for smaller time slices of the average historical emissions factors, such as one per month for the 12-month period.
* Bonus points for participants that develop a methodology to calculate a marginal emissions factor for the sub-national region using the provided datasets. 


# PRIZES

* First Prize: USD 10,000
* Second Prize: USD 7,500
* Third Prize: USD 5,000
* Fourth Prize: USD 2,500



**More getting started material is available here:**
* [Explore Image Metadata](https://www.kaggle.com/paultimothymooney/explore-image-metadata-s5p-gfs-gldas)
 - bounding boxes, band names, etc
* [How to get started with the Earth Engine data](https://www.kaggle.com/paultimothymooney/how-to-get-started-with-the-earth-engine-data/) 
 - connect to earthengine-API, load and preview data, etc