This notebook is a part of the solution for DSG: EIE competition. The solution splited into 4 parts. Here is the list of notebook in correct order. The part of solution you are currently reading is highlighted in bold.

[1. Introduction to the solution of DSG: EIE](https://www.kaggle.com/niyamatalmass/1-introduction-to-the-solution-of-dsg-eie)

[**2. Sub-region and State wise E.F and Evaluation**](https://www.kaggle.com/niyamatalmass/2-sub-region-and-state-wise-e-f-and-evaluation)

[3. Individual Power plant E.F and Evaluation](https://www.kaggle.com/niyamatalmass/3-individual-power-plant-e-f-and-evaluation)

[4. Final thoughts, recommendation](https://www.kaggle.com/niyamatalmass/4-final-thoughts-recommendation)
***
<br/>

<h1 align="center"><font color="#5831bc" face="Comic Sans MS">Sub-region and State wise E.F and Evaluation</font></h1> 

# Notebook overview
This notebook implements the methodology to calculate the emission factor from power generation in sub-region and states level. In the first notebook of my solution, I describe the basic theory part of the methodology. In this notebook, I will describe the theory in more detail and implement the theory. After that, I will evaluate the results and see how our model perform in a real-world example. 

In [None]:
##### 
# importing necessary libraries
####

import numpy as np
import math
import pandas as pd
from scipy.ndimage import gaussian_filter 
import glob 
import os
import time
from tqdm import tqdm_notebook as tqdm

import geopandas 
import rasterio as rio
import folium
import tifffile as tiff

import ee
from kaggle_secrets import UserSecretsClient
from google.oauth2.credentials import Credentials

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from plotly.offline import init_notebook_mode
init_notebook_mode(connected=True)


import matplotlib.pyplot as plt
%matplotlib inline

# Abbreviation
* GEE - Google Earth Engine
* TVCD - Troposporic vertical column density
* E.F - Emission Factor 
* AOI - Area Of Interest

# Earth engine overview and why?
We will extensively use Google Earth Engine(GEE) for our research and implementation. Because Google Earth Engine is the most advanced cloud-based geospatial processing platform in the world! The purpose of the Earth Engine is to:

* Perform highly-interactive algorithm development at a global scale
* Push the edge of the envelope for big data in remote sensing
* Enable high-impact, data-driven science
* Make substantive progress on global challenges that involve large geospatial datasets

Because in our kaggle kernel we don't have enough compute resources, we will rely on GEE backend for our analysis. To understand the technical details and implementation, having some background in GEE will help. But I will try my best to explain important parts of the implementation. First, we have to authenticate and initialize our GEE python API. 

In [None]:
# Trigger the authentication flow. 
# ee.Authenticate()

In [None]:
# !cat ~/.config/earthengine/credentials

In [None]:
# initializing earth engine

user_secret = "earth_engine_3" # Your user secret, defined in the add-on menu of the notebook editor
refresh_token = UserSecretsClient().get_secret(user_secret)
credentials = Credentials(
        None,
        refresh_token=refresh_token,
        token_uri=ee.oauth.TOKEN_URI,
        client_id=ee.oauth.CLIENT_ID,
        client_secret=ee.oauth.CLIENT_SECRET,
        scopes=ee.oauth.SCOPES)
ee.Initialize(credentials=credentials) 

In [None]:
band_viz = {
  'min': 0,
  'max': 0.0002,
  'palette': ['black', 'blue', 'purple', 'cyan', 'green', 'yellow', 'red']
}

# Define a method for displaying Earth Engine image tiles on a folium map.
def add_ee_layer(self, ee_object, vis_params, name):
    
    try:    
        # display ee.Image()
        if isinstance(ee_object, ee.image.Image):    
            map_id_dict = ee.Image(ee_object).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
            ).add_to(self)
        # display ee.ImageCollection()
        elif isinstance(ee_object, ee.imagecollection.ImageCollection):    
            ee_object_new = ee_object.mosaic()
            map_id_dict = ee.Image(ee_object_new).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
            ).add_to(self)
        # display ee.Geometry()
        elif isinstance(ee_object, ee.geometry.Geometry):    
            folium.GeoJson(
            data = ee_object.getInfo(),
            name = name,
            overlay = True,
            control = True
        ).add_to(self)
        # display ee.FeatureCollection()
        elif isinstance(ee_object, ee.featurecollection.FeatureCollection):  
            ee_object_new = ee.Image().paint(ee_object, 0, 2)
            map_id_dict = ee.Image(ee_object_new).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
        )
        # display ee.FeatureCollection()
        elif isinstance(ee_object, ee.feature.Feature):  
            ee_object_new = ee.Image().paint(ee_object, 0, 2)
            map_id_dict = ee.Image(ee_object_new).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
        ).add_to(self)
    
    except:
        print("Could not display {}".format(name))
    
def plot_ee_data_on_map(dataset,minimum_value,maximum_value,latitude,longitude,zoom):
    # https://github.com/google/earthengine-api/blob/master/python/examples/ipynb/ee-api-colab-setup.ipynb
    folium.Map.add_ee_layer = add_ee_layer
    vis_params = {
      'min': minimum_value,
      'max': maximum_value,
      'palette': ['black', 'blue', 'purple', 'cyan', 'green', 'yellow', 'red']}
    my_map = folium.Map(location=[latitude,longitude], zoom_start=zoom, height=500)
    my_map.add_ee_layer(dataset, vis_params, 'Color')
    my_map.add_child(folium.LayerControl())
    display(my_map)

# Methodology for calculating E.F for sub-region and states
Overview: For every power plant we draw a circle centring individual power plant with a radius that relative to power plant yearly emission(GWH). After that exclude all other pixel value of satellite image, as a result, we are left with individual power plants area with the circle. Then we sum all NO2 using a weighted approach where weight is defined how far a pixel is located from the centre( power plant ). This is just an overview. Below we break down each step with a clear explanation of our method. We are going to calculate the emission and emission factor for Peurto Rico and from that, we will learn how this methodology work.

1. [Importing and filtering satellite images](#importing_satellite_images)
2. [Importing power plant database, filter them and calculate AOI](#calculate_aoi)
3. [Convert daily scattered image tiles to daily mosaic](#daily_mosaic)
4. [Unit conversion](#unit_convert)
5. [Building a yearly composite of AOI](#composite_aoi)
6. [Finally, calculate total NO2 for a region using weighted reductions](#finally_weighted_reduction)

<a id='importing_satellite_images'></a>
### 1. Importing and filtering satellite images
Our first step is to import our Sentinel-5p dataset and select TVCD band and filter bounds to Peurto Rico. 

Why TVCD(Trophosphoric Vertical Column Density)? Because our satellite covers daily global coverage, we will be using tropospheric vertical column, because it is the nearest stage of the atmosphere where all weather takes place. Because no2 has very shot lifespan it is the ideal column for our research.

In [None]:
peurto_rico_state_code = ['72'] # peurto rico code

# import tiger states boudary data for getting states boundary
peurto_rico_geometry = ee.FeatureCollection("TIGER/2018/States")\
.filter(ee.Filter.inList('STATEFP', peurto_rico_state_code))

# import sentinel-5p dataset, filter band to TVCDs and filter bounds (peurto rico)
no2_satellite_data_for_peurto_rico = ee.ImageCollection('COPERNICUS/S5P/OFFL/L3_NO2')\
    .select('tropospheric_NO2_column_number_density')\
    .filterBounds(peurto_rico_geometry.geometry())

<a id='calculate_aoi'></a>
### 2. Importing power plant database, filter them and calculate AOI
We want to exclude all the pixels that don't fall in our AOI. But how do we calculate our AOI? We will take each power plant and calculate a circle with a relative radius where the radius is a factor of annual total emissions(GWH) of that power plant. We will do that for all power plants in desired states and exclude pixels that don't fall into the AOI.

First, we will import power plant database from GEE. It stores as FeatureCollection. After that, we filter the power plant situated in the USA. In this data, power plant data stored as a point. Each point contains latitude and longitude of the power plant. We use that point to draw a circle of relative radius around the power plant. We will do that for each plant in the USA. Then we exclude other pixels that don't fall into the AOI using masking and store the data as a feature collection for future when we mask this image with no2 image for excluding pixels out of our AOI. 

In [None]:
# import power plant database from GEE
power_plant_feature_collection = ee.FeatureCollection("WRI/GPPD/power_plants")

# filter USA power plant only
country_filter = ee.Filter.eq('country', 'USA') 
power_plant_feature_collection = power_plant_feature_collection.filter(country_filter)

def calc_buffer(feature):
    """
    this function calculates the circle around each power plant using relative 
    radius and update the geometry
    """
    keepProperties = ['country','country_lg','name','gppd_idnr','capacitymw',
                      'latitude','longitude','fuel1','fuel2','fuel3','fuel4','comm_year',
                      'owner','source','url','src_latlon','cap_year','gwh_2013',
                      'gwh_2014','gwh_2015','gwh_2016','gwh_estimt']
    buffer_amount = ee.Number(feature.get('gwh_2016')).multiply(1.1) # circle radius
     # create that circle using buffer
    buffer = feature.geometry().buffer(buffer_amount, maxError=200)
    return ee.Feature(buffer).copyProperties(feature, keepProperties)

# apply the above function to create circle around each power plant of USA
power_plant_feature_collection_buffer = power_plant_feature_collection.map(
    lambda feature: calc_buffer(feature))

# create the mask for exlcuding pixels that don't fall in the circle
pp_masks = ee.Image().toByte().paint(power_plant_feature_collection_buffer, 1)
pp_masks = pp_masks.updateMask(pp_masks)

In [None]:
# let's plot the mask
minimum_value = 0
maximum_value = 0.0002
latitude = 37.5010
longitude = -122.1899
zoom =4.3
plot_ee_data_on_map(pp_masks, minimum_value, maximum_value, latitude, longitude, zoom)

We are seeing on the map that our AOI are showing in red, in where we will calculate no2. 

<a id='daily_mosaic'></a>
### 3. Convert daily scattered image tiles to daily mosaic

These Sentinel-5p images come with different tiles. Sometimes a single location has multiple tiles. We will use earth engine mosaic feature to turn all tiles into a single image for each day for Peurto Rico as an example. And we will do that for all days between 2018-06-28 to 2019-06-28, generating a total of 365 images. 

In [None]:
# yearly first date and last date 
start = ee.Date('2018-06-28')
finish = ee.Date('2019-06-28')

diff = finish.difference(start, 'day') # 365 days 
range_date = ee.List.sequence(0, diff.subtract(1)).map(lambda day: start.advance(day,'day'))

def day_mosaics(date, newlist):
    """
    this function convert daily tiles of image of a region to a full mosaic image
    """
    date = ee.Date(date)
    newlist = ee.List(newlist)
    filtered = no2_satellite_data_for_peurto_rico.filterDate(date, date.advance(1,'day'))
    image = ee.Image(filtered.mosaic())
    return ee.List(ee.Algorithms.If(filtered.size(), newlist.add(image), newlist))

# generating 365 images of daily in a year
no2_satellite_data_for_peurto_rico = ee.ImageCollection(ee.List(range_date.iterate(day_mosaics,
                                                                                   ee.List([]))))

<a id='unit_convert'></a>
### 4. Unit conversion
Sentinel-5p data comes in mol/m^2 unit. We want to convert that unit to locally based unit like a ton or lb. For now, let's convert mol/m^2 to a ton. But how? Let's understand the unit first. 

Let's talk about a single pixel. Sentinel-5p single-pixel size (area) is 7x3.5 km^2(24,500,000 m^2). Sentinel-5p data gives value for each pixel (e.g a pixel value is 0.89 mol/m^2). If we multiply that with each pixel area then we negate m^2 and we get unit as mol. Next, we convert mol to gram by multiplying 46 and gram to tons by multiplying gram_to_ton factor. We will apply these process to each pixel in sentinel-5p images and in that way we will get no2 emission in tons.

I inspired this method conversion from [a. this paper](http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S0187-62362018000300189), [b. question 1](https://gis.stackexchange.com/questions/353368/correlating-mol-m2-with-ton-and-lb), [c. question 2](https://earthscience.stackexchange.com/questions/19444/how-to-convert-mol-m2-to-total-mass-e-g-gram-kg-etc)

In [None]:
# multipication factor for converting gram to ton
us_ton_conve_mult_factor = (1.10231e-6) 
no2_satellite_data_for_peurto_rico = no2_satellite_data_for_peurto_rico.map(
    lambda image: image.multiply(24500000)) # convert mol/m^2 to mol
no2_satellite_data_for_peurto_rico = no2_satellite_data_for_peurto_rico.map(
    lambda image: image.multiply(46)) # convert mol no2 to gram no2
no2_satellite_data_for_peurto_rico = no2_satellite_data_for_peurto_rico.map(
    lambda image: image.multiply(us_ton_conve_mult_factor)) # convert gram to ton

<a id='composite_aoi'></a>
### 5. Building a yearly composite of AOI
Now we want to all those 365 images into one image using composite features of earth engine. Where each pixel is the yearly sum of no2. Then from that image, we will exclude pixels that don't fall in our AOI. 

In [None]:
# convert 365 images to one image composite using ee.Reducer.sum()
no2_satellite_data_for_peurto_rico = \
no2_satellite_data_for_peurto_rico.reduce(ee.Reducer.sum())

# exlclude pixels that don't fall in the AOIs
no2_satellite_data_for_peurto_rico = no2_satellite_data_for_peurto_rico.updateMask(pp_masks)

<a id='finally_weighted_reduction'></a>
### 6. Finally, calculate total NO2 from power plant using weighted reduction
Finally, we will now calculate the total no2 of Puerto Rico. But we are not doing a normal reduction(sum), we will be using a weighted reduction by distance from the power plant as a weight. We can easily calculate sum reduction over an AOI but how do we calculate the weights. After researching a lot I have found the solution. 

We want the weight to be calculated in a way where each pixel get less weight as it's moving away from the source ( power plant ). For implementation this we first declare max distance and then use GEE FeatureCollection distance function calculate the distance for each pixel and then calculate weights from 0 to 1. After we use GEE reduceRegion with sum reducer to reduce our state no2 to a single sum value. 

This method is very useful because it takes a lot of factors considered. For example, wind moves no2 from its source. Within our AOI, our satellite will take pictures and if no2 moving away from its source it will have a lot of noise around the circle. For this, weighted distance gives less weight as no2 move from its source. By this way, we can minimize a lot of noise. 

In [None]:
# calculating weights 
maxDist = 5000
distance = power_plant_feature_collection.distance(searchRadius=maxDist, maxError=100)
weight = distance.subtract(maxDist).abs().divide(maxDist)
# adding weights as a band to our no2 composite images 
no2_satellite_data_for_peurto_rico = no2_satellite_data_for_peurto_rico.addBands(weight)

# we will calculate total no2 of a region, using reduceRegion function
# but first we will calculate without weight
total_no2_peurto_rico = no2_satellite_data_for_peurto_rico.reduceRegion(
    reducer = ee.Reducer.sum(),
    geometry= peurto_rico_geometry.geometry(),
    scale= 1113).getInfo()
print('yearly total no2 emission from power generation (ton) without weight: ' \
      + str(total_no2_peurto_rico.get('tropospheric_NO2_column_number_density_sum'))) 

In [None]:
total_no2_peurto_rico = no2_satellite_data_for_peurto_rico.reduceRegion(reducer = ee.Reducer.sum().splitWeights(),
                                                  geometry= peurto_rico_geometry.geometry(),
                                                  scale= 1113).getInfo()
print('yearly total no2 emission from power generation (ton) with weight: ' \
      + str(total_no2_peurto_rico.get('sum')))

Finally, we get annual total emission from power generation in Puerto Rico. As we already have data on yearly total energy generation we can easily calculate the emission factor. But how do we evaluate the result? For this reason, we are going to do another major step. 

# How do we evaluate our methodology?

To know how correct and efficient a model is it crucial to evaluate the model with a real-world scenario. For this reason, we have developed a very well defined evaluation method to test our model. As we know that Sentinel-5p has data from 2018-06-28, so we cant compare our calculated value against historical value, so how do we evaluate our model. For solving that, what we did is calculate emission for all states and compare them with 2018 and 2019 data. That way we can evaluate our results perfectly. 

Our searching and reading a lot, we have found only two datasets that useful for us. The first one is bottom-up emission data from EPA and the second individual power plant emission data from EPA. The problem is EPA up until right now doesn’t publish statewide emission data for 2019. the [latest data is for 2018](https://www.epa.gov/energy/emissions-generation-resource-integrated-database-egrid). And this dataset contains the yearly sum of no2 emission, emission factor for each state, plant on a total yearly average. But our sentinel-5p satellite data is available from July 2018. That has a six-month gap from the total yearly 2018 data. 

On the other hand, we have [no2 data for 2019](https://www.epa.gov/airmarkets/power-plant-emission-trends) that contains emission from power plant wise and but it doesn't have all power plant emission. That makes it unsuitable for our evaluation. 

Then what to do? Don’t worry! After some researching, we have found a solution for that. 
    • 2018 EPA contains information in ozone season. That is good news for us. Because mostly ozone season happens after 5 -6 month. So we have calculated total emission from satellite imagery in that date bound and evaluate against ozone emission. This will give us an accurate understanding of our methodology. 
We will use the 2019 power plant wise emission data to evaluate our power plant wise emission calculation ( marginal emission ) in the next notebook. 

# Calculate emission and E.F for all US states
For evaluation, we like to calculate emission and emission factor for us states. This will help us to more accurately measure our methodology performances.

1. [Create a function for calculating emission for all US states](#function)
2. [Create a batch of running for computation limit and calculate](#batch)
3. [Process for evaluation](#process_evaluation)

<a id='function'></a>
### 1. Create a function for calculating emission for all US states
For calculating emission for all us states we are going to use a function for easy to use. This function works the same way as we describe above. We just wrap it into a function and apply for all us states. 

In [None]:
us_ton_conve_mult_factor = (1.10231e-6)

def calc_each_states_total_no2(feature): 
    
    """
    this function calculates emission for each states. 
    here feature represents a single states 
    all the functionality are same described above
    """
    
    collection_for_specific_states = ee.ImageCollection('COPERNICUS/S5P/OFFL/L3_NO2')\
    .select('tropospheric_NO2_column_number_density')\
    .filterBounds(feature.geometry())
    
    start = ee.Date('2018-06-28')
    finish = ee.Date('2018-11-28')

    diff = finish.difference(start, 'day')
    range_date = ee.List.sequence(0, diff.subtract(1)).map(
        lambda day: start.advance(day,'day'))
    
    def day_mosaics(date, newlist):
        date = ee.Date(date)
        newlist = ee.List(newlist)
        filtered = collection_for_specific_states.filterDate(date, date.advance(1,'day'))
        image = ee.Image(filtered.mosaic())
        return ee.List(ee.Algorithms.If(filtered.size(), newlist.add(image), newlist))
    
    s5p_mosaic_for_each_states = ee.ImageCollection(
        ee.List(range_date.iterate(day_mosaics, ee.List([]))))
    
#     s5p_mosaic_for_each_states = s5p_mosaic_for_each_states.map(
        #lambda image: image.convolve(gauss_kernel))
    
    s5p_mosaic_for_each_states = s5p_mosaic_for_each_states.map(
        lambda image: image.multiply(24500000))
    s5p_mosaic_for_each_states = s5p_mosaic_for_each_states.map(
        lambda image: image.multiply(46))
    s5p_mosaic_for_each_states = s5p_mosaic_for_each_states.map(
        lambda image: image.multiply(us_ton_conve_mult_factor))
    s5p_mosaic_for_each_states = s5p_mosaic_for_each_states.reduce(ee.Reducer.sum())
    
    #########
    maxDist = 20000
    distance = power_plant_feature_collection.distance(searchRadius=maxDist, maxError=1000)
    weight = distance.subtract(maxDist).abs().divide(maxDist)

    s5p_mosaic_for_each_states = s5p_mosaic_for_each_states.addBands(weight)
    #########
    
    s5p_mosaic_for_each_states = s5p_mosaic_for_each_states.updateMask(pp_masks)
        
    total_no2_yearly_each_states = s5p_mosaic_for_each_states.reduceRegion(
        reducer = ee.Reducer.sum().splitWeights(),
        geometry= feature.geometry(),scale= 1113)
    
    return feature.set({'total_no2' : total_no2_yearly_each_states.get('sum')})

<a id='batch'></a>
### 2. Create a batch of running for computation limit and calculation
Because we have more than 50 states to calculate it take quite a few time to calculate. To avoid GEE time out error, we separate our calculation in two batches. The first batch will calculate emission for half states and the second batch will calculate for other half states. 

After that, we will calculate emission for all states. 

In [None]:
us_states_collection = ee.FeatureCollection("TIGER/2018/States")
all_us_state_code = us_states_collection.reduceColumns(
    reducer=ee.Reducer.toList().repeat(2),
    selectors=['STATEFP', 'NAME']).getInfo().get('list')[0]

first_batch_us_state_code = all_us_state_code[:len(all_us_state_code)//2]
second_batch_us_state_code = all_us_state_code[len(all_us_state_code)//2:]

# create feature collection for half of states
first_batch_states_collection = ee.FeatureCollection("TIGER/2018/States")\
.filter(ee.Filter.inList('STATEFP', first_batch_us_state_code))

# create feature collection with second half of states
second_batch_states_collection = ee.FeatureCollection("TIGER/2018/States")\
.filter(ee.Filter.inList('STATEFP', second_batch_us_state_code))


total_no2_first_batch = first_batch_states_collection.map(
    lambda feature: calc_each_states_total_no2(feature))
total_no2_second_batch = second_batch_states_collection.map(
    lambda feature: calc_each_states_total_no2(feature))

In [None]:
# start computation and get desired column as featurecollection
first_batch_results = total_no2_first_batch.reduceColumns(
    reducer=ee.Reducer.toList().repeat(4),
    selectors=['NAME', 'STATEFP', 'STUSPS', 'total_no2']).getInfo()

In [None]:
second_batch_results = total_no2_second_batch.reduceColumns(
    reducer=ee.Reducer.toList().repeat(4),
    selectors=['NAME', 'STATEFP', 'STUSPS', 'total_no2']).getInfo()

<a id='process_evaluation'></a>
### 3. Process for evaluation
Finally, we have calculated emission for all states. But to evaluate against bottom-up emission we have to process and merge data with bottom-up data. First, we will convert satellite data into pandas dataframe and then we will merge it with bottom-up emission data. **After that we convert emission data from ton to lb and calculate emission factor by dividing emission by total electricity generation. **

In [None]:
# create empty dataframe for storing featurecollection from computaion
df_first_batch_total_no2 = pd.DataFrame(columns=['NAME', 'STATEFP', 'STUSPS', 'total_no2'])
df_second_batch_total_no2 = pd.DataFrame(columns=['NAME', 'STATEFP', 'STUSPS', 'total_no2'])

def convert_feature_collection_results_to_df(results, dataframe):
    dataframe['NAME'] = pd.Series(results.get('list')[0])
    dataframe['STATEFP'] = pd.Series(results.get('list')[1])
    dataframe['STUSPS'] = pd.Series(results.get('list')[2])
    dataframe['total_no2'] = pd.Series(results.get('list')[3])
    return dataframe

In [None]:
# convert our calculated featurecollection to pandas dataframe
df_first_batch_total_no2 = convert_feature_collection_results_to_df(
    first_batch_results, df_first_batch_total_no2)
df_second_batch_total_no2 = convert_feature_collection_results_to_df(
    second_batch_results, df_second_batch_total_no2)

df_total_no2 = pd.concat([df_first_batch_total_no2, df_second_batch_total_no2])

In [None]:
# import epa 2018 yearly emission data and merge with our calculated emission data

columns = ['FIPSST', 'STNGENAN', 'STNGENOZ', 'STNOXAN', 'STNOXOZ', 'STNOXRTA', 'STNOXRTO']
df_2018_epa_emission = pd.read_csv(
    '../input/state-wise-2018-epa-data/state_wise_epa_data_2018_version_3.csv',
    header=0, skiprows=1, usecols=columns)
df_2018_epa_emission.columns = ['STATEFP', 'annual net electricity generation',
                                'ozone net electricity generation', 'annual nox emissions',
                                'ozone nox emissions', 'annual nox output rate',
                                'ozone nox output rate']
df_total_no2['STATEFP'] = df_total_no2['STATEFP'].astype(int)

df_total_no2_with_bottom_up = df_total_no2.merge(
    df_2018_epa_emission, on='STATEFP', how='inner').sort_values('STATEFP')

# convert ton to lb, because ground e.f are in lb/MWH

df_total_no2_with_bottom_up['total_no2_lb'] = df_total_no2_with_bottom_up['total_no2'] * 2000

# calculate ozone season emission factor
df_total_no2_with_bottom_up['ozone_emission_factor'] = \
df_total_no2_with_bottom_up['total_no2_lb'] / \
df_total_no2_with_bottom_up['ozone net electricity generation']

df_total_no2_with_bottom_up.head()

# Evaluation
After a lot of work, we have finally come to the evaluation part. In this part, we are going to evaluate our methodology against bottom-up emission to see how our model performance. How we like to evaluate is described in the upper section. Now we are going to compare our calculated satellite image form 2018-06-28 to 2018-11-28 to bottom-up emission data from EPA of 2018. We have already discussed the issue in the upper section. So let's just see what happens. 

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=df_total_no2_with_bottom_up['total_no2'].values.tolist(),
    y=df_total_no2_with_bottom_up['NAME'].values.tolist(),
    marker=dict(color="crimson", size=12),
    mode="markers",
    name="Satellite (2018-06-28 to 2018-11-28)",
))

fig.add_trace(go.Scatter(
    x=df_total_no2_with_bottom_up['ozone nox emissions'].values.tolist(),
    y=df_total_no2_with_bottom_up['NAME'].values.tolist(),
    marker=dict(color="gold", size=12),
    mode="markers",
    name="Bottom up (Ozone season 2018)"
))

fig.update_layout(title="Satellite emissions vs bottom up emissions for ozone season in 2018",
                  xaxis_title="Emissions (tons)",autosize=False,
                  width=700,height=1200,
                  yaxis_title="State names", yaxis={'nticks':60})

fig.show()

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=df_total_no2_with_bottom_up['ozone_emission_factor'].values.tolist(),
    y=df_total_no2_with_bottom_up['NAME'].values.tolist(),
    marker=dict(color="crimson", size=12), 
    mode="markers",
    name="Satellite (2018-06-28 to 2018-11-28)",
))

fig.add_trace(go.Scatter(
    x=df_total_no2_with_bottom_up['ozone nox output rate'].values.tolist(),
    y=df_total_no2_with_bottom_up['NAME'].values.tolist(),
    marker=dict(color="gold", size=12),
    mode="markers",
    name="Bottom up (Ozone season 2018)"
))

fig.update_layout(title="Satellite emissions factor vs bottom up emission factor",
                  xaxis_title="Emissions factor(lb/MWh)",autosize=False,
                  width=700,height=1200,
                  yaxis_title="State names", yaxis={'nticks':60})

fig.show()

Wow! we are seeing a very astonishing result. The satellite emission data correlate heavily with bottom-up data. We can see for each state our satellite image data produces very similar data. Of course, there is some ambiguity of the date of calculation, for that reason we are seeing some error. But overall we are confident that our methodology has successfully calculated emission from satellite images. 

# Conclusion
We have finally implemented our methodology for calculating the emission factor for sub-national regions and individual states. Also, we successfully evaluate our results and found very promising results. In the next notebook, we are going to see how to use this methodology for individual power plants.