# DS4G - Geospatial Analysis, Exploring alternatives for emission factor

Lot of great kernals like

https://www.kaggle.com/paultimothymooney/overview-of-the-eie-analytics-challenge
https://www.kaggle.com/paultimothymooney/how-to-get-started-with-the-earth-engine-data
https://www.kaggle.com/ragnar123/exploratory-data-analysis-and-factor-model-idea
https://www.kaggle.com/parulpandey/understanding-the-data-wip

have already stated the details/videos on green environment, emission factor analysis, emission sources and remote sensing process of acquiring images and data

Also referred these discussion which are very useful for this project 

https://www.kaggle.com/c/ds4g-environmental-insights-explorer/discussion/130055
https://www.kaggle.com/c/ds4g-environmental-insights-explorer/discussion/130221


## Why do we use Google Earth Engine


Google's mission is to organize the world's information and make it universally accessible and useful. In line with this mission, Earth Engine organizes geospatial information and makes it available for analysis. More generally, Google strives to make the world a better place through the use of technology.

**About Google Earth Engine

Google Earth Engine is a cloud-based platform for planetary-scale environmental data analysis. **

**The below statements are taken from kaggle project description page **

https://www.kaggle.com/c/ds4g-environmental-insights-explorer/overview

## PROJECT OVERVIEW

Develop a methodology to calculate an average historical emissions factor of electricity generated for a sub-national region, using remote sensing data and techniques.

## PROBLEM STATEMENT

Current emissions factors methodologies are based on time-consuming data collection and may include errors derived from a lack of access to granular datasets,inability to refresh data on a frequent basis, overly general modeling assumptions, and inaccurate reporting of emissions sources like fuel consumption. 

To develop a methodology to calculate an average historical emissions factor for electricity generation in a sub-national region.


## DATASET PROVIDED
Initial list of datasets covering the geographic boundary of ***Puerto Rico*** to serve as the foundation for this analysis. As an island, there are fewer confounding factors from nearby areas. 
Calculate the annual historical emission factor
1. Bonus points will be awarded for smaller time slices of the average historical emissions factors, such as one per month for the 12-month period
2. Additional bonus points will be awarded for participants that develop methodologies for calculating **marginal emissions factors** for the sub-national region.

### What is Marginal Emission factor

** Marginal emission factor (MEF) is an effective tool for estimating incremental changes in carbon emissions as a result of a change in demand. ... Under this plan, the power sector—responsible for 27% of carbon emissions and constituting the single largest source of carbon in 2010—must cut emissions to almost zero by 2050 **

https://www.tmrow.com/blog/marginal-emissions-what-they-are-and-when-to-use-them

** LOADING BASIC LIBRARIES **

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import rasterio as rio
import folium 
import seaborn as sns

** Referred from @paulmooney's kernal [https://www.kaggle.com/paultimothymooney/how-to-get-started-with-the-earth-engine-data](http://) **

**Step 1: Explore the power plant data**

In [None]:

def plot_points_on_map(dataframe,begin_index,end_index,latitude_column,latitude_value,longitude_column,longitude_value,zoom):
    df = dataframe[begin_index:end_index]
    location = [latitude_value,longitude_value]
    plot = folium.Map(location=location,zoom_start=zoom)
    for i in range(0,len(df)):
        popup = folium.Popup(str(df.primary_fuel[i:i+1]))
        folium.Marker([df[latitude_column].iloc[i],df[longitude_column].iloc[i]],popup=popup).add_to(plot)
    return(plot)

def overlay_image_on_puerto_rico(file_name,band_layer):
    band = rio.open(file_name).read(band_layer)
    m = folium.Map([lat, lon], zoom_start=8)
    folium.raster_layers.ImageOverlay(
        image=band,
        bounds = [[18.6,-67.3,],[17.9,-65.2]],
        colormap=lambda x: (1, 0, 0, x),
    ).add_to(m)
    return m

def plot_scaled(file_name):
    vmin, vmax = np.nanpercentile(file_name, (5,95))  # 5-95% stretch
    img_plt = plt.imshow(file_name, cmap='gray', vmin=vmin, vmax=vmax)
    plt.show()

def split_column_into_new_columns(dataframe,column_to_split,new_column_one,begin_column_one,end_column_one):
    for i in range(0, len(dataframe)):
        dataframe.loc[i, new_column_one] = dataframe.loc[i, column_to_split][begin_column_one:end_column_one]
    return dataframe

In [None]:
pd.set_option('max_columns', 30)

In [None]:
power_plants = pd.read_csv('/kaggle/input/ds4g-environmental-insights-explorer/eie_data/gppd/gppd_120_pr.csv')

power_plants.head(35)

In [None]:
power_plants = split_column_into_new_columns(power_plants,'.geo','latitude',50,66)
power_plants = split_column_into_new_columns(power_plants,'.geo','longitude',31,48)
power_plants['latitude'] = power_plants['latitude'].astype(float)
a = np.array(power_plants['latitude'].values.tolist()) # 18 instead of 8
power_plants['latitude'] = np.where(a < 10, a+10, a).tolist() 
lat=18.200178; lon=-66.664513
plot_points_on_map(power_plants,0,425,'latitude',lat,'longitude',lon,9)

**The dataset gppd_120_pr.csv contains 24 columns, we include only those are required as of now**

In [None]:
power_plants.columns

In [None]:
import pandas_profiling
eda_analysis = pandas_profiling.ProfileReport(power_plants)
eda_analysis.to_file('eie_analysis.html')

In [None]:
power_plants.info()

In [None]:
power_plants.describe(include = 'all')

In [None]:
#power_plants_df = power_plants.sort_values('estimated_generation_gwh',ascending=False).reset_index()
dsg_df = power_plants[['name','latitude','longitude','primary_fuel','capacity_mw','estimated_generation_gwh','source','owner','country','commissioning_year','year_of_capacity_data']]

In [None]:
dsg_df.head(25)

In [None]:
dsg_df.describe()

In [None]:
plt.figure(figsize=(25,15))
sns.barplot(x='capacity_mw', y='estimated_generation_gwh', hue='primary_fuel', data=dsg_df[dsg_df['primary_fuel'].isin(['Coal','Oil','Gas'])])
#plt.set_xticklabels(a.get_xticklabels(), rotation=45)
plt.ylabel('Estimated Generation')
plt.title('Capacity Vs Estimated Generation');

In [None]:
dsg_df_corr = dsg_df.corr()
dsg_df_corr = sns.heatmap(dsg_df_corr, cmap="Accent",  annot= True)

In [None]:
from matplotlib import style
from matplotlib.pyplot import pie,show

In [None]:
power_plants_df_fueltype = power_plants.groupby (['primary_fuel']).agg({'capacity_mw': 'sum',
                                                'estimated_generation_gwh': 'sum'           
                                                }).reset_index()

In [None]:
power_plants_df_fueltype[['primary_fuel','capacity_mw','estimated_generation_gwh']]

In [None]:
plt.figure(figsize=(22,15))
plt.title("Fuel Generation from different power plants")
colors = ['green', 'orange', 'pink', 'c', 'm', 'y']
power_plants['primary_fuel'].value_counts().plot(kind='pie', colors=colors, 
 autopct='%1.1f%%',
counterclock=False, shadow=True)


In [None]:
power_plants['primary_fuel'].value_counts()

In [None]:
%matplotlib inline 

fig, ax = plt.subplots(figsize=[8,6])
plt.title ('Various Power Plants Capacity (in MW)')
plt.xlabel('Fuel Type')
plt.ylabel('Capacity mw');
ax.bar(power_plants_df_fueltype['primary_fuel'], power_plants_df_fueltype['capacity_mw'], color='YGR')

In [None]:
import seaborn as sns
sns.jointplot(x="capacity_mw", y="estimated_generation_gwh", data=power_plants_df_fueltype, kind = 'kde', color = 'orange')

In [None]:
df = power_plants[['name','latitude','longitude','primary_fuel','capacity_mw','estimated_generation_gwh']]
d=df.corr()
plt.figure(figsize=(10,7))
a = sns.heatmap(d, cmap="viridis",  annot= True)
a.Title = 'Fuel Type power Generation - Heatmap';
rotx = a.set_xticklabels(a.get_xticklabels(), rotation=45)
roty = a.set_yticklabels(a.get_yticklabels(), rotation=45)
plt.show()

In [None]:
from skimage.io import imread
image = imread('/kaggle/input/ds4g-environmental-insights-explorer/eie_data/s5p_no2/s5p_no2_20180708T172237_20180714T190743.tif')
print (image.shape)
plt.imshow(image[:,:,0], cmap = 'cool')
plt.axes = False

In [None]:
image = imread('/kaggle/input/ds4g-environmental-insights-explorer/eie_data/gfs/gfs_2018070400.tif')
print (image.shape)
plt.imshow(image[:,:,2], cmap = 'viridis')

** Explore the nO2 emissions data**

**Sentinel-5P OFFL NO2: Offline Nitrogen Dioxide - NO2_column_number_density**

In [None]:
image = '/kaggle/input/ds4g-environmental-insights-explorer/eie_data/s5p_no2/s5p_no2_20180714T170945_20180720T185244.tif'
image_band = rio.open(image).read(1)
plot_scaled(image_band)
overlay_image_on_puerto_rico(image,band_layer=1)

** Explore the weather data**

**GLDAS-2.1: Global Land Data Assimilation System  -  **

In [None]:
image = '/kaggle/input/ds4g-environmental-insights-explorer/eie_data/gldas/gldas_20180702_1500.tif'
image_band = rio.open(image).read(3)
plot_scaled(image_band)

image = '/kaggle/input/ds4g-environmental-insights-explorer/eie_data/gfs/gfs_2018072118.tif'
image_band = rio.open(image).read(3)
plot_scaled(image_band)

overlay_image_on_puerto_rico(image,band_layer=3)

**** Work In Progress ....****