# OVERVIEW

The Environmental Insights Explorer team at Google is keen to gather insights on ways to improve calculations of global emissions factors of sub-national regions.The ultimate goal of this challenge is to test if calculations of emission factor using remote sensing technique are possible.

# Remote Sensing
  
 Remote sensing is the acquistion of information about an object or phenomenon without making physical contact with the object and thus in contrast to on site observation,especially the earth. Remote sensing is used in numerous fields, including geography,land surveying and most Earth Sciene disciplines; it also has military,intelligence,commercial,economic,planning, and humanitarian applications.

# Remote Sensing Techniques
 
Remote sensing utilizes satellite and/or  airborne based sensors to collect information about a given object or area. Remote sensing data collection methods can be passive or active.Passive sensors(e.g., spetral imagers) detect natural radition that is emitted or reflected by the object or area being observed. In active remote sensing energy is emitted and resultant signal that is reflected back is measured.
![](http://th.bing.com/th/id/OIP.cr73jDfOvCNKcH3-1jcSXgHaFj?w=223&h=180&c=7&o=5&dpr=2.5&pid=1.7)

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Earth science data & analysis

The following datasets have been provided as a starter kit to get started with the competition.

1.**Sentinel-5P Carbon Monoxide**
Concentrations of Carbon monoxide(CO) and water vapor. CO is an important atmospheric trace gas for our understanding of tropospheric chemistry. Main sources of CO are combustion of fossil fuels,biomass burning, and atmospheric oxidation of methane and other hydrocarbons.

Sentinel-5 Percursor is statelite launched on 13 october by the European Space Agency to monitor air pollution.The onboard sensor is frequently referred to as Tropomi.

2**Sentinel-5P Nitrogen Dioxide**
Troposheric and Stratospheric nitrogen dioxide concentration. Nitrogen dioxide enters the atmosphere as a result of anthropogenic activities such as fossil fuel combustion and biomass burning as well as natural processes including microbiological processes in soils,wildfires and lighting.

Sentinel-5 Precursor is a satellite launched on 13 october 2017 by the European Space Ageny to monitor air pollution.The onboard sensor is frequently referred to as Tropomi.


**Importing  necessary libraries**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import datetime as dt
from datetime import datetime
# color pallette
cnf, dth, rec, act = '#393e46', '#ff2e63', '#21bf73', '#fe9801'
import folium 
from folium import plugins
import plotly.express as px
import rasterio as rio
import warnings
warnings.filterwarnings('ignore')

# Global Power Plant Database (GPPD)

The Global Power Plant Database is a comprehensive, open source database of power plants around the world.It centralizes power plant data to make it easier to navigate,compare and draw insights. Each power plant is geolocated and entries contain information on plant capcity, generation,ownership and fueltype. 

In [None]:
power_plant=pd.read_csv('/kaggle/input/ds4g-environmental-insights-explorer/eie_data/gppd/gppd_120_pr.csv')
display(power_plant.shape)
display(power_plant.columns)
display(power_plant['system:index'].is_unique)
display(power_plant.head())

In [None]:
power_plant.set_index('system:index',inplace=True)
power_plant.head().T

Let's understand some of the attributes :
* **capcity_mw**- electrical generating capcity in megawatts
* **commissioning year** - year of plant operation 
* **country_long**- full name of country
* **estimated_generation_gwh** - estimated annual electricity generation in gigawatt-hours
* **generation_gwh_2013**- generation of electricity in gigawatt-hours for the year2013
* **geolocation_source** - Dataset provider
* **gppd_idnr**- It is id of Global Power Plant Database under department of lllinois Natural Resourcessource
* **name**- name or title of powerplant
* **primary_fuel**-energy source used in primary eletricity generation 
* **weep_id**- a reference to a unique plant identifier in the widely used PLATTS-WEPP database
* **year_of_capcity_data**-year the capcity information was reported
* **source**- entity reporting the data
* **owner**- majority shareholder of the powerplant


In [None]:
display("Database Country")
display(power_plant['country'].value_counts())
display("ID of Global Power Plant Database")
display(power_plant['gppd_idnr'].value_counts())
display("Stockholder of power plant")
display(power_plant['owner'].value_counts())
display("Source to generated electricity")
display(power_plant['source'].value_counts())
display("Types of primary fuel  used :")
display(power_plant['primary_fuel'].value_counts())

In [None]:
power_plant['name'].value_counts()

In [None]:
#plot primary fuel used
sns.countplot(power_plant['primary_fuel'])

In [None]:
#Year of establishment of plant
power_plant['commissioning_year'].value_counts()

In [None]:
#contribution of Stock Holder
power_plant['owner'].value_counts(ascending=True).plot(kind='barh',title='Contribution of stock holders ')

In [None]:
#types of source
power_plant['source'].value_counts(ascending=True).plot(kind='barh',title='Types of source')

In [None]:
temp=power_plant.groupby('commissioning_year')['estimated_generation_gwh','capacity_mw'].sum().reset_index()
temp=temp[temp['commissioning_year']==max(temp['commissioning_year'])].reset_index(drop=True)
tm=temp.melt(id_vars="commissioning_year",value_vars=["estimated_generation_gwh","capacity_mw"])
temp.head()
fig=px.treemap(tm,path=["variable"],values="value",height=225,width=1200,color_discrete_sequence=[act,rec])
fig.data[0].textinfo='label+text+value'
fig.show()

In [None]:
#Estimated generation growth from commissioning year
temp=power_plant.groupby('commissioning_year')['estimated_generation_gwh','capacity_mw'].sum().reset_index()
temp=temp.melt(id_vars="commissioning_year",value_vars=["estimated_generation_gwh","capacity_mw"],var_name='Year',value_name='Count')
temp.head()
fig=px.area(temp,x='commissioning_year',y='Count',color='Year',height=600,title='Production  over time',color_discrete_sequence=[rec,dth])
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

In [None]:
full_grouped=power_plant.groupby(['source','primary_fuel',])['capacity_mw','estimated_generation_gwh'].sum().reset_index()


In [None]:
temp_1=full_grouped.sort_values(by='estimated_generation_gwh',ascending=False)
temp_1=temp_1.reset_index(drop=True)
temp_1.style.background_gradient(cmap='Blues')

In [None]:
#total electricity generation in gigwatthour for one year
total_gen=power_plant['estimated_generation_gwh'].sum()
print('Total Generation :'+'{:.3f}'.format(total_gen)+'GW')

In [None]:
# percentage of total generation in gigawatthour
generation = (power_plant.groupby(['primary_fuel'])['estimated_generation_gwh'].sum()).to_frame()
generation = generation.sort_values('estimated_generation_gwh',ascending=False)
generation['percentage_of_total'] = (generation['estimated_generation_gwh']/total_gen)*100
generation

In [None]:
fig = plt.gcf()
fig.set_size_inches(10, 6)
colors = ['dodgerblue', 'plum', '#F0A30A','#8c564b','orange','green','yellow'] 
generation['percentage_of_total'].plot(kind='bar',color=colors)

In [None]:

generation = (power_plant.groupby(['source'])['estimated_generation_gwh'].sum()).to_frame()
generation = generation.sort_values('estimated_generation_gwh',ascending=False)
generation['percentage_of_total'] = (generation['estimated_generation_gwh']/total_gen)*100
generation

In [None]:
#total production capcity 
total_cap=power_plant['capacity_mw'].sum()
print('Total Capcity :'+'{:.3f}'.format(total_cap)+'MW')

In [None]:
capcity = (power_plant.groupby(['primary_fuel'])['capacity_mw'].sum()).to_frame()
capcity = capcity.sort_values('capacity_mw',ascending=False)
capcity['percentage_of_total'] = (capcity['capacity_mw']/total_cap)*100
capcity

In [None]:
fig = plt.gcf()
fig.set_size_inches(10, 6)
colors = ['dodgerblue', 'plum', '#F0A30A','#8c564b','orange','green','yellow'] 
capcity['percentage_of_total'].plot(kind='bar',color=colors)

In [None]:
capcity = (power_plant.groupby(['source'])['capacity_mw'].sum()).to_frame()
capcity = capcity.sort_values('capacity_mw',ascending=False)
capcity['percentage_of_total'] = (capcity['capacity_mw']/total_cap)*100
capcity

# Second Part


In [None]:
def plot_points_on_map(dataframe,begin_index,end_index,latitude_column,latitude_value,longitude_column,longitude_value,zoom):
    df = dataframe[begin_index:end_index]
    location = [latitude_value,longitude_value]
    plot = folium.Map(location=location,zoom_start=zoom)
    for i in range(0,len(df)):
        popup = folium.Popup(str(df.primary_fuel[i:i+1]))
        folium.Marker([df[latitude_column].iloc[i],df[longitude_column].iloc[i]],popup=popup).add_to(plot)
    return(plot)

def overlay_image_on_puerto_rico(file_name,band_layer):
    band = rio.open(file_name).read(band_layer)
    m = folium.Map([lat, lon], zoom_start=8)
    folium.raster_layers.ImageOverlay(
        image=band,
        bounds = [[18.6,-67.3,],[17.9,-65.2]],
        colormap=lambda x: (1, 0, 0, x),
    ).add_to(m)
    return m

def plot_scaled(file_name):
    vmin, vmax = np.nanpercentile(file_name, (5,95))  # 5-95% stretch
    img_plt = plt.imshow(file_name, cmap='gray', vmin=vmin, vmax=vmax)
    plt.show()

def split_column_into_new_columns(dataframe,column_to_split,new_column_one,begin_column_one,end_column_one):
    for i in range(0, len(dataframe)):
        dataframe.loc[i, new_column_one] = dataframe.loc[i, column_to_split][begin_column_one:end_column_one]
    return dataframe


In [None]:
power_plant = pd.read_csv('/kaggle/input/ds4g-environmental-insights-explorer/eie_data/gppd/gppd_120_pr.csv')
power_plant = split_column_into_new_columns(power_plant,'.geo','latitude',50,66)
power_plant = split_column_into_new_columns(power_plant,'.geo','longitude',31,48)
power_plant['latitude'] = power_plant['latitude'].astype(float)
a = np.array(power_plant['latitude'].values.tolist()) # 18 instead of 8
power_plant['latitude'] = np.where(a < 10, a+10, a).tolist() 
lat=18.200178; lon=-66.664513
plot_points_on_map(power_plant,0,425,'latitude',lat,'longitude',lon,9)

# Explore the weather data

In [None]:
image='/kaggle/input/ds4g-environmental-insights-explorer/eie_data/gldas/gldas_20181210_0600.tif'
image_band=rio.open(image).read(1)
plot_scaled(image_band)
overlay_image_on_puerto_rico(image,band_layer=1)
image='/kaggle/input/ds4g-environmental-insights-explorer/eie_data/gfs/gfs_2019051106.tif'
image_band=rio.open(image).read(1)
plot_scaled(image_band)
overlay_image_on_puerto_rico(image,band_layer=1)
image='/kaggle/input/ds4g-environmental-insights-explorer/eie_data/gfs/gfs_2019031218.tif'
image_band=rio.open(image).read(1)
plot_scaled(image_band)
overlay_image_on_puerto_rico(image,band_layer=1)

# Check the Automated land use classification

In [None]:
# checck random image:
image='../input/inputravi/l3-ne43h01-094-059-01feb2013-band2.tif'
image_band=rio.open(image).read(1)
plot_scaled(image_band)
overlay_image_on_puerto_rico(image,band_layer=1)

# Connect to the Google Earth Engine API

In [None]:
from kaggle_secrets import UserSecretsClient
from google.oauth2.credentials import Credentials
import ee
import folium

def add_ee_layer(self, ee_image_object, vis_params, name):
  # https://github.com/google/earthengine-api/blob/master/python/examples/ipynb/ee-api-colab-setup.ipynb
  map_id_dict = ee.Image(ee_image_object).getMapId(vis_params)
  folium.raster_layers.TileLayer(
    tiles = map_id_dict['tile_fetcher'].url_format,
    attr = 'Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
    name = name,
    overlay = True,
    control = True
  ).add_to(self)

def plot_ee_data_on_map(dataset,column,begin_date,end_date,minimum_value,maximum_value,latitude,longitude,zoom):
    # https://github.com/google/earthengine-api/blob/master/python/examples/ipynb/ee-api-colab-setup.ipynb
    folium.Map.add_ee_layer = add_ee_layer
    vis_params = {
      'min': minimum_value,
      'max': maximum_value,
      'palette': ['006633', 'E5FFCC', '662A00', 'D8D8D8', 'F5F5F5']}
    my_map = folium.Map(location=[latitude,longitude], zoom_start=zoom, height=500)
    s5p = ee.ImageCollection(dataset).filterDate(
        begin_date, end_date)
    my_map.add_ee_layer(s5p.first().select(column), vis_params, 'Color')
    my_map.add_child(folium.LayerControl())
    display(my_map)


After registering at [https://earthengine.google.com/signup/,](http://earthengine.google.com/signup/,) navigate to the add on menu of the notebook editor and create a new user secret called "earth_engine" that contains the refresh token from ee.Authenticate().This step only need to run once. YOu can generate the refresh_token by running the following line in a new code cell:ee.Authenticate().Next, follow the instructions ,paste the value into the input box, run the following command in  new code cell:**!cat ~/.config/earthengine/credentials**.This should return to you a refresh token that can then be saved as your Kagggle user secret.

In summary:

* Step 0: Register your account using both of the following two links:[Links#1](https://earthengine.google.com/signup/) ,[Link#2](https://docs.google.com/forms/d/e/1FAIpQLScFk_pkrrDDF4O8imsEBMaryLDU-Ghf44eHbgujIAl_SXJTJQ/viewform)    
* Step 1: Open an internet-enabled notebbok and then retrieve your token by opening the ee.Authenticate() link in a new tab.You will also ned to copy/paste that value into the relavant input box.
* Step2: After completing Step1,retrieve your refresh-token by running **!cat ~/.config/earthengine/credentials**
* step3: Save your refresh-token as kaggle user secret
* Step 4: Run the code snippet that contains ee.Initialize()


Steps #0,#1,#2, and #3 only need to be perfoemed once(for the initial setup).


Step #4 is run every time that you run your code. Hopefully that helps!