# **Explore Earth Engine Data, Hi Hi**

This is my first try on kaggle. At first, I wanted to use R language but later I realized that it does not support Google Earth Engine API. So, I had to start learning both Google Earth Engine API and Python. It is really fun to have a try and here is my findings.

Electricity shutdown in Puerto Rico:
1. In 2017 September (20/9/2017), [Hurricane Maria](https://www.mercycorps.org/blog/quick-facts-hurricane-maria-puerto-rico) tore through the island and electricity was cut off to 100 percent of the island, and access to clean water and food became limited for most. It wasn’t until August 2018 — nearly a year after the storm — that Puerto Rico Electric Power Authority (PREPA) announced that 100 percent of customers have power restored. 
2. According to [EIA](https://www.eia.gov/state/analysis.php?sid=RQ), Hurricane Maria damaged the one coal-fired electricity generating plant, at Guayama and transmission grid, but the plant resumed generating electricity in February 2018.
3. According to [cnn news](https://edition.cnn.com/2020/01/09/us/puerto-rico-earthquake-power-outages-satellite-images-trnd/index.html) dated 09/01/2020, there was serious earthquark very closed to Puerto Rico causing power outages on the island. 

Assumptions:
1. It is expected that declining of NO2 emission level could be found during abovementioned events in September 2017 and January 2020. If this is the case, the emission level could be used as reference for background emission rate on the island.
2. Hydro, solar, and wind power generations are emission-free and will be ignored.

Models:
1. [Box model](http://acmg.seas.harvard.edu/people/faculty/djj/book/bookchap3.html#pgfId=112721) will be adopted for calculating emission. This will require to determine dimensions of a box around a power plant (pp) for sampling purpose. To calculate emission, I need to know lifetime of NO2 and background NO2 emission on the island. NO2 lifetime can be calculated by wind speed and dimensions of bounding box around powerplants. On the other hand, finding background NO2 emission is tricky.

Defintion:
1. [Power capacity](https://www.eia.gov/tools/faqs/faq.php?id=101&t=3) - the maximum electric output an electricity generator can produce under specific conditions. 
2. Electricity generation - a generator with 1 megawatt (MW) power generation capacity means the power generator can operate at that capacity consistently for one hour. Alternatively, it means the power generator can produce 1 megawatthour (MWh) of electricity in every one hour.
3. Estimated yearly generation by plant (in GDPP) - this is estimation done by two methods: scaling information on aggregate geneation by plant size and machine learning appraoch.
4. [Capacity Factor](https://www.eia.gov/tools/faqs/faq.php?id=101&t=3) is a measure (expressed as a percentage) of how often an electricity generator operates during a specific period of time using a ratio of the actual output to the maximum possible output during that time period. 
5. Emission is mass of emission over time.
6. Emission Factor is the total emission with specified period (e.g. hour) divided by total electricity generation with the same period of time.
7. [Watt](https://en.wikipedia.org/wiki/Watt) is unit of power, or energy rate, Joules per second (J/s)
8. kWh, kilo watt hour is power generated or used in one hour.
9. Mariginal Emission Factor is the rate of change between change of power generation and change of emission.
10. Calm wind conditions: wind speed <= 2 meters per second below 500m.
11. Windy conditions: Wind speed > 2 meters per second.
12. Wind direction: u is the ZONAL VELOCITY, i.e. the component of the horizontal wind TOWARDS EAST. v is the MERIDIONAL VELOCITY, i.e. the component of the horizontal wind TOWARDS NORTH.

Duration of Data To Be Considered:
From July 2018 to July 2019, however, imagery data may not be available in some time slots.

Reading:

[de Foy et al. (2015) Estimates of power plant NOx emissions and lifetimes from OMI NO2 satellite retrievals, Atmoshperic Environment 116(2015)1-11.]]((http://dx.doi.org/10.1016/j.atmosenv.2015.05.056))
[Liu et al. (2020) A methodology to constrain carbon dixoide emissions from coal-fired power plants using satellite obesrevations of co-emitted nitrogen dioxide, Atmospheric Chemistry and Physics, 20, 99-116.](http://doi.org/10.5194/acp-20-99-2020)


![](http://)**Step 1: Set up mapping functions and start to explore powerplant data**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import rasterio as rio
import folium
        
# These are standard functions from folium
# This use folium to plot a map showing location of powerplants classified by fuel type.
def plot_points_on_map(dataframe,begin_index,end_index,latitude_column,latitude_value,longitude_column,longitude_value,zoom):
    df = dataframe[begin_index:end_index]
    location = [latitude_value,longitude_value]
    plot = folium.Map(location=location,zoom_start=zoom)
    for i in range(0,len(df)):
        pf = str(df.primary_fuel[i])
        popup = folium.Popup(pf+'\n<b>'+str(df.name[i])+'</b>')
        if (pf == 'Gas'):
            folium.Marker([df[latitude_column].iloc[i],df[longitude_column].iloc[i]],popup=popup,icon=folium.Icon(color='green')).add_to(plot)
        elif (pf == 'Coal'):
            folium.Marker([df[latitude_column].iloc[i],df[longitude_column].iloc[i]],popup=popup,icon=folium.Icon(color='black')).add_to(plot)
        elif (pf == 'Hydro'):
            folium.Marker([df[latitude_column].iloc[i],df[longitude_column].iloc[i]],popup=popup,icon=folium.Icon(color='blue')).add_to(plot)
        elif (pf == 'Solar'):
            folium.Marker([df[latitude_column].iloc[i],df[longitude_column].iloc[i]],popup=popup,icon=folium.Icon(color='red')).add_to(plot)
        elif (pf == 'Wind'):
            folium.Marker([df[latitude_column].iloc[i],df[longitude_column].iloc[i]],popup=popup,icon=folium.Icon(color='beige')).add_to(plot)
        else:
            folium.Marker([df[latitude_column].iloc[i],df[longitude_column].iloc[i]],popup=popup,icon=folium.Icon(color='orange')).add_to(plot)
    return(plot)

def overlay_image_on_puerto_rico(file_name,band_layer):
    band = rio.open(file_name).read(band_layer)
    m = folium.Map([lat, lon], zoom_start=8)
    folium.raster_layers.ImageOverlay(
        image=band,
        bounds = [[18.6,-67.3,],[17.9,-65.2]],
        colormap=lambda x: (1, 0, 0, x),
    ).add_to(m)
    return m

def plot_scaled(file_name):
    vmin, vmax = np.nanpercentile(file_name, (5,95))  # 5-95% stretch
    img_plt = plt.imshow(file_name, cmap='gray', vmin=vmin, vmax=vmax)
    plt.show()

def split_column_into_new_columns(dataframe,column_to_split,new_column_one,begin_column_one,end_column_one):
    for i in range(0, len(dataframe)):
        dataframe.loc[i, new_column_one] = dataframe.loc[i, column_to_split][begin_column_one:end_column_one]
    return dataframe

**Step 1.1: Import GPPD dataset using geopandas**

I use geopandas and shapely.geometry because it can handle json encoded locational information in a natural way. After creating geometry for each powerplant, I can extract latitude and longitude and store them in separated series.

In [None]:
import geopandas as gpd
from shapely.geometry import mapping, shape
import json
from shapely.geometry import shape
pp = gpd.read_file('/kaggle/input/ds4g-environmental-insights-explorer/eie_data/gppd/gppd_120_pr.csv')
pp['geometry']=pp['.geo'].apply(lambda x: shape(json.loads(x))) #convert str to shape.point and store it in a geometry field.
print('converted data type:', pp['geometry'][0]) #Print out to see whether above conversion is okay
pp['latitude']=pp['geometry'].apply(lambda x:(x.y)) #Generate latitude using the point feature in geometry field.
pp['longitude']=pp['geometry'].apply(lambda x:(x.x)) #Generate longitude using the point feature in geometry field

In [None]:
# Plot the powerplants on map
lat=18.200178; lon=-66.664513 # set center of map
end=len(pp.index)
plot_points_on_map(pp,0,end,'latitude',lat,'longitude',lon,9) # befin index: 0, end index: set to larger than no. of rows is okay

In [None]:
# Examine data type for each variable
pp.dtypes

In [None]:
pp.info()

In [None]:
#capacity_mw means electrical cap
power_plants_df = pp.copy()
power_plants_df['capacity_mw'] = power_plants_df['capacity_mw'].astype('float')
power_plants_df['commissioning_year'] = power_plants_df['commissioning_year'].apply(lambda x: x.replace('.0', '')).astype('int')
power_plants_df['estimated_generation_gwh'] = power_plants_df['estimated_generation_gwh'].astype('float')
power_plants_df[['name','latitude','longitude','primary_fuel','capacity_mw','estimated_generation_gwh','commissioning_year']]

In [None]:
#sorting by capacity_mw by descending order
power_plants_df = power_plants_df.sort_values(by=['capacity_mw'],ascending=False).reset_index()
power_plants_df[['name','latitude','longitude','primary_fuel','capacity_mw','estimated_generation_gwh','commissioning_year']]

In [None]:
#power_plants_df['capacity_mw']=power_plants_df['capacity_mw'].astype(float)
#power_plants_df['estimated_generation_gwh']=power_plants_df['estimated_generation_gwh'].astype(float)
power_plants_df['capacity_factor'] = power_plants_df['estimated_generation_gwh']/(power_plants_df['capacity_mw']*24*365/1000)
power_plants_df[['name', 'capacity_mw', 'primary_fuel', 'estimated_generation_gwh', 'capacity_factor']]

Need to fix the capacity factor for AES, as it is over 100%, which is impossible. I adopted the ratio suggested by https://www.kaggle.com/c/ds4g-environmental-insights-explorer/discussion/130537

In [None]:
#Adopt from https://www.kaggle.com/c/ds4g-environmental-insights-explorer/discussion/130537
source_capacity_factors = {"Coal": 0.55, "Hydro": 0.40, "Gas": 0.80, "Oil": 0.64, "Solar": 0.25, "Wind": 0.30, "Nuclear": 0.85}

# "force_fix": - if False, the source_capacity_factors dictionary values are applied only to the 
#                "estimated_generation_gwh" values whose Capacity Factor is > 1
#              - if True, all the "estimated_generation_gwh" values are fixed with the source_capacity_factors dictionary values
def fix_estimated_generation(gpp_df, source_capacity_factors, force_fix=False):
    gpp_df["capacity_factor"] = np.where(gpp_df["capacity_mw"] > 0, gpp_df["estimated_generation_gwh"] / (gpp_df["capacity_mw"]*24*365/1000), 0)
    for idx in range(gpp_df.shape[0]):
        if (gpp_df.loc[idx, 'capacity_factor'] > 1) or force_fix: 
            gpp_df.loc[idx, 'capacity_factor'] = source_capacity_factors[gpp_df.loc[idx, "primary_fuel"]]
            gpp_df.loc[idx, 'estimated_generation_gwh'] = gpp_df.loc[idx, "capacity_factor"] * gpp_df.loc[idx, "capacity_mw"] * 24*365/1000
    return gpp_df

fix_estimated_generation(power_plants_df,source_capacity_factors)
power_plants_df[['name', 'capacity_mw', 'primary_fuel', 'estimated_generation_gwh', 'capacity_factor']]

## **Step 2: Connect to the Google Earth Engine API**

In [None]:
#ee.Authenticate()

In [None]:
from kaggle_secrets import UserSecretsClient
from google.oauth2.credentials import Credentials
import ee
import folium

# Define a method for displaying Earth Engine image tiles on a folium map.
def add_ee_layer(self, ee_object, vis_params, name):
    
    try:    
        # display ee.Image()
        if isinstance(ee_object, ee.image.Image):    
            map_id_dict = ee.Image(ee_object).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
            ).add_to(self)
        # display ee.ImageCollection()
        elif isinstance(ee_object, ee.imagecollection.ImageCollection):    
            ee_object_new = ee_object.mosaic()
            map_id_dict = ee.Image(ee_object_new).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
            ).add_to(self)
        # display ee.Geometry()
        elif isinstance(ee_object, ee.geometry.Geometry):    
            folium.GeoJson(
            data = ee_object.getInfo(),
            name = name,
            overlay = True,
            control = True
        ).add_to(self)
        # display ee.FeatureCollection()
        elif isinstance(ee_object, ee.featurecollection.FeatureCollection):  
            ee_object_new = ee.Image().paint(ee_object, 0, 2)
            map_id_dict = ee.Image(ee_object_new).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
        ).add_to(self)
    
    except:
        print("Could not display {}".format(name))

def plot_ee_data_on_map(dataset,column,begin_date,end_date,minimum_value,maximum_value,latitude,longitude,zoom):
    # https://github.com/google/earthengine-api/blob/master/python/examples/ipynb/ee-api-colab-setup.ipynb
    folium.Map.add_ee_layer = add_ee_layer
    vis_params = {
      'min': minimum_value,
      'max': maximum_value,
      'palette': ['006633', 'E5FFCC', '662A00', 'D8D8D8', 'F5F5F5']}
    my_map = folium.Map(location=[latitude,longitude], zoom_start=zoom, height=500)
    s5p = ee.ImageCollection(dataset).filterDate(
        begin_date, end_date)
    my_map.add_ee_layer(s5p.first().select(column), vis_params, 'Color')
    my_map.add_child(folium.LayerControl())
    display(my_map)

In [None]:
#!cat ~/.config/earthengine/credentials

In [None]:
from kaggle_secrets import UserSecretsClient
user_secret = "earth_engine" # Your user secret, defined in the add-on menu of the notebook editor
refresh_token = UserSecretsClient().get_secret(user_secret)
credentials = Credentials(
        None,
        refresh_token=refresh_token,
        token_uri=ee.oauth.TOKEN_URI,
        client_id=ee.oauth.CLIENT_ID,
        client_secret=ee.oauth.CLIENT_SECRET,
        scopes=ee.oauth.SCOPES)
ee.Initialize(credentials=credentials)

# **Step 3: Preparing data for analysis using the Google Earth Engine API**


Here I would like to create the following functions for further analysis
1. A function to accept powerplant location and generate bounding box. This bounding box would be used for sampling different satellite measurements such as NO2 VCD, temperature, wind direction and so on. For NO2 VCD, I would use troposperhic NO2 VCD direct as it can reflect the situation within sensible elevation level.
2. A function to generate reduce images by whatever summarizing functions such as mean and median by whatever fixed duration such as month, week or day. This will help for comparison to identify difference in emission between days with and without power plant operation. Of course, this is under an assumption that the powerplants were shutdown for some reasons and such scenarios were recorded by the satellites.
3. Some comparison functions, and I would compare data from two different levels. One is Island-wise scale and another is power-plant scale. By observing island-wise change of NO2 VCD over time, I would like to identify background emission level. Consequently, I could apply the background emission to each power-plant, do comparison over time and find out relevant emission. 

In [None]:
#Filter points from power_plants_df
#Args:
#        gdf (geoPandas.GeoDataFrame) : the input geodataframe
#        fuel type : 'Hydro', 'Oil', 'Solar', 'Gas', 'Coal', 'Wind'
#    Returns:
#        fc (ee.FeatureCollection) : feature collection (server side)  
def FilterPointbyFuelType(df,fuelcolumn,fueltype):
    #'primary_fuel'
    fy = df[df[fuelcolumn].isin(fueltype)]
    features=[]
    for index, row in fy.iterrows():
        g=ee.Geometry.Point([row['geometry'].x,row['geometry'].y])
        feature = ee.Feature(g,{'name':ee.String(row['name']), 'primary_fuel':ee.String(row['primary_fuel']),\
                                'capacity_mw':ee.Number(row['capacity_mw']), 'estimated_generation_gwh':ee.Number(row['estimated_generation_gwh']),\
                                'capacity_factor':ee.Number(row['capacity_factor'])})
        features.append(feature)
    return ee.FeatureCollection(features)

In [None]:
# Select power plants based on primary fuel type 'Coal' only
ftype = ['Coal']
# Convert geopandas dataframe to ee.featurecollection
sel = FilterPointbyFuelType(power_plants_df,'primary_fuel',ftype)
#sel.getInfo()
print('No.of power plants selected:', sel.size().getInfo(), ' for ', ftype)
type(sel)

In [None]:
#print(sel.size().getInfo())
#print(type(sel))
#Create a bounding box for each powerplant for sampling purpose
#create a buffer zone for given point type geodataframe, then find its bounding box
#use the bounding box to define feature

#Define buffer distance, i.e. 1000 m
size=1000
#Define a bounding boxes with dimensions, 2x size * 2x size, i.e. 2000 m x 2000 m
boundingBoxes = sel.map(lambda f: f.buffer(size).bounds())
#print(boundingBoxes.)
#apply the bounding box to extract image
#Calculate mean value by applying chart by region        
    


Here I would perform data prepation, i.e. obtain images, reduction, and generate images.

In [None]:
import folium
from folium import plugins

# Define a bounding box for the island and it should be a bit larger to cover enough sea area at South.
PR_geometry = ee.Geometry.Rectangle([-67.32, 17.70, -65.19, 18.56])

testimg = ee.ImageCollection('COPERNICUS/S5P/OFFL/L3_NO2').filter(ee.Filter.calendarRange(2020, 2020, 'year'))\
        .filter(ee.Filter.calendarRange(1, 1, 'month')).filterBounds(PR_geometry)

testimg.select('tropospheric_NO2_column_number_density')
print('no.of images in test', testimg.size().getInfo())

reducedtest = testimg.reduce(ee.Reducer.mean())

# Add EE drawing method to folium.
folium.Map.add_ee_layer = add_ee_layer

# Set visualization parameters.
band_viz = {
  'min': 0,
  'max': 0.00010,
  'palette': ['black', 'blue', 'purple', 'cyan', 'green', 'yellow', 'red']
}

# Create a folium map object.
my_map = folium.Map(location=[17.95,-66.15], zoom_start=9, height=500)

# Add the exampleImg model to the map object.
my_map.add_ee_layer(reducedtest.select('NO2_column_number_density_mean').clip(PR_geometry), band_viz, 'reducedtest')
#Add the boundingBoxes
my_map.add_ee_layer(boundingBoxes, {'palette': ['FF0000']},'boundingBoxes') #FF0000: red, FFFF00: yellow

# Add a layer control panel to the map.
my_map.add_child(folium.LayerControl())
plugins.Fullscreen().add_to(my_map)

# Display the map.
display(my_map)


In [None]:
# Here I will select images by a date range, and bounding box, then reduce it by month. However, not all the months have images,
# so I need to remove entry without image before further processing.

import folium
from folium import plugins

# Define a bounding box for the island and it should be a bit larger to cover enough sea area at South.
PR_geometry = ee.Geometry.Rectangle([-67.32, 17.70, -65.19, 18.56])

# Define time range
startyear = 2017
endyear = 2020
 
# Set date in ee date format
startdate = ee.Date.fromYMD(startyear,7,1) # 2017/7/1
enddate = ee.Date.fromYMD(endyear,2,28) # 2020/2/28

# Select sentinel 5P NO2 images filtered by PR_geometry
imgcollection = ee.ImageCollection('COPERNICUS/S5P/OFFL/L3_NO2').filterBounds(PR_geometry)

print('total no. of selected images:', imgcollection.size().getInfo())
print(imgcollection.first().bandNames().getInfo())
# There are 12 bands.
#['NO2_column_number_density', 'tropospheric_NO2_column_number_density', 'stratospheric_NO2_column_number_density', 
# 'NO2_slant_column_number_density', 'tropopause_pressure', 'absorbing_aerosol_index', 'cloud_fraction', 'sensor_altitude', 
# 'sensor_azimuth_angle', 'sensor_zenith_angle', 'solar_azimuth_angle', 'solar_zenith_angle']

# Add EE drawing method to folium.
folium.Map.add_ee_layer = add_ee_layer

# Set visualization parameters.
band_viz = {
  'min': 0,
  'max': 0.00010,
  'palette': ['black', 'blue', 'purple', 'cyan', 'green', 'yellow', 'red']
}
# Create a folium map object.
my_map = folium.Map(location=[17.95,-66.15], zoom_start=9, height=500)

# Create list of dates for time series
# first, find out number of months between end date and start date
n_months = enddate.difference(startdate, 'month').round()
# Make a day of month sequence from 0 to n_months-1 with a 1-month step.
months = ee.List.sequence(0, n_months.subtract(1), 1)

# Use start date as anchor point and generate list of date, each entry will be by n-month advancement
def make_datelist(n):
    return startdate.advance(n, 'month')

# Here apply the make_datelist function and generate dates
dates =  months.map(make_datelist)

# getm will reduce images by averaging values for images within the same month acccording to given date.
def getm(d1):
    d1 = ee.Date(d1)
    m = d1.get('month')
    y = d1.get('year')
    # Create image by mean reducer
    s = imgcollection.filter(ee.Filter.calendarRange(y, y, 'year'))\
    .filter(ee.Filter.calendarRange(m, m, 'month')).mean()
    return s\
           .set('month', m)\
           .set('year', y)\
           .set('count', s.bandNames().length())

# This will create a list of images
listofimages = dates.map(getm)
# Create an ImageCollection
monthlyCol = ee.ImageCollection.fromImages(listofimages)    

# Remove null band images by selecting count greater than or equal to 1
monthlyCol = monthlyCol.filter(ee.Filter.gte('count', 1))

# print out no. of images
print('no. of remaindar images with bands:', monthlyCol.size().getInfo())

# print('bandnames: ', monthlyCol.first().bandNames().getInfo())

# Call the first image for visual check
img = ee.Image(monthlyCol.first())

# Add img to folium
my_map.add_ee_layer(img.select('tropospheric_NO2_column_number_density').clip(PR_geometry), band_viz, 'tropospheric_NO2_column_number_density')
# Add bounding Boxes
my_map.add_ee_layer(boundingBoxes, {'palette': ['FF0000']},'boundingBoxes') #FF0000: red, FFFF00: yellow
# Add study area
my_map.add_ee_layer(PR_geometry, {'palette': ['FFFF00']},'PR') #FF0000: red, FFFF00: yellow

# Add a layer control panel to the map.
my_map.add_child(folium.LayerControl())
plugins.Fullscreen().add_to(my_map)

# Display the map.
display(my_map)


And now, I have boundingbox for each selected power plant (the red boxes), and monthly average NO2 measurements for the entire island from 7/2017 to 2/2020, i.e. 32 images. Following, I need to find average tropospheric NO2 measurement for each box or you can think of, for each powerplant. Then I need to compile it as a panda dataframe for time series analysis. So, in the dataframe, it has the following fields: Power plant name, NO2 value (mean), Month, and Year.

In [None]:
# ==========================================================================
# Function to Convert Feature Classes to Pandas Dataframe
# Adapted from: https://events.hpc.grnet.gr/event/47/material/1/12.py
def fc2df(fc):
    # Convert a FeatureCollection into a pandas DataFrame
    # Features is a list of dict with the output
    features = fc.getInfo()['features']

    dictarr = []

    for f in features:
        # Store all attributes in a dict
        attr = f['properties']
        # and treat geometry separately
        attr['geometry'] = f['geometry']  # GeoJSON Feature!
        # attr['geometrytype'] = f['geometry']['type']
        dictarr.append(attr)

    df = gpd.GeoDataFrame(dictarr)
    # Convert GeoJSON features to shape
    df['geometry'] = map(lambda s: shape(s), df.geometry)
    return df

# ==========================================================================
# Function to iterate over image collection, returning a pandas dataframe
def extract_point_values(img_id, polys):
    image = ee.Image(img_id)

    fc_image_red = image.reduceRegions(collection=polys,
                                  reducer=ee.Reducer.mean(),
                                  scale=30)

    # Convert to Pandas Dataframe
    df_image_red = fc2df(fc_image_red)

    # Add Date as Variable
    df_image_red['band'] = image.bandNames()

    return df_image_red

# ==========================================================================
# Function to iterate over image collection, returning a pandas dataframe
def _extract_point_values_map(img):
    image = ee.Image(img)
    # fc_image_red is feature collection containing mean value after reduceRegions
    fc_image_mean = image.reduceRegions(collection=boundingBoxes,
                                  reducer=ee.Reducer.mean(),
                                  scale=30)
    # Convert to Pandas Dataframe
    df_image_mean = fc2df(fc_image_mean)
    
    # Select required field only
    df_image_mean = df_image_mean[['NO2_column_number_density',
     'NO2_slant_column_number_density',
     'absorbing_aerosol_index',
     'capacity_factor',
     'capacity_mw',
     'cloud_fraction',
     'estimated_generation_gwh',
     'name',
     'primary_fuel',
     'stratospheric_NO2_column_number_density',
     'tropopause_pressure',
     'tropospheric_NO2_column_number_density',
     'geometry']]
    # Add Date as Variable    
    df_image_mean['month'] = image.getInfo()['properties']['month']
    df_image_mean['year'] = image.getInfo()['properties']['year']
    
    return df_image_mean
# ==========================================================================

# ==========================================================================
# Function to iterate over image collection, returning a pandas dataframe
# mean
def extract_point_values_map2(img):
    image = ee.Image(img)
    # fc_image_red is feature collection containing mean value after reduceRegions
    fc_image_mean = image.reduceRegions(collection=boundingBoxes,
                                  reducer=ee.Reducer.mean(),
                                  scale=30)
    # Convert to Pandas Dataframe
    df_image_mean = fc2df(fc_image_mean) 
    # Add Date as Variable    
    df_image_mean['month'] = image.getInfo()['properties']['month']
    df_image_mean['year'] = image.getInfo()['properties']['year']
    
    return df_image_mean
# ==========================================================================

# ==========================================================================
# Function to iterate over image collection, returning a pandas dataframe
# min
def extract_point_values_map3(img):
    image = ee.Image(img)
    # fc_image_red is feature collection containing mean value after reduceRegions
    fc_image_min = image.reduceRegions(collection=boundingBoxes,
                                  reducer=ee.Reducer.min(),
                                  scale=30)
    # Convert to Pandas Dataframe
    df_image_min = fc2df(fc_image_min) 
    # Add Date as Variable    
    df_image_min['month'] = image.getInfo()['properties']['month']
    df_image_min['year'] = image.getInfo()['properties']['year']
    
    return df_image_min
# ==========================================================================


# ==========================================================================
# Function to iterate over image collection, returning a pandas dataframe
def reduceImg(img):
    # fc_image_red is a mean value after reduceRegions
    image_mean = image.reduceRegions(collection=boundingBoxes,
                                  reducer=ee.Reducer.mean(),
                                  scale=30)
    image_mean = image_mean.set()
    return image_mean
# ==========================================================================

In [None]:
varImage = monthlyCol
listOfImages = varImage.toList(varImage.size())
#### Create Initial Pandas Dataframe

df_all_mean = extract_point_values_map2(listOfImages.get(0))
df_all_min = extract_point_values_map3(listOfImages.get(0))
numOfImage = varImage.size().getInfo()
li = range(1,numOfImage)
for i in li:
    df_all_mean = df_all.append(extract_point_values_map2(listOfImages.get(i)))
    df_all_min = df_all.append(extract_point_values_map3(listOfImages.get(i)))

#Convert ['year','month'] to date
df_all_mean['DATE'] = pd.to_datetime(df_all_mean[['year', 'month']].assign(DAY=1)).dt.to_period('M')
df_all_min['DATE'] = pd.to_datetime(df_all_min[['year', 'month']].assign(DAY=1)).dt.to_period('M')


In [None]:
#df_all_mean.drop(df_all_mean.index, inplace=True)
#df_all_min.drop(df_all_mean.index, inplace=True)

In [None]:
#### Display Results
df_all_min

In [None]:
df_all_mean

In [None]:
df_all_min[['NO2_column_number_density',
 'NO2_slant_column_number_density',
 'absorbing_aerosol_index',
 'capacity_factor',
 'capacity_mw',
 'cloud_fraction',
 'estimated_generation_gwh',
 'name',
 'primary_fuel',
 'stratospheric_NO2_column_number_density',
 'tropopause_pressure',
 'tropospheric_NO2_column_number_density',
 'geometry',
 'month',
 'year']]

In [None]:
df_all_mean[['NO2_column_number_density',
 'NO2_slant_column_number_density',
 'absorbing_aerosol_index',
 'capacity_factor',
 'capacity_mw',
 'cloud_fraction',
 'estimated_generation_gwh',
 'name',
 'primary_fuel',
 'stratospheric_NO2_column_number_density',
 'tropopause_pressure',
 'tropospheric_NO2_column_number_density',
 'geometry',
 'month',
 'year']]

**Finding Background NO2**

To determine NO2 emission generating from power plants, it is crucial to understand the background NO2. To do this, one quick way is to observe NO2 measurements when the powerplants were out of services or at very low electricity generation. According to the news, [Hurricane Maria](https://www.mercycorps.org/blog/quick-facts-hurricane-maria-puerto-rico) hitted the island and electricity was cut off to 100 percent of the island in end of September 2017. As such, we would take NO2 samples from October to December 2017 and comparing it with that recorded from July to September 2017. It is expected that there is decreasing NO2 trend between the two periods.

Further, I would take samples before and after the events to see any difference. 

Things to do:
1. Identify any significant difference of NO2 emission over time.
    - Preprocess images from S5P and GFS
2. Calculate true NO2 emission generated by power plants
3. Calculate Emission Factor




In [None]:
list(df_all.columns)

In [None]:
# Plotting
#
import matplotlib.pyplot as plt
import seaborn as sns
df_all_date=df_all.set_index('DATE')
sns.set(rc={'figure.figsize':(11, 4)})
cols_plot = ['tropospheric_NO2_column_number_density', 'NO2_column_number_density','NO2_slant_column_number_density']
df_all_date[cols_plot].plot(linewidth=0.5);

In [None]:
#gldas gives 

gldas = ee.ImageCollection('NASA/GLDAS/V021/NOAH/G025/T3H').filterDate(startdate, enddate).filterBounds(boundingBoxes)
 #.filterBounds(ee.Geometry.Point(-66.15, 17.95)) \

# Get the number of images.
count = gldas.size()
print('Counting gldas: ', str(count.getInfo())+'\n')

# Get information about the bands as a list.
bandName = gldas.first().bandNames()
print('Band name: ', bandName.getInfo()); # ee.List of band names

** Step 6: Explore the weather data data using the Google Earth Engine API**
1. Try to get the low wind dates according to the bounding boxes.
2. Check timeseries for 'u_component_of_wind_10m_above_ground' and 'v_component_of_wind_10m_above_ground'.


In [None]:

image = ee.Image('NOAA/GFS0P25').filterDate(startdate, enddate)

# Create a time series chart.
tempTimeSeries = ui.Chart.image.seriesByRegion(
    image, boundingBoxes, ee.Reducer.mean(), 'temperature_2m_above_ground', 200, 'system:time_start', 'label')
        .setChartType('ScatterChart')
        .setOptions({
          title: 'Temperature over time in regions of the bounding boxes',
          vAxis: {title: 'Temperature (°C)'},
          lineWidth: 1,
          pointSize: 4,
          series: {
            0: {color: 'FF0000'}, // urban
            1: {color: '00FF00'}, // forest
            2: {color: '0000FF'}  // desert
}});

// Display.
print(tempTimeSeries);

In [None]:
gfs = ee.ImageCollection('NOAA/GFS0P25').filterDate(startdate, enddate).filterBounds(boundingBoxes)
 #.filterBounds(ee.Geometry.Point(-66.15, 17.95)) \

# Get the number of images.
count = gfs.size()
print('Counting gfs: ', str(count.getInfo())+'\n')

# Get information about the bands as a list.
bandName = gfs.first().bandNames()
print('Band name(gfs): ', bandName.getInfo()); # ee.List of band names

#gfsv2 = gfs.filter(ee.Filter.lte('v_component_of_wind_10m_above_ground', 2))

#Get statistics for a property of the images in the collection.
windStats = gfs.aggregate_stats('v_component_of_wind_10m_above_ground');
print('v_component_of_wind_10m_above_ground statistics: ', windStats.getInfo());

range = gfs.reduceColumns(ee.Reducer.minMax(), ['v_component_of_wind_10m_above_ground'])
print('Wind speed range: ', ee.Number(range.get('min')).getInfo(), ee.Number(range.get('max')).getInfo())
type(range)


# Get the number of images.
#count = gfsv2.size()
#print('Counting gfs_v1: ', str(count.getInfo())+'\n')

##########
## Want to know the which date having calm wind for getting error-free NO2 observation.