**PORJECT OVERVIEW**

***Develop a methodology to calculate an average historical emissions factor of electricity generated for a sub-national region, using remote sensing data and techniques.***

**PROBLEM STATEMENT**

***Measuring Emissions factors from Satellite Data***

**To develop a methodology to calculate an average historical emissions factor for electricity generation in a sub-national region. **

**DATA PROVIDED**

Initial list of datasets covering the geographic boundary of Puerto Rico to serve as the foundation for this analysis. As an island, there are fewer confounding factors from nearby areas. Calculate the annual historical emission factor

Bonus points will be awarded for smaller time slices of the average historical emissions factors, such as one per month for the 12-month period
Additional bonus points will be awarded for participants that develop methodologies for calculating marginal emissions factors for the sub-national region.

* An emissions factor is a representative value that attempts to relate the quantity of a pollutant released to the atmosphere with an activity associated with the release of that pollutant. 


> These factors are usually expressed as the weight of pollutant divided by a unit weight, volume, distance, or duration of the activity emitting the pollutant (e.g., kilograms of particulate emitted per megagram of coal burned).

> Such factors facilitate estimation of emissions from various sources of air pollution. In most cases, these factors are simply averages of all available data of acceptable quality, and are generally assumed to be representative of long-term averages for all facilities in the source category (i.e., a population average).

**The general equation for emissions estimation is:**

*E = A x EF x (1-ER/100)*

**where:**

    E = emissions;
    
    A = activity rate;
    
    EF = emission factor, and
    
    ER =overall emission reduction efficiency, %





**PROBLEM APPROACH OR METHODOLOGY**

    1. Analyse the temperature, wind speed and wind direction data to see if they correlate with the NO2 concentration 
    2. Perform principal component analysis to isolate pollution attributable to the power units
    3  calculate the emission of NO2
    4. calculate the emission factor
    5. calculate the marginal emission factor
    
  

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import os
import rasterio as rio
import folium 
import seaborn as sns
from kaggle_secrets import UserSecretsClient
from google.oauth2.credentials import Credentials
from os import listdir
import ee

# Import the Image function from the IPython.display module. 
from IPython.display import Image
from folium import plugins 
from PIL import Image
# Analysing datetime
import datetime as dt
from datetime import datetime 

import cartopy.crs as ccrs
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
import matplotlib.ticker as mticker


%matplotlib inline
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

# import os
# for dirname, t, filenames in os.walk('/kaggle/input'): 
#     for filename in filenames:
#         print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.




In [None]:
# retrieve te credentials
!cat ~/.config/earthengine/credentials

In [None]:


user_secret = "credientials" # Your user secret, defined in the add-on menu of the notebook editor
refresh_token = UserSecretsClient().get_secret(user_secret)
credentials = Credentials(
       None,
       refresh_token=refresh_token,
       token_uri=ee.oauth.TOKEN_URI,
       client_id=ee.oauth.CLIENT_ID,
       client_secret=ee.oauth.CLIENT_SECRET,
       scopes=ee.oauth.SCOPES)

# Initialize the library.

ee.Initialize(credentials=credentials)

**Analysing images **

In [None]:

# Load Images ifrom earth engine


roi = ee.Geometry.Polygon( [[[-67.32297404549217, 18.563112930177304],
                            [-67.32297404549217,17.903121359128956],
                            [-65.19437297127342, 17.903121359128956],
                            [-65.19437297127342, 18.563112930177304]]],
                         None, False);

power_plants_table = ee.FeatureCollection("WRI/GPPD/power_plants")\
.filterBounds(roi)


EXPLORE THE PLANT TABLE

In [None]:
# Get the plant data from earth engine into a dataframe

def plant_database(ft):
    
    
    names=[]
    fuel_types=[]
    capa=[]
    est_growth=[]
    lats=[]
    lons=[]
    
    
    for feat in ft.getInfo()['features']:
        
        lat=feat['properties']['latitude']
        long=feat['properties']['longitude']
        fuel_type=feat['properties']['fuel1']
        name=feat['properties']['name']
        capacity=feat['properties']['capacitymw']
        est_gr=feat['properties']['gwh_estimt']

        names.append(name)
        fuel_types.append(fuel_type)
        lats.append(lat)
        lons.append(long)
        capa.append(capacity)
        est_growth.append(est_gr)
    
    data=pd.DataFrame({'latittude':lats,'longitude':lons,'name':names,'fuel_type':fuel_types,'capacity':capa,'estimated_growth':est_growth})
    
    return data




plant_datab=plant_database(power_plants_table)


In [None]:
plant_datab.head()

In [None]:
# plot the fuel type by the sum of capacity
plant_datab.groupby(['fuel_type']).sum()['capacity'].plot(kind='bar')

Oil has the highest electricity generation , followed by Gas
This would mean that Oil plant factories would produced more emissions than the others

In [None]:
# plot the distribution of electricity generation with the plant unit location
plant_datab.plot(kind="scatter", x="longitude", y="latittude",
    s=plant_datab['capacity'], label="electricity generated",
    c="estimated_growth", cmap=plt.get_cmap("jet"),
    colorbar=True, alpha=0.4, figsize=(10,7),
)
plt.legend()
plt.show()

the Two plant units with fairely red painting has the highest electricity estimates per hour but with medium sized capacity, possibly this is a plant unit with Oil as fuel type

More than half of the plant units have about the same capacity and electricity estimates

The largest points which are in blue circles, indicate that plant units with large capacity has less electricity estimate per hour

There are points on the chart that has very small capacity and also very small electricity estimates, this may be due to the fuel type used by the plant unit

In [None]:
plant_datab.groupby(['capacity','fuel_type'])['estimated_growth'].sum().plot(kind='barh',figsize=(10,10))
plt.xlabel('estimated growth');

looks like plant units with the same fuel type produces about the same Electricity

**FUEL TYPE CAN BE A GOOD CHOICE TO USE WHEN ASSIGNING CLASSES TO THE FEATURE TABLE FOR CLASSIFICATION**

Exploring the satelite images

In [None]:
# code for adding layer to Map


# Define a method for displaying Earth Engine image tiles on a folium map.
def add_ee_layer(self, ee_object, vis_params, name):
    
    try:    
        # display ee.Image()
        if isinstance(ee_object, ee.image.Image):    
            map_id_dict = ee.Image(ee_object).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
            ).add_to(self)
        # display ee.ImageCollection()
        elif isinstance(ee_object, ee.imagecollection.ImageCollection):    
            ee_object_new = ee_object.mosaic()
            map_id_dict = ee.Image(ee_object_new).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
            ).add_to(self)
        # display ee.Geometry()
        elif isinstance(ee_object, ee.geometry.Geometry):    
            folium.GeoJson(
            data = ee_object.getInfo(),
            name = name,
            overlay = True,
            control = True
        ).add_to(self)
        # display ee.FeatureCollection()
        elif isinstance(ee_object, ee.featurecollection.FeatureCollection):  
            ee_object_new = ee.Image().paint(ee_object, 0, 2)
            map_id_dict = ee.Image(ee_object_new).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
        )
        # display ee.FeatureCollection()
        elif isinstance(ee_object, ee.feature.Feature):  
            ee_object_new = ee.Image().paint(ee_object, 0, 2)
            map_id_dict = ee.Image(ee_object_new).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
        ).add_to(self)
    
    except:
        print("Could not display {}".format(name))
    


# Add EE drawing method to folium.
folium.Map.add_ee_layer = add_ee_layer




**Satellite images Analysis and visualization**

 Map a function to generate monthly time slices per month for the 12-month period
 
 Data for Puerto Rico from July 2018 to July 2019. Exported from Earth Engine

**DERVING THE NO2 EMISSIONS FROM THE SENINEL IMAGES**


the wind direction takes on an important role in terms of the air pollution and air quality

the tro

In [None]:

# column density of NO2
tropospheric_NO2_column_number_density=ee.ImageCollection("COPERNICUS/S5P/OFFL/L3_NO2")\
.filterDate("2018-07-01","2018-07-03")\
.filterBounds(roi)\
.select('tropospheric_NO2_column_number_density')\


# wind data 
gfs_data=ee.ImageCollection("NOAA/GFS0P25")\
.filterDate("2018-07-01","2018-07-03")\
.filterBounds(roi)

#wind direction
wind_direction=ee.ImageCollection("NASA/GLDAS/V021/NOAH/G025/T3H")\
.filterDate("2018-07-01","2018-07-03")\
.filterBounds(roi)\
.select('Wind_f_inst')\

# temperature
temperature_data=ee.ImageCollection('NOAA/GFS0P25')\
.filterDate("2018-07-01","2018-07-03")\
.filterBounds(roi)\
.select('temperature_2m_above_ground')\



In [None]:
# display the two days of NO2 emission data
max = ee.ImageCollection(tropospheric_NO2_column_number_density).mean().reduceRegion(ee.Reducer.max(), roi, 3000).getInfo()['tropospheric_NO2_column_number_density']


min = ee.ImageCollection(tropospheric_NO2_column_number_density).mean().reduceRegion(ee.Reducer.min(), roi, 3000).getInfo()['tropospheric_NO2_column_number_density']



vizAnomaly = {
'min':min, 'max':max, 
'palette': ','.join(['#ffffcc','#ffeda0','#fed976','#feb24c','#fd8d3c','#fc4e2a','#e31a1c','#bd0026','#800026'])
 }



# Create a folium map object.
lat=18.200178; lon=-66.664513
my_map = folium.Map(location=[lat, lon],tiles="OpenStreetMap"  ,  zoom_start=9.4, height=700)
my_map.add_ee_layer(tropospheric_NO2_column_number_density.mean().clip(roi),vizAnomaly,'NO2 emission')
folium.TileLayer(opacity=0.42).add_to(my_map)
# Display the map.

for g in plant_datab.itertuples():

    folium.Marker(
    location=[g[1],g[2]],
    popup=str(g[3]),
    icon=folium.Icon(color='red', icon='bolt')).add_to(my_map)

display(my_map)


In [None]:
# export the latitude, longitude and array
def NO2_desnity_to_array(img):
    img = img.addBands(ee.Image.pixelLonLat())
 
    img = img.reduceRegion(reducer=ee.Reducer.toList(),\
                                        geometry=roi,\
                                        maxPixels=1e13,\
                                        scale=1000);
     
    
    no2 = np.array((ee.Array(img.get("tropospheric_NO2_column_number_density")).getInfo())) 
       
    lats = np.array((ee.Array(img.get("latitude")).getInfo()))
    lons = np.array((ee.Array(img.get("longitude")).getInfo()))
    
    
    df=pd.DataFrame({'lats':lats,'lons':lons,'no2_density':no2})
    
    return df



NO2_density=NO2_desnity_to_array(tropospheric_NO2_column_number_density.mean())

In [None]:

#convert data into imagge

# covert the lat, lon and array into an image
def toImage(data,image_name):
    img_data=data[image_name]
    # get the unique coordinates
    
    lats=data['lats']
    lons=data['lons']
    
    uniqueLats = np.unique(data['lats'])
    uniqueLons = np.unique(data['lons'])
 
    # get number of columns and rows from coordinates
    ncols = len(uniqueLons)
    nrows = len(uniqueLats)
 
    # determine pixelsizes
    ys = uniqueLats[1] - uniqueLats[0]
    xs = uniqueLons[1] - uniqueLons[0]
 
    # create an array with dimensions of image
    arr = np.zeros([nrows, ncols], np.float32) #-9999
 
    # fill the array with values
    counter =0
    for y in range(0,len(arr),1):
        for x in range(0,len(arr[0]),1):
            if lats[counter] == uniqueLats[y] and lons[counter] == uniqueLons[x] and counter < len(lats)-1:
                counter+=1
                arr[len(uniqueLats)-1-y,x] = img_data[counter] # we start from lower left corner
    return arr
 

In [None]:
# calculate wind speed and wind direction form 

# U = u_component_of_wind  , V = v_component_of_wind 

#wind speed = np.sqrt(U**2+ V**2)

# Calculate wind direction in radians:

#wind direction = np.arctan2(V,U)


def wind_speed_and_direction_mapper(imageCollection):
    # u compoment of wind
    U = imageCollection.select('u_component_of_wind_10m_above_ground')
    # v component of wind
    V = imageCollection.select('v_component_of_wind_10m_above_ground')
    
    U_component=ee.Image(U)
    
    V_component=ee.Image(V)
    
    wind_sped= U_component.pow(2).add( V_component.pow(2) ).sqrt()\
                                .set('system:time_start',imageCollection.get('system:time_start') )\
                                .rename("wind_speed")
    
    wind_direction= V_component.atan2(U_component)\
                                .set('system:time_start',imageCollection.get('system:time_start') ) \
                                .rename("wind_direction")
    U=U.set('system:time_start',imageCollection.get('system:time_start') )
    V=V.set('system:time_start',imageCollection.get('system:time_start') )
    
    return ee.Image(1).addBands([wind_sped,wind_direction,U.rename('U') ,V.rename('V')])




wind_data=ee.ImageCollection(gfs_data).map(wind_speed_and_direction_mapper)




In [None]:
# get the data fro wind speed and wind direction into a dataframe


# export the latitude, longitude and array
def LatLonResults(img):
    img = img.addBands(ee.Image.pixelLonLat())
 
    img = img.reduceRegion(reducer=ee.Reducer.toList(),\
                                        geometry=roi,\
                                        maxPixels=1e13,\
                                        scale=1000);
 
    wind_speed = np.array((ee.Array(img.get("wind_speed")).getInfo()))
    wind_direction = np.array((ee.Array(img.get("wind_direction")).getInfo()))
    
    U=np.array((ee.Array(img.get("U")).getInfo()))
    V=np.array((ee.Array(img.get("V")).getInfo()))
    
    lats = np.array((ee.Array(img.get("latitude")).getInfo()))
    lons = np.array((ee.Array(img.get("longitude")).getInfo()))
    
    df=pd.DataFrame({'lats':lats,'lons':lons,'wind_speed':wind_speed,'wind_direction':wind_direction,'U':U,'V':V })
    
    return df



data=LatLonResults(wind_data.mean())

In [None]:
# plotting wind speed against no2

# plot the mean wind_speeds angainst the mean no2 values
sns.regplot(x=data['wind_speed'], y=NO2_density['no2_density'],x_jitter=.05,x_estimator=np.mean);

the regression line shows a negative slop

this means an increase in wind speed decreases the no2 tropospher column density

and this is very true as pollution levels due to gas emissions in an environment are likely to be higher on low wind speed days

In [None]:
# plotting wind direction against no2

# plot the mean wind_speeds angainst the mean no2 values
sns.regplot(x=data['wind_direction'], y=NO2_density['no2_density'],x_jitter=.05,x_estimator=np.mean);

the regression line is nearly level but slightly tilted negatively

which infers that high increase in wind direction contribute to a small decrease in no2 troposphere column density

wind direction has a strong correlation with no2 and hence will be used in the classification process to isolate polution due to the plant units

In [None]:
# temperature data


# export the latitude, longitude and array
def temperature_to_array(img):
    img = img.addBands(ee.Image.pixelLonLat())
 
    img = img.reduceRegion(reducer=ee.Reducer.toList(),\
                                        geometry=roi,\
                                        maxPixels=1e13,\
                                        scale=1000);
     
    
    no2 = np.array((ee.Array(img.get("temperature_2m_above_ground")).getInfo())) 
       
    lats = np.array((ee.Array(img.get("latitude")).getInfo()))
    lons = np.array((ee.Array(img.get("longitude")).getInfo()))
    
    
    df=pd.DataFrame({'lats':lats,'lons':lons,'temerature':no2})
    
    return df



temp=temperature_to_array(temperature_data.mean())

In [None]:
temp_img=toImage(temp,'temerature')


fig=plt.figure(figsize=(8,6))
plt.imshow(temp_img)
plt.colorbar()
plt.show()

In [None]:
#plot temperature 2 above ground against no2 concentration

sns.regplot(x=temp['temerature'], y=NO2_density['no2_density'],x_jitter=.05,x_estimator=np.mean);

the regression line shows a slightly negative slop

this means an increase in temperature decreases the no2 concentration 2m above ground

and this is very true as pollution levels due to gas emissions in an environment are likely to be higher on low temperature days

the scatter of the mean no2 concentration is more towards temperatures from 26 degrees onward

this shows that no2 is produced more when temperature is higher

so temperature is also a good indicator for no2 pollution 


For Temperature with Plant units

In [None]:
# get temperature of each plant unit and plot with electricity generation 

temp_data=[]
lats=[]
lons=[]
capa=[]
fuel_type=[]
est_growth=[]
for g in plant_datab.itertuples():

    lats.append(g[1])
    lons.append(g[2])
    fuel_type.append(g[4])
    capa.append(g[5])
    est_growth.append(g[6])
    location=[g[2],g[1]]
    
    area=ee.Geometry.Point(location)
    
    info=ee.ImageCollection(temperature_data).mean().reduceRegion(ee.Reducer.sum(), area, 1000).getInfo()['temperature_2m_above_ground']
    
    
    temp_data.append(info)
    

    
    
temperature_plant_dt=pd.DataFrame({'lats':lats,'lons':lons,'fuel_type':fuel_type,'capacity':capa,'est_growth':est_growth,'temp':temp_data})    
    

In [None]:
temperature_plant_dt.plot(kind="scatter", x="lons", y="lats",
    s=temperature_plant_dt['capacity'], label="temperature",
    c="temp", cmap=plt.get_cmap("jet"),
    colorbar=True, alpha=0.4, figsize=(10,7),
)
plt.legend()
plt.show()

Most of te plant units here produced very high temperature when their electricity generating capacity is also high

Just a few plant units with low capacity gave a low temperature this might be due to the fact that plant units use different fuel type

In [None]:
temperature_plant_dt.groupby(['fuel_type']).sum()['temp'].plot(kind='bar')
plt.ylabel('temperature');

On this selected date range, Hydro and Gas gave the cumulative highest temperatures

In [None]:
temperature_plant_dt.groupby(['capacity','fuel_type'])['temp'].sum().plot(kind='barh',figsize=(10,10))
plt.xlabel('Temperature');

All plant units gave a fiarly equal temperature regardless of their capacity with one plant unit of fuel type Has Given the highest temperature 

*For NO2 with plant units*

In [None]:
# get no2 concentration of each plant unit and plot with electricity generation 

no2_data=[]
lats=[]
lons=[]
capa=[]
fuel_type=[]
est_growth=[]
for g in plant_datab.itertuples():

    lats.append(g[1])
    lons.append(g[2])
    fuel_type.append(g[4])
    capa.append(g[5])
    est_growth.append(g[6])
    location=[g[2],g[1]]
    
    area=ee.Geometry.Point(location)
    
    info=ee.ImageCollection(tropospheric_NO2_column_number_density)\
                                        .mean().reduceRegion(ee.Reducer.sum(), area, 1000)\
                                        .getInfo()['tropospheric_NO2_column_number_density']
    
    
    no2_data.append(info)
    

    
    
no2_data_plant_dt=pd.DataFrame({'lats':lats,'lons':lons,'fuel_type':fuel_type,'capacity':capa,'est_growth':est_growth,'no2':no2_data})    
    

In [None]:
no2_data_plant_dt.plot(kind="scatter", x="lons", y="lats",
    s=no2_data_plant_dt['capacity'], label="NO2 concentration",
    c="no2", cmap=plt.get_cmap("jet"),
    colorbar=True, alpha=0.4, figsize=(10,7),
)
plt.legend()
plt.show()

fairly, the plant units for the selected date of NO2 emission was low, even though their capacity for high

just two of those plant units with low capacity gave very high NO2 emission

In [None]:
no2_data_plant_dt.groupby(['fuel_type']).sum()['no2'].plot(kind='bar')
plt.ylabel('NO2 concentration');

Again, Hydrogen and Gas Gave the highest cumulative NO2 concentration on this day, THis may be due to the number of Hydrogen and Gas plants compared to the other plants

In [None]:
no2_data_plant_dt.groupby(['capacity','fuel_type'])['no2'].sum().plot(kind='barh',figsize=(10,10))
plt.xlabel('NO2 concentration');

**TEMPERATURE , WIND SPEED, WIND DIRECTION HAS AN INFLUENCE ON NO2 CONCENTRATION**

**USE A MONTHS DATA TO EXPERIMENT WITH THE METHODOLOGY**

 The project is to calculate the emission factor with respect to only electricity generation
 
 It is True that Puerto rico has a well distinct energy source which will be a high contribution factor to NO2 concentration
 
 But other factors are also there in Puerto Rico which contribute to NO2 Concentration such as traffic, e.t.c.
 
 Therefore we have to use a statistical model to distinguish NO2 concentration sources and isolate concentrations of NO2 attributable to electricity
 
 A statistical  method  in  analyzing  the  air pollution  sources  has  been  widely  used  using  the  Principal  Component  Analysis  (PCA) 
 
**PCA is a one of the multivariate statistical analysis which categorizing the data into similar group, provided it has relationship of each other and later known as a principal component.  **

In [None]:
# wind data 
gfs_data=ee.ImageCollection("NOAA/GFS0P25")\
.filterDate("2018-07-25","2018-07-30")\
.filterBounds(roi)


wind_data=ee.ImageCollection(gfs_data).map(wind_speed_and_direction_mapper)

# temperature
temp_d=ee.ImageCollection('NOAA/GFS0P25')\
.filterDate("2018-07-25","2018-07-30")\
.filterBounds(roi)\
.select('temperature_2m_above_ground')\


# NO2 imagecollection
NO2=ee.ImageCollection("COPERNICUS/S5P/OFFL/L3_NO2")\
.filterDate("2018-07-01","2018-07-30")\
.filterBounds(roi)\
.select('tropospheric_NO2_column_number_density')\

selector=['tropospheric_NO2_column_number_density','temperature_2m_above_ground','wind_speed','wind_direction']



ninput=NO2.mean().addBands([temp_d.mean(),wind_data.mean()]).select(selector)


#************************************************* PCA************************************************

# Get some information about the input to be used later.
scale = ninput.projection().nominalScale().getInfo()

bandNames = ninput.bandNames()

#Mean center the data to enable a faster covariance reducer and an SD stretch of the principal components.
#reducer, geometry, scale, crs, crsTransform, bestEffort, maxPixels, tileScale
meanDict = ninput.reduceRegion(ee.Reducer.mean(),roi,scale,maxPixels=1e9)

means = ee.Image.constant(meanDict.values(bandNames))
centered = ninput.subtract(means)





#This helper function returns a list of new band names.
def getNewBandNames(prefix):
    seq = ee.List.sequence(1, bandNames.length())    
    mlist=[]
    for x in seq.getInfo():
        new=f"{prefix}{x}"
        mlist.append(new)
    return ee.List(mlist)
                 
# This function accepts mean centered imagery, a scale and a region in which to perform the analysis.  
# It returns the Principal Components (PC) in the region as a new image.
                  
def getPrincipalComponents(centered, scale, region):
    #Collapse the bands of the image into a 1D array per pixel.
    arrays = centered.toArray()
  
    #Compute the covariance of the bands within the region.
    covar = arrays.reduceRegion(ee.Reducer.centeredCovariance(),region, scale,maxPixels=1e9)
    
    #Get the 'array' covariance result and cast to an array. This represents the band-to-band covariance within the region.
    covarArray = ee.Array(covar.get('array'))
  
    #Perform an eigen analysis and slice apart the values and vectors.
    eigens = covarArray.eigen()
    
    #This is a P-length vector of Eigenvalues.
    eigenValues = eigens.slice(1, 0, 1)
    
    #This is a PxP matrix with eigenvectors in rows.
    eigenVectors = eigens.slice(1, 1)
    
    #Convert the array image to 2D arrays for matrix computations.
    arrayImage = arrays.toArray(1)
    
    #Left multiply the image array by the matrix of eigenvectors.
    principalComponents = ee.Image(eigenVectors).matrixMultiply(arrayImage)
    
    
    #Turn the square roots of the Eigenvalues into a P-band image.
    sdImage = ee.Image(eigenValues.sqrt()).arrayProject([0]).arrayFlatten([getNewBandNames('sd')])
    #Turn the PCs into a P-band image, normalized by SD.
    
    return principalComponents.arrayProject([0]).arrayFlatten([getNewBandNames('pc')]).divide(sdImage)


In [None]:
#Get the PCs at the specified scale and in the specified region
pcImage = getPrincipalComponents(centered, scale, roi)


In [None]:
band = pcImage.bandNames().get(0).getInfo()
print(band)
print(pcImage.select([band]).reduceRegion(ee.Reducer.minMax(),roi,1000).getInfo())

the PCA images for NO2 concentration, temperature, wind speed and wind direction gives positive and neative values of PCA

When different variables are subjected to PCA, not all are positively correlated with each other. There are some variables which are negatively correlated. That's why, theoretically variance as represented by e-values in PCA may b negative or positive

PCA 1 = variance of other variables with respect to NO2 concentration

PCA 2 = variance with of other variables with respect to temperature

PCA 3 = variance with of other variables with respect to wind speed

PCA 4 = variance with of other variables with respect to wind direction


Therefore for PCA 1,2,3,4 , positive values represent areas that are positively correlated in terms of NO2 concentration, temperature, wind speed, and wind direction.

This gives us a fair idea of pollution sources

hence  highly positive correlated areas can represent pollution due to the power plants


In [None]:
#*****************   PCA1  tropospheric_NO2_column_number_density
 
# selector=['tropospheric_NO2_column_number_density','temperature_2m_above_ground','wind_speed','wind_direction']

# band = pcImage.bandNames().get(0).getInfo()
# print(band)
# print(pcImage.select([band]).reduceRegion(ee.Reducer.minMax(),roi,1000))
 
#Map.addLayer(pcImage.select([band]), {min:-3.2, max:2 ,palette:p1}, band);


vizAnomaly = {
'min':-1.3491126012357098, 'max': 6.6613191472965845, 
'palette': ','.join(['#ffffcc','#ffeda0','#fed976','#feb24c','#fd8d3c','#fc4e2a','#e31a1c','#bd0026','#800026'])
 }



# Create a folium map object.
lat=18.200178; lon=-66.664513
my_map = folium.Map(location=[lat, lon],tiles="OpenStreetMap"  ,  zoom_start=9.4, height=700)
my_map.add_ee_layer(pcImage.select(['pc1']).clip(roi),vizAnomaly,'pc1')
folium.TileLayer(opacity=0.1).add_to(my_map)

#add the power plants location
for g in plant_datab.itertuples():

    folium.Marker(
    location=[g[1],g[2]],
    popup=str(g[3]),
    icon=folium.Icon(color='blue', icon='info-sign')).add_to(my_map)



# Display the map.
display(my_map)


From PCA 1 image, we see that places with not power plant have very high PCA than places wit power plants, 

this might be due to pollution due to population and traffic or other pollution sources other than the power plants electricity generation

In [None]:
band = pcImage.bandNames().get(1).getInfo()
print(band)
print(pcImage.select([band]).reduceRegion(ee.Reducer.minMax(),roi,1000).getInfo())

In [None]:
#*****************PCA2 temperature

# band = pcImage.bandNames().get(1).getInfo()
 
# print(pcImage.select([band]).reduceRegion(ee.Reducer.minMax(),roi,1000))


vizAnomaly = {
'min':-2.8838120828363216, 'max': 24.497258166422363, 
'palette': ','.join(['#ffffcc','#ffeda0','#fed976','#feb24c','#fd8d3c','#fc4e2a','#e31a1c','#bd0026','#800026'])
 }



# Create a folium map object.
lat=18.200178; lon=-66.664513
my_map = folium.Map(location=[lat, lon],tiles="OpenStreetMap"  ,  zoom_start=9.4, height=700)
my_map.add_ee_layer(pcImage.select(['pc2']).clip(roi),vizAnomaly,'pc2')
folium.TileLayer(opacity=0.1).add_to(my_map)
# Display the map.

#add the power plants location
for g in plant_datab.itertuples():

    folium.Marker(
    location=[g[1],g[2]],
    popup=str(g[3]),
    icon=folium.Icon(color='red', icon='info-sign')).add_to(my_map)



display(my_map)


From PCA 2 image, we see that places with no or less number of power plant have low  PCA than places with power plants, 

The temperatures from power plants are higher than that of other pollution sources like traffic e.t.c

The temperatures might also differ from power plants due to the different fuel types

**hence PCA 2 is a good predictor of pollution source**


In [None]:
band = pcImage.bandNames().get(2).getInfo()
print(band)
print(pcImage.select([band]).reduceRegion(ee.Reducer.minMax(),roi,1000).getInfo())

In [None]:
#*****************PCA3

vizAnomaly = {
'min': -215968294.192117, 'max': 32334311.396379523, 
'palette': ','.join(['#ffffcc','#ffeda0','#fed976','#feb24c','#fd8d3c','#fc4e2a','#e31a1c','#bd0026','#800026'])
 }


 
# Create a folium map object.
lat=18.200178; lon=-66.664513
my_map = folium.Map(location=[lat, lon],tiles="OpenStreetMap"  ,  zoom_start=9.4, height=700)
my_map.add_ee_layer(pcImage.select(['pc3']).clip(roi),vizAnomaly,'pc3')
folium.TileLayer(opacity=0.1).add_to(my_map)

#add the power plants location
for g in plant_datab.itertuples():

    folium.Marker(
    location=[g[1],g[2]],
    popup=str(g[3]),
    icon=folium.Icon(color='green', icon='info-sign')).add_to(my_map)


# Display the map.
display(my_map)


From PCA 3 image, we see that the distribution of PCA scores is not really given us the information we need to know pollution sources

Because even areas which no power plant can either have high or low PCA score, **hence PCA 3 is not a good predictor of pollution source**

USING PCA 2 IMAGE TO MUSK OUT PIXELX IN NO2 TROPOSPHERE VERTICAL COLUMN TO GET NO2 CONCENTRATIONS ATTRIBUTABLE TO POWER PLANTS

In [None]:
# pca 2 image
pca_2=pcImage.select(['pc2'])

#no2 concentration image
no2_img=NO2.mean()

# since pca2 gave both positive and negative pca scores, we want to musk out pixels in no2 image that had negative variation in pca2 image

#Create a binary mask.
mask = pca_2.gt(0) # pixel greater than zero

# Update the composite mask with the pca 2 mask.
maskedComposite = no2_img.updateMask(mask)

print(maskedComposite.getInfo())

In [None]:
max = maskedComposite.reduceRegion(ee.Reducer.max(), roi, 3000).getInfo()['tropospheric_NO2_column_number_density']


min = maskedComposite.reduceRegion(ee.Reducer.min(), roi, 3000).getInfo()['tropospheric_NO2_column_number_density']


print(min,max)

vizAnomaly = {
'min':min, 'max':max, 
'palette': ','.join(['#e7e1ef','#c994c7','#dd1c77'])
 }



# Create a folium map object.
lat=18.200178; lon=-66.664513
my_map = folium.Map(location=[lat, lon],tiles="OpenStreetMap"  ,  zoom_start=9.4, height=700)
my_map.add_ee_layer(maskedComposite.clip(roi),vizAnomaly,'NO2 concentration')
folium.TileLayer(opacity=0.1).add_to(my_map)
# Display the map.

for g in plant_datab.itertuples():

    folium.Marker(
    location=[g[1],g[2]],
    popup=str(g[3]),
    icon=folium.Icon(color='red', icon='info-sign')).add_to(my_map)

display(my_map)




The image displayed gives a fiar image of no2 concentrations due to the power plants

NO2 emission = NO2 Concentration * Area occupied by the NO2 concentrations

In [None]:
no2_concentration = maskedComposite.clip(roi)

# tropospheric no2 vertical column in mol/m^2 ->
# calculate the pixel area in square meters
area_no2 = no2_concentration.multiply(ee.Image.pixelArea()).divide(1000*1000).multiply(1000000)




# reducing the statistics for the area
stat = area_no2.reduceRegion(ee.Reducer.sum(),roi,1000, maxPixels=1e9).getInfo()['tropospheric_NO2_column_number_density']

#get the sq meter area for no2 concentration
print ('NO2 concentration Area (in sq.meter) is', stat)


In [None]:
no2_emission= no2_concentration.multiply(stat)

In [None]:
max = no2_emission.reduceRegion(ee.Reducer.max(), roi, 3000).getInfo()['tropospheric_NO2_column_number_density']


min = no2_emission.reduceRegion(ee.Reducer.min(), roi, 3000).getInfo()['tropospheric_NO2_column_number_density']


print(min,max)

vizAnomaly = {
'min':min, 'max':max, 
'palette': ','.join(['#ffeda0','#feb24c','#f03b20'])
 }



# Create a folium map object.
lat=18.200178; lon=-66.664513
my_map = folium.Map(location=[lat, lon],tiles="OpenStreetMap"  ,  zoom_start=9.4, height=700)
my_map.add_ee_layer(no2_emission.clip(roi),vizAnomaly,'NO2 concentration')
folium.TileLayer(opacity=0.1).add_to(my_map)
# Display the map.

for g in plant_datab.itertuples():

    folium.Marker(
    location=[g[1],g[2]],
    popup=str(g[3]),
    icon=folium.Icon(color='red', icon='info-sign')).add_to(my_map)

display(my_map)


**EMISSION FACOTR (EF) = EMISSION / ACTICITY **

THE ACTIVITY GENERATING THE NO2 EMISSION IS THE ELECTRICITY GENERATION FROM THE POWER PLANTS

In [None]:
# calculate the total electricity generated
sum_capacity=plant_datab['capacity'].sum()


electricity_gen= sum_capacity

In [None]:
#Emission factor

emission_factor= no2_emission.divide(electricity_gen)

In [None]:
# reducing the statistics for the area
stat = emission_factor.reduceRegion (ee.Reducer.sum(),roi,1000, maxPixels=1e9).getInfo()['tropospheric_NO2_column_number_density']


print ('Emission factor for sub_urban area of peurto rico is', stat)


**MARGINAL EMISSION FACTOR**

The correct way to measure the impact of environmental decisions is to use marginal emissions factors.

Marginal emissions factors measure the actual environmental consequences of taking different potential actions on the power grid.

If the example city is evaluating an energy efficiency measure to conserve one megawatt-hour ofelectricity consumption, this program will reduce local emissions by reducing output at one or more power plants. But which power plants? Many sources of power, for example most solar

**Conserving energy only affects some power plants: those which can scale up or down in response, known as the “marginal” power plants.**

**Marginal emissions measure the emissions per kilowatt-hour only from these power plants, thus accurately measuring real-world results.**



**CALCULATING MARGINAL EMISSION FACTOR**

   * FIRST CALCULATE THE EFFICIENCY OF ELECTRICITY GENERATION POR EACH POWER UNIT*
    
   * SUM ALL THE EFFICIENCY OF THE POWER UNITS TO GET THE NET EFFICIENCY*
    
   * DIVIDE THE MEAN NO2 EMISION BY THE EFFIEINCY*
    
**MEF = MEAN NO2 EMISSION / NET EFFICIENCY  **  

In [None]:
#CALCULATE THE EFFICIENCY OF ELECTRICITY GENERATION POR EACH POWER UNIT
# EFFICIENCY = ELECTRICITY CAPACITY - ESTIMATED ELECTRICITY

# convert megawatt hour to gigawatt hours 
capacity_elect=(plant_datab['capacity'] /1000) * 24 

estimated_elect= plant_datab['estimated_growth']

plant_datab['efficiency']= estimated_elect - capacity_elect 

net_efficiency = plant_datab.efficiency.sum()

print ('Net Efficiency for power plants of peurto rico is', f'{net_efficiency} gigawatt hours')

In [None]:
# CALCULATING MARGINAL EMISSION FACTOR

emf= no2_emission.divide(net_efficiency)

# reducing the statistics for the area
stat = emf.reduceRegion (ee.Reducer.sum(),roi,1000, maxPixels=1e9).getInfo()['tropospheric_NO2_column_number_density']


print ('Marginal Emission factor for sub_urban area of peurto rico is', stat)



**NOW CALCULATE THE AVERAGE HISTORICAL EMISSION FACTOR AND MARGINAL EMISSION FACTOR USING THE ABOVE PROCEDURES FOR PUERTO RICO**

this time get the monthly 3 days data for temperature and wind data and average them and recompute the emision factor, this is because the gfs data is huge and always gives memory issuse

In [None]:

#calculate monthly data for temperature
# because of memory issues from google earth engine api, get the last 10 days data for each month and average it
def calcMonthlyMean_temp(years,months):
    mylist = ee.List([])
    for y in years:
        for m in months:
            if(y==2018 and m>6 ) or (y==2019 and m<8) :
                
                
                gfs_imagecollection=ee.ImageCollection("NOAA/GFS0P25")\
                                .filter(ee.Filter.date(ee.Date.fromYMD(y,m,1),ee.Date.fromYMD(y,m,3)))\
                                .filterBounds(roi)\
                                .select('temperature_2m_above_ground')
              
                
                w = gfs_imagecollection.filter(ee.Filter.calendarRange(y, y, 'year'))\
                .filter(ee.Filter.calendarRange(m, m, 'month')).mean()
            
                mylist = mylist.add(w.set('year', y)\
                                    .set('month', m)\
                                    .set('date', ee.Date.fromYMD(y,m,1))\
                                    .set('system:time_start',\
                                         ee.Date.fromYMD(y,m,1)))
            
    return ee.ImageCollection.fromImages(mylist)

In [None]:
years = [2018, 2019]

months = range(1,13)

# get monthly temperature data from gfs data
monthly_temperature= calcMonthlyMean_temp(years,months)

In [None]:
#calculate monthly data for wind speed and wind direction 
# because of memory issues from google earth engine api, get the last 10 days data for each month and average it
def calcMonthlyMean_wind(years,months):
    mylist = ee.List([])
    for y in years:
        for m in months:
            if(y==2018 and m>6 ) or (y==2019 and m<8) :
                
                
                gfs_imagecollection=ee.ImageCollection("NOAA/GFS0P25")\
                                .filter(ee.Filter.date(ee.Date.fromYMD(y,m,1),ee.Date.fromYMD(y,m,3)))\
                                .filterBounds(roi)\
                                .select(['u_component_of_wind_10m_above_ground','v_component_of_wind_10m_above_ground'])

              
                
                w = gfs_imagecollection.filter(ee.Filter.calendarRange(y, y, 'year'))\
                .filter(ee.Filter.calendarRange(m, m, 'month')).mean()
            
                mylist = mylist.add(w.set('year', y)\
                                    .set('month', m)\
                                    .set('date', ee.Date.fromYMD(y,m,1))\
                                    .set('system:time_start',\
                                         ee.Date.fromYMD(y,m,1)))
            
    return ee.ImageCollection.fromImages(mylist)

In [None]:
# wind data 

years = [2018, 2019]

months = range(1,13)

# get monthly wind speed and wind direction data from gfs data
monthly_wind_s_d= calcMonthlyMean_wind(years,months)
                

In [None]:
#no2 concentration
no2=ee.ImageCollection("COPERNICUS/S5P/OFFL/L3_NO2")\
.filterDate("2018-07-01","2019-07-01")\
.filterBounds(roi)\
.select('tropospheric_NO2_column_number_density')\

#temperature
temp = monthly_temperature

#wind_data
wind = monthly_wind_s_d
wind_data=ee.ImageCollection(wind).map(wind_speed_and_direction_mapper).select(['wind_speed','wind_direction'])


selector=['tropospheric_NO2_column_number_density','temperature_2m_above_ground','wind_speed','wind_direction']

ninput=no2.mean().addBands([temp.mean(),wind_data.mean()]).select(selector)


#************************************************* PCA ************************************************

# Get some information about the input to be used later.
scale = ninput.projection().nominalScale().getInfo()

bandNames = ninput.bandNames()

#Mean center the data to enable a faster covariance reducer and an SD stretch of the principal components.
#reducer, geometry, scale, crs, crsTransform, bestEffort, maxPixels, tileScale
meanDict = ninput.reduceRegion(ee.Reducer.mean(),roi,scale,maxPixels=1e9)

means = ee.Image.constant(meanDict.values(bandNames))
centered = ninput.subtract(means)

#Get the PCs at the specified scale and in the specified region
pcImage = getPrincipalComponents(centered, scale, roi)

# pca 2 image
pca_2=pcImage.select(['pc2'])

#************************************************* NO2 EMISSION ************************************************                             
#no2 concentration image
no2_img=no2.mean()

#Create a binary mask.
mask = pca_2.gt(0) # pixel greater than zero

# Update the composite mask with the pca 2 mask.
maskedComposite = no2_img.updateMask(mask)


no2_concentration = maskedComposite.clip(roi)


# tropospheric no2 vertical column in mol/m^2 ->
# calculate the pixel area in square meters
area_no2 = no2_concentration.multiply(ee.Image.pixelArea()).divide(1000*1000).multiply(1000000)

# reducing the statistics for the area
stat = ee.Image(area_no2).reduceRegion(ee.Reducer.sum(),roi,10000, maxPixels=1e9).getInfo()['tropospheric_NO2_column_number_density']


no2_emission= no2.mean().multiply(stat)

#************************************************* NO2 EMISSION FACTOR ************************************************                             
#  # convert megawatt  to gigawatt hours and calculate the total electricity generated
sum_capacity=((plant_datab['capacity'] /1000) * 24 ).sum()


electricity_gen= sum_capacity

#Emission factor
emission_factor= no2_emission.divide(electricity_gen).set('system:time_start',no2.get('system:time_start') ).rename("ef")

ef_stat = emission_factor.reduceRegion (ee.Reducer.sum(),roi,10000, maxPixels=1e9).getInfo()['ef']

#************************************************* NO2 MARGINAL EMISSION FACTOR ************************************************                             
#CALCULATE THE EFFICIENCY OF ELECTRICITY GENERATION POR EACH POWER UNIT
# EFFICIENCY = ELECTRICITY CAPACITY - ESTIMATED ELECTRICITY

# convert megawatt  to gigawatt hours 
capacity_elect=(plant_datab['capacity'] /1000) * 24 

estimated_elect= plant_datab['estimated_growth']

plant_datab['efficiency']= estimated_elect - capacity_elect 

net_efficiency = plant_datab.efficiency.sum()

# CALCULATING MARGINAL EMISSION FACTOR

mef= no2_emission.divide(net_efficiency).set('system:time_start',no2.get('system:time_start') ).rename("mef")

mef_stat = mef.reduceRegion (ee.Reducer.sum(),roi,10000, maxPixels=1e9).getInfo()['mef']

#************************************************* RESULTS ************************************************                             

print ('Average historical Emission factor for sub_urban area of peurto rico is', ef_stat)
print('-'*40)
print ('Average historical Marginal Emission factor for sub_urban area of peurto rico is', mef_stat)

**MODEL SCALABILITY AND ACCURACY**

because all computations where done using earth engine algorithms,which enables us to process large copous of data and get instant results, this method is very scalable as it can take very large geograhpic bounding box

the prinicpal component analysis used is very accurate in identifying pollution sources in the image data and hence this method is quite accurate


**MODEL DEPENDANCY**

1.this model improves the current emission factor calculation as it is able to clearly distinguish or isolate pollution related to electricity 

2.No need for data collection on fuel consumption or other difficult data parameters for computing emission factor


**RECOMMENDATION**

it is recommended that to make this model more roboust, we need to use classification algorithms and more factors such as aerosel data,ozone data and other weather data to make or improve the model