# Exploring L2B CH4 Plume Complexes

**Summary**  

In this notebook, we'll conduct a search and visualize the available methane plume complex observations, then select a region of interest (ROI) that shows several plumes that appear to be from the same source. We will then visualize a time series of plume data for the ROI, calculate the integrated methane enhancement for each plume. Lastly, to add more context to the plume complex timeseries, we will look at all EMIT acquisitions over the target regions and determine if there were factors such as clouds limiting plume detection, or simply no methane emissions during those acquisitions.

> **Note**: Throughout this notebook, several complex functions and workflows are used to process and visualize data. These can be found in the `emit_tools.py` and `tutorial_utils.py` modules.

**Background**

The EMIT instrument is an imaging spectrometer that measures light in visible and infrared wavelengths. These measurements display unique spectral signatures that correspond to the composition on the Earth's surface. The EMIT mission focuses specifically on mapping the composition of minerals to better understand the effects of mineral dust throughout the Earth system and human populations now and in the future. The EMIT instrument has also been used to successfully map methane point source emissions within its target mask. More details about EMIT and its associated products can be found in the **README.md** and on the [EMIT website](https://earth.jpl.nasa.gov/emit/).

The L2B Estimated Methane Plume Complexes ([EMITL2BCH4PLM](https://lpdaac.usgs.gov/products/emitl2bch4plmv001/)) product provides estimated methane plume complexes in parts per million meter (ppm m) along with uncertainty data. This product is a plume specific subset of the EMIT L2B Methane Enhancement Data ([EMITL2BCH4ENH](https://lpdaac.usgs.gov/products/emitl2bch4enhv001/)) product. Each EMITL2BCH4PLM granule is sized to a specific plume complex but may cross multiple EMITL2BCH4ENH granules. A list of source EMITL2BCH4ENH granules is included in the GeoTIFF file metadata as well as the GeoJSON file. Each EMITL2BCH4PLM granule contains two files: one Cloud Optimized GeoTIFF (COG) file at a spatial resolution of 60 meters (m) and one GeoJSON file. The EMITL2BCH4PLM COG file contains a raster image of a methane plume complex extracted from EMITL2BCH4ENH v001 data. The EMITL2BCH4PLM GeoJSON file contains a vector outline of the plume complex, a list of source scenes, coordinates of the maximum enhancement values, and the uncertainty of the plume complex. 

The EMITL2BCH4ENH product only includes granules where methane plume complexes have been identified. To reduce the risk of false positives, all EMITL2BCH4ENH data undergo a manual review (or identification and confirmation) process before being designated as a plume complex. For more information on the manual review process, see Section 4.2.2 of the [EMIT GHG Algorithm Theoretical Basis Document (ATBD)](https://lpdaac.usgs.gov/documents/1696/EMIT_GHG_ATBD_V1.pdf). 


**References**

**Requirements** 
 - Set up Python Environment - See **setup_instructions.md** in the `/setup/` folder
 - NASA Earthdata Login Account

**Data Used** 
 - [EMIT L2B Estimated Methane Plume Complexes (EMITL2BCH4PLM)](https://lpdaac.usgs.gov/products/emitl2bch4plmv001/)
 - [EMIT L2A Estimated Surface Reflectance and Uncertainty and Masks (EMITL2ARFL)](https://lpdaac.usgs.gov/products/emitl2arflv001/)

**Learning Objectives** 
 - Search for EMIT L2B Estimated Methane Plume Complexes
 - Visualize search results
 - Retrieve and visualize the EMIT L2B Estimated Methane Plume Complexes Metadata
 - Select a region of interest and build a time-series of plume data
 - Further investigate plume detection by looking at browse images and quality information

**Tutorial Outline**  

1. [**Search for EMIT L2B Estimated Methane Plume Complexes**](#search)
2. [**Creating a Timeseries from Plume Data**](#plume-timeseries)
3. [**Further investigation into plume detection**](#plume-detection)
4. [**Calculating the Integrated Methane Enhancement for Plumes**](#ime)

In [None]:
# Import required libraries
import sys
import os
import glob
import requests
import numpy as np
import pandas as pd
import xarray as xr
from osgeo import gdal
import geopandas as gpd

from datetime import datetime
import folium
import earthaccess
import folium.plugins
import rasterio as rio
import rioxarray as rxr

import hvplot
import hvplot.xarray
import hvplot.pandas

from branca.element import Figure
from IPython.display import display
from shapely.geometry.polygon import orient
from shapely.geometry import Point

sys.path.append('../modules/')
from emit_tools import emit_xarray, ortho_xr, ortho_browse
from tutorial_utils import list_metadata_fields, results_to_geopandas, convert_bounds

All of the data we use or save will go into the `methane_tutorial/` directory, so we can go ahead and define that filepath now, relative to this notebook.

In [None]:
methane_dir = '../../data/methane_tutorial/'

## 1. Search for EMIT L2B Estimated Methane Plume Complexes<a id='search'></a>

Use `earthaccess` to find all EMIT L2B Estimated Methane Plume Complexes (EMITL2BCH4PLM) data available from 2023. Define the date range, and concept-ids (unique product identifier) for the EMIT products that we want to search for, but leave the spatial arguments like `polygon` and `bbox` empty so we can preview detected methane plumes globally. 

In [None]:
# Data Collections for our search, using a dictionary
concept_ids = {'plumes':'C2748088093-LPCLOUD', 'reflectance':'C2408750690-LPCLOUD', 'enhancement': 'C2748097305-LPCLOUD'}
# Define Date Range
date_range = ('2023-01-01','2023-12-31')

In [None]:
results = earthaccess.search_data(
    concept_id=concept_ids['plumes'],
    temporal=date_range,
    count=2000
)

Convert the results to a `geopandas.GeoDataFrame` using a function from our tutorials module. This gives a nice way to organize and visualize the search results.

In [None]:
gdf = results_to_geopandas(results)
gdf

By default this function includes some fields, but you can add fields with a `fields` argument. To see all of the metadata available use the `list_metadata_fields` function imported from the `tutorial_utils.py` module.

In [None]:
list_metadata_fields(results)

Add an index column to the dataframe to include it in the tooltips for our visualization.

In [None]:
# Specify index so we can reference it with gdf.explore()
gdf['index']=gdf.index

In [None]:
# Set up Figure and Basemap tiles
fig = Figure(width="1080px",height="540")
map1 = folium.Map(tiles=None)
folium.TileLayer(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}',name='Google Satellite', attr='Google', overlay=True).add_to(map1)
folium.TileLayer(tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}.png',
                name='ESRI World Imagery',
                attr='Tiles &copy; Esri &mdash; Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community',
                overlay='True').add_to(map1)
fig.add_child(map1)
# Add Search Results gdf
gdf.explore("_single_date_time",
            categorical=True,
            style_kwds={"fillOpacity":0.1,"width":2},
            name="EMIT L2B CH4PLM",
            tooltip=[
                "index",
                "native-id",
                "_single_date_time",
            ],
            m=map1,
            legend=False
)

# Zoom to Data
map1.fit_bounds(bounds=convert_bounds(gdf.unary_union.bounds))
# Add Layer controls
map1.add_child(folium.LayerControl(collapsed=False))
display(fig)

In this example we'll choose a region that looks like it has a several plumes being emitted from the same source, a landfill in Jordan. To create a simple bounding box around our target region, we can just use the plumes that extend furthest in the cardinal directions to generate a bounding box around the region that we can use in our upcoming analysis. 

Create a list of these plumes.

In [None]:
# Note these index values can change if new data is added
plumes =[143,191,235]

Use our list of plumes to index our `GeoDataFrame` (gdf) and create a bounding box enveloping those geometries, then create a new `GeoDataFrame` with our new bounding box as the geometry.

In [None]:
bbox = gdf.loc[plumes].geometry.unary_union.envelope
bbox = orient(bbox, sign=1)
plume_bbox = gpd.GeoDataFrame({"name":['plume_bbox'], "geometry":[bbox]},crs=gdf.crs)
plume_bbox

Visualize our selected region, plumes, and bounding box.

In [None]:
# Set up Figure and Basemap tiles
fig = Figure(width="1080px",height="540")
map1 = folium.Map(tiles=None)
folium.TileLayer(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}',name='Google Satellite', attr='Google', overlay=True).add_to(map1)
folium.TileLayer(tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}.png',
                name='ESRI World Imagery',
                attr='Tiles &copy; Esri &mdash; Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community',
                overlay='True').add_to(map1)
fig.add_child(map1)
# Add Search Results gdf
plume_bbox.explore("name",
                   name='Plume BBox',
                   style_kwds={"fillOpacity":0,"width":2},
                   m=map1,
                   legend=False)

gdf.explore("_single_date_time",
            categorical=True,
            style_kwds={"fillOpacity":0.1,"width":2},
            name="EMIT L2B CH4PLM",
            tooltip=[
                "index",
                "native-id",
                "_single_date_time",
            ],
            m=map1,
            legend=False
)
# Zoom to Data
map1.fit_bounds(bounds=convert_bounds(plume_bbox.unary_union.bounds))
# Add Layer controls
map1.add_child(folium.LayerControl(collapsed=False))
display(fig)

We can save our bounding box to a `geojson` file if desired using the cell below.

In [None]:
plume_bbox.to_file(f'{methane_dir}jordan_plumes_bbox.geojson', driver='GeoJSON')

Subset our geodataframe of plumes to only those that intersect our bounding box.

In [None]:
plm_gdf = gdf[gdf.geometry.intersects(plume_bbox.geometry[0])]
plm_gdf

In [None]:
plm_gdf['_related_urls'].iloc[0]

Retrieve additional plume metadata contained in the EMIT L2B Estimated Methane Plume Complexes (EMITL2BCH4PLM) data product, which contains the maximum enhancement value, the uncertainty of the plume complex, and the list of source scenes.

In [None]:
def get_asset_url(row,asset, key='Type',value='GET DATA'):
    """
    Retrieve a url from the list of dictionaries for a row in the _related_urls column.
    Asset examples: CH4PLM, CH4PLMMETA, RFL, MASK, RFLUNCERT 
    """
    # Add _ to asset so string matching works
    asset = f"_{asset}_"
    # Retrieve URL matching parameters
    for _dict in row['_related_urls']:
        if _dict.get(key) == value and asset in _dict['URL'].split('/')[-1]:
            return _dict['URL']

In [None]:
#TODO - Improve implementation using asyncio/aiohttp
def fetch_ch4_metadata(row):
    response = requests.get(get_asset_url(row, 'CH4PLMMETA'))
    return response.json()['features'][0]['properties']

In [None]:
# Apply the function to each row and convert the result to a DataFrame
# plm_meta = plm_gdf.apply(fetch_ch4_metadata, axis=1).apply(pd.Series)

We can add the points with highest methane concentration to our visualization.

Create an index column, as we did for the plumes, then convert the latitude and longitude of max concentration to a shapely `Point` object and add it to our `GeoDataFrame`.

In [None]:
# # Specify index so we can reference it with gdf.explore()
# plm_meta['index'] = plm_meta.index
# # Add Geometry and convert to geodataframe
# plm_meta['geometry'] = plm_meta.apply(lambda row: Point(row['Longitude of max concentration'], row['Latitude of max concentration']), axis=1)
# plm_meta = gpd.GeoDataFrame(plm_meta, geometry='geometry', crs='EPSG:4326')

In [None]:
# plm_meta

Now add this to our visualization.

In [None]:
# # Set up Figure and Basemap tiles
# fig = Figure(width="1080px",height="540")
# map1 = folium.Map(tiles=None)
# folium.TileLayer(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}',name='Google Satellite', attr='Google', overlay=True).add_to(map1)
# folium.TileLayer(tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}.png',
#                 name='ESRI World Imagery',
#                 attr='Tiles &copy; Esri &mdash; Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community',
#                 overlay='True').add_to(map1)
# fig.add_child(map1)
# # Add Search Results gdf
# plume_bbox.explore("name",
#                    name='Plume BBox',
#                    style_kwds={"fillOpacity":0,"width":2},
#                    m=map1,
#                    legend=False)

# plm_gdf.explore("index",
#             categorical=True,
#             style_kwds={"fillOpacity":0.1,"width":2},
#             name="EMIT L2B CH4PLM",
#             tooltip=[
#                 "index",
#                 "native-id",
#                 "_single_date_time",
#             ],
#             m=map1,
#             legend=False
# )

# plm_meta.explore("index",
#             categorical=True,
#             style_kwds={"fillOpacity":0.1,"width":2},
#             name="Location of Max Concentration (ppm m)",
#             tooltip=[
#                 "DAAC Scene Names",
#                 "UTC Time Observed",
#                 "Max Plume Concentration (ppm m)",
#                 "Concentration Uncertainty (ppm m)",
#                 "Orbit"
#             ],
#             m=map1,
#             legend=False
# )
# # Zoom to Data
# map1.fit_bounds(bounds=convert_bounds(plume_bbox.unary_union.bounds))
# # Add Layer controls
# map1.add_child(folium.LayerControl(collapsed=False))
# display(fig)

## 2. Creating a Timeseries from Plume Data<a id='plume-timeseries'></a>

We can visualize a timeseries of these plumes that appear to be from the same source. To do this we'll generate a list of the COG urls for the plumes, then use rioxarray to build a timeseries based on the dates in the filenames.

In [None]:
# Iterate over rows in the plm_gdf and get the CH4PLM urls and store them in a list
plm_urls = plm_gdf.apply(lambda row: get_asset_url(row, asset='CH4PLM'), axis=1).tolist()
plm_urls

Now that we have a list of COG urls we can set our `gdal` configuration options to pass our NASA Earthdata login credentials when we access each COG.

In [None]:
# GDAL configurations used to successfully access LP DAAC Cloud Assets via vsicurl 
gdal.SetConfigOption('GDAL_HTTP_COOKIEFILE','~/cookies.txt')
gdal.SetConfigOption('GDAL_HTTP_COOKIEJAR', '~/cookies.txt')
gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN','EMPTY_DIR')
gdal.SetConfigOption('CPL_VSIL_CURL_ALLOWED_EXTENSIONS','TIF')
gdal.SetConfigOption('GDAL_HTTP_UNSAFESSL', 'YES')

Since the plumes have different extents and some are from the same scene, our first step to create a timeseries will be to create a standard grid to project all of our plumes onto. We'll use the first plume in our list to define the resolution of the grid, and our bounding box to define the extent.

Open the first plume in our list of urls and squeeze the `band` dimension.

In [None]:
plm = rxr.open_rasterio(plm_urls[0], masked=True).squeeze('band',drop=True)
plm

#TODO - is this the best way to do this?

Create a new `xarray.DataArray` to represent our extent and resolution that we match all of our others to.

In [None]:
new_extent = plume_bbox.total_bounds #[minx, miny, maxx, maxy]
# Calculate new raster shape using the new extent, maintaining the original resolution
height = int(np.ceil((new_extent[3] - new_extent[1]) / abs(plm.rio.resolution()[1])))
width = int(np.ceil((new_extent[2] - new_extent[0]) / plm.rio.resolution()[0]))
data = np.full((height,width),plm.rio.nodata)
coords = {'y':(['y'],np.arange(new_extent[1], new_extent[3], abs(plm.rio.resolution()[1]))),
          'x':(['x'],np.arange(new_extent[0], new_extent[2], plm.rio.resolution()[0]))}
to_match = xr.DataArray(data, coords=coords)
to_match.rio.write_crs(plm.rio.crs, inplace=True)
to_match

In [None]:
del plm

Next, our goal is to open the plumes from each url in our list, merge plumes acquired at the same time, then concatenating all of the plumes along a time dimension based on the acquisition time in the granule name.

To do this we will loop over our list of urls, open each plume and reproject it onto our `to_match` grid, merge plumes acquired at the same time, and store them in a dictionary where keys correspond to the acquisition time and values are the plume data in an `xarray.DataArray`.

In [None]:
plm_ts_dict = {}
# Set max retries for vsicurl errors
max_retries=5
# Iterate over plm urls
for url in plm_urls:
    # retrieve acquisition time from url
    acquisition_time = url.split('/')[-1].split('.')[-2].split('_')[-2]
    # list plumes identified in same scene if there are any
    same_scene = [url for url in plm_urls if acquisition_time in url.split('/')[-1].split('.')[-2].split('_')[-2]]
    to_merge = []
    # prevent duplicate processing of plumes from the same scene
    if acquisition_time not in list(plm_ts_dict.keys()):
        # Open and merge plumes identified from each scene
        for _plm in same_scene:
            print(f"Opening {_plm.split('/')[-1]}")
            # Try loop for vsicurl/unrecongnized format error
            for retry in range(max_retries):
                try:
                    # Open COG and squeeze band dimension
                    plm = rxr.open_rasterio(_plm).squeeze('band', drop=True)
                    break
                except Exception as e:
                    print(f'{e} Retrying...')
                else:
                    print(f"Failed to process {url} after {max_retries} retries. Please check to see you're authenticated with earthaccess.")
            # Add to list of plumes to merge
            to_merge.append(plm)
            # Merge plumes and add to timeseries we also need to use `reproject_match` here in case a plume extends outside of our bounding box, because of our lazy bbox construction.
            plm_ts_dict[acquisition_time] = rxr.merge.merge_arrays(to_merge,bounds=to_match.rio.bounds()).rio.reproject_match(to_match)

Now that we have all of our plumes on a standard grid, we can concatenate them along a time dimension to create a timeseries. Create an xarray variable called 'time' from our dictionary keys, then use `xarray.concat` to concatenate all of our plumes along the time dimension.

In [None]:
plm_time = xr.Variable('time', [datetime.strptime(t,'%Y%m%dT%H%M%S') for t in list(plm_ts_dict.keys())])
plm_time

In [None]:
plm_ts_ds = xr.concat(list(plm_ts_dict.values()), dim=plm_time)
plm_ts_ds

In [None]:
plm_ts_ds.data[plm_ts_ds.data == -9999] = np.nan

In [None]:
plm_ts_ds.hvplot.image(x='x',y='y',geo=True, tiles='ESRI', crs='EPGS:4326', cmap='inferno',clim=(np.nanmin(plm_ts_ds.data),np.nanmax(plm_ts_ds.data)),clabel=f'Methane Concentation ({plm_ts_ds.Units})', frame_width=600, frame_height=600, rasterize=True)*plume_bbox.hvplot(color='red',crs='EPSG:4326',fill_color=None, line_color='red')

## 3. Further investigation into plume detection<a id='plume-detection'></a>

The time-series shown above doesn't necessarily give us a full a full picture of what's happening. Although this plume is fairly persistent, absence of observations does not mean the source isn't emitting methane. Since EMIT is on the ISS, the revisit period varies, limiting the number of observations we can use. Additionally, clouds and other obstructions can limit plume detection. To add more context to the plume complex timeseries, we will look at all EMIT acquisitions over the target regions and try to determine if there were factors affecting plume detection, or simply no methane emissions during those acquisitions.

Create an ROI for a new search from our bounding box.

In [None]:
roi = list(plume_bbox.geometry[0].exterior.coords)

Now conduct a search for reflectance data over our ROI.

In [None]:
rfl_results = earthaccess.search_data(
    concept_id=concept_ids['reflectance'],
    polygon=roi,
    temporal=date_range,
    count=2000
)
rfl_gdf = results_to_geopandas(rfl_results)
rfl_gdf

We can visualize the footprints of these scenes to gain some insight into coverage over our ROI. 

In [None]:
# Set up Figure and Basemap tiles
fig = Figure(width="1080px",height="540")
map1 = folium.Map(tiles=None)
folium.TileLayer(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}',name='Google Satellite', attr='Google', overlay=True).add_to(map1)
folium.TileLayer(tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}.png',
                name='ESRI World Imagery',
                attr='Tiles &copy; Esri &mdash; Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community',
                overlay='True').add_to(map1)
fig.add_child(map1)

# Add Search Reflectance Scenes with no CH4
rfl_gdf.explore(color='red',
               style_kwds={"fillOpacity":0,"width":2},
               name="Scenes with no CH4 Plumes",
               tooltip=[
                "native-id",
                "_beginning_date_time",
                ],
                m=map1,
                legend=False)

# Add Plume BBox to Map
plume_bbox.explore(m=map1,
                   name='Plumes Bounding Box',
                   legend=False)

# Zoom to Data
map1.fit_bounds(bounds=convert_bounds(rfl_gdf.unary_union.bounds))
# Add Layer controls
map1.add_child(folium.LayerControl(collapsed=False))
display(fig)

From this we can see that there are several scenes that intersect with our ROI, but likely have no relevant information since they only cover a small portion or corner of the ROI.

We can use a similar process as with the methane product to construct a time series to better understand the data gathered on each overpass. To do this, we will use the browse imagery and the masks included in the EMITL2ARFL product. The by default the mask files and browse images are not orthorectified, so we must do that as part of our workflow.

First, get the urls for the browse images and masks for each scene in our `rfl_gdf` search results using the `get_asset_urls` function.

In [None]:
png_urls = rfl_gdf.apply(lambda row: get_asset_url(row, asset='RFL', value='GET RELATED VISUALIZATION'), axis=1).tolist()
png_urls

In [None]:
mask_urls = rfl_gdf.apply(lambda row: get_asset_url(row, asset='MASK'), axis=1).tolist()
mask_urls

We can write these as a text file so we don't need to search again, although we will use the `rfl_gdf` GeoDataFrame later in the tutorial. 

In [None]:
# # Save URL List
# with open(f'{methane_dir}rfl_mask_urls.txt', 'w') as f:
#     for line in mask_urls:
#         f.write(f"{line}\n")

Since the mask files are not chunked, its quicker to download them to do the processing.

Login with `earthaccess` and download these files.

In [None]:
earthaccess.login(persist=True)
# Get requests https Session using Earthdata Login Info
fs = earthaccess.get_requests_https_session()
# Retrieve granule asset ID from URL (to maintain existing naming convention)
for url in mask_urls:
    granule_asset_id = url.split('/')[-1]
    # Define Local Filepath
    fp = f'{methane_dir}{granule_asset_id}'
    # Download the Granule Asset if it doesn't exist
    if not os.path.isfile(fp):
        with fs.get(url,stream=True) as src:
            with open(fp,'wb') as dst:
                for chunk in src.iter_content(chunk_size=64*1024*1024):
                    dst.write(chunk)

For each of these scenes we want to open EMIT L2A Mask data, then subset spatially and select only the variable we want, in this case we'll use the `Aggregate flag` from the `masks` dataarray. The `Aggregate flag` includes the cloud mask, dilated cloud mask, cirrus mask, water mask, and an AOD threshold of 0.5. We can assume most of these will affect the methane plume detection.

As we do this, we will also take the GLT, read in the browse image, clip, and orthorectify it to our ROI. We can do this because the browse png files are in the native resolution and can be broadcast onto an orthorectified grid using the GLT.

First, get the filepaths for our downloaded mask data.

In [None]:
# List the downloaded files
fps = glob.glob(f'{methane_dir}*.nc')
fns = [os.path.basename(fp) for fp in fps]
fns

Create a function to loop through our files, orthorectifying the mask and browse image, clipping to our ROI and reprojecting to our predefined `to_match` grid, and finally saving outputs as a COG.

In [None]:
def process_scenes(fns, outdir, gdf):
    """
    This function will process a list of EMIT Mask scenes, downloading, merging adjacent scenes and building a time-series, as well as orthorectifying/m and returning the browse images.
    """
    for fn in fns:
        # Get Granule Asset ID for First Adjacent Scene (may only be one)
        granule_asset_id = fn.split('.')[-2]
        # Set Output Path
        outpath_mask = f"{outdir}{granule_asset_id}_aggregate_flag.tif"
        outpath_browse = f"{outdir}{granule_asset_id}_ortho_browse.tif"
        # Check if the file exists
        if not os.path.isfile(outpath_mask):
            # Open Mask Dataset
            emit_ds = emit_xarray(f'{methane_dir}{fn}', ortho=False)
            # Retrieve GLT, spatial_ref, and geotransform to use on browse image
            glt = np.nan_to_num(np.stack([emit_ds["glt_x"].data, emit_ds["glt_y"].data], axis=-1),nan=0).astype(int)
            spatial_ref = emit_ds.spatial_ref
            gt = emit_ds.geotransform
            # Select browse image url corresponding to the scene
            png_url = [url for url in png_urls if fn.split('.')[-2].split('_')[-3] in url][0]
            # Orthorectify browse and mask
            rgb = ortho_browse(png_url, glt, spatial_ref, gt)
            emit_ds = ortho_xr(emit_ds)
            # Clip to Geometry using rioxarray
            emit_ds = emit_ds.rio.clip(gdf.geometry.values,gdf.crs, all_touched=True)          
            rgb = rgb.rio.clip(gdf.geometry.values,gdf.crs, all_touched=True)
            # Select only mask array and desired quality flag and reproject to match our chosen extent
            mask_da = emit_ds['mask'].sel(mask_bands='Aggregate Flag')
            # Drop elevation
            mask_da = mask_da.drop_vars('elev')
            mask_da.name = 'Aggregate Flag'
            mask_da.data = np.nan_to_num(mask_da.data, nan=-9999)
            mask_da = mask_da.rio.reproject_match(to_match, nodata=-9999)
            #mask_da.rio.write_nodata(np.nan, inplace=True)
            # Reproject rgb
            rgb = rgb.rio.reproject_match(to_match, nodata=0)
            # Write cog outputs        
            mask_da.rio.to_raster(outpath_mask,driver="COG")
            rgb.rio.to_raster(outpath_browse,driver="COG")

Run the function.

In [None]:
process_scenes(fns, methane_dir, plume_bbox)

Create a list of the processed files to use in creation of a timeseries. We'll use a similar process to what we did for the plumes, adding a `time` variable to our datasets and concatenating.

In [None]:
mask_files = sorted(glob.glob(f'{methane_dir}*aggregate_flag.tif'))
mask_files

In [None]:
rgb_files = sorted(glob.glob(f'{methane_dir}*ortho_browse.tif'))
rgb_files

Build a time index from the filenames.

In [None]:
def time_index_from_filenames(file_names,datetime_pos):
    """
    Helper function to create a pandas DatetimeIndex
    """
    return [datetime.strptime(f.split('_')[datetime_pos], '%Y%m%dT%H%M%S') for f in file_names]

In [None]:
mask_time = xr.Variable('time', time_index_from_filenames(mask_files, -5))

Open and concatenate our datasets along the time dimension, then assign fill_values to `np.nan` to make those sections of the data transparent in our visualization.

In [None]:
quality_ts_da = xr.concat([rxr.open_rasterio(f).squeeze('band', drop=True).rio.reproject_match(to_match) for f in mask_files], dim=mask_time)
quality_ts_da.data[quality_ts_da.data <= 0] = np.nan

Now, add a column to our plume geodataframe containing a time index formatted similarly to our quality `time` dimension so we can visualize our plume extents with our quality data.

In [None]:
plm_gdf['time'] = pd.to_datetime(plm_gdf.loc[:,'_single_date_time'])

Plot the quality timeseries, bounding box, and plume extents on the same figure.

In [None]:
quality_ts_map = quality_ts_da.hvplot.image(x='x',y='y',cmap='greys',groupby='time',clim=(0,1),geo=True,frame_height=400, alpha=0.4)


RGB images are a good way to add something more visually understandable than just the mask layers. Follow the same process as above to build an RGB timeseries, then plot it with the bounding box and plume extents.

In [None]:
rgb_ts_ds = xr.concat([rxr.open_rasterio(f).rio.reproject_match(to_match) for f in rgb_files], dim=mask_time)
rgb_ts_ds.data[rgb_ts_ds.data == -1] = 0

In [None]:
rgb_ts_map = rgb_ts_ds.hvplot.rgb(x='x',y='y', bands='band',groupby='time',geo=True, frame_height=400, crs='EPSG:4326')

In [None]:
rgb_ts_map*quality_ts_map*plm_gdf.hvplot(groupby='time', geo=True, line_color='red', fill_color=None)

We can see from this timeseries that most overpasses we included did not cover enough of the region to capture a plume if one was being emitted. If we intend to do further analysis, we can omit these, or perhaps if conducting a similar analysis in the future, we choose a smaller more specific bounding polygon for our search.

## 4. Calculating the IME for each plume<a id='ime'></a>

$$\ kg\ \ (per \ \ pixel) = \frac{ppm \cdot m}{1} \frac{1}{1 \cdot 10^6 \ \ ppm} \frac {60 \ \ m \cdot 60 \ \ m} {1} \frac {1000 \ \ L} {m^3} \frac {1 \ \ mol} {22.4 \ \ L} \frac {0.01604 \ \ kg} {1 \ \ mol}$$

In [None]:
def calc_ime(plume_da):
    molar_volume = 22.4 # L/mol at STP
    molar_mass_ch4 = 0.01604 #kg/mol

    kg = plume_da * (1/1e6) * (60*60) * (1000) * (1/molar_volume) * molar_mass_ch4
    ime = np.nansum(kg)
    return ime

In [None]:
# Apply the function along the 'x' and 'y' dimensions
ime_ts = xr.apply_ufunc(calc_ime, plm_ts_ds, input_core_dims=[['y', 'x']], vectorize=True)
ime_ts.name = 'IME (kg)'

In [None]:
ime_ts.hvplot.line(x='time',y='IME (kg)', title='Observed Integrated Methane Emissions (kg) over 2023', color='black', xticks=list(ime_ts.time.data), rot=90)