# 1 Finding Concurrent ECOSTRESS and EMIT Data

**Summary**  

Both the ECOsystem Spaceborne Thermal Radiometer Experiment on Space Station (ECOSTRESS) and the Earth surface Mineral dust source InvesTigation (EMIT) instruments are located on the International Space Station (ISS). Their overlapping fields of view provide an unprecedented opportunity to demonstrate the compounded benefits of working with both datasets. In this notebook we will show how to utilize the [`earthaccess` Python library](https://github.com/nsidc/earthaccess) to find concurrent ECOSTRESS and EMIT data. 

<div>
<img src="../img/concurrent_data.png" width="750"/>
</div>

**Background**

The **ECOSTRESS** instrument is a multispectral thermal imaging radiometer designed to answer three overarching science questions:

- How is the terrestrial biosphere responding to changes in water availability?
- How do changes in diurnal vegetation water stress  the global carbon cycle?
- Can agricultural vulnerability be reduced through advanced monitoring of agricultural water consumptive use and improved drought   estimation?

The ECOSTRESS mission is answering these questions by accurately measuring the temperature of plants.  Plants regulate their temperature by releasing water through tiny pores on their leaves called stomata. If they have sufficient water they can maintain their temperature, but if there is insufficient water, their temperatures rise and this temperature rise can be measured with ECOSTRESS. The images acquired by ECOSTRESS are the most detailed temperature images of the surface ever acquired from space and can be used to measure the temperature of an individual farmers field.

More details about ECOSTRESS and its associated products can be found on the [ECOSTRESS website](https://ecostress.jpl.nasa.gov/) and [ECOSTRESS product pages](https://lpdaac.usgs.gov/product_search/?query=ECOSTRESS&status=Operational&view=cards&sort=title) hosted by the Land Processes Distributed Active Archive Center (LP DAAC).

The **EMIT** instrument is an imaging spectrometer that measures light in visible and infrared wavelengths. These measurements display unique spectral signatures that correspond to the composition on the Earth's surface. The EMIT mission focuses specifically on mapping the composition of minerals to better understand the effects of mineral dust throughout the Earth system and human populations now and in the future. In addition, the EMIT instrument can be used in other applications, such as mapping of greenhouse gases, snow properties, and water resources.

More details about EMIT and its associated products can be found on the [EMIT website](https://earth.jpl.nasa.gov/emit/) and [EMIT product pages](https://lpdaac.usgs.gov/product_search/?query=EMIT&status=Operational&view=cards&sort=title) hosted by the LP DAAC.

**Requirements**  
 - [NASA Earthdata Account](https://urs.earthdata.nasa.gov/home)   
 - *No Python setup requirements if connected to the workshop cloud instance!*  
 - **Local Only** Set up Python Environment - See **setup_instructions.md** in the `/setup/` folder to set up a local compatible Python environment 

**Learning Objectives**  
- How to use `earthaccess` to find concurrent EMIT and ECOSTRESS data.  
- How to export a list of files and download them programmatically.  

**Tutorial Outline**  

1. Setup
2. Searching for EMIT and ECOSTRESS Data
3. Organizing and Filtering Results
4. Visualizing Intersecting Coverage
5. Creating a list of Results and Desired Asset URLs
6. Streaming or Downloading Data

## 1. Setup

Import the required Python libraries.

In [None]:
# Import required libraries
import os
import json
import folium
import earthaccess
import warnings
# import folium.plugins
# import folium.raster_layers
import pandas as pd
import geopandas as gpd
import math

from branca.element import Figure
from IPython.display import display
from shapely import geometry
from shapely.geometry import MultiPolygon
# from skimage import io
from datetime import timedelta
from shapely.geometry.polygon import orient
from matplotlib import pyplot as plt
from matplotlib import colors as mcolors

### 1.2 NASA Earthdata Login Credentials

To download or stream NASA data you will need an Earthdata account, you can create one [here](https://urs.earthdata.nasa.gov/home). We will use the `login` function from the `earthaccess` library for authentication before downloading at the end of the notebook. This function can also be used to create a local `.netrc` file if it doesn't exist or add your login info to an existing `.netrc` file. If no Earthdata Login credentials are found in the `.netrc` you'll be prompted for them. This step is not necessary to conduct searches but is needed to download or stream data.

## 2. Search for ECOSTRESS and EMIT Data

Both EMIT and ECOSTRESS products are hosted by the Land Processes Distributed Active Archive Center (LP DAAC). In this example we will use the cloud-hosted EMIT_L2A_RFL and ECOSTRESS_L2T_LSTE products available from the LP DAAC to find data. Any results we find for these products, should be available for other products within the EMIT and ECOSTRESS collections. 

To find data we will use the [`earthaccess` Python library](https://github.com/nsidc/earthaccess). `earthaccess` searches NASA's Common Metadata Repository (CMR), a metadata system that catalogs Earth Science data and associated metadata records. The results can then be used to download granules or generate lists of granule search result URLs.

Using `earthaccess` we can search based on the attributes of a granule, which can be thought of as a spatiotemporal scene from an instrument containing multiple assets (ex: Reflectance, Reflectance Uncertainty, Masks for the EMIT L2A Reflectance Collection). We can search using attributes such as collection, acquisition time, and spatial footprint. This process can also be used with other EMIT or ECOSTRESS products, other collections, or different data providers, as well as across multiple catalogs with some modification. 

### 2.1 Define Spatial Region of Interest

For this example, our spatial region of interest (ROI) will be the a region near Santa Barbara, CA that contains the [Jack and Laura Dangermond Preserve](https://www.dangermondpreserve.org/) and the [Sedgwick Reserve](https://sedgwick.nrs.ucsb.edu/). 

In this example, we will create a rectangular ROI surrounding these two reserves as well as some of the agricultural region between. Even though the shape is rectangular we elect to search using a polygon rather than a standard bounding box in `earthaccess` because bounding boxes will typically have a larger spatial extent, capturing a lot of area we may not be interested in. This becomes more important for searches with larger ROIs than our example here. To search for intersections with a polygon using earthaccess, we need to format our ROI as a counterclockwise list of coordinate pairs. 

Open the `geojson` file containing the Dangermond and Sedgwick boundaries as a `geodataframe`, and check the coordinate reference system (CRS) of the data.

In [None]:
polygon = gpd.read_file('../data/agu_workshop_roi.geojson')
polygon.crs

The CRS is **EPSG:4326** (WGS84), which is also the CRS we want the data in to submit for our search.

Next, let's examine our polygon a bit closer.

In [None]:
polygon

We can see this `geodataframe` consists of two polygons, that we want to include in our study site. We need to create an exterior boundary polygon containing these, and make sure the vertices are in counterclockwise order to submit them in our query. To do this, create a polygon consisting of all the geometries, then create a bounding rectangle. This will give us a simple exterior polygon around our two ROIs. After that, use the `orient` function to place our coordinate pairs in counterclockwise order.

In [None]:
# Merge all Polygon geometries and create external boundary
roi_poly = polygon.union_all().envelope
# Re-order vertices to counterclockwise
roi_poly = orient(roi_poly, sign=1.0)

Make a `GeoDataFrame` consisting of the bounding box geometry.

In [None]:
df = pd.DataFrame({"Name":["ROI Bounding Box"]})
agu_bbox = gpd.GeoDataFrame({"Name":["ROI Bounding Box"], "geometry":[roi_poly]},crs="EPSG:4326")
agu_bbox

We can write this bounding box to a file for use in future notebooks.

In [None]:
# agu_bbox.to_file('../data/roi_bbox.geojson', driver='GeoJSON')

We can go ahead and visualize our region of interest and the exterior boundary polygon containing ROIs. First add a function to help reformat bounding box coordinates to work with leaflet notation.

In [None]:
# Function to convert a bounding box for use in leaflet notation

def convert_bounds(bbox, invert_y=False):
    """
    Helper method for changing bounding box representation to leaflet notation

    ``(lon1, lat1, lon2, lat2) -> ((lat1, lon1), (lat2, lon2))``
    """
    x1, y1, x2, y2 = bbox
    if invert_y:
        y1, y2 = y2, y1
    return ((y1, x1), (y2, x2))

In [None]:
fig = Figure(width="750px", height="375px")
map1 = folium.Map(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}', attr='Google')
fig.add_child(map1)

# Add Bounding Box Polygon
folium.GeoJson(agu_bbox,
                name='bounding_box',
                ).add_to(map1)

# Add roi geodataframe
polygon.explore(
    "Name",
    popup=True,
    categorical=True,
    cmap='Set3',
    style_kwds=dict(opacity=0.7, fillOpacity=0.4),
    name="Regions of Interest",
    m=map1
)

map1.add_child(folium.LayerControl())
map1.fit_bounds(bounds=convert_bounds(polygon.union_all().bounds))
display(fig)

Above we can see our regions of interest (ROIs) and the exterior boundary polygon containing the ROIs that we opened. We can hover over different areas to see the name of each ROI.

Lastly, we need to convert our polygon to a list of coordinate pairs, so it will be accepted as a 'polygon' search parameter in `earthaccess`, as it expects a list of coordinate pairs in a counterclockwise order.

In [None]:
# Set ROI as list of exterior polygon vertices as coordinate pairs
roi = list(roi_poly.exterior.coords)

### 2.2 Define Collections of Interest

We need to specify which products we want to search for. The best way to do this is using their concept-id. As mentioned above, we will conduct our search using the EMIT Level 2A Reflectance (EMITL2ARFL) and ECOSTRESS Level 2 Tiled Land Surface Temperature and Emissivity (ECO_L2T_LSTE). We can do some quick collection queries using `earthaccess` to retrieve the concept-id for each dataset. 

>Note: Here we use the Tiled ECOSTRESS LSTE Product. This will also work with the gridded LSTE and the swath; however, the swath product does not have a browse image for the visualization in section 4 and will require additional processing for subsequent analysis.

In [None]:
# EMIT Collection Query
emit_collection_query = earthaccess.collection_query().keyword('EMIT L2A Reflectance')
emit_collection_query.fields(['ShortName','EntryTitle','Version']).get()

In [None]:
# ECOSTRESS Collection Query
eco_collection_query = earthaccess.collection_query().keyword('ECOSTRESS L2 Tiled LSTE')
eco_collection_query.fields(['ShortName','EntryTitle','Version']).get()

If your search returns multiple products, be sure to select the right concept-id For this example it will be the first one. We want to use the `LPCLOUD` ECOSTRESS Tiled Land Surface Temperature and Emissivity (concept-id: "C2076090826-LPCLOUD"). Create a list of these concept-ids for our data search.

In [None]:
# Data Collections for our search
concept_ids = ['C2408750690-LPCLOUD', 'C2076090826-LPCLOUD']

### 2.3 Define Date Range

For our date range, we'll look at data collected between January and October 2023. The `date_range` can be specified as a pair of dates, start and end (up to, not including).

In [None]:
# Define Date Range
date_range = ('2023-01-01','2023-11-01')

### 2.4 Searching

Submit a query using `earthaccess`.

In [None]:
results = earthaccess.search_data(
    concept_id=concept_ids,
    polygon=roi,
    temporal=date_range,
)
print(f"Granules Found: {len(results)}")

In [None]:
results[:2]

## 3. Organizing and Filtering Results

As we can see from above, the results object contains a list of objects with metadata and links. We can convert this to a more readable format, a dataframe. In addition, we can make it a `GeoDataFrame` by taking the spatial metadata and creating a shapely polygon representing the spatial coverage, and further customize which information we want to use from other metadata fields.

First, we define some functions to help us create a shapely object for our geodataframe, and retrieve the specific browse image URLs that we want. By default, the browse image selected by `earthaccess` is the first one in the list, but the ECO_L2_LSTE has several browse images, and we want to make sure we retrieve the `png` file, which is a preview of the LSTE.

In [None]:
# Functions to Build Dataframe
def get_shapely_object(result: earthaccess.results.DataGranule):
    """
    Create a shapely polygon of spatial coverage from results metadata.
    Will work for BoundingRectangles or GPolygons.
    """
    # Get Geometry Keys
    geo = result["umm"]["SpatialExtent"]["HorizontalSpatialDomain"]["Geometry"]
    keys = geo.keys()

    if "BoundingRectangles" in keys:
        bounding_rectangle = geo["BoundingRectangles"][0]
        # Create bbox tuple
        bbox_coords = (
            bounding_rectangle["WestBoundingCoordinate"],
            bounding_rectangle["SouthBoundingCoordinate"],
            bounding_rectangle["EastBoundingCoordinate"],
            bounding_rectangle["NorthBoundingCoordinate"],
        )
        # Create shapely geometry from bbox
        shape = geometry.box(*bbox_coords, ccw=True)
    elif "GPolygons" in keys:
        points = geo["GPolygons"][0]["Boundary"]["Points"]
        # Create shapely geometry from polygons
        shape = geometry.Polygon([[p["Longitude"], p["Latitude"]] for p in points])
    else:
        raise ValueError(
            "Provided result does not contain bounding boxes/polygons or is incompatible."
        )
    return shape


def get_png(result: earthaccess.results.DataGranule):
    """
    Retrieve a png browse image if it exists or first jpg in list of urls
    """
    https_links = [link for link in result.dataviz_links() if "https" in link]
    if len(https_links) == 1:
        browse = https_links[0]
    elif len(https_links) == 0:
        browse = "no browse image"
        warnings.warn(f"There is no browse imagery for {result['umm']['GranuleUR']}.")
    else:
        browse = [png for png in https_links if ".png" in png][0]
    return browse

Now we can create an additional function using those to create a `GeoDataFrame` containing our results data, a geometry extracted from the metadata, the browse imagery, product shortname, and data links.

In [None]:
def results_to_gdf(results: list):
    """
    Takes a list of results from earthaccess and converts to a geodataframe.
    """
    # Create Dataframe of Results Metadata
    results_df = pd.json_normalize(results)
    # Shorten column Names
    results_df.columns = [
        col.split(".")[-1] if "." in col else col for col in results_df.columns
    ]
    # Create shapely polygons for result
    geometries = [
        get_shapely_object(results[index]) for index in results_df.index.to_list()
    ]
    # Convert to GeoDataframe
    gdf = gpd.GeoDataFrame(results_df, geometry=geometries, crs="EPSG:4326")
    # Add browse imagery and data links and product shortname
    gdf["browse"] = [get_png(granule) for granule in results]
    gdf["shortname"] = [
        result["umm"]["CollectionReference"]["ShortName"] for result in results
    ]
    gdf["data"] = [granule.data_links() for granule in results]
    return gdf

Use our function to create a `GeoDataFrame`.

In [None]:
gdf = results_to_gdf(results)

Preview our geodataframe to get an idea what it looks like.

In [None]:
pd.set_option('display.max_columns', None)
gdf.head()

There are a lot of columns with data that is not relevant to our goal, so we can drop those. To do that, list the names of columns.

In [None]:
# List Column Names
gdf.columns

Now create a list of columns to keep and use it to filter the dataframe.

In [None]:
# Create a list of columns to keep
keep_cols = [
    "native-id",
    "collection-concept-id",
    "BeginningDateTime",
    "EndingDateTime",
    "CloudCover",
    "DayNightFlag",
    "geometry",
    "browse",
    "shortname",
    "data",
]
# Remove unneeded columns
gdf = gdf[gdf.columns.intersection(keep_cols)]
gdf.head()

Now we will separate the results into two dataframes, one for ECOTRESS and one for EMIT, and print the number of results for each so we can monitor how many granules we're filtering.

In [None]:
# Split into two dataframes - ECO and EMIT
eco_gdf = gdf[gdf['native-id'].str.contains('ECO')]
emit_gdf = gdf[gdf['native-id'].str.contains('EMIT')]
print(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \n EMIT Granules: {emit_gdf.shape[0]}')

There are some additional filtering steps that we can do here for both instruments. Since we're using the ECOSTRESS tiled data (ECO_L2T LSTE) data, there are two tiles available for our area of interest because its located where the tiles overlap and we'll want to select one. Another caveat with this ECOSTRESS dataset is that there are sometimes multiple builds available for a scene. We'll want to select the newest build available for each.

First filter to a single tile, then remove duplicate scenes produced with older software builds.

In [None]:
# Select an ECOSTRESS Tile
eco_gdf = eco_gdf[eco_gdf["native-id"].str.contains("10SGD")]

# Remove Duplicate Granules from Older Builds
df = eco_gdf["native-id"].str.rsplit("_", n=2, expand=True)
df.columns = ["base", "build_major", "build_minor"]
df["build_major"] = df["build_major"].astype(int)
df["build_minor"] = df["build_minor"].astype(int)
df["native-id"] = eco_gdf["native-id"]
idx = df.groupby("base")[["build_major", "build_minor"]].idxmax().build_major
eco_gdf = eco_gdf[eco_gdf["native-id"].isin(df.loc[idx, "native-id"])]

# Show change in results quantity
print(f" ECOSTRESS Granules: {eco_gdf.shape[0]} \n EMIT Granules: {emit_gdf.shape[0]}")

Notice that this reduced our ECOSTRESS results quantity of by quite a bit.

Now we can fileter some EMIT scenes as well, by utilizing the `CloudCover` field from the results metadata. This field is not available for ECOSTRESS. 

In [None]:
# Cloud Filter using EMIT Metadata - Not available for ECOSTRESS
emit_gdf = emit_gdf[emit_gdf["CloudCover"] <= 60]
print(f" ECOSTRESS Granules: {eco_gdf.shape[0]} \n EMIT Granules: {emit_gdf.shape[0]}")

We still haven't filtered the locations where EMIT and ECOSTRESS have data at the same spatial location and time. The EMIT acquisition mask has been added to ECOSTRESS, so in most cases if EMIT is collecting data, so will ECOSTRESS, but there are edge cases where this is not true. To find intersecting scenes we'll write a function that finds spatial and temporal intersections for each row between the EMIT and ECOSTRESS geodataframes. 

First write a function to convert the `BeginningDateTime` and `EndDateTime`.

> **You may have noticed that the date format is similar for ECOSTRESS and EMIT, but the ECOSTRESS data also includes fractional seconds. If using an different version of `pandas`, you may need to drop the `format='ISO8601'`argument to the `to_datetime` function, as shown in the commented-out line.**

In [None]:
# Define Function to Convert Columns to datetime
def convert_col_dt(df:pd.DataFrame, columns:list, fmt="ISO8601"):
    """
    Convert a column to a datetime. Default format of ISO8601.
    df: a pd.DataFrame.
    columns: a list of column names containing datetime strings earthaccess results.
    fmt: the datetime format of the string.
    """
    for col in columns:
        df[col] = pd.to_datetime(df[col], format=fmt)
    return df

Now we can write a function to find spatial and temporal interections of the data. To do this, we'll conduct a spatial join to find intersecting geometries from our `emit_gdf` and `eco_gdf`, then we'll use the time windows defined by `BeginningDateTime` and `EndingDateTime` to evaluate if granules (rows) from each dataframe intersect. Additionally, we want to provide a suffix for the left and right `GeoDataFrames` so columns will be renamed in a meaninful way.

This works great for ECOSTRESS and EMIT because they are both on the ISS and acquiring data within seconds of eachother (this can vary due to FOV and other instrument parameters). 

For other applications we may want to find data from another orbiting instrument where an overpass falls within an hour or day of collection. We can accomplish this by manually providing a timedelta, then using the `BeginningDateTime` and `EndDateTime`, we can find if these times fall within the provided timedelta.

Lastly, because we want to visualize each set of concurrent scenes, we'll want our function to keep the geometries from the left and right geodataframes. By default, the spatial join function only keeps the left. To do this, we'll add new columns using the provided suffixes to maintain both geometries, then instead of keeping just the left geometry, we will create a multipolygon for the main geometry of each row that consists of two polygons. This two polygon geometry can be used later for plotting.

> **Note that the spatial join and temporal filtering can be computationally intensive. If the area of interest is very large or initial search parameters have a long temporal range, breaking down the initial search into sections spatially or temporally is recommended.**


In [None]:
def find_concurrent_scenes(lgdf, rgdf, lsuffix, rsuffix, time_delta=None):
    """
    Finds the concurrent scenes between two results converted geodataframes.
    lgdf: a geodataframe converted from earthaccess results
    rgdf: a second geodataframe converted from earthaccess results
    lsuffix: appended to lgdf column names
    rsuffix: appended to rgdf column names
    timedelta (seconds): an optional time delta to define an acceptable window between acquistions
                         useful for data acquired hours or days apart

    """
    # Copy lgdf, rgdf Geometry
    lgdf.loc[:,f"geometry_{lsuffix}"] = lgdf.geometry
    rgdf.loc[:,f"geometry_{rsuffix}"] = rgdf.geometry

    # Conduct Spatial Join
    joined = gpd.sjoin(
        lgdf,
        rgdf,
        how="inner",
        predicate="intersects",
        lsuffix=lsuffix,
        rsuffix=rsuffix,
    )

    # Convert Datetime fields
    date_columns = [
        f"BeginningDateTime_{lsuffix}",
        f"BeginningDateTime_{rsuffix}",
        f"EndingDateTime_{lsuffix}",
        f"EndingDateTime_{rsuffix}",
    ]
    joined = convert_col_dt(joined, date_columns)

    # Filter Based on Collection Times
    if time_delta:
        td = timedelta(seconds=time_delta)
        joined[f"MidDateTime_{lsuffix}"] = (
            joined[f"BeginningDateTime_{lsuffix}"]
            + (
                joined[f"EndingDateTime_{lsuffix}"]
                - joined[f"BeginningDateTime_{lsuffix}"]
            )
            / 2
        )
        joined[f"MidDateTime_{rsuffix}"] = (
            joined[f"BeginningDateTime_{rsuffix}"]
            + (
                joined[f"EndingDateTime_{rsuffix}"]
                - joined[f"BeginningDateTime_{rsuffix}"]
            )
            / 2
        )
        concurrent_df = joined[
            abs(joined[f"MidDateTime_{lsuffix}"] - joined[f"MidDateTime_{rsuffix}"])
            <= td
        ]
        print(joined[f"MidDateTime_{lsuffix}"])
    else:
        concurrent_df = joined[
            (
                joined[f"BeginningDateTime_{lsuffix}"]
                <= joined[f"EndingDateTime_{rsuffix}"]
            )
            & (
                joined[f"EndingDateTime_{lsuffix}"]
                >= joined[f"BeginningDateTime_{rsuffix}"]
            )
        ]

    # Combine Geometry
    concurrent_df.loc[:, "geometry"] = concurrent_df.apply(
        lambda row: MultiPolygon(
            [row[f"geometry_{lsuffix}"], row[f"geometry_{rsuffix}"]]
        ),
        axis=1,
    )

    return concurrent_df

Now run our function. Since we're working with ECOSTRESS and EMIT, we don't need to provide the `timedelta` argument.

In [None]:
# Find Concurrent Scenes
concurrent_df = find_concurrent_scenes(
    emit_gdf, eco_gdf, lsuffix="EMIT", rsuffix="ECO"
)
concurrent_df.head()

When viewing our concurrent dataframe, we can see that all of the info from both our left (EMIT) and right (ECO) geodataframes is still there, including geomtries from each, and our new multipolygon geometry.


## 4. Visualizing Intersecting Coverage

Now that we have a geodataframe containing concurrent data within our intial search parameters, we can visualize them on a map using `folium`. To improve our visual, we'll first create a function to help style each polygon and set up a colormap for each row.

In [None]:
# Define a function to style polygons by row
def style_by_row(row, cmap, ckey, **kwargs):
    """
    Applies style by row within interrows.

    row: A row (series) from a GeoDataFrame.
    cmap:
    ckey:
    kwargs: additional style options for style dict.
    """
    color = cmap.get(row[ckey], "#000000") if cmap else "#000000"

    return {"color": color, **kwargs}

# Creat a colormap for the data
palette = plt.get_cmap("tab20").colors
ckeys = concurrent_df["native-id_EMIT"].unique()
cmap = {ckey: mcolors.to_hex(palette[i % len(palette)]) for i, ckey in enumerate(ckeys)}

Plot using `folium`.

In [None]:
# Create Figure and Select Background Tiles
fig = Figure(width="750", height="375")
map1 = folium.Map(
    tiles="https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}", attr="Google"
)
fig.add_child(map1)

# Iterate through GeoDataFrame row
for idx, row in concurrent_df.iterrows():

    # Set up feature group to pair browse image with multipolygon feature
    fg = folium.FeatureGroup(
        name=f"{row['native-id_EMIT']}, {row['native-id_ECO']}", show=False
    )

    # Create Tooltip
    tooltip = folium.Tooltip(f"""
        <b> EMIT Granule-ID: </b> {row["native-id_EMIT"]}<br>
        <b> ECOSTRESS-ID: </b> {row["native-id_ECO"]}<br>
        <b> EMIT BeginningDateTime: </b> {row["BeginningDateTime_EMIT"]}<br>
        <b> ECO BeginningDateTime: </b> {row["BeginningDateTime_ECO"]}<br>
        <b> EMIT CloudCover: </b> {row["CloudCover_EMIT"]}<br>
        <b> ECO CloudCover: </b> {row["CloudCover_ECO"]}<br>
        """)

    # Add each row as a geoJson layer
    folium.GeoJson(
        data=row["geometry"].__geo_interface__,
        tooltip=tooltip,
        style_function=lambda feature, row=row: style_by_row(
            row=row, cmap=cmap, ckey="native-id_EMIT", fillOpacity=0, weight=2
        ),
    ).add_to(fg)

    # Add the ECO browse Image
    if row["browse_ECO"] != "no browse image":
        folium.raster_layers.ImageOverlay(
            image=row["browse_ECO"],
            # geometry bounds: minx, miny, maxx, maxy. Change to miny, minx, maxy, maxx for folium
            bounds=[
                [row["geometry_ECO"].bounds[1], row["geometry_ECO"].bounds[0]],
                [row["geometry_ECO"].bounds[3], row["geometry_ECO"].bounds[2]],
            ],
            interactive=False,
            cross_origin=False,
            opacity=0.75,
            zindex=1,
        ).add_to(fg)

    # Add the FeatureGroup to the map
    map1.add_child(fg)

# Plot Region of Interest
polygon.explore(
    popup=False,
    style_kwds=dict(color="#FFFF00", fillOpacity=0, width=2),
    name="Region of Interest",
    m=map1,
)
# Add Our Bounding Box
folium.GeoJson(
    roi_poly,
    name="bounding_box",
    popup=False,
    style_kwds=dict(color="#FFFF00", fillOpacity=0, width=2),
).add_to(map1)

# Add Layer Control and set zoom.
map1.fit_bounds(bounds=convert_bounds(concurrent_df.union_all().bounds))
map1.add_child(folium.LayerControl())
display(fig)

In the figure above, we start without loading any concurrent layers. Add or remove layers using the layer control in the top right. Each layer will contain an EMIT scene footprint, concurrent ECOSTRESS scene footprint, and ECOSTRESS browse scene. Note that there can be duplicates if multiple scenes intersect with a single scene.


## 5. Generating a list of URLs

With our concurrent geodataframe we can pull the links to scenes out of our `data_EMIT` and `data_ECO` column. To do this, define a function that groups the data by the `native-id_EMIT` and builds a dictionary of links corresponding to each. We use a sublist to help remove duplicates and lump multiple corresponding scenes to a single `native-id_EMIT` scene.

In [None]:
# Define Function to retrieve list of files intersecting with emit scene
def get_file_list(concurrent_df):
    """
    Retrieve a dictionary of EMIT scenes with a list of files from both instruments
    acquired at the location and time of lgdf scene.
    """

    files = (
        concurrent_df.groupby("native-id_EMIT")[["data_EMIT", "data_ECO"]]
        .apply(
            lambda group: list(
                dict.fromkeys(
                    [item for sublist in group["data_EMIT"] for item in sublist]
                    + [item for sublist in group["data_ECO"] for item in sublist]
                )
            )
        )
        .to_dict()
    )
    return files

Use the function to retrieve the dictionary of links.

In [None]:
# Retrieve Links
links_dict = get_file_list(concurrent_df)

In [None]:
# List EMIT Scenes (Keys)
list(links_dict.keys())

In [None]:
# Show links associated with Scene
links_dict['EMIT_L2A_RFL_001_20230219T202939_2305013_002']

Granules often have several assets associated with them, for example, `ECO_L2T_LSTE` has several assets:
 - Water Mask (water)
 - Cloud Mask (cloud)
 - Quality (QC)
 - Land Surface Temperature (LST)
 - Land Surface Temperature Error (LST_err)
 - Wide Band Emissivity (EmisWB)
 - Height (height)
and `EMIT_L2A_RFL` has:
 - Reflectance
 - Reflectance Uncertainty
 - Masks

We can use string matching to iterate through our dictionary and remove assets we don't want. Carefully choose some strings that can be used to identify desired assets, then use list comprehension to remove assets that don't contain the strings from the desired assets.

In [None]:
# Identify Strings
desired_assets = ['_RFL_', '_MASK_', '_LST.']

# Iterate through dictionary and filter
filtered_links_dict = {
    key: [asset for asset in assets if any(desired_asset in asset.split('/')[-1] for desired_asset in desired_assets)]
    for key, assets in links_dict.items()
}
list(filtered_links_dict.keys())

In [None]:
filtered_links_dict['EMIT_L2A_RFL_001_20230219T202939_2305013_002']

Now that we've filterd our results down to just the assets we want, we can write this dictionary out to a .json file to use later.

In [None]:
with open('../data/search_results.json', 'w') as f:
    json.dump(filtered_links_dict, f, indent=4)

## 6. Streaming or Downloading Data  

For the workshop, we will stream the data, but either method can be used, and each has trade-offs based on the internet speed, storage space, or use case. The EMIT files are very large due to the number of bands, so operations can take some time if streaming with a slower internet connection. Since the workshop is hosted in a Cloud workspace, we can stream the data directly to the workspace.

### 6.1 Streaming Data Workflow

For an example of streaming both **netCDF** and **Cloud Optimized GeoTIFF (COG) data** please see notebook 2, [Working With EMIT Reflectance and ECOSTRESS LST](02_Working_with_EMIT_Reflectance_and_ECOSTRESS_LST.ipynb).

If you plan to stream the data, you can stop here and move to the next notebook.

### 6.2 Downloading Data Workflow

To download the scenes, we can use the `earthaccess` library to authenticate then download the files.

First, log into Earthdata using the `login` function from the `earthaccess` library. The `persist=True` argument will create a local `.netrc` file if it doesn't exist, or add your login info to an existing `.netrc` file. If no Earthdata Login credentials are found in the `.netrc` you'll be prompted for them. As mentioned in section 1.2, this step is not necessary to conduct searches, but is needed to download or stream data.

We've included the canopy water content files in the repository to simplify the notebooks so users don't need to perform that operation for the examples in the repository. This means that only 3 granules from our list are required to execute the notebooks and walk through the notebooks in the repository. These are included in a separate text file, `required_granules.txt`. 

These can be downloading by uncommenting and running the following cells. 

>Note: If interested users can download all of the files using the cell below and recreate all of the canopy water content files following a workflow similar to the example in notebooks 2 and 3 for all of the necessary scenes. To do this, uncomment the `file_list` object with the `search_results.txt` filepath to download all of the results rather than just what is required.

In [None]:
# Authenticate using earthaccess
earthaccess.login(persist=True)

Download the required granules. The function below writes all assets to a directory named using the `links_dict` keys.

In [None]:
# Get requests https Session using Earthdata Login Info
fs = earthaccess.get_requests_https_session()
# Retrieve granule asset ID from URL (to maintain existing naming convention)
for granule, assets in links_dict.items():
    out_dir = f"../data/{granule}"
    os.makedirs(out_dir, exist_ok=True)
    for asset in assets:
        out_fn = f"{out_dir}{os.sep}{asset.split('/'[-1])}"
        if not os.path.isfile(out_fn):
            with fs.get(asset,stream=True) as src:
                with open(out_fn,'wb') as dst:
                    for chunk in src.iter_content(chunk_size=64*1024*1024):
                        dst.write(chunk)

Congratulations, now you have downloaded concurrent data from the ECOSTRESS and EMIT instruments on the ISS.


## Contact Info:  

Email: LPDAAC@usgs.gov  
Voice: +1-866-573-3222  
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹  
Website: <https://lpdaac.usgs.gov/>  

¹Work performed under USGS contract 140G0121D0001 for NASA contract NNG14HH33I. 