<a href="https://colab.research.google.com/github/edwardoughton/satellite-image-analysis/blob/main/02_01_ggs416_26_02_02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üõ∞Ô∏è GGS416 Satellite Image Analysis Week 2 üõ∞Ô∏è

This week we will cover:
 * Script-based accessing of satellite imagery
 * Creating a natural composite from LandSat data
 * Creating a false composite from LandSat data

## Learning outcomes

By the end of this activity, students will be able to:

* Access satellite imagery programmatically using a STAC API (Planetary Computer) to search for and retrieve Landsat imagery based on location, time range, and cloud cover.
* Describe the role of spectral bands in remote sensing to identify and explain the purpose of red and near-infrared (NIR) bands in vegetation analysis.
* Compute the Normalized Difference Vegetation Index (NDVI) and apply the NDVI formula using satellite image bands and interpret its numerical range (‚Äì1 to +1).
* Visualize raster data in Python to generate and customize natural-color and NDVI maps using Python plotting tools.
* Interpret spatial patterns in NDVI imagery, distinguishing between vegetated, urban, and water-covered areas based on NDVI values and spatial distribution.
* Analyze seasonal vegetation change to compare NDVI maps from different dates to visually assess changes in vegetation over time.
* Apply basic geospatial data processing techniques to subset imagery using a bounding box and filter scenes based on metadata (e.g., cloud cover).
* Evaluate data quality and limitations, such as recognizing how clouds, water, and built surfaces affect NDVI values and interpretation.
* Reuse and adapt code for new study areas, so you can modify bounding boxes and date ranges to generate maps for different locations and time periods.
* Communicate geospatial results effectively, such as describing NDVI maps in written form and explain what they reveal about land cover and vegetation health.

# Downloading satellite data using scripting methods

You could just go to [USGS EarthExplorer](https://earthexplorer.usgs.gov/) to manually sift through satellite imagery data.

However, there are a range of limitations to this approach:

* Does not scale easily
* Is not really scientifically reproducible


In real workflows, we want to access satellite imagery assets from:

* Web services
* Cloud storage
* Via Application Programming Interfaces (APIs)

Here we demonstrate downloading imagery directly from the Microsoft Planetary Computer.

This is not necessarily the definitive way for you to access data, but it is a simplistic way for you to get working with the imagery you want to process without too much unnecessarily complicated code.

# Download a Landsat sample image

To use the Microsoft Planetary Computer you can view the [Landsat Collection 2 Level-2](https://planetarycomputer.microsoft.com/dataset/landsat-c2-l2#overview) page to get an introductory understanding.

This approach means we can readily access and download imagery without needing:
* Access credentials (e.g., API keys, M2M passwords etc.). These can also accidentally become embedded in online content if you are not careful.
* An AWS account to allow reverse charging.

In the code below, we will go at a list of possible LandSat product IDs based on the provided Area of Interest (AoI) and time period.




First, we need to install our packages.

Conceptually, this pipeline is:

* `pystac-client` finds the satellite scene
* `planetary-computer` gives permission to access the data
* `odc-stac` loads the image bands
* `matplotlib` displays the result

However, we need to begin with the following because three packages are not provided in standard Python or Colab by default:

`!pip install -q pystac-client planetary-computer odc-stac`

Without this, Python would not know how to search satellite image catalogs, access the Landsat data, or load the image bands into a usable format.

In [None]:
# Specify our external and internal Colab packages
!pip install -q pystac-client planetary-computer odc-stac #load external packages

import pystac_client #load internal Colab packages
import planetary_computer
import odc.stac
import matplotlib.pyplot as plt

#load single function
from pystac.extensions.eo import EOExtension as eo

These libraries allow us to search and load satellite imagery directly into Python, detailed as follows.

Via `pystac-client` we can search a STAC (SpatioTemporal Asset Catalog) for satellite images to find the correct Landsat scene by:
* location (bounding box)
* date range
* and cloud cover

Via `planetary-computer` we can access secure, signed download links to satellite imagery hosted by Microsoft‚Äôs Planetary Computer. The benefit here is that we can access Landsat data without creating user accounts or logging in.

Via `odc-stac` we can load the satellite image bands (red, green, blue, infrared, etc.) from STAC into Python as a multi-band dataset.
This converts remote satellite files into arrays we can analyze and visualize.

You should be familiar with `matplotlib.pyplot` from the previous class, which we use to display images and plots (for example, natural-color images and NDVI maps).

Additionally, the `EOExtension` from `pystac.extensions.eo` allows us to read Earth Observation (EO) metadata such as cloud cover from each satellite scene so we can choose the clearest image.



**Next**, we will use `pystac_client.Client.open()` to open a connection to a STAC API (SpatioTemporal Asset Catalog).

A STAC catalog is like an online database of satellite images that you can search by:

* location (bounding box)
* date range
* cloud cover
* satellite type (Landsat, etc.)

We will provide the web address of Microsoft‚Äôs Planetary Computer STAC catalog, which hosts metadata for Landsat and other datasets, e.g., `"https://planetarycomputer.microsoft.com/api/stac/v1"`.

Additionally, we will state `modifier=planetary_computer.sign_inplace`, which allows automatic adding of temporary access permissions (signatures) to the image download links so you can download the satellite files without creating an account or logging in.

In [None]:
# Use Microsoft Planetary Computer to search images
!pip install -q pystac-client planetary-computer odc-stac

import pystac_client
import planetary_computer
import odc.stac
import matplotlib.pyplot as plt

from pystac.extensions.eo import EOExtension as eo

#load our Python SpatioTemporal Asset Catalog (pystac)
catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)
catalog

Next, we need to set a bounding box and time period, so we can constrain our image search by location/time.

We will set Northern Virginia as the bounding box of interest (in EPSG:4326)

 `bbox_of_interest = [-77.50, 38.60, -77, 39.00]`

And also the time period of interest (in year-month-day format):

`time_of_interest = "2025-08-01/2025-09-28"`

We can then pass these to our `catalog.search()` function which searches the Landsat image catalog for scenes that:

* are from the Landsat Collection 2 Level-2 dataset (`collections=["landsat-c2-l2"]`)
* cover my area of interest (`bbox=bbox_of_interest`)
* were taken during my chosen time period (`datetime=time_of_interest`)
* and have less than 10% cloud cover (`query={"eo:cloud_cover": {"lt": 10}}`)



In [None]:
# Specify our catalog search
!pip install -q pystac-client planetary-computer odc-stac

import pystac_client
import planetary_computer
import odc.stac
import matplotlib.pyplot as plt

from pystac.extensions.eo import EOExtension as eo

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

bbox_of_interest = [-77.50, 38.60, -77, 39.00] #bbox coordinates
time_of_interest = "2025-08-01/2025-09-28" #time period of interest
search = catalog.search( #search catalog
    collections=["landsat-c2-l2"],    #use landsat
    bbox=bbox_of_interest,
    datetime=time_of_interest,
    query={"eo:cloud_cover": {"lt": 10}}, #only get <10% cloud cover
)
#print our search object
print(search)

To actually start viewing our research, we need to unpack the search object.

To do that, you can use a for loop, e.g. `for i in my_data_structure`:


In [None]:
# Loop simple example
my_list = [0,1,2,3]
for i in my_list:
  print(i)

In [None]:
# Unpack our catalog search
!pip install -q pystac-client planetary-computer odc-stac

import pystac_client
import planetary_computer
import odc.stac
import matplotlib.pyplot as plt

from pystac.extensions.eo import EOExtension as eo

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

bbox_of_interest = [-77.50, 38.60, -77, 39.00] #bbox coordinates
time_of_interest = "2025-08-01/2025-09-28" #time period of interest
search = catalog.search( #search catalog
    collections=["landsat-c2-l2"],    #use landsat
    bbox=bbox_of_interest,
    datetime=time_of_interest,
    query={"eo:cloud_cover": {"lt": 10}}, #only get <10% cloud cover
)
#iterate over our search object printing the contents
items = search.item_collection() #converts the contents to an item collection
for item in items:
  print(item) #this prints our landsat item IDs

## Exercise

Find how many images are <50% cloud cover in January 2024, versus July 2024.

You need to figure out how to change the time period and cloud cover parameters.


**Next**, using this code, we are able to select the image with the lowest cloud cover by using a lambda function, like so:

`selected_item = min(items, key=lambda item: eo.ext(item).cloud_cover)`

This includes the following:
* `items` - This is a list of Landsat scenes returned by the search.
* `min(items, key=...)` - Here we find the one item in the list with the smallest value according to the rule in `key=`.
* `lambda item: eo.ext(item).cloud_cover` - This is the rule used for comparison. So for each item, we look up the cloud cover percentage from the metadata.


In [None]:
# We can access imagery and sort by cloud cover!
!pip install -q pystac-client planetary-computer odc-stac

import pystac_client
import planetary_computer
import odc.stac
import matplotlib.pyplot as plt

from pystac.extensions.eo import EOExtension as eo

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

bbox_of_interest = [-77.50, 38.60, -77, 39.00] #bbox coordinates
time_of_interest = "2025-08-01/2025-09-28" #time period of interest
search = catalog.search( #search catalog
    collections=["landsat-c2-l2"],    #use landsat
    bbox=bbox_of_interest,
    datetime=time_of_interest,
    query={"eo:cloud_cover": {"lt": 10}}, #only get <10% cloud cover
)
#iterate over our search object printing the contents
items = search.item_collection() #converts the contents to an item collection

#this sorts by cloud cover and uses min() to select the smallest (minimum)value
selected_item = min(items, key=lambda item: eo.ext(item).cloud_cover)

#print the results below, inserting the metadata into the f-strings
#formatted string literals (f-strings) allow concise/readable printing
print(
    f"Choosing {selected_item.id} from {selected_item.datetime.date()}"
    + f" with {selected_item.properties['eo:cloud_cover']}% cloud cover"
)

You can view the metadata to see all assets. Each Electro-Optical (EO) band is a separate asset we can utilize.

As the data structure is a dictionary, we iterate over it using a for loop with two variables, one for the key and one for the value, hence `for key, value... in my_data_structure:`.

The `.items()` method returns each dictionary entry as a key‚Äìvalue pair, allowing both to be accessed on each iteration. A simple example is below:


In [None]:
# simple dictionary loop example
my_dictionary = {'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}

for key, value in my_dictionary.items():
  print(key, value)

In [None]:
# We can view image metadata for all layers
!pip install -q pystac-client planetary-computer odc-stac

import pystac_client
import planetary_computer
import odc.stac
import matplotlib.pyplot as plt

from pystac.extensions.eo import EOExtension as eo

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

bbox_of_interest = [-77.50, 38.60, -77, 39.00]
time_of_interest = "2025-08-01/2025-09-28"
search = catalog.search(
    collections=["landsat-c2-l2"],
    bbox=bbox_of_interest,
    datetime=time_of_interest,
    query={"eo:cloud_cover": {"lt": 10}},
)

items = search.item_collection()
selected_item = min(items, key=lambda item: eo.ext(item).cloud_cover)

#now print
for key, asset in selected_item.assets.items():
  print(key, asset.title)

In order to start working with the imagery, we need to load the red, green, and blue bands into our working memory. We do this  using `odc-stac`, into an xarray dataset called `data`.

`xarray` is a Python library for working with labeled, multi-dimensional data (like satellite images, time series, etc.).

Instead of using plain arrays where you only access data by index (e.g., `data[0, 10, 20]`), xarray lets you access data by dimension names and coordinates (e.g., `data["red"]`).

**Next**, we can start by specifying our bands of interest, as follows:

`bands_of_interest = ["nir08", "red", "green", "blue", "qa_pixel", "lwir11"]`

And then we can load the specified bands from the selected Landsat scene, crop them to my area of interest, and extract the single image from the time dimension so we can work with it directly, as follows:

`data = odc.stac.stac_load([selected_item], bands=bands_of_interest, bbox=bbox_of_interest).isel(time=0)`

Where:

* `odc.stac.stac_load()` reads the metadata from the STAC item downloads the requested image bands, and loads them into an xarray dataset (which is a labeled, multi-band data structure).
* `[selected_item]` is a list containing the chosen Landsat scene we previously selected.
* `bands=bands_of_interest` tells the function which bands to load.
* `bbox=bbox_of_interest` tells the function where to crop the image.
* `.isel(time=0)` tells the function the dataset has a time function, although we specify it is a single point in time, hence the first and only time slice.

In [None]:
# Render a natural color image of the AOI
!pip install -q pystac-client planetary-computer odc-stac

import pystac_client
import planetary_computer
import odc.stac
import matplotlib.pyplot as plt

from pystac.extensions.eo import EOExtension as eo

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

bbox_of_interest = [-77.50, 38.60, -77, 39.00]
time_of_interest = "2025-08-01/2025-09-28"
search = catalog.search(
    collections=["landsat-c2-l2"],
    bbox=bbox_of_interest,
    datetime=time_of_interest,
    query={"eo:cloud_cover": {"lt": 10}},
)

items = search.item_collection()
selected_item = min(items, key=lambda item: eo.ext(item).cloud_cover)

bands_of_interest = ["nir08", "red", "green", "blue", "qa_pixel", "lwir11"]
data = odc.stac.stac_load(
    [selected_item], bands=bands_of_interest, bbox=bbox_of_interest
).isel(time=0)
data

We can now convert this xarray Dataset to a DataArray and plot the RGB image.

We will do that using `matplotlib`, which you should remember from last week.


In [None]:
# Render a natural color image of the AOI
!pip install -q pystac-client planetary-computer odc-stac

import pystac_client
import planetary_computer
import odc.stac
import matplotlib.pyplot as plt

from pystac.extensions.eo import EOExtension as eo

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

bbox_of_interest = [-77.50, 38.60, -77, 39.00]
time_of_interest = "2025-08-01/2025-09-28"
search = catalog.search(
    collections=["landsat-c2-l2"],
    bbox=bbox_of_interest,
    datetime=time_of_interest,
    query={"eo:cloud_cover": {"lt": 10}},
)

items = search.item_collection()
selected_item = min(items, key=lambda item: eo.ext(item).cloud_cover)

bands_of_interest = ["nir08", "red", "green", "blue", "qa_pixel", "lwir11"]
data = odc.stac.stac_load(
    [selected_item], bands=bands_of_interest, bbox=bbox_of_interest
).isel(time=0)

## plot using matplotlib
#setup our figure and axis objects
fig, ax = plt.subplots(figsize=(10, 10))
#put the red, green and blue layers in a new xarray
my_xarray = data[["red", "green", "blue"]].to_array()
#plot our data
my_xarray.plot.imshow(robust=True, ax=ax)
#add a title
ax.set_title("Natural Color, Northern Virginia, VA")
#set output filename
output_filename = "natural_color_northern_virginia.png"
#save the figure to a .png file
plt.savefig(output_filename, dpi=300, bbox_inches="tight")


# Exercise

Make a true color composite which manipulates the bounding box to only include half the area around Fairfax (you do not need to be exact, just show you can subset the image).

Tip, you could pass a new `bbox` list to `data = odc.stac.stac_load(...)` in a new cell, which might be more computationally efficient (and then continue to manipulate the following code accordingly).

# False color composite images

A false composite is a satellite image with non-normal coloration. By that we mean that the colors do not represent what the human eye would normally see.

This is achieved by going beyond only the visible light bands which the human eye can see (e.g., red, green, and blue), to utilize non-visible wavelengths, such as near-infrared (NIR) or shortwave infrared (SWIR).

This enables scientists to highlight features that are difficult or impossible to see in natural color images.

**Next**, we will develop a Normalized Difference Vegetation Index (NDVI) for Northern Virginia during late summer 2025. NDVI is calculated from the red and near-infrared (NIR) bands of the Landsat satellite image using the formula:




$$
\text{NDVI} = \frac{\text{NIR} - \text{Red}}{\text{NIR} + \text{Red}}
$$


NDVI values range from ‚Äì1 to +1 and indicate the presence and condition of vegetation.

Higher values (yellow to green colors) represent areas with healthy, dense vegetation such as forests and croplands.

Lower values (dark colors) correspond to urban areas, bare soil, or water, where little or no vegetation is present.

The plot below highlights spatial differences in vegetation across the region and provides a quantitative way to compare plant cover and health over the area of interest.

In [None]:
# Render an NDVI image of the AOI
!pip install -q pystac-client planetary-computer odc-stac

import pystac_client
import planetary_computer
import odc.stac
import matplotlib.pyplot as plt

from pystac.extensions.eo import EOExtension as eo

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

bbox_of_interest = [-77.50, 38.60, -77, 39.00]
time_of_interest = "2025-08-01/2025-09-28"
search = catalog.search(
    collections=["landsat-c2-l2"],
    bbox=bbox_of_interest,
    datetime=time_of_interest,
    query={"eo:cloud_cover": {"lt": 10}},
)

items = search.item_collection()
selected_item = min(items, key=lambda item: eo.ext(item).cloud_cover)

bands_of_interest = ["nir08", "red", "green", "blue", "qa_pixel", "lwir11"]
data = odc.stac.stac_load(
    [selected_item], bands=bands_of_interest, bbox=bbox_of_interest
).isel(time=0)

red = data["red"].astype("float")
nir = data["nir08"].astype("float")
ndvi = (nir - red) / (nir + red)

## plot using matplotlib
#setup our figure and axis objects
fig, ax = plt.subplots(figsize=(10, 10))
#plot our data
ndvi.plot.imshow(robust=True, ax=ax)
#add a title
ax.set_title("NDVI, Northern Virginia, VA")
#set output filename
output_filename = "ndvi_false_color_northern_virginia.png"
#save the figure to a .png file
plt.savefig(output_filename, dpi=300, bbox_inches="tight")

## Validation

Our colorbar spans roughly ‚Äì0.4 to +0.45, which is a realistic NDVI range:

* Water (rivers/lakes) has light/blue tones (negative NDVI)
* Urban and built-up areas have a near 0 value (whitish/light tones)
* Vegetated areas (forests, parks, fields) have higher positive values (reds)

This pattern is what we would normally expect.



## Plotting

You should experiment with changing different aspects of the map, e.g., size, color etc.

Here, we can start by changing the colorscale to viridis.

In [None]:
# Change the colorscale to viridis
!pip install -q pystac-client planetary-computer odc-stac

import pystac_client
import planetary_computer
import odc.stac
import matplotlib.pyplot as plt

from pystac.extensions.eo import EOExtension as eo

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

bbox_of_interest = [-77.50, 38.60, -77, 39.00]
time_of_interest = "2025-08-01/2025-09-28"
search = catalog.search(
    collections=["landsat-c2-l2"],
    bbox=bbox_of_interest,
    datetime=time_of_interest,
    query={"eo:cloud_cover": {"lt": 10}},
)

items = search.item_collection()
selected_item = min(items, key=lambda item: eo.ext(item).cloud_cover)

bands_of_interest = ["nir08", "red", "green", "blue", "qa_pixel", "lwir11"]
data = odc.stac.stac_load(
    [selected_item], bands=bands_of_interest, bbox=bbox_of_interest
).isel(time=0)

red = data["red"].astype("float")
nir = data["nir08"].astype("float")
ndvi = (nir - red) / (nir + red)

## plot using matplotlib
#setup our figure and axis objects
fig, ax = plt.subplots(figsize=(10, 10))
#plot our data
ndvi.plot.imshow(robust=True, ax=ax, cmap="viridis")
#add a title
ax.set_title("NDVI, Northern Virginia, VA")
#set output filename
output_filename = "ndvi_viridis_false_color_northern_virginia.png"
#save the figure to a .png file
plt.savefig(output_filename, dpi=300, bbox_inches="tight")


# Exercise

* Identify three areas with high NDVI and describe what land cover they likely represent.
* Identify three areas with low NDVI and describe what land cover they likely represent.
* Explain why rivers and lakes have negative or near-zero NDVI values.
* Modify the bounding box and data range for a new location (e.g., Vermont) to create new three maps over fall 2025 which capture visibly changing NDVI.
