# Notebook 5: Accessing satellite imagery of OU campus

In previous notebooks we have begun to explore vector and raster data related to land use on the OU campus. Now we are going to learn about using Python to access and work with satellite imagery.

In this notebook we will:

- **acquire satellite image data** (both Sentinel-2 and Landsat) from **Microsoft's Planetary Computer**,
- build a basic understanding of the **structure and data content** of these satellite created images,
- explore basic **viewing and manipulation of satellite imagery**.

In [None]:
# Need to do some date math and need to work with file paths
from datetime import timedelta
from pathlib import Path

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd

In [None]:
from IPython.display import Image
#from PIL import Image as PILImage

In [None]:
%matplotlib inline

The general steps we'll use to pull satellite data are:

   1. Establish a connection to the Planetary Computer's STAC API using the `planetary_computer` and `pystac_client` Python packages.

   2. Query the STAC API for recent scenes that capture the OU campus. We will use only Sentinel-2 L2A data.

   3. Select one image that is recent and has low cloud cover.

Using the Planetary Computer's STAC browswer, I searched for images containing the OU campus. I picked one from early April 1, 2024 that looked relatively cloud free. After selecting an image item, you'll be able to click on the squiqqly brackets icon to get a Python code snippet for accessing this item via the PyStac API. Here's the snippet:

In [None]:
import pystac
import planetary_computer
import rioxarray

item_url = "https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-2-l2a/items/S2A_MSIL2A_20240401T162831_R083_T17TLH_20240402T014917"

# Load the individual item metadata and sign the assets
item = pystac.Item.from_file(item_url)

signed_item = planetary_computer.sign(item)

# Open one of the data assets (other asset keys to use: 'B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B09', 'B11', 'B12', 'B8A', 'SCL', 'WVP', 'visual', 'preview')
asset_href = signed_item.assets["AOT"].href
ds = rioxarray.open_rasterio(asset_href)
ds




There's much to be learned from a careful look at this code. First, the imports.

- `pystac` - we need this work work with MPC's STAC API
- `planetary_computer` - even though MPC allows free access via the STAC API, we need this library in order to *sign* the item we are trying to retrieve. As we'll see later, this results in a long token string getting appended to the item URL. More on signing later.
- `rioxarray` - as we saw in the introduction to raster data, rioxarray is needed to open the actual raster image file and tuck it into an xarray `DataArray`.

Now for the `item_url`. It looks like this:

    https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-2-l2a/items/
    S2A_MSIL2A_20240401T162831_R083_T17TLH_20240402T014917
    
We see from the URL that:

- we are going to be using the STAC API
- this image is from the Sentinel-2 mission
- this is Level 2A data
- this particular image is one of a larger collection of image items
- the particular image has a unique id of `S2A_MSIL2A_20240401T162831_R083_T17TLH_20240402T014917`

Notice that the *datatake sensing time* (a date and time) are embedded in the `id`. The `id` naming conventions are explained at [https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/naming-convention](https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/naming-convention) which also includes links to detailed product specification pages.

The `S2A` is the *mission id* and the `MSIL2A` is:

> MSIL2A denotes the Level-2A product level

The `R083` is is the *relative orbit number* and the `T17TLH` is a *tile number field* and the second datetime is:

> The second date is the \<Product Discriminator> field, which is 15 characters in length, and is used to distinguish between different end user products from the same datatake. Depending on the instance, the time in this field can be earlier or slightly later than the datatake sensing time.


Grab the image item and sign it.

```python
item = pystac.Item.from_file(item_url)
signed_item = planetary_computer.sign(item)
```

We are just passing the URL for the item we want and using `pystac` to get it and `planetary_computer` to sign it. If it's not signed, we won't be able to access its details. Let's explore this PyStac `Item` object.

In [None]:
print(signed_item)

What about the attributes of an `Item`?

In [None]:
[att for att in dir(signed_item) if not att.startswith('_')]

Let's check out a few basic things.

In [None]:
print(f'The item is id {signed_item.id}')
print(f'The bounding box for this item is {signed_item.bbox}')

In [None]:
signed_item.self_href

The `properties` property is a dictionary containing quite a bit of information.

In [None]:
signed_item.properties

The `'proj:epsg': 32617` corresponds to the WGS 84 / UTM zone 17N coordinate reference system. See [https://epsg.io/32617](https://epsg.io/32617). The units are in meters. 

The actual data we are after lives in the `assets` dictionary. Before diving in, let's see what assets are available.

In [None]:
for asset_key, asset in signed_item.assets.items():
    print(f"{asset_key:<25} - {asset.title}")

There are visible bands (red, green, and blue), as well as a number of other spectral ranges and a few algorithmic bands. The Sentinel-2 [mission guide](https://docs.sentinel-hub.com/api/latest/data/sentinel-2-l2a/#available-bands-and-data) has more details about what these bands are and how to use them.

The SLC, AOT (haze), and WVP are considered *Quality Assurance* bands and can be useful in filtering out low quality images. See [https://docs.digitalearthafrica.org/en/latest/data_specs/Sentinel-2_Level-2A_specs.html](https://docs.digitalearthafrica.org/en/latest/data_specs/Sentinel-2_Level-2A_specs.html)
for a nice summary of these.

As we'll soon see, the `visual` band contains the red, green, and blue bands (not surprising). The `rendered_preview` is a png file. So, that is pretty straightforward to view using `IPython.display.Image`.

In [None]:
Image(url=signed_item.assets['rendered_preview'].href)

Obviously, we are only interested in a small portion of this image. In a future notebook, we'll learn how to *clip* or *crop* images using **rioxarray**. 

Also, we manually found our area of interest and obtained the code snippet for acquiring that image based on a specific URL. Now, let's learn how to do this programmatically by finding images that intersect our area of interest.

# Code driven search for images with the STAC API

We used the Planetary Computer's Explore feature to find an image of interest. Now, we'll use a bounding box along with a date range to find all the images available for that location at that time. For the bounding box, we'll use the bounds we found for the OU campus polygon back in the **ou_land_use_03_crs.ipynb** notebook.


In [None]:
ou_boundary_file = Path('../data', 'ou_boundary.geojson')
ou_boundary_gdf = gpd.read_file(ou_boundary_file)
bbox = ou_boundary_gdf.total_bounds
bbox

We can pass a date range to the Planetary Computer as a string such as `2024-02-29/2024-03-30`. Let's create a function in which we can pass an end date and the number of days back to include in the search.

In [None]:
# get our date range to search, and format correctly for query
def get_date_range(end_date, time_buffer_days=10):
    """Get a date range to search for in the planetary computer based
    on a sample's date. The time range will include the sample date
    and time_buffer_days days prior

    Returns a string"""
    datetime_format = "%Y-%m-%d"
    range_start = pd.to_datetime(end_date) - timedelta(days=time_buffer_days)
    date_range = f"{range_start.strftime(datetime_format)}/{pd.to_datetime(end_date).strftime(datetime_format)}"

    return date_range

In [None]:
target_date = "2024-04-03"

In [None]:
target_date_range = get_date_range(target_date, time_buffer_days=10)
target_date_range

This next step essentially "signs in" to the MPC catalog of data so that we can search and acquire the data we are interested in.

In [None]:
# Establish a connection to the STAC API
from pystac_client import Client

In [None]:
catalog = Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1", modifier=planetary_computer.sign_inplace
)

catalog

To search the catalog we will supply three different types of criteria:

- which collections to search (e.g. "sentinel-2-l2a")
- a bounding box of coordinates
- a date range

Any item with the specified collection(s), that intersect the bounding box and were acquired within the date range will be returned.

In [None]:
#help(Client.search)

In [None]:
# search the planetary computer sentinel-l2a collection
search = catalog.search(
    collections=["sentinel-2-l2a"], bbox=bbox, datetime=target_date_range
)

# see how many items were returned
items = search.item_collection()
print(f'{len(items)} items found')
print(f'items is a {type(items)}')
print(f'items[0] is a {type(items[0])}')

Great, it worked. By looking at the `id` values, we can see the specific Sentinel-2 images we found.

In [None]:
for item in items:
    print(item.id)

Look at the properties for one of the items.

In [None]:
# Sentinel-2 item
items[0].properties

We can use the `eo` extension to sort by cloudiness.

In [None]:
selected_item = min(items, key=lambda item: item.properties["eo:cloud_cover"])
print(selected_item)
print(selected_item.properties["eo:cloud_cover"])

### Previewing the item imagery

As we saw earlier, each STAC item has one or more [Assets](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#asset-object), which include links to the actual files.

Let's take a look at the `rendered_preview` asset as it contains the viewable image that we are interested in right now. You might wonder what all those other spectral bands are used or. We'll revisit that topic later.

In [None]:
selected_item.assets["rendered_preview"].to_dict()

Ah, so the `href` key contains the URL of the actual image file (a png file). Let's take a look!

In [None]:
Image(url=selected_item.assets["rendered_preview"].href, width=500)

You can see the clouds near the top of the image. Hopefully, they aren't over the OU campus. 

Now let's take a look at the `visual` asset. This is raw data and is stored in what is known as a [cloud optimized GeoTIFF (COG)](https://www.cogeo.org/) in Azure Blog Storage.



In [None]:
selected_item.assets["visual"].to_dict()

A few things to note:

- This is the 'True color image', so we'd expect to find red, green and blue bands included in it.
- Yep, the three included bands are B04, B03, and B02 (red, green and blue).
- The raster is 10980 rows by 10980 columns.
- The bounding box is expressed in a projected CRS set of coordinates (i.e. those aren't lat-lon values). Earlier we saw that this item uses EPSG:32617 (WGS84 UTM Zone 17N).
- There is a URL from which we can grab the TIFF file.

Obviously, this image contains much more data than we actually need. How can we select just the part of the raster corresponding to the OU campus? We'll address this in the next section on raster file manipulation.

Before ending this notebook, let's save the `visual` asset as a GeoTIFF to disk. 

In [None]:
import requests

In [None]:
signed_href = signed_item.assets['visual'].href

We'll keep the original filename and we can extract it from the URL by splitting on the `'?'` and then using Pathlib's `name` attribute.

In [None]:
base_url = signed_item.assets['visual'].href.split('?')[0]
print(base_url)
filename = Path(base_url).name
print(filename)

Ok, let's save it in the `../data` subfolder.


In [None]:
# use requests to grab the file and write it out

redownload = False # Avoiding redownloading after we've done it once.

if redownload:
    response = requests.get(signed_href)
    with open(Path('../data', filename), "wb") as f:
        f.write(response.content)

Now, this is a big file (>130Mb). Not surprising as it's approximately a 10k by 10k matrix with 3 bands. Later we will learn how to find just the part of large TIFF files that we need.

### Using rioxarray to load a satellite image

One of the bands in the Sentinel-2 COGs is B08, the near infrared band. This band is used in computing things like the NVDI, a commonly used vegetation index for classifying raster data.

In [None]:
nir_href = signed_item.assets['B08'].href

In [None]:
import rioxarray

In [None]:
nir_da = rioxarray.open_rasterio(nir_href)
nir_da

In [None]:
base_url = signed_item.assets['B08'].href.split('?')[0]
print(base_url)
filename = Path(base_url).name
print(filename)

Now we can save it to disk using the `to_raster()` function in **rioxarray**.

In [None]:
#nir_da.rio.to_raster(Path('../data/', filename))

Let's plot it to see what it looks like. However, this is a huge raster and may very well crash our Jupyter kernel. So, let's just plot a subset of it.

In [None]:
nir_da[0, 3000:3100, 3000:3100].plot()

### Landsat challenge

Repeat the above search for satellite images, but only search the Landsat Level 2 collection ("landsat-c2-l2"). Some questions and tasks to attempt:

- Does the Landsat item contain a visual asset like Sentinel?
- Plot the rendered preview
- What is the resolution of Landsat images and what property tells us this?

The following two resources have some useful information but are not critical to completing the task.

- [https://bitsofanalytics.org/posts/algaebloom-part3/#what-about-those-landsat-images](https://bitsofanalytics.org/posts/algaebloom-part3/#what-about-those-landsat-images)
- [https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/](https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/)



### Answer

In [None]:
catalog = Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1", modifier=planetary_computer.sign_inplace
)

search = catalog.search(
    collections=["landsat-c2-l2"], bbox=bbox, datetime=target_date_range
)

# see how many items were returned
items_landsat = search.item_collection()
print(f'{len(items_landsat)} items found')
print(f'items_landsat is a {type(items_landsat)}')
print(f'items_landsat[0] is a {type(items_landsat[0])}')

Great, it worked. By looking at the `id` values, we can see the specific Sentinel-2 images we found.

In [None]:
for item in items_landsat:
    print(item.id)

Look at the properties for one of the items.

In [None]:
# Sentinel-2 item
item.properties

Let's look at the `assets`.


In [None]:
for asset_key, asset in item.assets.items():
    print(f"{asset_key:<25} - {asset.title}")

Yep, some of the assets are different, though some are shared. 

- Sentinel-2 contains a 'visual' band that includes the red, green, and blue bands
- Landsat has individual red, green and blue bands, but not a convenient 'visual' band
- The `gsd` property of the Landsat item indicates that the resolution is 30m. Sentinel-2 gives us 10m resolution for several of the bands.

In [None]:
Image(url=item.assets["rendered_preview"].href, width=500)