# Create covariate stacks

### Covariates will be derived by reducing satellite imagery to useful modelling layers
#### Covariates can include: Landsat 8, NASADEM, Sentinel-2, Sentinel-1

### Inputs:
#### bbox_img_tile: An image tile extent will provide the bbox that accesses cloud-optimized GeoTiffs
#### filters: search for imagery based on days-of-year, cloudcover, year range
##### Note: Image tile extent will be dictated by data volume of covars & CPU RAM

In [40]:
# Search for imagery
# https://github.com/developmentseed/example-jupyter-notebooks/blob/landsat-search/notebooks/Landsat8-Search/L8-USGS-satapi.ipynb
import os
import json
import requests
import datetime

sat_api_url = "https://landsatlook.usgs.gov/sat-api"

def query_satapi(query):
    headers = {
            "Content-Type": "application/json",
            "Accept-Encoding": "gzip",
            "Accept": "application/geo+json",
        }

    url = f"{sat_api_url}/stac/search"
    data = requests.post(url, headers=headers, json=query).json()
    
    return data

def query_year(year, bbox, min_cloud, max_cloud):
    '''Given the year, finds the number of scenes matching the query and returns it.'''
    date_min = '-'.join([str(year), "06-01"])
    date_max = '-'.join([str(year), "09-15"])
    start_date = datetime.datetime.strptime(date_min, "%Y-%m-%d")
    end_date = datetime.datetime.strptime(date_max, "%Y-%m-%d") 
    start = start_date.strftime("%Y-%m-%dT00:00:00Z")
    end = end_date.strftime("%Y-%m-%dT23:59:59Z")
    
    query = {
    "time": f"{start}/{end}",
    "bbox":bbox,
    "query": {
        "eo:platform": {"eq": "LANDSAT_8"},
        "eo:cloud_cover": {"gte": min_cloud, "lt": max_cloud},
        "collection":{"eq": "landsat-c2l2-sr"},
        "landsat:collection_category":{"eq": "T1"}
        },
    "limit": 20 # We limit to 500 items per Page (requests) to make sure sat-api doesn't fail to return big features collection
    }
    
    data = query_satapi(query)
    
    # you can't trouble shoot if you don't return the actual results
    return data

In [44]:
# Accessing imagery
# Select an area of interest
bbox_list = [[-105,45,-100,50], [-101,45,-100,46]]
min_cloud = 0
max_cloud = 20
for bbox in bbox_list:
    # Geojson of total scenes - Change to list of scenes
    response_by_year = [query_year(year, bbox, min_cloud, max_cloud ) for year in range(2015,2020 + 1)]
    scene_totals = [each['meta']['found'] for each in response_by_year]
    print(scene_totals)

[78, 80, 99, 95, 58, 97]
[12, 20, 19, 17, 17, 15]


## Debugging

The next few code chunks were inserted just to debug why the query was returning so many results for such a small bounding box. The answer was that bbox needs to be outside the query, and adding a platform and tier to the query reduces even more. Note: everything in the query section follows a very specific format. TODO link the STAC docs explaining the query language.

In [41]:
# It's better to just return the whole response, so you can iterate and debug through it.
bbox_list = [[-105,45,-100,50], [-101,45,-100,46]]
min_cloud = 0
max_cloud = 20

response = query_year(2020, bbox_list[1], min_cloud, max_cloud) 
scenes = response['meta']['found']
scenes

15

In [45]:
# Some helpful ways to debug
#response['features'][0]
#for item in response['features']: print(item['id'])

LC08_L2SP_031029_20200915_20200919_02_T1
LC08_L2SP_032029_20200906_20200918_02_T1
LC08_L2SP_031029_20200830_20200906_02_T1
LC08_L2SP_033028_20200828_20200906_02_T1
LC08_L2SP_031029_20200814_20200920_02_T1
LC08_L2SP_033028_20200812_20200918_02_T1
LC08_L2SP_032029_20200805_20200916_02_T1
LC08_L2SP_032029_20200805_20200821_02_T1
LC08_L2SP_033028_20200727_20200908_02_T1
LC08_L2SP_033028_20200727_20200806_02_T1
LC08_L2SP_032029_20200704_20200913_02_T1
LC08_L2SP_031029_20200627_20200823_02_T1
LC08_L2SP_031028_20200627_20200824_02_T1
LC08_L2SP_031028_20200611_20200824_02_T1
LC08_L2SP_031029_20200611_20200824_02_T1


## Accessing Data

Now that we've queried for which landsat scenes to use in a given analysis tile, we need to setup AWS authentication and iterate over the assets, subsetting based on the tile (bbox). An AWS IAM account access key and secret are required so the bill goes to the right place for any requester pays costs.

In [None]:
aws_access_key_id='?'
aws_secret_access_key='?'

In [None]:
# Processing imagery (band math)
# is this done on the fly? No

# Authenticate to aws to access a requester pays bucket
aws_session = AWSSession(boto3.Session(aws_access_key_id=aws_access_key_id,
                                      aws_secret_access_key=aws_secret_access_key),                           requester_pays=True)

In [54]:
# The returned data is actually geojson, could load into a geopandas dataframe

# Get the first feature, and lookup the url of Band 4 asset
asset = response_by_year[0]['features'][0]['assets']['SR_B4.TIF']['href']

# Convert to S3 url for use with requester pays
cog = asset.replace('https://landsatlook.usgs.gov/data/', 's3://usgs-landsat/')
print(cog)

s3://usgs-landsat/collection02/level-2/standard/oli-tirs/2015/032/028/LC08_L2SP_032028_20150909_20200908_02_T1/LC08_L2SP_032028_20150909_20200908_02_T1_SR_B4.TIF


In [None]:
with rio.Env(aws_session):
    with rio.open(cog) as src:
        profile = src.profile
        arr = src.read(1)

#plot the raster
imshow(arr)

In [None]:
# Adding processed imagery to a covariate stack