Using the Basemaps API
====================

The Basemaps API allows access to data and metadata for basemaps you have access to as well as searching for and downloading quad data.

This tutorial will introduce key concepts such as the relationship between Series, Mosaics, and Quads.  Along the way, we'll construct a small python client that you can use to do things like download quads through time for an AOI and TOI.

Dependencies
--------------------

We'll only be using two non-stdlib python libraries, and both are very common dependencies.  We'll use `requests` and `urllib3`.  If you have `requests` installed, then you also have `urllib3` installed. If you don't, use `pip install requests` or similar approaches to install the `requests` library.


Planet API Concepts
-----------------------------

For all Planet APIs, you'll need to set up authentication, retry "slow down" responses automatically, and handle pagination of responses.  None of these are specific to the basemaps API, but it's important to understand how to handle all of them in whatever language or environment you're using whenever you're working with any Planet API.

First off, let's set up your API key and set up the `requests` library to pass it along and automatically retry some types of error codes.  Read the overview of API mechanics for any of Planet's APIs here to more fully understand what is being set up: https://developers.planet.com/docs/analytics/api-mechanics/

In [None]:
import os
import time

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

# Let's get your API key. I'm assuming it's been set in the environment in this example
# but you can also pass it along here...
api_key = os.getenv("PL_API_KEY")

# Next, let's set up requests so that it will pass along the API key automatically
session = requests.Session()
session.auth = (api_key, '')

# We also need to set up requests to honor and retry common "slow down" status codes
# that the API may respond with as well as other retryable statuses.
retries = Retry(total=10, backoff_factor=1, status_forcelist=[429, 502, 503])
session.mount('https://', HTTPAdapter(max_retries=retries))

Just to aid readability and avoid displaying any sensitive information, let's also set up a quick utility printing method:

In [None]:
import re
import pprint

def display(content):
    result = pprint.pformat(content)
    result = re.sub(r'api_key=\w+', 'api_key=XXXX', result)
    print(result)

Next, let's list all mosaics you have access to.  This may be anywhere from just a few to tens of thousands.  To do so, we'll also need ot understand a key part of all Planet APIs: pagination.  The results from any search or listing response are paginated and return "chunks" of values rather than the entire set at once to reduce response time.  Therefore, for all Planet APIs, you need to handle pagination in the client code you write to interface with the APIs.  When results are paginated, there will be a `_next` URL inside the `_links` section of the response to retrieve the next chunk of results.

So for example, let's try to list all mosaics that you have access to. For most users, you'll only get the first few responses and there will be a link for the next chunk of results. However, if you only have access to a few basemaps, there might only be one chunk and therefore no `_next` link.

In [None]:
basemaps_api_url = 'https://api.planet.com/basemaps/v1'
url = f'{basemaps_api_url}/mosaics'

rv = session.get(url)
rv.raise_for_status()
result = rv.json()
display(result)

Okay, that's a big blob of JSON that's hard to read. Let's take a closer look at its structure...

In [None]:
print(f"This chunk has {len(result['mosaics'])} mosaics")
if '_next' in result['_links']:
    print("There are more chunks to fetch - we did not get all results in the first page")
else:
    print("We recieved all results in the first page")

For most users, you'll have multiple pages of results.  Even if you don't, you should still write the code in a way that expects multiple pages, as we'll use a similar pattern when searching for quads and doing other operations where there will always be multiple pages of results.

Here's an example of what that might look like:

In [None]:
basemaps_api_url = 'https://api.planet.com/basemaps/v1'
url = f'{basemaps_api_url}/mosaics'

pages = []
while True:
    rv = session.get(url)
    rv.raise_for_status()
    page = rv.json()
    pages.append(page)
    if '_next' in page['_links']:
        url = page['_links']['_next']
    else:
        break

That will work, but it's quite inconvient because the pages are an implementation detail that we shouldn't need to care about.  We just want an iterator over the individual items in each page.  Therefore, let's rewrite that a bit to something that's a touch more general:

In [None]:
def paginated_get(session, url, item_key, **kwargs):
    while True:
        rv = session.get(url, **kwargs)
        rv.raise_for_status()
        page = rv.json()
        
        for item in page[item_key]:
            yield item
            
        if '_next' in page['_links']:
            url = page['_links']['_next']
        else:
            break

The general structure of making requests in the example above needs to be applied to any listing or searching operation for all Planet APIs.  Otherwise, you won't get all requests and you'll see only the first handful.  With that out of the way, we can dive a bit deeper into interacting with the Planet Basemaps API and look at mosaics, which are a dataset for a single time interval.

Understanding Mosaic Metadata
----------------------------------------------

Let's back up a bit and take a look at one of the items that we listed earlier:

In [None]:
display(result['mosaics'][0])

That's the metadata for an individual mosaic.  The https://api.planet.com/basemaps/v1/mosaics endpoint retrieves info for mosaics, and each mosaic is a spatially consistent dataset for a specific time period.  The time period of data used is indicated by the `first_acquired` and `last_acquired` timestamps. Note that these are full timestamps in UTC time and 00:00 hours represents midnight of the following day per ISO specification.  In other words, `2024-01-01T00:00:00.000Z` is midnight in UTC time between December 31st 2023 and January 1st 2024.  See https://developers.planet.com/docs/basemaps/reference/#tag/Basemaps-and-Mosaics/operation/listMosaics for a description of all fields.

Note that the `_self` entry under `_links` uses the `id`, which is a relatively non-human-readable UUID and not the mosaic name.  In the basemaps API, you can't directly request an item by name. Instead, you need to search with things for that name.  You can directly request an item by `id`, but it's rare that you'll have the `id` for a mosaic or series.  Most of the time, you'll know the name, not the `id`.

Let's look up a mosaic by name. To do this, we'll use the `name__is` parameter (there is also `name__contains` if you want to search for mosaics that have a specific string in their names).

In [None]:
mosaic_name = 'global_monthly_2021_02_mosaic'

rv = session.get(f'{basemaps_api_url}/mosaics', params={'name__is': mosaic_name})
rv.raise_for_status()
display(rv.json()['mosaics'][0])

Working with Series
-----------------------------

The basemaps product is a timeseries-focused analysis ready dataset.  While these examples focus on visual mosaics, note that SR mosaics are also common and are readily available.  Mosaics represent individual timesteps.  Series represent a timeseries of related data and each mosaic in the series is a different time interval. While not all mosaics belong to a larger time series, for most use cases, you'll want to start with a series rather than individual mosaics.

Let's start by listing the names of all series that you have access to:

In [None]:
all_series = paginated_get(session, f'{basemaps_api_url}/series', 'series')
for item in all_series:
    print(item['name'])

Similar to mosaics, we can search based on a substring within the name and also look up a specific item by name. E.g. let's search for all standard "select basemap" series by listing all series with "subscription" in their names:

In [None]:
results = paginated_get(
    session, 
    f'{basemaps_api_url}/series', 
    'series', 
    params={'name__contains': 'subscription'},
)
for item in results:
    print(item['name'])

Similar to mosaics, we can look up an indiviual mosaic by a `name__is` query when we don't know the ID of the series:

In [None]:
series_name = 'Global Monthly'

rv = session.get(f'{basemaps_api_url}/series', params={'name__is': series_name})
rv.raise_for_status()
display(rv.json()['series'][0])

Note the `interval` field in the series metadata displayed above. A series is a time-ordered set of mosaics on a regular cadence. Each mosaic within the series will have the same `interval` as the series.

A key part of using a series is listing mosaics within that series. We can also limit the results to a particular time window via the `acquired__lt` and `acquired__gt` parameters.  Let's take a quick look at what that looks like:

In [None]:
def series_metadata(session, series_name):
    rv = session.get(f'{basemaps_api_url}/series', params={'name__is': series_name})
    rv.raise_for_status()
    return rv.json()['series'][0]

def mosaics_in_series(session, series_name, start_date=None, end_date=None):
    info = series_metadata(session, series_name)
    url = info['_links']['mosaics']
    mosaics = paginated_get(
        session, 
        url, 
        'mosaics', 
        params={
            'acquired__lt': end_date, 
            'acquired__gt': start_date,
        },
    )
    return mosaics

mosaics = mosaics_in_series(session, 'Global Monthly', start_date='2018-01-01', end_date='2019-01-01')
for mosaic in mosaics:
    display(mosaic)
    print('\n')

Searching for Quads
------------------------------

Each mosaic is a single seamless dataset. However, the data is stored in smaller pieces called quads.  To download data via the basemaps API, you need to search for quads within a mosaic and download the data for each one.  Quads also contain critical localized metadata such as the scenes that contributed the data for each quad.

There are two ways to search for quads: An arbitrary polygon search and a rectangular bounding box search. Let's explore some quick examples of both. To start with, we'll use a bounding box search:

In [None]:
def mosaic_metadata(session, mosaic_name):
    rv = session.get(f'{basemaps_api_url}/mosaics', params={'name__is': mosaic_name})
    rv.raise_for_status()
    return rv.json()['mosaics'][0]

def bbox_quad_search(session, mosaic_name, bbox):
    info = mosaic_metadata(session, mosaic_name)
    url = f'{basemaps_api_url}/mosaics/{info["id"]}/quads'
    quads = paginated_get(
        session, 
        url, 
        'items', 
        params={'bbox': ','.join(map(str, bbox))}
    )
    return quads
    
# lon_min, lat_min, lon_max, lat_max
bbox = -87.1, 35, -87, 35.1

for item in bbox_quad_search(session, 'global_monthly_2021_02_mosaic', bbox):
    display(item)

Note that each quad that's returned has some basic metadata such as a bounding box for the quad, the percentage of coverage (i.e. non-nodata pixels) and an ID.  You can also retrieve a quad directly by its ID. See the structure of the `_self` link to see an exmaple of a URL for a single quad.

The `download` link gives a URL that can be used to download the data for each quad.  We'll come back to that in a future example.

The `items` link can be used to retrieve contributing scene information for each quad.  The results will be a list of Planet Data API urls to the scenes that contributed. Let's take a quick look at an exmaple of those:

In [None]:
for quad in bbox_quad_search(session, 'global_monthly_2021_02_mosaic', bbox):
    url = quad['_links']['items']
    rv = session.get(url)
    rv.raise_for_status()
    print(quad['id'])
    print(rv.json()['items'])
    print('\n')

Let's go back to searching for quads for a bit. In addition to bounding box searches, we can also make arbitrary polygon searches for quads. 

To do that, let's get an AOI set up. You could use `fiona` to read a polgyon from a shapefile, if that's easier. Note that this is a single geometry (can be a multipolygon or a polygon), and not a `FeatureCollection` or a `Feature`, just the raw geometry portion.

In [None]:
# The API expects a geometry, not a feature or a feature collection!
# This is a large AOI in the southeastern US.
polygon = {
    "type": "Polygon",
    "coordinates": [[[-84.8280089534599, 33.593106818093858],
                     [-85.054128959451859, 33.140866806109933],
                     [-84.266356035350825, 32.783451312767802],
                     [-83.693761826629256, 33.523811977547936],
                     [-84.295532810317539, 33.954169408306832],
                     [-84.8280089534599, 33.593106818093858]]]
}

Arbitrary polygon quad searches work slightly differently than bounding box searches. Instead of using a GET request to the quads endpoint with a bbox parameter, we'll need to POST a geometry to the `quads/search` endpoint.  That means the pattern is slightly different:

In [None]:
def paginated_query(session, url, payload, item_key, **kwargs):
    rv = session.post(url, json=payload, **kwargs)
    rv.raise_for_status()
    page = rv.json()
    for item in page[item_key]:
        yield item
    
    if '_next' in page['_links']:
        url = page['_links']['_next']
        for item in paginated_get(session, url, item_key, **kwargs):
            yield item
            
def polygon_quad_search(session, mosaic_name, polygon):
    info = mosaic_metadata(session, mosaic_name)
    url = f'{basemaps_api_url}/mosaics/{info["id"]}/quads/search'
    return paginated_query(session, url, polygon, 'items')

for quad in polygon_quad_search(session, 'global_monthly_2021_02_mosaic', polygon):
    display(quad)
    print('\n')

Building a Client
------------------------

Up until now, we've been doing a lot of rather suboptimal things like passing around a `session` object everywhere and relying on a lot of global variables from previous cells.  These get clunky quickly, and you can see how it would be more convienent to have a few classes that handle this state for us with convienent methods to do common tasks with the basemaps API.  Put another way, the examples shown so far aren't reusable.  It would be nice to have something a bit more reusable and cleaner.

With that in mind, let's tie all of these examples together into a simple client for the basemaps API that we can use to search for quads, list mosaics in a series, etc. This client is also provided in the `basemaps_client.py` file alongside this notebook. Let's load that file and inspect what it does:

In [None]:
%load basemaps_client.py

Using the Demo Client
--------------------------------

Now that we've created a client, we can use it to download data and/or retrieve information.  For example, let's repeat the previous example displaying quads inside an AOI for a specific mosaic:

In [None]:
# Assumes PL_API_KEY is set in environment variables. If not, pass it in.
client = BasemapsClient(api_key=None) 
mosaic = client.mosaic(name='global_monthly_2021_02_mosaic')
for quad in mosaic.quads(region=polygon):
    display(quad.info)
    print('\n')

More usefully, though, we can operate on series easily as well. For example, let's summarize coverage for that same AOI for the full series:

In [None]:
series = client.series(name='Global Monthly')
for mosaic in series.mosaics(start_date='2017-01-01', end_date='2018-01-01'):
    coverage = [quad.coverage for quad in mosaic.quads(region=polygon)]
    avg = sum(coverage) / (len(coverage) or 1)
    print(f'{mosaic.name} has {len(coverage)} quads in the AOI averaging {avg:0.1f}% coverage')

Note that the number of quads changes through time from 168 to 51.  That's because the tiling system changes through time. Therefore, it's important to always search for quads rather than trying to guess what quad IDs will be.  The same quad ID may be in a very different location for a different mosaic, as ID depends on the tiling system. If you do need to check if the tiling system for two mosaics is the same, check the `grid.quad_size`, `grid.resolution`, and `coordinate_system` metadata.

Contributing Scenes
------------------------------
Another common type of metadata we might want is the scenes that contributed to each mosaic quad. Let's get all contributing scenes for the AOI we've been working with for a single timestep. Note that adjacent quads will always use some of the same scenes, so it's important to de-duplicate contributing scenes when working with more than one quad.  We can do this by searching for quads and using the `items` endpoint for the quad, which returns contributing scenes in the form of Planet Data API URLs.  In our client, this is accessed via the `contribution` method of the `MosaicQuad`:

In [None]:
contributing_scenes = []
for quad in client.mosaic('global_monthly_2017_06_mosaic').quads(region=polygon):
    contributing_scenes += quad.contribution()
    
for item in sorted(set(contributing_scenes)):
    print(item)

Quad Download
-----------------------

If you have download access, you can also download quads for the series or for an individual mosaic. Let's use a bit of a smaller AOI for this example to avoid downloading large amounts of data:

In [None]:
bbox = -87, 35, -86.9, 35.1
# Note that we need to exhaust the iterator to actually download things!
for item in series.download_quads(bbox=bbox, start_date='2023-01-01', end_date='2024-01-01'):
    print(item)

As you can see, we downloaded quads overlapping the bounding box into directories based on the mosaic name.  Feel free to inspect the files/etc.  Then, let's clean those up:

In [None]:
import glob
import shutil

for dirname in glob.glob('global_monthly_*'):
    shutil.rmtree(dirname)