# Traffic Density Index: Usage Instructions and Documentation

## Introduction

This notebook documents the Traffic Density Index (TDI) analysis workflow. <p> The process uses medium spatial-resolution PlanetScope imagery to detect image pixels likely to contain vehicles. 
This work was heavily inspired by the following paper:<br> 
- Chen, Y., Qin, R., Zhang, G., & Albanwan, H. (2021). Spatial Temporal Analysis of Traffic Patterns during the 
COVID-19 Epidemic by Vehicle Detection Using Planet Remote-Sensing Satellite Images. Remote Sensing, 13(2), Article 2. 
<a href="https://doi.org/10.3390/rs13020208">https://doi.org/10.3390/rs13020208t</a>

The workflow is run using the following two notebooks found in the `notebooks/tdi` folder in this repository:
- [skywatch-api-notebook.ipynb](skywatch-api-notebook.ipynb)
- [tdi-notebook.ipynb](tdi-notebook.ipynb)

The `skywatch-api-notebook` is run first, and is used to query and download PlanetScope images for a desired area-of-interest (AOI) and time-of-interest (TOI).<p>

Once the images have been downloaded, the `tdi-notebook` is run to generate the TDI outputs.

## Python Environment Setup

Before doing anything, the Python environment needs to be set up. The recommended method to build the environment is to use Anaconda 
(use of <a href="https://mamba.readthedocs.io/en/latest/installation.html">Mamba</a> is strongly recommended as it is much faster at building the environment than regular Anaconda).<p>

To build the environment:
- Open a new terminal in the `notebooks` folder
- Using mamba, enter the following command into the terminal: 
    - `mamba env create --name wb-spk-notebook-env --file environment.yml`

To activate the new python environment, enter the following command into the terminal: 
- `mamba activate wb-spk-notebook-env`

## Notebook 1: skywatch-api-notebook

### Initial Setup

There is some initial work required by the user prior to running the notebook. This initial work includes:
- Retrieve an API key from the Skywatch Earthcache <a href="https://console.earthcache.com/account">website</a>.
- Create a polygon file which outlines the image download AOI.
    - This can be created in any standard desktop GIS program (i.e. QGIS/ArcGIS) and must be in a format readable by the <a href="https://geopandas.org/en/stable/docs/user_guide/io.html">geopandas</a> package (i.e. geopackage, shapefile, geojson).
    - The Earthcache API allows retrieved images to be cropped to complex geometry, and this is a good way to reduce image download costs by only selecting the areas needed. However, it is recommended to include some buffer around the areas of interest to avoid missing important data due to image georegistration shifts.
    - Note that for the Beitbridge study AOI, a geometry file has been created already, located [here](../../data/processed/Beitbridge_PS_AOI_Final.gpkg).

### Input Query Parameters

Open Jupyter Lab by typing `jupyter lab` into the terminal. Inside Jupyter Lab, navigate to and open `skywatch-api-notebook.ipynb`.<p>
At the top of the notebook (2nd code cell), specify the following variables:
- Put your Earthcache API key in the `api_key` variable.
- If desired, change the `out_path` variable to the desired output directory, or keep the default.
- Set the `aoi_file` variable to the path to the geometry file created in the [Initial Setup](#initial-setup) section.
- Set the `o_start_date` variable to the starting date for the image download query.
- Set the `o_end_date` variable to the ending date for the image download query.
- Set the `cc_thresh` variable to the desired cloud cover threshold to use for the query. Images with a cloud cover percentage exceeding this threshold will be omitted from the returned query results.
    - Note: prior to image download it is possible to use a more stringent cloud cover threshold. Therefore it is fine to use a higher value in `cc_thresh` for the initial query.
- Set the `coverage_thresh` variable to the desired aoi coverage percentage. Images which cover less of the AOI than the specified threshold will be omitted from the returned query results.
    - Note: at a later stage in the notebook it is possible to use a more stringent aoi coverage threshold. Therefore it is fine to use a higher value in `coverage_thresh` for the initial query.

### Stage 1: Search API

The first stage of the notebook queries the Earthcache API using the specified AOI and TOI and returns the search results.<p>

For larger time periods, the API may not return full results. To compensate for this, time intervals longer than 90 days are split into separate queries. At the end of the process, the separate queries are put back together.<p> 

The image results are stored in a [pandas](https://pandas.pydata.org/) dataframe. This dataframe is then serialized into a [pickle file](https://docs.python.org/3/library/pickle.html), allowing the results to be retrieved if the notebook is shut down and resumed later.

### Stage 2: Filter Query DataFrame

In the first stage, all image results which cover at least 50% of the AOI and are within the TOI are returned. This includes images with up to 100% cloud cover, and may include multiple images on a specific date.<p>

In stage 2, the results are filtered down based on the AOI coverage and cloud coverage thresholds specified in the `coverage_thresh` and `cc_thresh` variables.<p>

Finally, if multiple images are present on a given day, a selection function is applied to pick the "best" image for the day. The selection criteria is as follows:
- If any of the images have significantly lower AOI coverage than the others, they are removed.
- If there are still multiple images remaining, the program checks if any of the images are from the Dove-R or SuperDove satellites, as these are superior to the original Dove-Classic satellites. If this is the case, then any Dove-Classic scenes are removed.
- If there are STILL multiple images remaining, then the program simply takes the one with the highest AOI coverage.

The filtered dataframe is once again serialized into a pickle file, allowing the notebook to be shut down and resumed later.

### Stage 3: Graph Generation

In this stage, a number of interactive graphs may be generated to allow the user to explore and visualize the returned query results.<p>

For example, the below figure displays the number of images per year for a test AOI at different levels of cloud cover, along with the estimated cost to download all the images.

In [None]:
from IPython.display import HTML
from pathlib import Path
HTML(filename=Path("../../data/tdi_demo_files/search_queries/figures/PlanetScope_Annual_Image_Count_and_Cost_2018_to_2023_BeitBridge.html"))

While this stage is optional, it is helpful to understand the query results, and can be informative for the final image selection criteria to be used for image downloading.

### Stage 4: Download Imagery

The last stage of this notebook is used to download the images. It is possible to specify different download criteria for downloading images than used in the original query. For instance, a narrower date range can be specified, a more strict cloud cover or aoi coverage threshold can be used, or only select sensor types can be downloaded. Otherwise, the original query criteria can also be used for downloading the images.<p>

Once the final selection criteria is specified, the program displays the total number of images to download and the total cost as shown:

`Total number of images to download: 90. Total cost: $676.67 USD`

**Warning: Continuing with the image download in the next steps will incur the specified costs. There is no safety mechanism such as a confirmation window. Proceed with caution.**<p>

Images are downloaded using the API by setting up image download "pipelines". Each image will create a unique pipeline, which will show up in the Earthcache online dashboard. The pipeline kicks off image retrieval and processing, which is required before the images can be downloaded. Note that once a pipeline is created, you will be billed for the images, even if you do not download them locally.<p>

Image download proceeds as follows:
- Image download pipelines are created for each image in the filtered dataframe. The IDs for each pipeline are appended to the dataframe.
- The dataframe with the appended pipeline IDs is saved to a pickle file, allowing it to be retrieved in case the notebook session terminates.
- Each pipeline may take up to an hour to finish processing. The program will loop through the pipelines and check their status. If some pipelines are still processing, the program waits for up to 30 minutes then repeats the process. This loop repeats using progressively smaller intervals until all pipelines achieve a "completed" status.
- Once all image pipelines are finished processing, the next step downloads the images locally. Both the image file and associated metadata are downloaded and placed in a directory named after the image ID.

The first notebook is now finished, and you may move on to the second notebook to generate the TDI outputs.

## Notebook 2: tdi-notebook

### Initial Setup

As with Notebook 1, there is some initial work required prior to running Notebook 2.<p>

Firstly, if images were downloaded in multiple batches (i.e. multiple date ranges were used), then all downloaded images must be placed in a common directory.

Secondly, a geometry polygon file must be created which outlines the roads and parking areas within the AOI which are desired to 

### Setting Run Parameters

In Jupyter Lab, navigate to and open `tdi-notebook.ipynb`.<p>
At the top of the notebook (2nd code cell), specify the following variables:
- Set the `inpath` variable to the path of the directory containing the image files.<p>

### Stage 1: Align Images

### Stage 2: Computing Median Image

**Important Caveat:** As part of the TDI process, a reference image is created from the full image time series. This image calculates the median value for each pixel and each band from the entire time series. This median reference image is then compared with each individual image in the time series and used to calculate TDI. 

### Stage 3: Tophat Filtering, Vehicle Detection, & Computing TDI