###  Planet Analytics API Tutorial

# Summary Statistics: Ships

## Overview
    
1. [Introduction](#1.-Introduction)
2. [Post a stats job request](#2.-Post-a-stats-job-request)
3. [Poll the stats job endpoint](#3.-Poll-the-stats-job-endpoint)
4. [Get the job report results](#4.-Get-the-job-report-results)
5. [Restructure the results into a pandas dataframe](#5.-Restructure-the-results-into-a-pandas-dataframe)
6. [Visualize the time series](#6.-Visualize-the-time-series)
7. [Normalize and clean the report data](#7.-Normalize-and-clean-the-report-data)


## 1. Introduction

This notebook demonstrates how to request road summary statistics for a subscription using the Anaytics Feeds Stats API and visualize them as time series, enabling further analyses including patterns of life, development trends and anomaly detection. Access to an object detection subscription (ships or planes) is required to run the notebook. 

The workflow involves:
- Posting a stats job request
- Polling the job stats endpoint
- Getting the job report results
- Restructuring the results into a pandas dataframe
- Normalizing and cleaning the report data
- Visualizing the time series

#### Import and install external dependencies
This notebook requires hvplot, which may not be available in the main notebook docker image.

In [None]:
!pip install hvplot

In [None]:
import os
import requests
import json
import pprint
import time
import pandas as pd
import holoviews as hv
import hvplot.pandas
from bokeh.models.formatters import DatetimeTickFormatter
from collections import defaultdict

## 2. Post a stats job request

### a) Check API Connection
_**Note:** If you do not have access to the Analytics Feeds API, you may not be able to run through these examples. Contact [Sales](go.planet.com/getintouch) to learn more._

In [None]:
ANALYTICS_BASE_URL = 'https://api.planet.com/analytics/'
# change this line if your API key is not set as an env var
API_KEY = os.environ['PL_API_KEY']
# alternatively, you can just set your API key directly as a string variable:
# API_KEY = "YOUR_PLANET_API_KEY_HERE"
# set up a reusable session with required headers
session = requests.Session()
session.headers.update({'content-type':'application/json','Authorization': 'api-key ' + API_KEY})
# make a request to the analytics api
resp = session.get(ANALYTICS_BASE_URL)
if resp.ok:
    print("Yay, you are able to connect to the Planet Analytics API!")
else:
    print("Something is wrong:", resp.content)


### b) Select your subscription
The analytics stats API enables you to create summary stats reports for your analytics subscriptions. You will need the id of a subscription of interest in order to make a stats request. This notebook uses the Singapore Strait ships subscription by default (f3aef23c-a540-458e-a3b5-979b7920d2ea)

In [None]:
# Make sure you have access to the subscription
subscription_id = 'f3aef23c-a540-458e-a3b5-979b7920d2ea'
resp = session.get(f"{ANALYTICS_BASE_URL}subscriptions/{subscription_id}")
if not resp.ok:
    raise Exception('Bad response:', resp.content)
else:
    print("Subscription info:")
    print(resp.json())

### d) Post a stats report job request to the AF API

In [None]:
request_body = {
    "title": "Stats Demo - Ships",
    "subscriptionID": subscription_id,
    "interval": "day",  # most object detection feeds generate results on a daily cadence
#     "collection": collection,  # remove this line if you want to use the default subscription geometry
#     "startTime": start_time,  # remove this line if you want to use the default subscription startTime
#     "endTime": end_time  # remove this line if you want to use the default subscription endTime
}

stats_post_url = ANALYTICS_BASE_URL + 'stats'

job_post_resp = session.post(
    stats_post_url, 
    data=json.dumps(request_body)
)

pprint.pprint(job_post_resp.json())

## 3. Poll the stats job endpoint

In [None]:
job_link = job_post_resp.json()['links'][0]['href']
status = "pending"
while status != "completed":
    report_status_resp = session.get(
        job_link,
    )
    status = report_status_resp.json()['status']
    print(status)
    time.sleep(2)
    
    
pprint.pprint(report_status_resp.json())

## 4. Get the job report results

In [None]:
report_results_link = report_status_resp.json()['links'][-1]['href']
report_results_link

In [None]:
results_resp = session.get(
    report_results_link,
)
print(results_resp.status_code)

## 5. Restructure the results into a pandas dataframe

In [None]:
def restructure_results(results_json):
    cols = results_json['cols']
    rows = results_json['rows']
    
    records = []
    for r in rows:
        rec = defaultdict()
        for i, cell in enumerate(r):
            rec[cols[i]['label']] = cell
        records.append(rec)
        
    df = pd.DataFrame.from_records(records)
    df['Start Time'] = pd.to_datetime(df['Start Time'])
    df = df.set_index('Start Time')
    return df

In [None]:
df = restructure_results(results_resp.json())
df.head()

## 6. Visualize the time series

In [None]:
hv.extension('bokeh')
formatter = DatetimeTickFormatter(months='%b %Y')

In [None]:
df['Total Object Count'].hvplot().options(xformatter=formatter, width=800)

## 7. Normalize and clean the report data

The graph above is likely very noisy due to clouds, haze, and a variation in the amount of imagery per day. The steps below normalize the object count by the estimated area of usable imagery that the model observed. Planet currently provides two versions of an unusable data mask (UDM) for most scenes. Udm (version 1) is less accurate but is available for every scene. Udm2 is more accurate but is sometimes unavailable. The steps below use udm2 to estimate the percentage of pixels that are usable (i.e. not cloudy), and the original udm to estimate the total imaged area per day.

In [None]:
pd.set_option('precision', 15)

# Get the total area of the subscription or submitted feature (sq m)
submitted_area = df['Submitted Area'][0]

### a) Remove time points that contain < 50% clear imagery
On cloudy days results are less likely to be accurate.

In [None]:
df['Clear Percentage'] = df['Clear Area (udm2_band_1)'] / df['Total Area (udm2)']
df = df[df['Clear Percentage'] > 0.5]

### b) Remove time points where imagery coverage is < 50%
If only a small section of the AOI contains imagery, inferring the object count for the whole AOI is less accurate.

In [None]:
df['Imagery Coverage'] = df['Total Area (udm2)'] / submitted_area
df = df[df['Imagery Coverage'] > 0.5]

### c) Estimate usable area per time point
Models can often detect objects through light haze and sometimes through heavy haze, so we use that rough information to create an estimated "usable percentage" metric. You can adjust the parameters if you know the model your using performs better or worse in haze.

In [None]:
# Count 100% of light haze area as usable
light_haze_weight = 1.0
# Count 50% of heavy haze area as usable
heavy_haze_weight = 0.5

# Create a column that estimates the percentage of imagery where the model is expected to perform.
df['Usable Percentage'] = (df['Clear Area (udm2_band_1)'] + (df['Light Haze Area (udm2_band_4)'] * light_haze_weight) + (df['Heavy Haze Area (udm2_band_5)'] * heavy_haze_weight)) / df['Total Area (udm2)']
# Create a column that estimates usable area. In some cases udm2 assets are missing, so the most accurate measurement of total area that the model has seen comes from the udm Total Area column.
df['Usable Area'] = df['Usable Percentage'] * df['Total Area (udm)']

### d) Normalize the object count
Create a normalized object count by getting the object count per usable square meter and multiplying by the total aoi size.

In [None]:
df['Normalized Count'] = round((df['Total Object Count'] / df['Usable Area']) * submitted_area).astype(int)

### e) Vizualize the normalized data

In [None]:
max_count = df['Normalized Count'].max()
df['Normalized Count'].hvplot().options(xformatter=formatter, width=800, ylim=(0,max_count + (max_count * .1)))

If you're using the demo Singapore Strait ships subscription the graph above should appear roughly flat, meaning that no major changes in counts of ships were found in the subscription. 