# Track ARs at consecutive time steps to form tracks

This notebook tracks detected ARs at individual time steps to form tracks.

Specifically, assume we have detected $n$ ARs at time $t$, and $m$ ARs at time $t+1$. There are theoretically $n \times m$ possible associations to link these two groups of ARs. Of cause not all of them are meaningful. The rules that are applied in the association process are:

1. **nearest neighbor** principle: for any AR at time $t$, the nearest AR at time $t+1$ "wins" and is associated with it, subject to that:
2. the **inter-AR distance (H)** is $\le 1200 \, km$.
3. no merging or splitting is allowed, any AR at time $t$ can only be linked to one AR at time $t+1$, similarly, any AR at time $t+1$ can only be linked to one AR at time $t$.
4. after all associations at any give time point have been created, any left-over AR at time $t+1$ forms a track on their own, and waits to be associated in the next iteration between $t+1$ and $t+2$.
5. all tracks that do not get updated during the $t$ - $t+1$ process terminates. This assumes that no gap in the track is allowed. 

The remaining important question is how to define that **inter-AR distance (H)**. Here we adopt a modified *Hausdorff distance* definition:

\begin{equation}
	H(A,B) \equiv min \{ h_f(A,B), h_b(A,B) \}
\end{equation}

where $H(A, B)$ is the *modified Hausdorff distance* from track $A$ to track $B$,
$h_f(A,B)$ is the *forward Hausdorff distance* from $A$ to $B$, and $h_b(A,B)$ the *backward Hausdorff distance* from $A$ to $B$. They are defined, respectively, as:

\begin{equation}
	h_f(A, B) \equiv \operatorname*{max}_{a \in A} \{ \operatorname*{min}_{b \in B} \{
		d_g(a,b) \} \}
\end{equation}


namely, the largest great circle distance of all distances from a point in $A$ to the
closest point in $B$. And the backward Hausdorff distance is:

\begin{equation}
	h_b(A, B) \equiv \operatorname*{max}_{b \in B} \{ \operatorname*{min}_{a \in A} \{
		d_g(a,b) \} \}
\end{equation}

Note that in general $h_f \neq h_b$. Unlike the standard definition of
Hausdorff distance that takes the maximum of $h_f$ and $h_b$, we take the
minimum of the two. 

The rationale behind this modification is that merging/splitting of ARs mostly
happen in an end-to-end manner, during which a sudden increase/decrease in the
length of the AR induce misalignment among the anchor points. Specifically,
merging (splitting) tends to induce large backward (forward) Hausdorff
distance.  Therefore $min \{ h_f(A,B), h_b(A,B) \}$ offers a more faithful
description of the spatial closeness of ARs. For merging/splitting events in a
side-to-side manner, this definition works just as well.



In production you can use the `scripts/trace_ARs.py` for this step.


## Input data

* `ar_records.csv`: a csv table saving various attributes for each detected AR appearance at all time steps. This is the output from the previous step.


## Steps

1. Make sure you have successfully run the previous notebook.
2. Execute the following code blocks in sequence.


## Results

* `ar_tracks_1984.csv`: a csv table listing various attributes for each AR track.
* `plots/ar_track_198405.png` (optional): plot of the geographical locations of the track with id `198405` during its evolutions.
* `plots/linkages_scheme_simple_YYYY-MM-DD_HH-00-00.png` (optional): schematic illustration of the association process using the modified Hausdorff distance as the inter-AR distance measure between the time step of `YYYY-MM-DD_HH-00-00` and the one before it.


# Set parameters

As before, first we give the locations to the input and output data using `RECORD_FILE` and `OUTPUTDIR`.

`SCHEMATIC` is boolean flag controls whether a schematic illustration of the track association process is created.

`LAT1`, `LAT2`, `LON1` and `LON2` control the domain to plot results.

In [None]:
%matplotlib inline
import os

RECORD_FILE=os.path.join('1984', 'ar_records.csv')
OUTPUTDIR=os.path.join('.', '1984')

SCHEMATIC=True   # plot schematic or not

LAT1=0; LAT2=90; LON1=80; LON2=440         # domain to plot

Below is the important parameters used in the tracking process.

* `TIME_GAP_ALLOW`: int, hours, the temporal gap allowed to link 2 records. For instance, if input data has 6-hourly resolution, and `TIME_GAP_ALLOW` is set to 6, then only consecutive records are allowed to be linked. If `TIME_GAP_ALLOW` is 12, then a single gap in time can be allowed.
* `TRACK_SCHEME`: 'simple' or 'full'. If 'simple', each track is a topological simple path, i.e. no merging or splitting is allowed. If 'full', merging and splitting are allowed. For most applications 'simple' makes good sense. 'full' scheme is useful for case studies, e.g. you are interested how 2 particular ARs are merging/splitting.
* `MAX_DIST_ALLOW`: float, $km$, maximum Hausdorff distance to link 2 ARs. About $\sim 1000 \, km$ is a good choice for 6-hourly data, and this does not seem to be a sensitive parameter. If using daily data, you should probably choose a larger number.
* `MIN_DURATION`: int, minimum required number of hours of a track.
* `MIN_NONRELAX`: int, minimum required number of non-relaxed records in a track. 

In [None]:
# Int, hours, gap allowed to link 2 records. Should be the time resolution of
# the data.
TIME_GAP_ALLOW=6

# tracking scheme. 'simple': all tracks are simple paths.
# 'full': use the network scheme, tracks are connected by their joint points.
TRACK_SCHEME='simple'  # 'simple' | 'full'

# int, max Hausdorff distance in km to define a neighborhood relationship
MAX_DIST_ALLOW=1200  # km

# int, min duration in hrs to keep a track.
MIN_DURATION=24

# int, min number of non-relaxed records in a track to keep a track.
MIN_NONRELAX=1

Import modules

In [None]:
#--------Import modules-------------------------
import os, sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from ipart.AR_tracer import readCSVRecord, trackARs, filterTracks, plotAR

Then read in the data from the previous step -- the `csv` table containing AR records at individual time points.

Also make sure the output folder exists.

In [None]:
print('\n# Read in file:\n', 'ar_records.csv')
ardf=readCSVRecord(RECORD_FILE)

if not os.path.exists(OUTPUTDIR):
    os.makedirs(OUTPUTDIR)

if SCHEMATIC:
    plot_dir=os.path.join(OUTPUTDIR, 'plots')
    if not os.path.exists(plot_dir):
        os.makedirs(plot_dir)

The tracking process is handled with this single function `trackARs()`.

* `ardf` is the `pandas.DataFrame` object we just read in.
* `track_list` is a list of `AR` objects, the class definition can be found in `ipart.AR_tracer.py`.

In [None]:
track_list=trackARs(ardf, TIME_GAP_ALLOW, MAX_DIST_ALLOW,
        track_scheme=TRACK_SCHEME, isplot=SCHEMATIC, plot_dir=plot_dir)

We can have a peak into what `track_list` contains:

In [None]:
print('Number of AR tracks = ', len(track_list))
print(track_list[0])

In [None]:
track_list[0].data

In [None]:
track_list[0].duration

In [None]:
track_list[7].data

In [None]:
track_list[6].duration

Each `AR` object in `track_list` stores a sequence of AR records that form a single track. The 1st one, `track_list[0]` is a short track with only 1 record. This one will be filtered out given a minimum duration requirement of 24 hours.

The 7th one, `track_list[6]`, lasted for 36 hours.

To filter out short tracks, and those that consist only of *relaxed* AR records: 

In [None]:
#------------------Filter tracks------------------
track_list=filterTracks(track_list, MIN_DURATION, MIN_NONRELAX)
print(len(track_list))

Note that now the number of tracks has dropped to 9.

Lets plot out the sequence of an AR. Only the AR axis is plotted, and a black-to-yellow color scheme is used to indicate the evolution of the AR.

In [None]:
latax=np.arange(LAT1, LAT2)
lonax=np.arange(LON1, LON2)

plot_ar=track_list[7]

figure=plt.figure(figsize=(12,6),dpi=100)
ax=figure.add_subplot(111)
plotAR(plot_ar,latax,lonax,True,ax=ax)

## As the last step, save the results to disk.

In [None]:
#-------------------Save output-------------------
for ii in range(len(track_list)):
    tii=track_list[ii]
    trackidii='%d%d' %(tii.data.loc[0,'time'].year, ii+1)
    tii.data.loc[:,'trackid']=trackidii
    tii.trackid=trackidii

    if ii==0:
        trackdf=tii.data
    else:
        trackdf=pd.concat([trackdf,tii.data],ignore_index=True)

    figure=plt.figure(figsize=(12,6),dpi=100)
    ax=figure.add_subplot(111)
    plotAR(tii,latax,lonax,True,ax=ax)

    #----------------- Save plot------------
    plot_save_name='ar_track_%s' %trackidii
    plot_save_name=os.path.join(plot_dir,plot_save_name)
    print('\n# <river_tracker2>: Save figure to', plot_save_name)
    figure.savefig(plot_save_name+'.png',dpi=100,bbox_inches='tight')

    plt.close(figure)

#--------Save------------------------------------
abpath_out=os.path.join(OUTPUTDIR,'ar_tracks_1984.csv')
print('\n# Saving output to:\n',abpath_out)
if sys.version_info.major==2:
    np.set_printoptions(threshold=np.inf)
elif sys.version_info.major==3:
    np.set_printoptions(threshold=sys.maxsize)
trackdf.to_csv(abpath_out,index=False)