<div align="center"; span style="color:#336699"><b><h2>pyForTraCC - Radar Data Track Exemple</h2></b></div>
<hr style="border:2px solid #0077b9;">
<br/>
<div style="text-align: center;font-size: 90%;">
   <sup><a href="https://www.linkedin.com/in/helvecio-leal/"> Helvécio B. Leal Neto, <i class="fab fa-lg fa-orcid" style="color: #a6ce39"></i></a></sup><t>&nbsp;</t> 
    <sup><a href="https://www.linkedin.com/in/alan-calheiros-64a252160/">Alan J. P. Calheiros<i class="fab fa-lg fa-orcid" style="color** #a6ce39"></i></a></sup>
   <br/><br/>
    National Institute for Space Research (INPE)
    <br/>
    Avenida dos Astronautas, 1758, Jardim da Granja, São José dos Campos, SP 12227-010, Brazil
    <br/><br/>
    Contact: <a href="mailto:helvecio.neto@inpe.br">helvecio.neto@inpe.br</a>, <a href="mailto:alan.calheiros@inpe.br">alan.calheiros@inpe.br</a>
    <br/><br/>
    Last Update: Feb 12, 2025
</div>
<br/>
<div style="text-align: justify;  margin-left: 25%; margin-right: 25%;">
<b>Abstract.</b> This Jupyter notebook demonstrates the track process for a Radar Data and show some utilities of the pyForTraCC algorithm.
</div>    
<br/>
<div style="text-align: justify;  margin-left: 15%; margin-right: 15%;font-size: 75%; border-style: solid; border-color: #0077b9; border-width: 1px; padding: 5px;">
    <b>This notebook is part of the <a href="https://github.com/fortracc/pyfortracc">pyfortracc</a> examples gallery</b>
    <div style="margin-left: 10px; margin-right: 10px; margin-top:10px">
      <p> Leal Neto, H.B.; Calheiros, A.J.P.;  pyForTraCC Algorithm. São José dos Campos, INPE, 2024. <a href="https://github.com/fortracc-project/pyfortracc" target="_blank"> Online </a>. </p>
    </div>
</div>

### Schedule

 [1. Installation](#install)<br>
 [2. Radar Input Data](#data)<br>
 [3. Read Function](#read)<br>
 [4. Tracking Parameters](#parameters)<br>
 [5. Tracking Routine](#track)<br>
 [6. Tracking Table](#tracktable)<br>
 [7. Spatial Conversions](#visualization)<br>
 [8. Explore Results](#results)

<a id='install'></a>
#### 1. Installation

Installing the pyFortraCC package can be done using the pip install command. 

All dependencies will be installed in the current Python environment and the code will be ready to use.

In [None]:
# Install latest version of pyfortracc
!python -m pip install -q -U pyfortracc

In [None]:
# Import libraries and check pyfortracc version
import pyfortracc
print(f'pyfortracc version: {pyfortracc.__version__}')

<a id='data'></a>
#### 1. Radar Input Data

The data in this example corresponds to a small sample of scans from the S-Band Radar located in the city of Manaus-AM Brazil.<br>
 Data processed and published by Schumacher, Courtney and Funk, Aaron (2018) were separated, <br>
 and are available in full on the ARM platform https://www.arm.gov/research/campaigns/amf2014goamazon.<br>
 This data is part of the GoAmazon2014/5 project and is named "Three-dimensional Gridded S-band Reflectivity and Radial Velocity<br>
 from the SIPAM Manaus S-band Radar dataset".<br>
https://doi.org/10.5439/1459573


In [None]:
# Download the example input files
!python -m pip install -q -U gdown

import gdown, zipfile, os, shutil

# Remove the existing input files
shutil.rmtree('input', ignore_errors=True)

# Download the input files
url = 'https://drive.google.com/uc?id=1UVVsLCNnsmk7_wOzVrv4H7WHW0sz8spg'
gdown.download(url, 'input.zip', quiet=False)
with zipfile.ZipFile('input.zip', 'r') as zip_ref:
    for member in zip_ref.namelist():
        zip_ref.extract(member)
os.remove('input.zip')

<a id='read'></a>
#### 3. Read Function

The downloaded data is compressed with the .gz extension, however, it is of the netCDF4 type. The variable that represents reflectivity is DBZc. This data also has multiple elevations, and we arbitrarily chose elevation 5, which corresponds to the volumetric scan level at 2.5 km height. We extract the data and apply a NaN value to the data mask -9999.

In [None]:
# Define the Read function
import gzip
import netCDF4
import numpy as np

def read_function(path):
    variable = "DBZc"
    z_level = 5 # Elevation level 2.5 km
    with gzip.open(path) as gz:
        with netCDF4.Dataset("dummy", mode="r", memory=gz.read()) as nc:
            data = nc.variables[variable][:].data[0,z_level, :, :][::-1, :]
            data[data == -9999] = np.nan
    gz.close()
    return data

To visualize the data, we use an function from pyFortraCC that reads the data and create an animation of the radar scan.

In [None]:
from pyfortracc import plot_animation

In [None]:
# Visualize the data using the plot_animation function
plot_animation(path_files='input/*.gz', # Path to the files
                          read_function=read_function, # Read function
                          num_frames=10, min_val=0, max_val=60, # Number of frames and maximum value
                          cbar_title='dBZ') # Colorbar title

<a id='parameters'></a>
#### 4. Tracking Parameters (Name List)

For this example, we will track reflectivity clusters at multiple thresholds and sizes <br>Arbitrarily selecting thresholds of 20, 30 and 35 dBZ with clusters of 5,4 and 3 pixels <br>of minimum size. The segmentation operator will be >=, that is, the clusters will be <br>segmented in order of greatest equal for each threshold and the delta of the observations <br>is 12 minutes.<br>

*   ***Clustering Method***: The algorithm have two clustering methods, the first is the ndimage.label method, and the second is the ***DBSCAN*** method. The ndimage.label method is a simple method that labels the clusters in the image, while the DBSCAN method is a more complex method that uses the DBSCAN algorithm to find contiguous pixels connected by a spatial distance (epsilon). The DBSCAN method is more robust and can be used to track clusters in a more complex way. For this example, we will use the DBSCAN method with epsilon = 3 pixels of spatial distance to find the clusters.

*   ***Vector Correction Methods***: Several factors can modify rain cell movement when analyzing their trajectory. One of these factors that can affect a reasonable estimate of displacement trajectory is the use of a centroid as a target. This problem is associated with the shape of tracked objects and processes of mergers and splits that may occur during the development of precipitating systems, These problem are covered in the work https://doi.org/10.3390/rs14215408. For this example, we will use the Split, Merge, Inner Cores, Optical Flow and Ellipse methods find the best vector for the tracking.

In [None]:
# Example Name list dictionary of mandatory parameters
name_list = {}
name_list['input_path'] = 'input/' # path to the input data
name_list['output_path'] = 'output/' # path to the output data
name_list['timestamp_pattern'] = 'sbmn_cappi_%Y%m%d_%H%M.nc.gz' # timestamp file pattern
name_list['thresholds'] = [20, 30, 35] # in dbz
name_list['min_cluster_size'] = [5,4,3] # in number of points per cluster
name_list['operator'] = '>=' # '>= *   **<=' or '=='
name_list['delta_time'] = 12 # in minutes
name_list['min_overlap'] = 20 # Minimum overlap between clusters in percentage

# Clustering method
name_list['cluster_method'] = 'dbscan' # DBSCAN Clustering method
name_list['eps'] = 3 # in pixels

# Vector correction methods
name_list['spl_correction'] = True # Perform the Splitting events
name_list['mrg_correction'] = True # Perform the Merging events
name_list['inc_correction'] = True # Perform the Inner Core vectors
name_list['opt_correction'] = True # Perform the Optical Flow method (New Method)
name_list['elp_correction'] = True # Perform the Ellipse method (New Method)
name_list['validation'] = True # Perform the validation of the best correction between times (t-1 and t)

# Optional parameters, if not set, the algorithm will not use geospatial information
name_list['lon_min'] = -62.1475 # Min longitude of data in degrees
name_list['lon_max'] = -57.8461 # Max longitude of data in degrees
name_list['lat_min'] = -5.3048 # Min latitude of data in degrees
name_list['lat_max'] = -0.9912 # Max latitude of data in degrees

In [None]:
# Import the pyfortracc module
import pyfortracc as pf

# Track the clusters
pf.track(name_list, read_function, parallel=False)

<a id='trackingtable'></a>
#### 5. Tracking Table

The output of concatenate is a entity called tracking table. The tracking table is the generalized output  located in the output directory of name_list['output'] ('output_path/trackingtable'). The information obtained in the tracking process is stored in a tabular format, and is organized according to the tracking time. Listed below are the names of the columns (output variables) and what they represent.

*   Each row of tracking table is related to a cluster at its corresponding threshold level. 
*   The Tracking table structure provides a comprehensive view of grouped entities, facilitating analysis and understanding of patterns across different threshold levels.

Tracking table columns:

*   **timestamp** (datetime64[us]): Temporal information of cluster.
*   **uid** (float64): Unique idetifier of cluster.
*   **iuid** (float64): Internal Unique idetifier of cluster.
*   **threshold_level** (int64): Level of threshold.
*   **threshold** (float64): Specific threshold.
*   **status** (object): Entity status (NEW, CONTINUOUS, SPLIT, MERGE, SPLIT/MERGE)
*   **u_, v_** (float64): Vector components.
*   **inside_clusters** (object): Number of inside clusters.
*   **size** (int64): Cluster size in pixels.
*   **min, mean, max, std** (float64): Descriptive statistics.
*   **delta_time** (timedelta64[us]): Temporal variation.
*   **file** (object): Associated file name.
*   **array_y, array_x** (object): Cluster array coordinates.
*   **vector_field** (object): Associated vector field.
*   **trajectory** (object): Cluster's trajectory.
*   **geometry** (object):  Boundary geometric representation of the cluster.
*   **lifetime** (int64): Cluster lifespan in minutes.
*   **vector_field** The vector field of the cluster (MultiLineString).
*   **expansion** The expansion rate between the clusters of consecutive times.
*   **u_spl** The u component of the cluster by Split Correction.
*   **v_spl** The v component of the cluster by Split Correction.
*   **u_mrg** The u component of the cluster by Merge Correction.
*   **v_mrg** The v component of the cluster by Merge Correction.
*   **u_inc** The u component of the cluster by Inner Cores Correction.
*   **v_inc** The v component of the cluster by Inner Cores Correction.
*   **u_opt** The u component of the cluster by Optical Flow Correction.
*   **v_opt** The v component of the cluster by Optical Flow Correction.
*   **u_elp** The u component of the cluster by Elliptical Correction.
*   **v_elp** The v component of the cluster by Elliptical Correction.
*   **u_noc** The u component of the cluster by No Correction.
*   **v_noc** The v component of the cluster by No Correction.
*   **far** The False Alarm Rate of method, if the validation is True into a name_list.
*   **method** The best method of correction, if the validation is True into a name_list.

One simple way to visualize the tracking table is to use the duckdb library to create a database and query the table. The duckdb library is a simple and efficient library for creating databases in Python. DuckDB uses the SQL language to query the database, and the pandas library to create the dataframes.

In [None]:
import duckdb

# Connect to the database
con = duckdb.connect(database=':memory:', read_only=False)

# Read and filter the data from track table
tracking_table = con.execute(f"""SELECT * 
                             FROM parquet_scan('output/track/trackingtable/*.parquet',
                             union_by_name=True)
                             """).fetch_df()
# Display the tracking table
display(tracking_table.tail())

<a id='convect'></a>
### 6. Convert Tracking Table to GeoSpatial Data

pyForTraCC has a utility function that converts the tracking table to a GeoSpatial data format. These functions are useful for creating shapefiles, GeoJSON, and other spatial data formats. The converted data can be visualized in a GIS software like QGIS or ArcGIS. The output files are saved in the output directory of name_list['output'].

In [None]:
# Import the pyfortracc.spatial_conversions module
from pyfortracc.spatial_conversions import boundaries, trajectories, vectorfield, clusters

The code below shows how to convert the tracking table cluster boundaries, trajectories and vector field to a GeoSpatial data format. To check the converted data, you can view into the output directory of `name_list['output']`.

In [None]:
print('The output path is:', name_list['output_path'] + 'geometry/')

# Get the boundaries of tracked clusters
boundaries(name_list=name_list,
           start_time=str(tracking_table.timestamp.min()),
           end_time=str(tracking_table.timestamp.max()),
           driver='GeoJSON')

# Get the trajectories of tracked clusters
trajectories(name_list=name_list,
             start_time=str(tracking_table.timestamp.min()),
             end_time=str(tracking_table.timestamp.max()),
             driver='GeoJSON')

# Get the vector field of tracked clusters
vectorfield(name_list=name_list,
             start_time=str(tracking_table.timestamp.min()),
             end_time=str(tracking_table.timestamp.max()),
             driver='GeoJSON')

The tracked cluster can be converted to netCDF format using the function `clusters`. The function converts the clusters from tracking table to a netCDF file. Each individual cluster is saved into with the cluster's UID content into the Band variable. The output file is saved in the output directory of `name_list['output']`.

In [None]:
clusters(name_list=name_list,
         read_function=read_function,
         start_time=str(tracking_table.timestamp.min()),
         end_time=str(tracking_table.timestamp.max()))

<a id='results'></a>
#### 7. Explore the results in tracking table

To explore the results of the tracking process, we can use the tracking table. For this example, we will find the with a max lifetime and explore the results showing the track process using animation.

In [None]:
# Get two maxlifetime clusters from the track_table
maxlifetime = 2
max_lifetimes = tracking_table.groupby('uid').size().nlargest(maxlifetime).index.values
max_clusters = tracking_table[tracking_table['uid'].isin(max_lifetimes)]
print('The clusters with the highest lifetime are the uids: {}'.format(max_lifetimes))

Visualize the maxlifetime system in the tracking table.

In [None]:
# Visualize as animation. (obs: if run in colab, the animation could be fail sometimes, run again to fix)
plot_animation(read_function=read_function, # Read function
                          name_list=name_list, # Name list dictionary
                          uid_list=max_lifetimes.tolist(), # List of uids
                          start_timestamp = max_clusters['timestamp'].min().strftime('%Y-%m-%d %H:%M:%S'), 
                          end_timestamp= max_clusters['timestamp'].max().strftime('%Y-%m-%d %H:%M:%S'),
                          cbar_title='dBZ', # Colorbar title
                          threshold_list=[20], # Threshold list
                          trajectory=True, # Plot the trajectory
                          max_val=60, # Maximum value for the colorbar
                          min_val=20, # Minimum value for the colorbar
                          info_cols=['uid','method','far'], # Information columns from the tracking table
                          background='google', # Background type: 'default', 'stock', satellite' or 'google'
                          traj_color='red', # Trajectory color
                          parallel=False
                          )