<span style="color:#888888">Copyright (c) 2014-2021 National Technology and Engineering Solutions of Sandia, LLC. Under the terms of Contract DE-NA0003525 with National Technology and Engineering Solutions of Sandia, LLC, the U.S. Government retains certain rights in this software.     Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:</span>

<span style="color:#888888">1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.</span>

<span style="color:#888888">2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.</span>

<span style="color:#888888">THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</span>

# <span style="color:#0054a8">**Tutorial 6:**</span> <span style="color:#555555">Filtering Trajectories</span>

In [None]:
import tracktable.examples.tutorials.tutorial_helper as tutorial 

## Purpose

It is often desirable to remove unwanted trajectories from a dataset prior to beginning analysis.  Tracktable's `geomath` package includes a large suite of functions to support trajectory filtering, as well as general analysis.  We will demonstrate several possible filtering techniques here.

## Import Example Trajectories

The function below will assemble trajectories from our sample data file $^1$.  For details, please see Tutorials [1](Tutorial_01.ipynb) & [2](Tutorial_02.ipynb).

In [None]:
trajectories = tutorial.get_trajectory_list()

Let's print out some info about our trajectories.  How each of these values is calculated is explained in each corresponding filter below.

In [None]:
tutorial.print_statistics(trajectories, 'all')

## Filtering using Tracktable's `geomath`

### Filtering by Distance Traveled

Suppose we know that our dataset contains many short trajectories (say, less than 5km in length) that we wish to filter out.  This can be done using `geomath`'s `length` function.

In [None]:
length_threshold = 5 # km

In [None]:
from tracktable.core.geomath import length

trajectories_filtered_by_length = [trajectory for trajectory in trajectories if length(trajectory) > length_threshold]

How many are left?

In [None]:
len(trajectories_filtered_by_length)

What lengths do the remaining trajectories have?

In [None]:
tutorial.print_statistics(trajectories_filtered_by_length, 'length')

### Filtering by Straightness

In some datasets (such as air traffic) we can anticipate that many of our trajectories will be nearly straight, and we may wish to filter these out as "uninteresting" to make analysis faster on the remaining trajectories.

The function below will calculate the "straightness" of a trajectory.  This is done by comparing the distance between the trajectory's endpoints to the distance traveled along the trajectory.  If their ratio is 1, the trajectory traveled in a straight line.  As the ratio decreases, we consider the trajectory to be less straight.  A ratio of zero means the trajectory's origin and destination are the same, so it could not have traveled a straight line.

In [None]:
from tracktable.core.geomath import length, end_to_end_distance

def calculate_straightness(trajectory):
    
    # get the distance between endpoints of a trajectory
    end_to_end_dist = end_to_end_distance(trajectory)
    
    # get the distance traveled along the trajectory
    dist_traveled = length(trajectory)
    
    # if the trajectory doesn't move, it is not straight
    if dist_traveled == 0:
        return 0
    
    # measure how well the trajectory followed the straight path
    return end_to_end_dist / dist_traveled

Since straightness varies from 0 to 1, so must our threshold.  Increasing our threshold means we will discard few trajectories based on straightness.

In [None]:
straightness_threshold = 0.9

Let's discard all trajectories with straightness 0.9 or higher.

In [None]:
trajectories_filtered_by_straightness = [trajectory for trajectory in trajectories if calculate_straightness(trajectory) < straightness_threshold]

How many are left?

In [None]:
len(trajectories_filtered_by_straightness)

How straight are the remaining trajectories?

In [None]:
tutorial.print_statistics(trajectories_filtered_by_straightness, 'straightness')

### Filtering by Area Covered

The convex hull is the smallest convex polygon that encloses the entire trajectory (imagine each trajectory point to be a peg on a board, and we are stretching a rubber band around this set of pegs).  The area of this polygon can give us insight into the breadth of travel of a trajectory, and can be calculated using the `convex_hull_area` function of `geomath`.

For example, in some maritime datasets, we may wish to remove anchored boats from our dataset.  This can be done by filtering out trajectories with a small `convex_hull_area`, e.g. less than 0.2 km $^2$, as shown below.

In [None]:
convex_hull_area_threshold = 0.2 # square km

In [None]:
from tracktable.core.geomath import convex_hull_area

trajectories_filtered_by_area = [trajectory for trajectory in trajectories if convex_hull_area(trajectory) > convex_hull_area_threshold]

How many are left?

In [None]:
len(trajectories_filtered_by_area)

What are the convex hull areas of the remaining trajectories?

In [None]:
tutorial.print_statistics(trajectories_filtered_by_area, 'convex hull area')

### Filtering by Average Speed

We may wish to remove slow moving objects from our dataset (such as tugboats in maritime data).  We can use the `speed_between` function on the first and last trajectory points to filter slow trajectories, as shown below.

In [None]:
avg_speed_threshold = 1 # km/s

In [None]:
from tracktable.core.geomath import speed_between

trajectories_filtered_by_avg_speed = [trajectory for trajectory in trajectories if speed_between(trajectory[0], trajectory[-1]) > avg_speed_threshold]

How many are left?

In [None]:
len(trajectories_filtered_by_avg_speed)

What average speeds do the remaining trajectories have?

In [None]:
tutorial.print_statistics(trajectories_filtered_by_avg_speed, 'average speed')

### Filtering by Spatial Window

If we only want to keep trajectories within a given spatial window, we can filter as demonstrated below.  The algorithm below is a quick filtering algorithm that uses a trajectory's bounding box.

*Reminder:* Tracktable uses the ordering (longitude, latitude) to match the traditional Cartesian (x,y) ordering.

In [None]:
from tracktable.core.geomath import compute_bounding_box

def remove_trajectories_outside(trajectories, min_lon, min_lat, max_lon, max_lat, strictly_within=True):
    
    trajectories_in_window = []
    
    for trajectory in trajectories:
        
        # Get the bounding box of the current trajectory.
        bounding_box = compute_bounding_box(trajectory)
        
        # Check if the bottom left corner of the trajectory's bounding box is inside our window.
        min_corner_in_window = (min_lon < bounding_box.min_corner[0] < max_lon and min_lat < bounding_box.min_corner[1] < max_lat)
        
        # Check if the top right corner of the trajectory's bounding box is inside our window.
        max_corner_in_window = (min_lon < bounding_box.max_corner[0] < max_lon and min_lat < bounding_box.max_corner[1] < max_lat)
        
        # For a trajectory to be entirely within the box, both corners must be in the box.
        # For a trajectory to be somewhat within the box, at least one corner must be in the box.
        if ((strictly_within and min_corner_in_window and max_corner_in_window) 
            or ((not strictly_within) and (min_corner_in_window or max_corner_in_window))):
            
            trajectories_in_window.append(trajectory)
            
    return trajectories_in_window

In [None]:
trajectories_in_window = remove_trajectories_outside(trajectories, -74.1, 40.5, -73.9, 40.6, strictly_within=True) # change strictly_within to False to include trajectories that cross the boundary of our window

How many trajectories are strictly within the window with bottom left corner (-74.1, 40.5) and top right corner (-73.9, 40.6)?

In [None]:
len(trajectories_in_window)

## Other Filtering Techniques

### Trim Redundant Points

The storage footprints can be significantly reduced by removing redundant points, meaning sequential points with unchanged lat/long coordinates.  We can use Tracktable to create the function `trim_redundant_points` that will remove these points.

*Example:* If our trajectory consists of sequential points $p_0$, $p_1$, $p_2$, $p_3$, $p_4$, $p_5$, $p_6$, $p_7$, $p_8$, and points $p_1$, $p_2$, $p_3$, $p_4$, $p_6$ all occur in the exact same location, we can remove $p_2$ and $p_3$ without losing any spacio-temporal information.

In [None]:
from tracktable.domain.terrestrial import Trajectory

# Determines if the latitude and longitude of two points match.
def colocated(point1, point2):
    return point1[0] == point2[0] and point1[1] == point2[1]

# Determines if the timestamps of two points match.
def cotimed(point1, point2):
    return point1.timestamp == point2.timestamp

# Removes the middle points from any sequence of colocated points
def trim_redundant_points(trajectory):
    # Initialize our interval of points at the same location to include only the first point.
    first_point_at_location = trajectory[0]
    last_point_at_location = trajectory[0]
    
    trimmed_trajectory = Trajectory()
    
    for point in trajectory[1:]:
        if colocated(first_point_at_location, point):
            # Extend our interval of points at the same location to include this point,
            last_point_at_location = point
        else:
            # Keep the first point of the interval of points at the same location.
            trimmed_trajectory.append(first_point_at_location)
            # Check that the endpoints of our interval of points at the same location aren't indentically timed as well.
            if not cotimed(first_point_at_location, last_point_at_location):
                # Keep the last point of the interval of points at the same location.
                trimmed_trajectory.append(last_point_at_location)
            # Reinitialize the interval of points at the same location to include only the current point.
            first_point_at_location = point
            last_point_at_location = point
    
    
    if not cotimed(first_point_at_location, last_point_at_location):
        trimmed_trajectory.extend([first_point_at_location, last_point_at_location])

    return trimmed_trajectory

In [None]:
trajectories_with_redundant_points_removed = [trim_redundant_points(trajectory) for trajectory in trajectories]

We still have the same number of trajectories...

In [None]:
len(trajectories)

In [None]:
len(trajectories_with_redundant_points_removed)

... but we have reduced the number of trajectory points in our data:

In [None]:
tutorial.print_statistics(trajectories, 'total points')

In [None]:
tutorial.print_statistics(trajectories_with_redundant_points_removed, 'total points')

### Filtering by Time Window

We can remove all trajectories occurring outside of a given time range as follows:

In [None]:
from datetime import timedelta, timezone, datetime

# This format can be changed to match your data.
format = '%Y-%m-%d %H:%M:%S'

# Let's trim down to the first thirty minutes of the day.
begin = datetime.strptime('2020-06-30 00:00:00', format)
end = datetime.strptime('2020-06-30 00:30:00', format)

# Matching our begin/end timestamps to time zones of the Tracktable trajectories.
begin = begin.replace(tzinfo=timezone.utc)
end = end.replace(tzinfo=timezone.utc)

In [None]:
trajectories_in_first_thirty_minutes = [trajectory for trajectory in trajectories
                                        if trajectory[0].timestamp - begin >= timedelta(0) and end - trajectory[-1].timestamp >= timedelta(0)]

How many trajectories occurred only in the first thirty minutes of the day?

In [None]:
len(trajectories_in_first_thirty_minutes)

## Additional Functionality

Tracktable's `geomath` module contains many additional functions not shown here.  For a complete list, please see the [geomath documentation](https://tracktable.readthedocs.io/en/latest/api/python/tracktable.core.geomath.html?highlight=geomath#tracktable-core-geomath-module).

<span style="color:gray">$^1$ Bureau of Ocean Energy Management (BOEM) and National Oceanic and Atmospheric Administration (NOAA). MarineCadastre.gov. *AIS Data for 2020.* Retrieved February 2021 from [marinecadastre.gov/data](https://marinecadastre.gov/data/).  Trimmed down to the first hour of June 30, 2020, restricted to in NY Harbor.</span>