## Analyzing the agent filtering and masking strategy

In [1]:
import os
import debugpy

debugpy.listen(5678)

from l5kit.data import ChunkedDataset, LocalDataManager

In [2]:
os.environ["L5KIT_DATA_FOLDER"] = "/home/nisarkavungal_gmx_com/lyft-data"

In [3]:
dm = LocalDataManager()
train_zarr_path = dm.require("scenes/train.zarr")
train_zarr_path

'/home/nisarkavungal_gmx_com/lyft-data/scenes/train.zarr'

In [4]:
train_chunked_ds = ChunkedDataset(train_zarr_path)
train_chunked_ds = train_chunked_ds.open()

In [5]:
train_conf = {
'format_version': 4,
'model_params':{    # this params determines model arch and dataset input/output shapes and sizes.
    'model_architecture': 'resnet50',
    'history_num_frames': 10, # how many frames history you need when you get an item from the dataset
    'history_step_size': 1,
    'history_delta_time': 0.1,
    'future_num_frames': 50, # how many future frames should be, our target comes from this, we need to predict 50 frames to the future
    'future_step_size': 1,
    'future_delta_time': 0.1
    },
'raster_params': {
    'raster_size': [512, 512], # image size of the frames.
    'pixel_size': [0.5, 0.5], # each pixel corresponding how many meters on the map
    'ego_center': [0.25, 0.5], # adjust the location of the agent in question in the map, [0.5,0.5] - agent at the center
    'map_type': 'py_satellite', # satellite image or semantic image
    'satellite_map_key': 'aerial_map/aerial_map.png', # the full arial map of the region
    'semantic_map_key': 'semantic_map/semantic_map.pb', # full semantic map of the region
    'dataset_meta_key': 'meta.json',
    'filter_agents_threshold': 0.5 # in a frame the agents are labelled with a probabilities of their classes. here we say to ignore all agents with <0.5 prob
    },
'val_data_loader': {
    'key': 'scenes/sample.zarr',
    'batch_size': 12,
    'shuffle': False,
    'num_workers': 16
    }
}

In [6]:
from l5kit.rasterization import build_rasterizer

In [10]:
debugpy.breakpoint()
raster_sat = build_rasterizer(train_conf, dm)

In [8]:
from l5kit.dataset import AgentDataset

In [9]:
agent_ds = AgentDataset(train_conf, train_chunked_ds, raster_sat)

Digging inside the `get_valid_agents` method in `/home/nisarkavungal_gmx_com/l5kit/l5kit/l5kit/dataset/select_agents.py`

Lets assume we are going to filter the first scene. that is frame index 0-248

In [10]:
frames_range = (0, 248)
dataset = train_chunked_ds
th_agent_filter_probability_threshold = 0.5
th_yaw_degree = 30
th_extent_ratio = 1.1
th_distance_av = 50

Extracting the frames

In [11]:
frames = dataset.frames[slice(*frames_range)]

In [12]:
frames[0]

(1572643684801892606, [ 0, 38], [0, 0], [  680.61975098, -2183.32763672,   288.5411377 ], [[ 0.54673314, -0.83729434,  0.00459086], [ 0.83528739,  0.54502565, -0.07240184], [ 0.05811952,  0.04341917,  0.997365  ]])

In [13]:
agents_range_start = frames[0]["agent_index_interval"][0] # 1st frame, 1st value of the interval tuple
agents_range_start

0

In [14]:
agents_range_end = frames[-1]["agent_index_interval"][-1] # last frame, last value of the interval tuple
agents_range_end

21309

Extracting the agent details in the scene

In [15]:
agents = dataset.agents[agents_range_start: agents_range_end]
len(agents)

21309

In [16]:
frames["agent_index_interval"] -= agents_range_start # set index to start from 0 

#### Starting the filtering

In [17]:
from collections import defaultdict, Counter
import numpy as np

In [18]:
agents_dict = defaultdict(list)

In [19]:
# for every agent -> (available_past_frame, available_future_frame)
agents_mask = np.zeros((len(agents), 2), dtype=np.uint32)
agents_mask.shape

(21309, 2)

In [20]:
report = Counter()

##### First stage of filtering

Based on the `label_probabilites`

In [21]:
from l5kit.dataset.select_agents import _get_label_filter

In [22]:
of_interest = _get_label_filter(agents["label_probabilities"], th_agent_filter_probability_threshold)
of_interest

array([ True,  True,  True, ..., False, False, False])

The `_get_label_filter` basically filter out any agents whose detection probabilities are below the specified threshold.  

Here the threshold specified is 0.5. therefore for example,  

1st agent details is preserved

In [23]:
agents[0]["label_probabilities"]

array([0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
      dtype=float32)

Because the detected probability is above 0.5

last agent detail is masked 

In [24]:

agents[-1]["label_probabilities"]

array([0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
      dtype=float32)

> Because, by default l5kit cares about only certain # of agents. Here although a detected probability is 1, that type of agents is not relevant in l5kit

In [25]:
from l5kit.data import PERCEPTION_LABEL_TO_INDEX
PERCEPTION_LABEL_TO_INDEX

{'PERCEPTION_LABEL_NOT_SET': 0,
 'PERCEPTION_LABEL_UNKNOWN': 1,
 'PERCEPTION_LABEL_DONTCARE': 2,
 'PERCEPTION_LABEL_CAR': 3,
 'PERCEPTION_LABEL_VAN': 4,
 'PERCEPTION_LABEL_TRAM': 5,
 'PERCEPTION_LABEL_BUS': 6,
 'PERCEPTION_LABEL_TRUCK': 7,
 'PERCEPTION_LABEL_EMERGENCY_VEHICLE': 8,
 'PERCEPTION_LABEL_OTHER_VEHICLE': 9,
 'PERCEPTION_LABEL_BICYCLE': 10,
 'PERCEPTION_LABEL_MOTORCYCLE': 11,
 'PERCEPTION_LABEL_CYCLIST': 12,
 'PERCEPTION_LABEL_MOTORCYCLIST': 13,
 'PERCEPTION_LABEL_PEDESTRIAN': 14,
 'PERCEPTION_LABEL_ANIMAL': 15,
 'AVRESEARCH_LABEL_DONTCARE': 16}

These are the total number of label annotations

In [26]:
from l5kit.data.filter import PERCEPTION_LABELS_TO_KEEP
PERCEPTION_LABELS_TO_KEEP

['PERCEPTION_LABEL_CAR',
 'PERCEPTION_LABEL_VAN',
 'PERCEPTION_LABEL_TRAM',
 'PERCEPTION_LABEL_BUS',
 'PERCEPTION_LABEL_TRUCK',
 'PERCEPTION_LABEL_EMERGENCY_VEHICLE',
 'PERCEPTION_LABEL_OTHER_VEHICLE',
 'PERCEPTION_LABEL_BICYCLE',
 'PERCEPTION_LABEL_MOTORCYCLE',
 'PERCEPTION_LABEL_CYCLIST',
 'PERCEPTION_LABEL_MOTORCYCLIST',
 'PERCEPTION_LABEL_PEDESTRIAN',
 'PERCEPTION_LABEL_ANIMAL']

These are the valid l5kit agents. all other l5kit annotations are masked.

##### Second stage of filtering

Going frame by frame

In [27]:
frame_idx = 0

In [28]:
frame = frames[frame_idx]

In [29]:
agents_frame = agents[slice(*(frame["agent_index_interval"]))] # agents in the frame

In [31]:
len(agents_frame)

38

Going agent by agent in that frame

In [32]:
global_agent_index = -1

In [33]:
agent = agents_frame[0]

In [34]:
global_agent_index += 1

In [35]:
agents_dict[agent["track_id"]].append((frame_idx, global_agent_index, agent))

Since `track_id` is unique for an agent through out in a scene. this dictionary keep track of agent locations of a particular agent throughout the frames in the scene.

- First filter - Whether this agent is of_interest(calculated earlier based on the probs)

In [36]:
of_interest[global_agent_index]

True

- Second Filter - Is the agent at a certain distance from the AV or higher?, if higher discard

In [37]:
from l5kit.dataset.select_agents import in_av_distance

In [38]:
in_av_distance(frame["ego_translation"], agent["centroid"], 50) # 50 is the default distance threshold

True

The agent is within the distance threshold from AV, so keep.

Above two filters are direct filters. Looking just at the current frame. also known as `POINT-WISE FILTERS`

There are other filters that looks at two consecutive frames and decide to whether to mask this agent or not. also known as `COUPLE-WISE FILTERS`  

These filters include  

- Checking if the agent is there in consecutive frames