This notebook serves as an introduction to the new functionality added to the nuScenes devkit for the prediction challenge.

It is organized into the following sections:

1. Data splis for the challenge
2. Getting past and future data for an agent 
3. Changes to the Map API
4. Overview of Input Representation
5. Baseline Model Implementations

In [1]:
from nuscenes import NuScenes

In [2]:
nuscenes = NuScenes('v1.0-mini', dataroot='/home/freddyboulton/prediction_nas/sets/nuscenes/')

Loading NuScenes tables for version v1.0-mini...
23 category,
8 attribute,
4 visibility,
911 instance,
12 sensor,
120 calibrated_sensor,
31206 ego_pose,
8 log,
10 scene,
404 sample,
31206 sample_data,
18538 sample_annotation,
4 map,
Done loading in 2.5 seconds.
Reverse indexing ...
Done reverse indexing in 0.1 seconds.


## 1. Data Splits for the Prediction Challenge

This section assumes basic familiarity with the nuScenes [schema](https://www.nuscenes.org/data-format?externalData=all&mapData=all&modalities=Any).

The goal of the NuScenes prediction challenge is to predict the future location of agents in the nuScenes dataset. Agents are indexed by an instance token and a sample annotation. To get a list of agents in the train and val split of the challenge, we provide a function called `get_prediction_challenge_split`.

In [3]:
from nuscenes.eval.predict.splits import get_prediction_challenge_split

In [4]:
mini_train = get_prediction_challenge_split("mini_train")

The `get_prediction_challenge_split` function returns a list of strings of the form `{instance_token}_{sample_token}`. In the next section, we show how to use an instance token and sample token to query data for the prediction challenge.

In [5]:
mini_train[:5]

['48d58b69b40149aeb2e64aa4b1a9192f_ca9a282c9e77460f8360f564131a8af5',
 '48d58b69b40149aeb2e64aa4b1a9192f_39586f9d59004284a7114a68825e8eec',
 '48d58b69b40149aeb2e64aa4b1a9192f_356d81f38dd9473ba590f39e266f54e5',
 '48d58b69b40149aeb2e64aa4b1a9192f_e0845f5322254dafadbbed75aaa07969',
 '48d58b69b40149aeb2e64aa4b1a9192f_c923fe08b2ff4e27975d2bf30934383b']

## 2. Getting past and future data for an agent

We provide a class called `PredictHelper` that provides methods for querying past and future data for an agent. This class is instantited by wrapping an instance of the `nuScenes` class. 

In [6]:
from nuscenes.predict import PredictHelper
helper = PredictHelper(nuscenes)

To get the data for an agent at a particular point in time, use the `get_sample_annotation` method.

In [7]:
instance_token, sample_token = mini_train[0].split("_")

In [8]:
annotation = helper.get_sample_annotation(instance_token, sample_token)

In [9]:
annotation

{'token': '6b89da9bf1f84fd6a5fbe1c3b236f809',
 'sample_token': 'ca9a282c9e77460f8360f564131a8af5',
 'instance_token': '48d58b69b40149aeb2e64aa4b1a9192f',
 'visibility_token': '2',
 'attribute_tokens': ['ab83627ff28b465b85c427162dec722f'],
 'translation': [378.888, 1153.348, 0.865],
 'size': [0.775, 0.769, 1.711],
 'rotation': [-0.5527590208259255, 0.0, 0.0, 0.8333411455673865],
 'prev': '',
 'next': '216bbbd8e01c450a8fabe9d47433c10a',
 'num_lidar_pts': 2,
 'num_radar_pts': 0,
 'category_name': 'human.pedestrian.adult'}

To get the future/past of an agent, use the `get_past_for_agent`/`get_future_for_agent` methods. If the `in_agent_frame` parameter is set to true, the coordinates will be in the agents local coordinate frame. Otherwise, they will be in the global frame.

In [11]:
future_xy_local = helper.get_future_for_agent(instance_token, sample_token, seconds=3, in_agent_frame=True)
future_xy_local

array([[-0.01285847,  0.62956228],
       [-0.02610585,  1.26004583],
       [-0.03804305,  1.88999702],
       [ 0.00629423,  2.46120994],
       [ 0.05024259,  3.03334414],
       [ 0.09457987,  3.60455707]])

In [13]:
future_xy_global = helper.get_future_for_agent(instance_token, sample_token, seconds=3, in_agent_frame=False)
future_xy_global

array([[ 378.655, 1152.763],
       [ 378.422, 1152.177],
       [ 378.188, 1151.592],
       [ 377.925, 1151.083],
       [ 377.662, 1150.573],
       [ 377.399, 1150.064]])

Note that you can also return the entire annotation record by passing `just_xy=False`. However in this case, `in_agent_frame` is not taken into account.

In [15]:
helper.get_future_for_agent(instance_token, sample_token, seconds=3, in_agent_frame=False, just_xy=False)

[{'token': '216bbbd8e01c450a8fabe9d47433c10a',
  'sample_token': '39586f9d59004284a7114a68825e8eec',
  'instance_token': '48d58b69b40149aeb2e64aa4b1a9192f',
  'visibility_token': '2',
  'attribute_tokens': ['ab83627ff28b465b85c427162dec722f'],
  'translation': [378.655, 1152.763, 0.873],
  'size': [0.775, 0.769, 1.711],
  'rotation': [-0.5527590208259255, 0.0, 0.0, 0.8333411455673865],
  'prev': '6b89da9bf1f84fd6a5fbe1c3b236f809',
  'next': 'ea18f90581fc46978c4954d6f0147fb0',
  'num_lidar_pts': 2,
  'num_radar_pts': 0,
  'category_name': 'human.pedestrian.adult'},
 {'token': 'ea18f90581fc46978c4954d6f0147fb0',
  'sample_token': '356d81f38dd9473ba590f39e266f54e5',
  'instance_token': '48d58b69b40149aeb2e64aa4b1a9192f',
  'visibility_token': '2',
  'attribute_tokens': ['ab83627ff28b465b85c427162dec722f'],
  'translation': [378.422, 1152.177, 0.882],
  'size': [0.775, 0.769, 1.711],
  'rotation': [-0.5527590208259255, 0.0, 0.0, 0.8333411455673865],
  'prev': '216bbbd8e01c450a8fabe9d47433c

If you would like to return the data for the entire sample, as opposed to one agent in the sample, you can use the `get_annotations_for_sample` method. This will return a list of records for each annotated agent in the sample.

In [16]:
sample = helper.get_annotations_for_sample(sample_token)

In [17]:
len(sample)

69

Note that there are `get_future_for_sample` and `get_past_for_sample` methods that are analogous to the `get_future_for_agent` and `get_past_for_agent` methods.

We also provide methods to compute the velocity, acceleration, and heading change rate of an agent at a given point in time

In [18]:
# We get a new instance and sample token because these methods require computing difference between records.
instance_token_2, sample_token_2 = mini_train[5].split("_")

# Meters / second
helper.get_velocity_for_agent(instance_token_2, sample_token_2)

1.2616850959705042

In [19]:
# Meters / second^2
helper.get_acceleration_for_agent(instance_token_2, sample_token_2)

0.2569005623454393

In [20]:
# radians / second
helper.get_heading_change_rate_for_agent(instance_token_2, sample_token_2)

0.0

## Changes to the Map API

We've added a couple methods to the Map API to help query lane centerline information.

In [21]:
from nuscenes.map_expansion.map_api import NuScenesMap
onenorth = NuScenesMap(map_name='singapore-onenorth')



To get the closest lane to a location, use the `get_closest_lane` method. To see the internal data representation of the lane, use the `get_lane_record` method. 
You can also explore the connectivity of the lanes, with the `get_outgoing_lanes` and `get_incoming_lane_methods`.

In [22]:
x, y, yaw = 395, 1095, 0
closest_lane = onenorth.get_closest_lane(x, y)
closest_lane

'5933500a-f0f2-4d69-9bbc-83b875e4a73e'

In [23]:
lane_record = onenorth.get_lane(closest_lane)
lane_record

[{'start_pose': [421.2419602954602, 1087.9127960414617, 2.739593514975998],
  'end_pose': [391.7142849867393, 1100.464077182952, 2.7365754617298705],
  'shape': 'LSR',
  'radius': 999.999,
  'segment_length': [0.23651121617864976,
   28.593481378991886,
   3.254561444252876]}]

In [24]:
onenorth.get_incoming_lane_ids(closest_lane)

['f24a067b-d650-47d0-8664-039d648d7c0d']

In [25]:
onenorth.get_outgoing_lane_ids(closest_lane)

['0282d0e3-b6bf-4bcd-be24-35c9ce4c6591',
 '28d15254-0ef9-48c3-9e06-dc5a25b31127']

To help manipulate the lanes, we've added an `arcline_path_utils` module. For example, something you might want to do is discretize a lane into a sequence of poses.

In [26]:
from nuscenes.map_expansion import arcline_path_utils
poses = arcline_path_utils.discretize_lane(lane_record, resolution_meters=1)
poses

[(421.2419602954602, 1087.9127960414617, 2.739593514975998),
 (420.34712994585345, 1088.2930152148274, 2.739830026428688),
 (419.45228865726136, 1088.6732086473173, 2.739830026428688),
 (418.5574473686693, 1089.0534020798073, 2.739830026428688),
 (417.66260608007724, 1089.433595512297, 2.739830026428688),
 (416.76776479148515, 1089.813788944787, 2.739830026428688),
 (415.87292350289306, 1090.1939823772768, 2.739830026428688),
 (414.97808221430097, 1090.5741758097668, 2.739830026428688),
 (414.0832409257089, 1090.9543692422567, 2.739830026428688),
 (413.1883996371168, 1091.3345626747464, 2.739830026428688),
 (412.29355834852475, 1091.7147561072363, 2.739830026428688),
 (411.39871705993266, 1092.0949495397263, 2.739830026428688),
 (410.5038757713406, 1092.4751429722162, 2.739830026428688),
 (409.6090344827485, 1092.8553364047061, 2.739830026428688),
 (408.7141931941564, 1093.2355298371958, 2.739830026428688),
 (407.81935190556436, 1093.6157232696858, 2.739830026428688),
 (406.92451061697

Given a query pose, you can also find the closest pose on a lane.

In [27]:
closest_pose_on_lane, distance_along_lane = arcline_path_utils.project_pose_to_lane((x, y, yaw), lane_record)

In [28]:
print(x, y, yaw)
closest_pose_on_lane

395 1095 0


(396.25524909914367, 1098.5289922434013, 2.739830026428688)

In [29]:
# Meters
distance_along_lane

27.5

To find the entire length of the lane, you can use the `length_of_lane` functions

In [30]:
arcline_path_utils.length_of_lane(lane_record)

32.08455403942341

You can also compute the curvature of a lane at a given length along the lane.

In [31]:
# 0 Means it is a straight lane
arcline_path_utils.get_curvature_at_distance_along_lane(distance_along_lane, lane_record)

0

## 4. Input Representation

It is common in the prediction literature to represent the state of an agent as a tensor containing information about the semantic map (such as the drivable area and walkways) as well the past locations of surrounding agents.

Each paper in the field chooses to represent the input in a slightly different way. For example, [CoverNet](https://arxiv.org/pdf/1911.10298.pdf) and [MTP](https://arxiv.org/pdf/1808.05819.pdf) choose to rasterize the map information and agent locations into a three channel RGB image. But [Rules of the Road](http://openaccess.thecvf.com/content_CVPR_2019/papers/Hong_Rules_of_the_Road_Predicting_Driving_Behavior_With_a_Convolutional_CVPR_2019_paper.pdf) decides to use a taller tensor with information represented in different channels .

We provide a module called `input_representation` that is meant to make it easy for you to define your own input representation. In short, you need to define your own `StaticLayerRepresentation`, `AgentRepresentation`, and `Combinator`.

The `StaticLayerRepresentation` controls how the static map information is represented. The `AgentRepresentation` controls how the locations of the agents in the scene are represented. The `Combinator` controls how these two sources of information are combined into a single tensor.

For more information, consult `input_representation/interface.py`.

To help get you started, we've provided implementations of input representation used in CoverNet and MTP.

In [32]:
from nuscenes.predict.input_representation.static_layers import StaticLayerRasterizer
from nuscenes.predict.input_representation.agents import AgentBoxesWithFadedHistory
from nuscenes.predict.input_representation.interface import InputRepresentation
from nuscenes.predict.input_representation.combinators import Rasterizer

In [None]:
static_layer_rasterizer = StaticLayerRasterizer(helper)
agent_rasterizer = AgentBoxesWithFadedHistory(helper)
mtp_input_representation = InputRepresentation(static_layer_rasterizer, agent_rasterizer, Rasterizer())

static_layers.py - Loading Map: singapore-onenorth


In [None]:
anns = [ann for ann in nuscenes.sample_annotation if ann['instance_token'] == instance_token_img]

In [None]:
instance_token_img, sample_token_img = 'bc38961ca0ac4b14ab90e547ba79fbb6', '7626dde27d604ac28a0240bdd54eba7a'

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
img = mtp_input_representation.make_input_representation(instance_token_img, sample_token_img)

In [None]:
plt.imshow(img)

## Baseline Model Implementations

We've provided PyTorch implementations for CoverNet and MTP. Below we show, how to make predictions on the previously created input representation.

In [None]:
from nuscenes.predict.models.backbone import ResNetBackbone
from nuscenes.predict.models.mtp import MTP
from nuscenes.predict.models.covernet import CoverNet
import torch

In [None]:
backbone = ResNetBackbone('resnet50')
mtp = MTP(backbone, num_modes=2)

# Note that the value of num_modes depends on the size of the lattice used for CoverNet
covernet = CoverNet(backbone, num_modes=2)

In [None]:
agent_state_vector = torch.Tensor([[helper.get_velocity_for_agent(instance_token_img, sample_token_img),
                                    helper.get_acceleration_for_agent(instance_token_img, sample_token_img),
                                    helper.get_heading_change_rate_for_agent(instance_token_img, sample_token_img)]])

In [None]:
image_tensor = torch.Tensor(img).permute(2, 0, 1).unsqueeze(0)

In [None]:
# Output has 50 entries.
# The first 24 are x,y coordinates over the next 6 seconds at 2 Hz for the first mode.
# The second 24 are the x,y coordinates for the second mode.
# The last 2 are the logits of the mode probabilities
mtp(image_tensor, agent_state_vector)

In [None]:
# CoverNet outputs a probability distribution over the lattice.
# These are the logits of the probabilities
covernet(image_tensor, agent_state_vector)