# Scenario Description 

[![Click and Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/metadriverse/metadrive/blob/main/documentation/source/scenario_description.ipynb)



There exists a lot real-world autonomous driving datasets, such as Waymo and nuPlan.
Those datasets contain rich information about road map, traffic lights, traffic participants and their motions.
Those data can be imported to the MetaDrive so that we can reconstruct an interactive environments and replay (or even alter!) those scenarios recorded in the real world.

To achieve that, we have a separated project called ScenarioNet (https://metadriverse.github.io/scenarionet/) that converts various sources of AD data into a unified data structure, manage those data, and import those data into MetaDrive for creating interactive environments.

As you can see in the next figure of ScenarioNet, the key functionalities developed in ScenarioNet are:

* **Unified Scenario Description**
* **Scenario Data Management**
* **Real-world Scenario Simulation**

 
<img src="figs/scenarionet.png" width="800">


MetaDrive implements the **unified scenario description** in `metadrive/scenario/scenario_description.py` and [ScenarioNet](https://metadriverse.github.io/scenarionet/) implements the data management toolkits. 

For completeness, we will walk through both the ScenarioNet dataset, which contains a lots of scenarios, and the Scenario Description, a nested dict that stores the information of one scenario.

## What is the ScenarioNet dataset?

**TL;DR: A established ScenarioNet dataset is a folder containing `dataset_summary.pkl` and `dataset_mapping.pkl` (optionally), along with a set of `scenario_version_id.pkl` files each of which stores the unified scenario description of a scenario.**

To be specific, the functions of these files are as follows.

- `dataset_summary.pkl` summarizes the meta information for each scenario. Like the number of objects and moving distance of each object.

- `dataset_mapping.pkl` contains the mapping from the scenario ID to its relative path on your machine. When this file is missing, the relative path to look up scenarios will be set to the directory of `dataset_summary.pkl`.

- `scenario_version_id.pkl` stores the serialized scenario description, i.e. a python dict, of a scenario.

**Note: All these files are human readable after beding loadded and decoded by `pickle`**

Stored detailed scenario information, `scenario_version_id.pkl` files are stored either in the same folder of `dataset_summary.pkl` or in some other folders with the relative path to `dataset_summary.pkl` registered in `dataset_mapping.pkl`. The function of`dataset_mapping.pkl` is like a soft link or a pointer. Thus, to build new datasets, we only combine a set of relative paths pointing to target scenarios, without the need to copy `scenario_version_id.pkl` scenario files. In the following content, we call scenario files `.pkl` file for convenience.

In this section, we demonstrate how to use the utilities from MetaDrive to easily access scenarios. We prepare two demo datasets, splitting from Waymo Open Dataset and nuScenes Dataset. Here we will demonstrate how to load the files. You can also find some useful tutorials in ScenarioNet repo: https://github.com/metadriverse/scenarionet/tree/main/tutorial


In [1]:
from metadrive.engine.asset_loader import AssetLoader
import os

waymo_data =  AssetLoader.file_path(AssetLoader.asset_path, "waymo", unix_style=False) # Use the built-in datasets with simulator
os.listdir(waymo_data) # there are 3 waymo scenario files with one 'dataset_summary.pkl' in this example dataset.

['sd_training.tfrecord-00000-of-01000_8a346109094cd5aa.pkl',
 'sd_training.tfrecord-00000-of-01000_2a1e44d405a6833f.pkl',
 'dataset_summary.pkl',
 'sd_training.tfrecord-00000-of-01000_c403d5992cab9e0.pkl']

In [2]:
nuscenes_data = AssetLoader.file_path(AssetLoader.asset_path, "nuscenes", unix_style=False) # Use the built-in datasets with simulator
os.listdir(nuscenes_data) # there are 10 nuscenes scenario file with a 'dataset_summary.pkl' and a 'dataset_summary.pkl'

['nuscenes_6',
 'nuscenes_7',
 'nuscenes_2',
 'nuscenes_5',
 'dataset_mapping.pkl',
 'nuscenes_0',
 'dataset_summary.pkl',
 'nuscenes_1',
 'nuscenes_3',
 'nuscenes_4']

You can find that for the Waymo dataset, as we put all scenarios in the same folder, we don't need the `dataset_mapping.pkl` that routes scenario ID to corresponding `.pkl` file. In the nuScenes dataset, we have both `dataset_mapping.pkl` and `dataset_summary.pkl` because we have a hierarchy file structure to store the scenarios.

## How to read a ScenarioNet dataset?


### Read the dataset summary

The first step to read a ScenarioNet dataset is to call the utility function `read_dataset_summary` from MetaDrive.

This function will not read all the data and exhausts your memory as it only read the `dataset_summary.pkl` and optionally the `dataset_mapping.pkl`. The `dataset_summary` contains the summaries (a set of stats) of each scenario in the dataset and the `dataset_mapping` contains the relative paths of the `.pkl` files of the scenarios relative to the folder that hosts `dataset_summary.pkl` file.

The output of the `read_dataset_summary` is a tuple of three elements:

1) the summary dict mapping from scenario ID to its metadata,
2) the list of all scenarios IDs, and
3) a dict mapping from scenario IDs to the folder that hosts their files.

In [3]:
from metadrive.scenario import utils as sd_utils

dataset_path = waymo_data

print(f"Reading the summary file from Waymo data at: {dataset_path}")

waymo_dataset_summary = sd_utils.read_dataset_summary(dataset_path)
print(f"The dataset summary is a {type(waymo_dataset_summary)}, with lengths {len(waymo_dataset_summary)}.")


Reading the summary file from Waymo data at: /home/zhenghao/metadrive/metadrive/assets/waymo
The dataset summary is a <class 'tuple'>, with lengths 3.


In [4]:
waymo_scenario_summary, waymo_scenario_ids, waymo_scenario_files = waymo_dataset_summary


print(f"The scenario summary is a dict with keys: {waymo_scenario_summary.keys()} \nwhere each value of the dict is the summary of a scenario.\n")

The scenario summary is a dict with keys: dict_keys(['sd_training.tfrecord-00000-of-01000_2a1e44d405a6833f.pkl', 'sd_training.tfrecord-00000-of-01000_c403d5992cab9e0.pkl', 'sd_training.tfrecord-00000-of-01000_8a346109094cd5aa.pkl']) 
where each value of the dict is the summary of a scenario.



### What is the summary dict of a scenario?

As you can see, the first return value from the utils will return a dict `waymo_scenario_summary` summarizing all scenarios in the dataset.

Each item in this dict represents the summary of one scenario.

Now let's dive into the detailed structure and see what is included in the summary of one scenario.

In [5]:
example_scenario_summary = waymo_scenario_summary['sd_training.tfrecord-00000-of-01000_2a1e44d405a6833f.pkl']
print(f"The summary of a scenario is a dict with keys: {example_scenario_summary.keys()}")

The summary of a scenario is a dict with keys: dict_keys(['coordinate', 'ts', 'metadrive_processed', 'sdc_id', 'dataset', 'scenario_id', 'source_file', 'track_length', 'current_time_index', 'sdc_track_index', 'objects_of_interest', 'tracks_to_predict', 'object_summary', 'number_summary'])


In [6]:
# A string indicating the coordinate system of the scenario.
print(example_scenario_summary["coordinate"])

waymo


In [7]:
# A list containing the wall time of each time step
print(example_scenario_summary["ts"])

[0.      0.09998 0.19996 0.29995 0.39994 0.49991 0.59994 0.69993 0.79995
 0.89997 0.99999 1.09998 1.2     1.30002 1.40001 1.5     1.59998 1.69997
 1.79994 1.89994 1.99996 2.09995 2.19998 2.3     2.40003 2.50002 2.60003
 2.70002 2.80004 2.90004 3.00001 3.09995 3.19989 3.2998  3.3997  3.49961
 3.59951 3.69942 3.79937 3.89928 3.99918 4.09914 4.19905 4.29897 4.39888
 4.49875 4.59866 4.69856 4.79844 4.89836 4.99823 5.09814 5.19802 5.2979
 5.3978  5.49771 5.59762 5.69753 5.79748 5.89743 5.99738 6.09736 6.19736
 6.29735 6.39737 6.4974  6.59746 6.69749 6.79759 6.89762 6.99768 7.09773
 7.1978  7.2979  7.39799 7.49809 7.59822 7.69832 7.79839 7.89849 7.99855
 8.09861 8.19868 8.29874 8.39879 8.49882 8.59888 8.6989  8.79893 8.89896
 8.99898]


In [8]:
# A boolean indicator saying whether the scenario is exported from MetaDrive.
# Note: we have a replay and record systems in MetaDrive that will generate 
# scenarios with `metadrive_processed=True`.
print(example_scenario_summary["metadrive_processed"])

False


In [9]:
# The object ID of the self-driving car (if exists)
print(example_scenario_summary["sdc_id"])

1629


In [10]:
# Where the scenario comes from
print(example_scenario_summary["dataset"])

waymo


In [11]:
# The ID of the scenario
print(example_scenario_summary["scenario_id"])

2a1e44d405a6833f


In [12]:
# The number of each type of objects. This field is very useful when filtering the datasets.
print(example_scenario_summary["number_summary"])

{'num_objects': 211, 'object_types': {'CYCLIST', 'PEDESTRIAN', 'VEHICLE'}, 'num_objects_each_type': {'VEHICLE': 184, 'PEDESTRIAN': 25, 'CYCLIST': 2}, 'num_moving_objects': 69, 'num_moving_objects_each_type': defaultdict(<class 'int'>, {'VEHICLE': 52, 'PEDESTRIAN': 15, 'CYCLIST': 2}), 'num_traffic_lights': 8, 'num_traffic_light_types': {'LANE_STATE_STOP', 'LANE_STATE_UNKNOWN'}, 'num_traffic_light_each_step': {'LANE_STATE_UNKNOWN': 164, 'LANE_STATE_STOP': 564}, 'num_map_features': 358, 'map_height_diff': 2.4652252197265625}


In [13]:
# The summary for each object, including information like the moving distance, type, valid time steps.
print(example_scenario_summary["object_summary"])

{'325': {'type': 'VEHICLE', 'object_id': '325', 'track_length': 91, 'moving_distance': 2.7487233877182007, 'valid_length': 3, 'continuous_valid_length': 2}, '327': {'type': 'VEHICLE', 'object_id': '327', 'track_length': 91, 'moving_distance': 5.598679721355438, 'valid_length': 15, 'continuous_valid_length': 15}, '332': {'type': 'VEHICLE', 'object_id': '332', 'track_length': 91, 'moving_distance': 103.90164065361023, 'valid_length': 91, 'continuous_valid_length': 91}, '333': {'type': 'VEHICLE', 'object_id': '333', 'track_length': 91, 'moving_distance': 111.29066771268845, 'valid_length': 91, 'continuous_valid_length': 91}, '336': {'type': 'VEHICLE', 'object_id': '336', 'track_length': 91, 'moving_distance': 5.664326503872871, 'valid_length': 23, 'continuous_valid_length': 21}, '337': {'type': 'VEHICLE', 'object_id': '337', 'track_length': 91, 'moving_distance': 0.0, 'valid_length': 43, 'continuous_valid_length': 43}, '339': {'type': 'VEHICLE', 'object_id': '339', 'track_length': 91, 'mo

In [14]:
# You might also find that the summaries of scenarios from different datasets might vary.

print(f"Keys of the scenario from Waymo: {example_scenario_summary.keys()}. \n")
print("Keys of the scenario from nuScene: ", sd_utils.read_dataset_summary(nuscenes_data)[0]['sd_nuscenes_v1.0-mini_scene-0061.pkl'].keys())

Keys of the scenario from Waymo: dict_keys(['coordinate', 'ts', 'metadrive_processed', 'sdc_id', 'dataset', 'scenario_id', 'source_file', 'track_length', 'current_time_index', 'sdc_track_index', 'objects_of_interest', 'tracks_to_predict', 'object_summary', 'number_summary']). 

Keys of the scenario from nuScene:  dict_keys(['dataset', 'metadrive_processed', 'map', 'date', 'coordinate', 'scenario_token', 'id', 'scenario_id', 'sample_rate', 'ts', 'sdc_id', 'object_summary', 'number_summary'])


### What is the other two returns from `read_dataset_summary`?

Recall that `sd_utils.read_dataset_summary(dataset_path)` will return a tuple of three elements. 

The second element is a list containing the `.pkl` file names of all scenarios.

The third element is a dict mapping the `.pkl` file names to their relative path from `dataset_summary.pkl` file.

**Case 1: All `.pkl` files are in the same folder as `dataset_summary.pkl`**

In [15]:
_, waymo_scenario_ids, waymo_scenario_files = sd_utils.read_dataset_summary(waymo_data)

print("Scenarios ID: ", waymo_scenario_ids)

# As those .pkl files are in the same folder of `dataset_summary.pkl`, there is no relative path from pkl to summary file.
print("The mapping: ", waymo_scenario_files)

Scenarios ID:  ['sd_training.tfrecord-00000-of-01000_2a1e44d405a6833f.pkl', 'sd_training.tfrecord-00000-of-01000_c403d5992cab9e0.pkl', 'sd_training.tfrecord-00000-of-01000_8a346109094cd5aa.pkl']
The mapping:  {'sd_training.tfrecord-00000-of-01000_2a1e44d405a6833f.pkl': '', 'sd_training.tfrecord-00000-of-01000_c403d5992cab9e0.pkl': '', 'sd_training.tfrecord-00000-of-01000_8a346109094cd5aa.pkl': ''}


**Case 2: The `.pkl` files are not in the same folder as `dataset_summary.pkl`**

In [16]:
print("Reading from: ", nuscenes_data)
_, nuscenes_scenario_ids, nuscenes_scenario_files = sd_utils.read_dataset_summary(nuscenes_data)


print("Scenarios ID: ", nuscenes_scenario_ids)

# As those .pkl files for nuscenes dataset is stored in separate subfolder from the folder containing `dataset_summary.pkl`,
# there are the relative paths.
#
# For example, the relative path of `sd_nuscenes_v1.0-mini_scene-0061.pkl` can be created by appending:
# nuscenes_scenario_files[`sd_nuscenes_v1.0-mini_scene-0061.pkl`] + '/' + 'sd_nuscenes_v1.0-mini_scene-0061.pkl'

print("The mapping: ", nuscenes_scenario_files)

Reading from:  /home/zhenghao/metadrive/metadrive/assets/nuscenes
Scenarios ID:  ['sd_nuscenes_v1.0-mini_scene-0061.pkl', 'sd_nuscenes_v1.0-mini_scene-0103.pkl', 'sd_nuscenes_v1.0-mini_scene-0553.pkl', 'sd_nuscenes_v1.0-mini_scene-0655.pkl', 'sd_nuscenes_v1.0-mini_scene-0757.pkl', 'sd_nuscenes_v1.0-mini_scene-0796.pkl', 'sd_nuscenes_v1.0-mini_scene-0916.pkl', 'sd_nuscenes_v1.0-mini_scene-1077.pkl', 'sd_nuscenes_v1.0-mini_scene-1094.pkl', 'sd_nuscenes_v1.0-mini_scene-1100.pkl']
The mapping:  {'sd_nuscenes_v1.0-mini_scene-0061.pkl': 'nuscenes_0', 'sd_nuscenes_v1.0-mini_scene-0103.pkl': 'nuscenes_1', 'sd_nuscenes_v1.0-mini_scene-0553.pkl': 'nuscenes_2', 'sd_nuscenes_v1.0-mini_scene-0655.pkl': 'nuscenes_3', 'sd_nuscenes_v1.0-mini_scene-0757.pkl': 'nuscenes_4', 'sd_nuscenes_v1.0-mini_scene-0796.pkl': 'nuscenes_5', 'sd_nuscenes_v1.0-mini_scene-0916.pkl': 'nuscenes_6', 'sd_nuscenes_v1.0-mini_scene-1077.pkl': 'nuscenes_7', 'sd_nuscenes_v1.0-mini_scene-1094.pkl': 'nuscenes_7', 'sd_nuscenes_v1.0

## What is the Scenario Description? ⭐

**TL;DR: Scenario Description is a nested python dict that stores the information of one AD scenario.**

By defining the unified data structure, from the input end, we can define a set of toolkits to convert each dataset into the dataset in the form of Scenario Description.

Then we can store each scenario into a `.pkl` file in the local file system and manage those files with [ScenarioNet](https://github.com/metadriverse/scenarionet).

From the output end, no matter where the source data is from, we can reconstruct interactive environment via MetaDrive and conduct downstream tasks like RL training by reading from the dataset containing those `.pkl` files.

Here is an example of what the Scenario Description looks like:


```python
scenario = {

    # ===== Meta data about the scenario =====
    # string. The name of the scenario
    "id": "Waymo-001",

    # string. The version of data format.
    "version": "MetaDrive v0.3.0.1",

    # int. The length of all trajectory and state arrays (T).
    "length": 200,

    # ===== Meta data ===
    "metadata": {

        # np.ndarray in (T, ). The time stamp of each time step.
        "ts": np.array([0.0, 0.1, 0.2, ...], dtype=np.float32),


        # bool. Whether the scenario is processed and exported by MetaDrive.
        # Some operations may be done, such as interpolating the lane to
        # make way points uniformly scattered in given interval.
        "metadrive_processed": True,

        # string. Coordinate system.
        "coordinate": "metadrive",

        # optional keys
        "source_file": "training_20s.tfrecord-00014-of-01000",
        "dataset": "waymo",
        "scenario_id": "dd0c8c27fdd6ef59",  # Used in Waymo dataset
        "seed": 512,
        "history_metadata": {},

        "sdc_id": "172",  # A key exists in tracks

    },

    # ===== Trajectories of active participants, e.g. vehicles, pedestrians =====
    # a dict mapping object ID to it's state dict.
    "tracks": {
        "vehicle1": {

            # The type string in metadrive.type.MetaDriveType
            "type": "VEHICLE",

            # The state dict. All values must have T elements.
            "state": {
                "position": np.ones([200, 3], dtype=np.float32),
                ...
            },

            # The meta data dict. Store useful information about the object. type in metadata could be those from
            # different dataset
            "metadata": {
                "type": "VEHICLE",
                "track_length": 200,
                "object_id": "vehicle1",

                # Optional keys
                "agent_name": "default_agent",
                "policy_spawn_info": {  # Information needed to re-instantiate the policy
                    "policy_class": ("metadrive.policy.idm_policy", "IDMPolicy"),
                    "args": ...,
                    "kwargs": ...,
                }
            }
        },

        "pedestrian1": ...
    },

    # ===== States sequence of dynamics objects, e.g. traffic light =====
    # a dict mapping object ID to it's state dict.
    "dynamic_map_states": {
        "trafficlight1": {

            # The type string in metadrive.type.MetaDriveType
            "type": "TRAFFIC_LIGHT",

            # The state dict. All values must have T elements.
            "state": {
                "object_state": np.ones([200, ], dtype=int),
                ...
            },

            # The meta data dict. Store useful information about the object
            "metadata": {
                "type": "TRAFFIC_LIGHT",
                "track_length": 200,
            }
    }

    # ===== Map features =====
    # A dict mapping from map feature ID to a line segment
    "map_features": {
        "219": {
            "type": "LANE_SURFACE_STREET",
            "polyline": np.array in [21, 2],  # A set of 2D points describing a line segment
            # optional, only works for lane
            "polygon": np.array in [N, 2] # A set of 2D points representing convexhull
        },
        "182": ...
        ...
    }
}

# Note: Example is from the docstring of metadrive/scenario/scenario_description.py
```

### Read a scenario description in `.pkl` with MetaDrive utils

We want to avoid using any advanced tools to ensure the maximal compatibility. Here let me demonstrate how to read a `.pkl` file and instantiate a Scenario Description from it.

We will use `read_scenario` to read a given `.pkl` file, and it will return a `ScenarioDescrption` instance. A `ScenarioDescrption` is a wrapper of native python dict with some useful utility functions.

In [17]:
# Get the dataset path
dataset_path = waymo_data
print("Dataset path: ", dataset_path)

# Get the scenario .pkl file name
_, scenario_ids, dataset_mapping = sd_utils.read_dataset_summary(dataset_path)
# Just pick the first scenario
scenario_pkl_file = scenario_ids[0]

# Get the relative path to the .pkl file
print("The pkl file relative path: ", dataset_mapping[scenario_pkl_file])  # An empty path

# Get the absolute path to the .pkl file
abs_path_to_pkl_file = os.path.join(dataset_path, dataset_mapping[scenario_pkl_file], scenario_pkl_file)
print("The pkl file absolute path: ", abs_path_to_pkl_file)

# Call utility function in MD and get the Scenario Description object
scenario = sd_utils.read_scenario_data(abs_path_to_pkl_file)

print(f"\nThe raw data type after reading the .pkl file is {type(scenario)}")

Dataset path:  /home/zhenghao/metadrive/metadrive/assets/waymo
The pkl file relative path:  
The pkl file absolute path:  /home/zhenghao/metadrive/metadrive/assets/waymo/sd_training.tfrecord-00000-of-01000_2a1e44d405a6833f.pkl

The raw data type after reading the .pkl file is <class 'metadrive.scenario.scenario_description.ScenarioDescription'>


In [18]:
print(f"The keys in a ScenarioDescription are: {scenario.keys()}")

The keys in a Scenario Description are: dict_keys(['id', 'version', 'length', 'tracks', 'dynamic_map_states', 'map_features', 'metadata'])


In [19]:
# Now you can play with the data in scenario as it's like a python dict:

print(f"The SDC vehicle name is: {scenario['metadata']['sdc_id']}")

sdc_id = scenario['metadata']['sdc_id']
print(f"The SDC vehicle type is: {scenario['tracks'][sdc_id]['type']}")

print(f"The SDC vehicle position at last time step is: {scenario['tracks'][sdc_id]['state']['position'][-1]}")

print(f"There are traffic lights: {scenario['dynamic_map_states'].keys()}")

print(f"The traffic light 316's state is: {scenario['dynamic_map_states']['316']['state']['object_state']}")


The SDC vehicle name is: 1629
The SDC vehicle type is: VEHICLE
The SDC vehicle position at last time step is: [ 2976.9563  -3251.819     186.91092]
There are traffic lights: dict_keys(['226', '227', '272', '316', '317', '321', '354', '355'])
The traffic light 316's state is: ['LANE_STATE_UNKNOWN', 'LANE_STATE_UNKNOWN', 'LANE_STATE_UNKNOWN', 'LANE_STATE_UNKNOWN', 'LANE_STATE_UNKNOWN', 'LANE_STATE_UNKNOWN', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'LANE_STATE_STOP', 'L

### Useful utility functions from ScenarioDescription

Note that these functions are class functions.

In [20]:
# Check the self-consistency of a scenario and whether the required fields are filled.
scenario.sanity_check?
scenario.sanity_check(scenario)  # If no error is raised, the check passes.

[0;31mSignature:[0m
[0mscenario[0m[0;34m.[0m[0msanity_check[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mscenario_dict[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcheck_self_type[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mvalid_check[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Check if the input scenario dict is self-consistent and has filled required fields.

The required high-level fields include tracks, dynamic_map_states, metadata, map_features.
For each object, the tracks[obj_id] should at least contain type, state, metadata.
For each object, the tracks[obj_id]['state'] should at least contain position, heading.
For each lane in map_features, map_feature[map_feat_id] should at least contain polyline.
For metadata, it should at least contain metadrive_processed, coordinate and timestep.
We have more checks to ensure the consistency of the data.

Args:
    scen

In [21]:
# Convert the ScenarioDescription to a python dict.
scenario_dict = scenario.to_dict()
print(f"The keys in the converted dict are: {scenario_dict.keys()}")

The keys in the converted dict are: dict_keys(['id', 'version', 'length', 'tracks', 'dynamic_map_states', 'map_features', 'metadata'])


In [22]:
# Return the object info dict of the SDC.
sdc_track = scenario.get_sdc_track()

print("The 'tracks' field stores the information of an object, including its type, state and metadata. "
      f"The object info dict contains these keys: {sdc_track.keys()}\n")
print("We demo the state vectors stored in the SDC's info dict: \n", {k: v.shape for k, v in sdc_track['state'].items()})
print("\nBut note that different type of objects might have different state vectors.")

The 'tracks' field stores the information of an object, including its type, state and metadata. The object info dict contains these keys: dict_keys(['type', 'state', 'metadata'])

We demo the state vectors stored in the SDC's info dict: 
 {'position': (91, 3), 'length': (91,), 'width': (91,), 'height': (91,), 'heading': (91,), 'velocity': (91, 2), 'valid': (91,)}

But note that different type of objects might have different state vectors.


In [23]:
# Compute the moving distance of SDC.
sd_utils.ScenarioDescription.sdc_moving_dist(scenario)

84.20105910301208

In [24]:
# Verify if the .pkl file is valid.
sd_utils.ScenarioDescription.is_scenario_file(abs_path_to_pkl_file)

True

In [25]:
# Get the summary of an object. Here we use the SDC.
sd_utils.ScenarioDescription.get_object_summary(
    object_dict=scenario.get_sdc_track(), 
    object_id=scenario['metadata']['sdc_id']
)

{'type': 'VEHICLE',
 'object_id': '1629',
 'track_length': 91,
 'moving_distance': 84.20105910301208,
 'valid_length': 91,
 'continuous_valid_length': 91}

In [26]:
# Get the stats of all objects in a scenario.
sd_utils.ScenarioDescription.get_number_summary(
    scenario=scenario
)

{'num_objects': 211,
 'object_types': {'CYCLIST', 'PEDESTRIAN', 'VEHICLE'},
 'num_objects_each_type': {'VEHICLE': 184, 'PEDESTRIAN': 25, 'CYCLIST': 2},
 'num_moving_objects': 69,
 'num_moving_objects_each_type': defaultdict(int,
             {'VEHICLE': 52, 'PEDESTRIAN': 15, 'CYCLIST': 2}),
 'num_traffic_lights': 8,
 'num_traffic_light_types': {'LANE_STATE_STOP', 'LANE_STATE_UNKNOWN'},
 'num_traffic_light_each_step': {'LANE_STATE_UNKNOWN': 164,
  'LANE_STATE_STOP': 564},
 'num_map_features': 358,
 'map_height_diff': 2.4652252197265625}

In [27]:
# Get the number of all objects in a scenario.
sd_utils.ScenarioDescription.get_num_objects(
    scenario=scenario
)

211

In [28]:
# Get the number of all moving objects in a scenario.
sd_utils.ScenarioDescription.get_num_moving_objects(
    scenario=scenario
)

69

## How to filter Scenarios?

To ensure the maximal compatibility and simplicity, we use the native file systems to store all `.pkl` files, so you can use native Python loop to read and filter scenarios.

We have defined some high-level commands to conduct filtering in ScenarioNet: https://scenarionet.readthedocs.io/en/latest/operations.html#filter

To give more flexibility, here we provide the full example of how you can build a ScenarioNet dataset by filtering existing data.

In this example, we try to filter the scenario if there exists traffic lights in the scene.

*Note that if you replace "waymo" in AssetLoader by "nuscenes", you will find that the new dataset does not include any scenario as none of the scenarios in the example nuscenes dataset contains traffic light.*


In [29]:
import os
import pickle
from pathlib import Path

import tqdm

import metadrive.scenario.utils as sd_utils
from metadrive.engine.asset_loader import AssetLoader
from metadrive.scenario.scenario_description import ScenarioDescription as SD

# === Specify the path to the original and the new dataset ===
waymo_data = AssetLoader.file_path(AssetLoader.asset_path, "waymo", unix_style=False)

dataset_path = waymo_data
new_dataset_path = "./filtered_dataset"
# ============================================================

new_dataset_path = Path(new_dataset_path).resolve()
if not os.path.isdir(new_dataset_path):
    os.makedirs(new_dataset_path)

print(f"Reading existing dataset at: {dataset_path}")
print(f"Will save new dataset at: {new_dataset_path}")
print("\n")

# Read the summary
scenario_summaries, scenario_ids, dataset_mapping = sd_utils.read_dataset_summary(dataset_path)


# Define a filter function that return True if the scenario is accepted
def filter(scenario):
    # Get the number of traffic light
    num_tl = len(scenario[SD.DYNAMIC_MAP_STATES])
    # You can also get the number from metadata (recommended)
    num_tl_from_metadata = scenario[SD.METADATA][SD.SUMMARY.NUMBER_SUMMARY][SD.SUMMARY.NUM_TRAFFIC_LIGHTS]
    assert num_tl_from_metadata == num_tl
    print(f"We found {num_tl} traffic lights in scenario {scenario[SD.ID]}")
    has_traffic_light = num_tl > 0
    return has_traffic_light


# Iterate over all scenarios
remove_scenario = []
new_mapping = {}
for file_name in tqdm.tqdm(scenario_summaries.keys(), desc="Filter Scenarios"):
    abs_path = Path(dataset_path) / dataset_mapping[file_name] / file_name
    abs_path_to_file_dir = Path(dataset_path) / dataset_mapping[file_name]

    # Call utility function in MD and get the ScenarioDescription object
    scenario = sd_utils.read_scenario_data(abs_path)

    # Filter
    accepted = filter(scenario)
    print(f"Processing: {file_name}. This scenario is {'accepted' if accepted else 'rejected'}.")
    if not accepted:
        remove_scenario.append(file_name)

    # Translate the relative path in mapping to the new location
    new_mapping[file_name] = os.path.relpath(abs_path_to_file_dir, new_dataset_path)
print("\n")

for file in remove_scenario:
    scenario_summaries.pop(file)
    new_mapping.pop(file)

summary_file_path = new_dataset_path / SD.DATASET.SUMMARY_FILE
with open(summary_file_path, "wb") as file:
    pickle.dump(scenario_summaries, file)
print(f"Summary file is saved at: {summary_file_path}")

mapping_file_path = new_dataset_path / SD.DATASET.MAPPING_FILE
with open(mapping_file_path, "wb") as file:
    pickle.dump(new_mapping, file)
print(f"Mapping file is saved at: {summary_file_path}")

# Verify
_, _, new_mapping = sd_utils.read_dataset_summary(new_dataset_path)
print(f"\nLet's verify if the new dataset is valid."
      f"\n{len(new_mapping)} scenarios are in the new dataset."
      f"\nThe new mapping:\n{new_mapping}")


Reading existing dataset at: /home/zhenghao/metadrive/metadrive/assets/waymo
Will save new dataset at: /home/zhenghao/metadrive/documentation/source/filtered_dataset




Filter Scenarios: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 421.41it/s]

We found 8 traffic lights in scenario 2a1e44d405a6833f
Processing: sd_training.tfrecord-00000-of-01000_2a1e44d405a6833f.pkl. This scenario is accepted.
We found 8 traffic lights in scenario c403d5992cab9e0
Processing: sd_training.tfrecord-00000-of-01000_c403d5992cab9e0.pkl. This scenario is accepted.
We found 19 traffic lights in scenario 8a346109094cd5aa
Processing: sd_training.tfrecord-00000-of-01000_8a346109094cd5aa.pkl. This scenario is accepted.


Summary file is saved at: /home/zhenghao/metadrive/documentation/source/filtered_dataset/dataset_summary.pkl
Mapping file is saved at: /home/zhenghao/metadrive/documentation/source/filtered_dataset/dataset_summary.pkl

Let's verify if the new dataset is valid.
3 scenarios are in the new dataset.
The new mapping:
{'sd_training.tfrecord-00000-of-01000_2a1e44d405a6833f.pkl': '../../../metadrive/assets/waymo', 'sd_training.tfrecord-00000-of-01000_c403d5992cab9e0.pkl': '../../../metadrive/assets/waymo', 'sd_training.tfrecord-00000-of-01000_8a


