# Observation and Action

<img align="right" src="figs/observation_demo.png" width=260>

MetaDrive provides various kinds of sensory input, as illustrated in the next figure.
For low-level sensors, RGB cameras, depth cameras, semantic camera, instance camera and Lidar can be placed anywhere in the scene with adjustable
parameters such as view field and the laser number.
Meanwhile, the high-level scene information including the road information and nearby vehicles' information like velocity and heading can also be provided as the observation.

Note that MetaDrive aims at providing an efficient platform to benchmark RL research,
therefore we improve the simulation efficiency at the cost of photorealistic rendering effect.

In this page, we describe the optional observation forms in current MetaDrive version and discuss how to implement new forms of observation subject to your own tasks.


## Observations
There are three kinds of observations we usually used for training agents:
- LidarStateObservation
- ImageStateObservation
- TopDownObservation

### LidarStateObservation
MetaDrive provides a state vector containing necessary information to navigation tasks.
We use this state vector in almost all existing RL experiments such as the Generalization, MARL and Safe RL experiments.
The state vector consist of three parts:
1. **Ego State**: current states such as the steering, heading, velocity and relative distance to boundaries, implemented in the `vehicle_state` function of [StateObservation](https://github.com/metadriverse/metadrive/blob/main/metadrive/obs/state_obs.py#L9). Please find the detailed meaning of each state dimension in the code.
2. **Navigation**: the navigation information that guides the vehicle toward the destination. Concretely, MetaDrive first computes the route from the spawn point to the destination of the ego vehicle. Then a set of checkpoints are scattered across the whole route with certain intervals. The relative distance and direction to the next checkpoint and the next next checkpoint will be given as the navigation information. This part is implemented in the `_get_info_for_checkpoint` function of [Navigation Class](https://github.com/metadriverse/metadrive/blob/9a89962e72c709e60d4a5bc19ce5f27d96027401/metadrive/component/vehicle_navigation_module/base_navigation.py#L13C10-L13C10).
3. **Surrounding**: the surrounding information is encoded by a vector containing the Lidar-like cloud points. The data is generated by the [Lidar Class](https://github.com/metadriverse/metadrive/blob/main/metadrive/component/vehicle_module/lidar.py#L16). We typically use 240 lasers (single-agent) and 70 lasers (multi-agent) to scan the neighboring area with radius 50 meters.

The above information is normalized to [0,1] and concatenated into a state vector by the [LidarStateObservation Class](https://github.com/metadriverse/metadrive/blob/main/metadrive/envs/observation_type.py) and fed to the RL agents.


### ImageStateObservation


.. image:: figs/rgb_obs.png
   :width: 350
   :align: center

.. image:: figs/depth_obs.jpg
   :width: 350
   :align: center


MetaDrive supports visuomotor tasks by turning on the rendering during the training.
The above figure shows the images captured by RGB camera (left) and depth camera (right).
In this section, we discuss how to utilize such observation in a **headless** machine, such as computing node in cluster
or other remote server.
Before using such function in your project, please make sure the offscreen rendering is working in your
machine. The setup tutorial is at :ref:`install_headless`.

Now we can setup the vision-based observation in MetaDrive:

* Step 1. Set the `config["image_observation"] = True` to tell MetaDrive maintaining a image buffer in memory even no popup window exists.
* Step 2. Set the `config["vehicle_config"]["image_source"]` to `"rgb_camera"` or `"depth_camera"` according to your demand.
* Step 3. The image size (width and height) will be determined by the camera parameters. The default setting is (84, 84) following the image size in Atari. You can customize the size by configuring `config["vehicle_config"]["rgb_camera"]`. For example, `config["vehicle_config"]["rgb_camera"] = (200, 88)` means that the image has 200 pixels in width and 88 pixels in height.

There is a demo script using RGB camera as observation::

    python -m metadrive.examples.drive_in_single_agent_env --observation rgb_camera

The script should print a message:

.. code-block:: text

    The observation is a dict with numpy arrays as values:  {'image': (84, 84, 3), 'state': (21,)}

The image rendering consumes memory in the first GPU of your machine (if any). Please be careful when using this.


If you feel the visual data collection is slow, why not try our advanced offscreen render: :ref:`install_render_cuda`.
After verifying your installation, set `config["image_on_cuda"] = True` to get **10x** faster data collection!



### TopDownObservation
<img align="center" src="figs/top_down_obs.png" width=600>

MetaDrive also supports Top-down semantic maps. We provide a handy example to illustrate the utilization of Top-down observation in [top_down_metadrive.py]( https://github.com/metadriverse/metadrive/blob/main/metadrive/examples/top_down_metadrive.py).
You can enjoy this demo via
```bash
python -m metadrive.examples.top_down_metadrive
```
The following is a minimal script to use Top-down observation.
The `TopDownMetaDrive` is a wrapper class on `MetaDriveEnv` which overrides observation to pygame top-down renderer.
The native observation of this setting is a numpy array with shape `[84, 84, 5]` and all entries fall into [0, 1].
The above figure shows the semantic meaning of each channel.

In [None]:
from metadrive import TopDownMetaDrive

env = TopDownMetaDrive()
try:
    o,i = env.reset()
    for s in range(1, 100000):
        o, r, tm, tc, info = env.step([0, 1])
        env.render(mode="top_down")
        if tm or tc:
            break
            env.reset()
finally:
    env.close()

## Action


MetaDrive receives normalized action as input to control each target vehicle: :math:`\mathbf a = [a_1, a_2]^T \in [-1, 1]^2`.

At each environmental time step, MetaDrive converts the normalized action into the steering :math:`u_s` (degree), acceleration :math:`u_a` (hp) and brake signal :math:`u_b` (hp) in the following ways:


.. math::

    u_s & = S_{max} a_1 ~\\
    u_a & = F_{max} \max(0, a_2) ~\\
    u_b & = -B_{max} \min(0, a_2)

wherein :math:`S_{max}` (degree)  is the maximal steering angle, :math:`F_{max}` (hp) is the maximal engine force, and :math:`B_{max}` (hp) is the maximal brake force.
Since the accurate values of these parameters are varying across different types of vehicle, please refer to the `VehicleParameterSpace Class <https://github.com/metadriverse/metadrive/blob/main/metadrive/utils/space.py#L219) for details.

By such design, the action space for each agent is always fixed to `gym.spaces.Box(low=-1.0, high=1.0, shape=(2, ))`. However, we provides a config named `extra_action_dim` (int) which allows user to add more dimensions in the action space.
For example, if we set `config["extra_action_dim"] = 1`, then the action space for each agent will become `Box(-1.0, 1.0, shape=(3, ))`. This allow the user to write environment wrapper that introduce more input action dimensions.