Skip to content

shaoanlu/diffusion_policy_quadrotor

Repository files navigation

diffusion_policy_quadrotor

This repository provides a demonstration of imitation learning using a diffusion policy. The implementation is adapted from the official Diffusion Policy repository.

Result

The control task is to drive the quadrotor from the initial position (0, 0) to the goal position (5, 5) without collision with the obstacles. The animation shows the denoising process of the diffusion policy predicting future trajectory followed by the quadrotor applying the actions.

drawing drawing

Usage

The notebook demo.ipynb demonstrates a closed-loop simulation using the diffusion policy controller for quadrotor collision avoidance. You can run it in colab Open In Colab.

The training script is provided as train.ipynb.

Dependencies

The program was developed and tested in the following environment.

  • Python 3.10
  • torch==2.2.1
  • jax==0.4.26
  • jaxlib==0.4.26
  • diffusers==0.27.2
  • torchvision==0.14.1
  • gdown (to download pre-trained weights)
  • joblib (format of training data)

Diffusion policy

The policy takes 1) the latest N step of observation $o_t$ (position and velocity) and 2) the encoding of obstacle information $O_{BST}$ (a flattened 7x7 grid with obstacle radius as values) as input. The outputs are N steps of actions $a_t$ (future position and future velocity).

drawing

*The quadrotor icon is from flaticon.

Deviation from the original implementation

  • Add a linear layer before the Mish activation to the condition encoder of ConditionalResidualBlock1D. This is to prevent the activation from truncating large negative values from the normalized observation.
  • A CLF-CBF-QP controller is implemented and used to modify the noisy actions during the denoising process of the policy. By default, this controller is not used.

drawing

References

Papers

Videos and Lectures

Learning note

Failure case: the diffusion policy controller failed to extrapolate from training data

Figure: A failure case of the controller.

  • The left figure is a trajectory in the training data.
  • The middle figure is the closed-loop simulation result of the controller starting from the SAME initial position as the training data.
  • The right figure is the closed-loop simulation result of the controller starting from a DIFFERENT initial position, which resulted in a trajectory with collision.

drawing

Refer to learning_note.md for other notes.

About

A simple demo of imitation learning based on diffusion policy for quadrotor control

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published