Description
Motivation
I created an environment with a compound action space: a list of continuous values (robot joint angles) and a boolean value (suction gripper on or off).
In the PPO tutorial the policy_module is a ProbabilisticActor which takes "loc" and "scale" inputs. I want to make an actor which is a combination of this (for the joint angles) and something else that uses a Bernoulli distribution to generate boolean action values for the gripper.
It kind of looks like this may already be supported by using a TensorDictSequential, but it's not clear how that would work.
Solution
I would like to see an example in the docs of a compound action space like this.
Alternatives
Maybe there's another way where one actor is created for each type of action space? Then how to combine them for use with a DataCollector?
Additional context
The environment is a robot arm manipulation scenario using box2d.
Checklist
- I have checked that there is no similar issue in the repo (required)