Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with diffusion head #43

Open
seann999 opened this issue Jan 29, 2024 · 8 comments
Open

Issue with diffusion head #43

seann999 opened this issue Jan 29, 2024 · 8 comments

Comments

@seann999
Copy link

seann999 commented Jan 29, 2024

I was able to fine-tune with a modified version of example 2 with the following action head:

config["model"]["heads"]["action"] = ModuleSpec.create(
    L1ActionHead,
    pred_horizon=9,
    action_dim=11,
    readout_key="readout_action",
)

The policy works reasonably well on the robot.
After, I've been trying to fine-tune with a diffusion head, but the robot goes out of control with this.

config["model"]["heads"]["action"] = ModuleSpec.create(
    # L1ActionHead,
    DiffusionActionHead,
    use_map=False,

    pred_horizon=9,
    action_dim=11,
    readout_key="readout_action",
)

The rest of the script is unchanged.
What else could be the problem?

Update: The diffusion-based model seems to be outputting fairly extreme action values, like those that are less than the minimum value or more than the maximum value in the dataset.

These are the action statistics:

'max': array([6.92096353e-03, 7.15068638e-01, 2.65712190e+00, 1.22000003e+00, 4.09653854e+00, 9.43594933e-01, 1.17203128e+00, 2.65219069e+00, 1.00000000e+00, 1.50034428e-01, 4.94167267e-04]),
'mean': array([-1.45641267e+00,  2.27537051e-01, -2.96192672e-02, -6.16574585e-01, -7.61023015e-02,  2.06921268e-02,  4.98067914e-03, -1.26738450e-04, 5.67098975e-01,  7.91745202e-04, -7.36813426e-01]),
'min': array([-2.61999989, -0.05653883, -1.91999996, -1.75      , -2.45071816, -0.88507879, -1.3124876 , -1.5       ,  0.        , -0.09809657, -1.35774779]),
'std': array([0.50778913, 0.21539085, 0.41647774, 0.72304732, 0.8022927, 0.11436515, 0.08398118, 0.30631647, 0.49528143, 0.01135448, 0.26658419])}

and these are sample outputs after unnormalization:

act: [-3.9953585   1.3044913   2.0527694  -4.231811    3.9353614   0.59251785   -0.41492522 -1.5317091   3.0435061  -0.05598066 -2.0697343 ]
act: [ 1.082533    1.3044913   2.0527694   2.998662   -4.087566    0.59251785
 -0.41492522  1.5314556   3.0435061  -0.05598066 -2.0697343 ]

which are clearly out of bounds.

@seann999
Copy link
Author

Okay, I was able to figure out that the horizon was too large; 1~3 was fine but 9 led to an unstable model.

@seann999
Copy link
Author

Reopening since although the output values are more stable, I found that the policy is still considerably worse than an L1-based policy.

@seann999 seann999 reopened this Jan 29, 2024
@kpertsch
Copy link
Collaborator

Thanks for digging into this!
We are debugging the ALOHA finetuning on our side as well, and also found that the diffusion head is somehow substantially worse than the L1 head in this particular case and outputs non-sensical values. I think it should work in principle so we are digging into this as well. Will update here if we find a solution!

@zwbx
Copy link

zwbx commented Apr 8, 2024

Similar problem. The paper indicate that the Diffusion head outperforms the L1 head. I guess, this conclusion is not necessarily extended to the fintuing condition.

@zwbx
Copy link

zwbx commented Apr 8, 2024

Maybe we can build a Discord to discuss these questions, the official reply is too slow these days.

WenchangGaoT pushed a commit to WenchangGaoT/octo1 that referenced this issue May 10, 2024
Merge in R2D2 eval script and small QOL changes to dataset.py
@BUAAZhangHaonan
Copy link

Thanks for digging into this! We are debugging the ALOHA finetuning on our side as well, and also found that the diffusion head is somehow substantially worse than the L1 head in this particular case and outputs non-sensical values. I think it should work in principle so we are digging into this as well. Will update here if we find a solution!

Compared with the L1 and MSE heads, the diffusion head seems to show more unstable characteristics under fewer training steps. In MuJoCO, simulation instability will occur, resulting in the following error:

Nan, Inf or huge value in QACC at DOF 0. The simulation is unstable. Time = 0.0840.
dm_control.rl.control.PhysicsError: Physics state is invalid. Warning(s) raised: mjWARN_BADQACC

Even reducing the size of the horzion parameter still has similar problems, and it seems that using a diffusion head for fine-tuning is more expensive. I don't know whether the head parameters in the config should be adjusted more carefully, or whether this problem is simply a flaw in the diffusion algorithm.

@zwbx
Copy link

zwbx commented May 13, 2024

Hi,

Thanks for digging into this! We are debugging the ALOHA finetuning on our side as well, and also found that the diffusion head is somehow substantially worse than the L1 head in this particular case and outputs non-sensical values. I think it should work in principle so we are digging into this as well. Will update here if we find a solution!

Compared with the L1 and MSE heads, the diffusion head seems to show more unstable characteristics under fewer training steps. In MuJoCO, simulation instability will occur, resulting in the following error:

Nan, Inf or huge value in QACC at DOF 0. The simulation is unstable. Time = 0.0840.
dm_control.rl.control.PhysicsError: Physics state is invalid. Warning(s) raised: mjWARN_BADQACC

Even reducing the size of the horzion parameter still has similar problems, and it seems that using a diffusion head for fine-tuning is more expensive. I don't know whether the head parameters in the config should be adjusted more carefully, or whether this problem is simply a flaw in the diffusion algorithm.

Hi, have you checked the reason of the error? Try to print the action the model output, you could see that the number is out of bound or hard to be solved by IK optimizer.

@BUAAZhangHaonan
Copy link

Hi,

Thanks for digging into this! We are debugging the ALOHA finetuning on our side as well, and also found that the diffusion head is somehow substantially worse than the L1 head in this particular case and outputs non-sensical values. I think it should work in principle so we are digging into this as well. Will update here if we find a solution!

Compared with the L1 and MSE heads, the diffusion head seems to show more unstable characteristics under fewer training steps. In MuJoCO, simulation instability will occur, resulting in the following error:

Nan, Inf or huge value in QACC at DOF 0. The simulation is unstable. Time = 0.0840.
dm_control.rl.control.PhysicsError: Physics state is invalid. Warning(s) raised: mjWARN_BADQACC

Even reducing the size of the horzion parameter still has similar problems, and it seems that using a diffusion head for fine-tuning is more expensive. I don't know whether the head parameters in the config should be adjusted more carefully, or whether this problem is simply a flaw in the diffusion algorithm.

Hi, have you checked the reason of the error? Try to print the action the model output, you could see that the number is out of bound or hard to be solved by IK optimizer.

Thank you for your attention. I tried Aloha simulation with 1000 and 5000 steps on Diffusion. Unfortunately, this problem was not reproduced. Maybe this problem itself is also unstable like Diffusion Head.
But I do get action prediction output from two different heads:
The first is the output of L1, which can be simulated normally and performs well:
action[100]: [ 0.57204974 -0.23319605 0.76363015 -0.09170198 -0.07699968 0.77580905 0.96044242 0.75630909 -0.45451248 1.13872492 -0.46 336147 -0.58958995 0.5670839 1.06122994]
action[200]: [-0.99930573 0.55937028 -0.52284944 -0.59289199 -0.51960701 0.75874555 0.97301495 0.14920622 0.19651346 -0.29161143 -0.02 436048 0.51398659 0.05542321 -1.52965879]
75%|action[300]: [-0.46441016 0.91491914 -1.4249717 -0.39001578 -0.50952721 0.77543199 -0.09499384 -0.16243899 0.69015974 -1.14362192 0.15943812 0.9602797 -0.15596175 0.2628575 ]
Then there is the output of Diffusion, which completely freezes after 100 steps and maintains consistent movement:
action[100]: [-3.59837627 5. 5. -4.92470407 5. 4.25408506 -4.62254953 -4.55121994 -5. -5. 1.16204941 4.00920582 -4.54648066 5. ]
action[200]: [-3.59454226 5. 5. -4.92671156 5. 4.25115156 -4.61913872 -4.55262327 -5. -5. 1.16107273 4.01593781 -4.55090284 5. ]
action[300]: [-3.59459043 5. 5. -4.92677879 5. 4.25059795 -4.61849213 -4.55246067 -5. -5. 1.16351843 4.01590443 -4.55066919 5. ]
Compared with the two, the latter does have some extreme values, showing its instability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants