Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue of gym_wrapper and action_heads #93

Open
BUAAZhangHaonan opened this issue May 13, 2024 · 0 comments
Open

Issue of gym_wrapper and action_heads #93

BUAAZhangHaonan opened this issue May 13, 2024 · 0 comments

Comments

@BUAAZhangHaonan
Copy link

About gym_wrapper

What should be the correct order of HistoryWrapper, RHCWrapper, TemporalEnsembleWrapper and UnnormalizeActionProprio.
According to the code in 03_eval_finetuned.py, the order is:

HistoryWrapper->RHCWrapper->UnnormalizeActionProprio

env = HistoryWrapper(env, horizon=1)

the order in visualization_lib.py is:

HistoryWrapper->RHCWrapper->TemporalEnsembleWrapper->UnnormalizeActionProprio

self._env = HistoryWrapper(self._env, self.history_length)

but in gym_wrapper.py is:

UnnormalizeActionProprio->RHCWrapper->HistoryWrapper

def add_octo_env_wrappers(

Does different nesting order, especially the order of HistoryWrapper and RHCWrapper, have any impact on the results?
At the same time, I noticed that the parameter horizon in HistoryWrapper will have a negative impact on the prediction results when it exceeds 5, which seems to be different from intuitive understanding. After all, providing a longer observation history should get better results. When horizon is equal to 1 or 2 the result is good. Is this normal? How does OctoTransformer handle stacked observations from HistoryWrapper?

About action_heads

I conducted fine-tuning tests on three different action heads and found that the results of L1 and MSE were stable, while the diffusion head is very unstable: without enough steps of training, the simulation could not even be performed normally: #43 (comment)

aloha_finetuning_local_0.0.mp4

I trained the diffusion head for 50,000 steps on ACT, and the results were not satisfactory, even far inferior to L1 and MSE, which had been trained for 5,000 steps. How should the parameters or network structure be adjusted to make the diffusion policy perform at its best?
At the same time, I also noticed that L1 head can show correct movements after 1000 steps of training, but it is still poor in details, manifested in the inability to grasp objects correctly, but it improved after 5000 steps of training.

aloha_finetuning_local_0.0.mp4

Does this mean that fine-tuning is a sign of overfitting? And the training cost required to reach usability seems to be relatively high.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant