Issue of gym_wrapper and action_heads #93

BUAAZhangHaonan · 2024-05-13T07:36:03Z

About gym_wrapper

What should be the correct order of HistoryWrapper, RHCWrapper, TemporalEnsembleWrapper and UnnormalizeActionProprio.
According to the code in 03_eval_finetuned.py, the order is:

HistoryWrapper->RHCWrapper->UnnormalizeActionProprio

octo/examples/03_eval_finetuned.py

Line 67 in cab7f94

env = HistoryWrapper(env, horizon=1)

the order in visualization_lib.py is:

HistoryWrapper->RHCWrapper->TemporalEnsembleWrapper->UnnormalizeActionProprio

octo/octo/utils/visualization_lib.py

Line 292 in cab7f94

self._env = HistoryWrapper(self._env, self.history_length)

but in gym_wrapper.py is:

UnnormalizeActionProprio->RHCWrapper->HistoryWrapper

octo/octo/utils/gym_wrappers.py

Line 53 in cab7f94

def add_octo_env_wrappers(

Does different nesting order, especially the order of HistoryWrapper and RHCWrapper, have any impact on the results?
At the same time, I noticed that the parameter horizon in HistoryWrapper will have a negative impact on the prediction results when it exceeds 5, which seems to be different from intuitive understanding. After all, providing a longer observation history should get better results. When horizon is equal to 1 or 2 the result is good. Is this normal? How does OctoTransformer handle stacked observations from HistoryWrapper?

About action_heads

I conducted fine-tuning tests on three different action heads and found that the results of L1 and MSE were stable, while the diffusion head is very unstable: without enough steps of training, the simulation could not even be performed normally: #43 (comment)

aloha_finetuning_local_0.0.mp4

I trained the diffusion head for 50,000 steps on ACT, and the results were not satisfactory, even far inferior to L1 and MSE, which had been trained for 5,000 steps. How should the parameters or network structure be adjusted to make the diffusion policy perform at its best?
At the same time, I also noticed that L1 head can show correct movements after 1000 steps of training, but it is still poor in details, manifested in the inability to grasp objects correctly, but it improved after 5000 steps of training.

aloha_finetuning_local_0.0.mp4

Does this mean that fine-tuning is a sign of overfitting? And the training cost required to reach usability seems to be relatively high.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue of gym_wrapper and action_heads #93

Issue of gym_wrapper and action_heads #93

BUAAZhangHaonan commented May 13, 2024

Issue of gym_wrapper and action_heads #93

Issue of gym_wrapper and action_heads #93

Comments

BUAAZhangHaonan commented May 13, 2024

About gym_wrapper

About action_heads