Unable to reproduce the performance of the pretrained checkpoint for Calvin-Sim #6

Ping-C · 2023-10-31T04:21:49Z

Hello Kevin,

I am back again. Thank you for looking at this issue!

So I attempted to reproduce the performance of the pretrained goal conditioned policy, but was unable to reproduce the performance of your pretrained checkpoint in Calvin-sim, and was wondering whether you could potential shed some lights on what I may be missing. The answer can be short, and doesn't have to be complete. Some simple pointers would likely suffice.

First, I downloaded the calvin-sim data, and preprocess it with experiments/susie/calvin/data_conversion_scripts/goal_conditioned.py on the abc training + d validation dataset. To get it working, I had to modify the raw_dataset_path, tfrecord_dataset_path, and then comment out the following section of code

if start_idx <= scene_info["calvin_scene_D"][1]:
      ctr = D_ctr
      D_ctr += 1
      letter = "D"

Second, I then trained the goal conditioned policy on calvin-sim using the following script

python experiments/susie/calvin/calvin_gcbc.py \
    --config experiments/susie/calvin/configs/gcbc_train_config.py:gc_ddpm_bc \
    --calvin_dataset_config experiments/susie/calvin/configs/gcbc_data_config.py:all

after updating data_path, save_dir in bridge_data_v2/experiments/susie/calvin/configs/gcbc_train_config.py.

I trained the model for 2 million steps as specified in the config, and the loss level went from ~2.5 to roughly 0.65 at the end of the training (see the plot below). Note that I did have to resume the checkpoints multiple times throughout. I then ran evaluations on multiple checkpoints throughout training coupled with the pretrained diffusion model, and these are roughly the success rate that I got for the each no. of instruction chained.

1: 57.0%
2: 21.0%
3: 7.0%
4: 2.0%
5: 1.0%

which is much worse than your pretrained gc policy + your pretrained diffusion model

1: 81.0%
2: 65.0%
3: 46.0%
4: 30.0%
5: 21.0%

If you could potentially give me some pointers on what I may be doing incorrectly, it would be greatly appreciated! :)

The text was updated successfully, but these errors were encountered:

Ping-C · 2023-10-31T04:55:31Z

And qualitatively, the model also seems to move more erratically compared to the pretrained model.

combined_test.mov

pranavatreya · 2023-10-31T15:31:39Z

Hi Ping-C,

Can you un-comment line 198 of calvin-sim/calvin_models/calvin_agent/evaluation/diffusion_gc_policy.py and re-evaluate? I suspect the issue is that the policy was trained with normalized actions, but is being evaluated without the assumption that actions are normalized.

Ping-C · 2023-10-31T18:15:52Z

It worked! Kevin, you are amazing!

houyaokun · 2023-11-29T06:39:46Z

Hello Ping-C,
I download the diffusion model and goal conditioned policy checkpoints from https://huggingface.co/patreya/susie-calvin-checkpoints and set the values of the environment variables in eval_susie.sh,

but the result is not good :
Average successful sequence length: 0.4666666666666667
Success rates for i instructions in a row:
1: 33.3%
2: 13.3%
3: 0.0%
4: 0.0%
5: 0.0%
turn_on_led: 2 / 2 | SR: 100.0%
open_drawer: 4 / 4 | SR: 100.0%
turn_on_lightbulb: 1 / 1 | SR: 100.0%
push_blue_block_right: 0 / 1 | SR: 0.0%
rotate_blue_block_right: 0 / 1 | SR: 0.0%
lift_blue_block_slider: 0 / 1 | SR: 0.0%
lift_blue_block_table: 0 / 1 | SR: 0.0%
push_pink_block_left: 0 / 2 | SR: 0.0%
move_slider_left: 0 / 3 | SR: 0.0%
push_blue_block_left: 0 / 2 | SR: 0.0%
lift_red_block_slider: 0 / 1 | SR: 0.0%
push_red_block_left: 0 / 1 | SR: 0.0%
rotate_red_block_left: 0 / 1 | SR: 0.0%
lift_red_block_table: 0 / 1 | SR: 0.0%

I noticed that you have a high success rate in evaluating pre-trained models, so I wanted to ask if there are any other operations you perform during the evaluation besides downloading the model and modifying the paths?

The actions of the robotic arm seem strange in some tasks, and I suspect that it may be an issue with GCBC.The robotic arm has even moved outside the field of view.
I would greatly appreciate it if you could tell me how to properly evaluate pre-trained models. :)

houyaokun · 2024-01-15T02:18:25Z

Hi @Ping-C and @pranavatreya ,

Can you please help me? I did the same. I am trying to train the model on CALVIN dataset ABC. when I run:

python experiments/susie/calvin/calvin_gcbc.py --config experiments/susie/calvin/configs/gcbc_train_config.py:gc_ddpm_bc --calvin_dataset_config experiments/susie/calvin/configs/gcbc_data_config.py:all

I got that error: Traceback (most recent call last): File "experiments/susie/calvin/calvin_gcbc.py", line 186, in app.run(main) File "/home/gaurav/miniconda3/envs/susie-calvin/lib/python3.8/site-packages/absl/app.py", line 308, in run run_main(main, args) File "/home/gaurav/miniconda3/envs/susie-calvin/lib/python3.8/site-packages/absl/app.py", line 254, in run_main sys.exit(main(argv)) File "experiments/susie/calvin/calvin_gcbc.py", line 77, in main task_paths = [ File "experiments/susie/calvin/calvin_gcbc.py", line 78, in glob_to_path_list( File "/media/local/gaurav/Music/calvin-sim/bridge_data_v2/jaxrl_m/data/calvin_dataset.py", line 27, in glob_to_path_list assert len(filtered_paths) > 0, f"{glob_str} came up empty" AssertionError: training/A/?/? came up empty

I ensured that the dataset is located at the path expected by the script. The glob pattern training/A/?/? suggests it's looking for directories or files within training/A/ where each subdirectory in A has a single-character name. so what should I do?

I will appreciate your help. Thanks in advance!
Have you converted the dataset into tfrecord format?

lightorange0v0 · 2024-06-05T03:57:08Z

@houyaokun Hi, have you solved the problem? I am struggling with the same issue. 😭

houykun · 2024-06-05T04:11:56Z

@houyaokun Hi, have you solved the problem? I am struggling with the same issue. 😭

yeah，os.environ.pop("DISPLAY") may work.

lightorange0v0 · 2024-06-05T04:16:01Z

@houyaokun Thank you for your quick reply. 😄 Actually, I am having problem with reproduction. My results are so bad with the provided pre-trained models. How did you handle this problem? Thank you so much for your reply. I am struggling with this about a week.

lightorange0v0 · 2024-06-05T04:43:13Z

@Ping-C Hi, I am struggling with the performance. I did all things same as they told me to do. But the performance is bad on provided pre-trained models. How did you reproduce? Is there anything else I need to do besides the default settings?

houykun · 2024-06-05T05:55:42Z

@houyaokun Thank you for your quick reply. 😄 Actually, I am having problem with reproduction. My results are so bad with the provided pre-trained models. How did you handle this problem? Thank you so much for your reply. I am struggling with this about a week.
For me,I simply added os.environ.pop("DISPLAY") .By doing this, you will be able to use EGL normally. Otherwise, you will have a domain gap between the images in the dataset (rendered with EGL)

lightorange0v0 · 2024-06-05T08:10:48Z

So the problem is domain gap with dataset. I will take a closer look at the rendering part.
Thank you so much for your reply. Hope you have a wonderful day. 😃

Ping-C closed this as completed Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce the performance of the pretrained checkpoint for Calvin-Sim #6

Unable to reproduce the performance of the pretrained checkpoint for Calvin-Sim #6

Ping-C commented Oct 31, 2023

Ping-C commented Oct 31, 2023 •

edited

Loading

pranavatreya commented Oct 31, 2023

Ping-C commented Oct 31, 2023

houyaokun commented Nov 29, 2023

houyaokun commented Jan 15, 2024

lightorange0v0 commented Jun 5, 2024

houykun commented Jun 5, 2024

lightorange0v0 commented Jun 5, 2024

lightorange0v0 commented Jun 5, 2024

houykun commented Jun 5, 2024

lightorange0v0 commented Jun 5, 2024

Unable to reproduce the performance of the pretrained checkpoint for Calvin-Sim #6

Unable to reproduce the performance of the pretrained checkpoint for Calvin-Sim #6

Comments

Ping-C commented Oct 31, 2023

Ping-C commented Oct 31, 2023 • edited Loading

pranavatreya commented Oct 31, 2023

Ping-C commented Oct 31, 2023

houyaokun commented Nov 29, 2023

houyaokun commented Jan 15, 2024

lightorange0v0 commented Jun 5, 2024

houykun commented Jun 5, 2024

lightorange0v0 commented Jun 5, 2024

lightorange0v0 commented Jun 5, 2024

houykun commented Jun 5, 2024

lightorange0v0 commented Jun 5, 2024

Ping-C commented Oct 31, 2023 •

edited

Loading