Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reproduce the performance of the pretrained checkpoint for Calvin-Sim #6

Closed
Ping-C opened this issue Oct 31, 2023 · 11 comments

Comments

@Ping-C
Copy link

Ping-C commented Oct 31, 2023

Hello Kevin,

I am back again. Thank you for looking at this issue!

So I attempted to reproduce the performance of the pretrained goal conditioned policy, but was unable to reproduce the performance of your pretrained checkpoint in Calvin-sim, and was wondering whether you could potential shed some lights on what I may be missing. The answer can be short, and doesn't have to be complete. Some simple pointers would likely suffice.

First, I downloaded the calvin-sim data, and preprocess it with experiments/susie/calvin/data_conversion_scripts/goal_conditioned.py on the abc training + d validation dataset. To get it working, I had to modify the raw_dataset_path, tfrecord_dataset_path, and then comment out the following section of code

if start_idx <= scene_info["calvin_scene_D"][1]:
      ctr = D_ctr
      D_ctr += 1
      letter = "D"

Second, I then trained the goal conditioned policy on calvin-sim using the following script

python experiments/susie/calvin/calvin_gcbc.py \
    --config experiments/susie/calvin/configs/gcbc_train_config.py:gc_ddpm_bc \
    --calvin_dataset_config experiments/susie/calvin/configs/gcbc_data_config.py:all

after updating data_path, save_dir in bridge_data_v2/experiments/susie/calvin/configs/gcbc_train_config.py.

I trained the model for 2 million steps as specified in the config, and the loss level went from ~2.5 to roughly 0.65 at the end of the training (see the plot below). Note that I did have to resume the checkpoints multiple times throughout. I then ran evaluations on multiple checkpoints throughout training coupled with the pretrained diffusion model, and these are roughly the success rate that I got for the each no. of instruction chained.

1: 57.0%
2: 21.0%
3: 7.0%
4: 2.0%
5: 1.0%

which is much worse than your pretrained gc policy + your pretrained diffusion model

1: 81.0%
2: 65.0%
3: 46.0%
4: 30.0%
5: 21.0%

Screenshot 2023-10-30 at 9 06 19 PM

If you could potentially give me some pointers on what I may be doing incorrectly, it would be greatly appreciated! :)

@Ping-C
Copy link
Author

Ping-C commented Oct 31, 2023

And qualitatively, the model also seems to move more erratically compared to the pretrained model.

combined_test.mov

@pranavatreya
Copy link

Hi Ping-C,

Can you un-comment line 198 of calvin-sim/calvin_models/calvin_agent/evaluation/diffusion_gc_policy.py and re-evaluate? I suspect the issue is that the policy was trained with normalized actions, but is being evaluated without the assumption that actions are normalized.

@Ping-C
Copy link
Author

Ping-C commented Oct 31, 2023

It worked! Kevin, you are amazing!

@Ping-C Ping-C closed this as completed Oct 31, 2023
@houyaokun
Copy link

Hello Ping-C,
I download the diffusion model and goal conditioned policy checkpoints from https://huggingface.co/patreya/susie-calvin-checkpoints and set the values of the environment variables in eval_susie.sh,
image

but the result is not good :
Average successful sequence length: 0.4666666666666667
Success rates for i instructions in a row:
1: 33.3%
2: 13.3%
3: 0.0%
4: 0.0%
5: 0.0%
turn_on_led: 2 / 2 | SR: 100.0%
open_drawer: 4 / 4 | SR: 100.0%
turn_on_lightbulb: 1 / 1 | SR: 100.0%
push_blue_block_right: 0 / 1 | SR: 0.0%
rotate_blue_block_right: 0 / 1 | SR: 0.0%
lift_blue_block_slider: 0 / 1 | SR: 0.0%
lift_blue_block_table: 0 / 1 | SR: 0.0%
push_pink_block_left: 0 / 2 | SR: 0.0%
move_slider_left: 0 / 3 | SR: 0.0%
push_blue_block_left: 0 / 2 | SR: 0.0%
lift_red_block_slider: 0 / 1 | SR: 0.0%
push_red_block_left: 0 / 1 | SR: 0.0%
rotate_red_block_left: 0 / 1 | SR: 0.0%
lift_red_block_table: 0 / 1 | SR: 0.0%

I noticed that you have a high success rate in evaluating pre-trained models, so I wanted to ask if there are any other operations you perform during the evaluation besides downloading the model and modifying the paths?

image

The actions of the robotic arm seem strange in some tasks, and I suspect that it may be an issue with GCBC.The robotic arm has even moved outside the field of view.
I would greatly appreciate it if you could tell me how to properly evaluate pre-trained models. :)

@houyaokun
Copy link

Hi @Ping-C and @pranavatreya ,

Can you please help me? I did the same. I am trying to train the model on CALVIN dataset ABC. when I run:

python experiments/susie/calvin/calvin_gcbc.py --config experiments/susie/calvin/configs/gcbc_train_config.py:gc_ddpm_bc --calvin_dataset_config experiments/susie/calvin/configs/gcbc_data_config.py:all

I got that error: Traceback (most recent call last): File "experiments/susie/calvin/calvin_gcbc.py", line 186, in app.run(main) File "/home/gaurav/miniconda3/envs/susie-calvin/lib/python3.8/site-packages/absl/app.py", line 308, in run run_main(main, args) File "/home/gaurav/miniconda3/envs/susie-calvin/lib/python3.8/site-packages/absl/app.py", line 254, in run_main sys.exit(main(argv)) File "experiments/susie/calvin/calvin_gcbc.py", line 77, in main task_paths = [ File "experiments/susie/calvin/calvin_gcbc.py", line 78, in glob_to_path_list( File "/media/local/gaurav/Music/calvin-sim/bridge_data_v2/jaxrl_m/data/calvin_dataset.py", line 27, in glob_to_path_list assert len(filtered_paths) > 0, f"{glob_str} came up empty" AssertionError: training/A/?/? came up empty

I ensured that the dataset is located at the path expected by the script. The glob pattern training/A/?/? suggests it's looking for directories or files within training/A/ where each subdirectory in A has a single-character name. so what should I do?

I will appreciate your help. Thanks in advance!
Have you converted the dataset into tfrecord format?

@lightorange0v0
Copy link

@houyaokun Hi, have you solved the problem? I am struggling with the same issue. 😭

@houykun
Copy link

houykun commented Jun 5, 2024

@houyaokun Hi, have you solved the problem? I am struggling with the same issue. 😭

yeah,os.environ.pop("DISPLAY") may work.

@lightorange0v0
Copy link

@houyaokun Thank you for your quick reply. 😄 Actually, I am having problem with reproduction. My results are so bad with the provided pre-trained models. How did you handle this problem? Thank you so much for your reply. I am struggling with this about a week.

@lightorange0v0
Copy link

@Ping-C Hi, I am struggling with the performance. I did all things same as they told me to do. But the performance is bad on provided pre-trained models. How did you reproduce? Is there anything else I need to do besides the default settings?

@houykun
Copy link

houykun commented Jun 5, 2024

@houyaokun Thank you for your quick reply. 😄 Actually, I am having problem with reproduction. My results are so bad with the provided pre-trained models. How did you handle this problem? Thank you so much for your reply. I am struggling with this about a week.
For me,I simply added os.environ.pop("DISPLAY") .By doing this, you will be able to use EGL normally. Otherwise, you will have a domain gap between the images in the dataset (rendered with EGL)

@lightorange0v0
Copy link

So the problem is domain gap with dataset. I will take a closer look at the rendering part.
Thank you so much for your reply. Hope you have a wonderful day. 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants