Skip to content

sonicaloha/sonicaloha_ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SonicAloha_ML

Main Project Website: https://sonicaloha.github.io/

This repository is used for robot teleoperation, dataset collection, and imitation learning algorithms. It can be placed anywhere on your computer.

πŸ“‚ Repo Structure

  • 3D_Print 3D-printed models for the stapler press and for mounting the contact audio microphone used in comparison experiments
  • aloha_scripts Folders for controlling the robot, camera and microphone. You can use it to test teleoperation, datasets, put the robot to sleep, and visualize the data. You can define your task in aloha_scripts/constants.py
  • detr Model definitions
  • imitate_episodes_multi_gpu.py Train and Evaluate policy with multiple GPUs
  • policy.py An adaptor for policy
  • utils.py Utils such as data loading and helper functions

πŸ—οΈ Quick Start Guide

πŸ–₯️ Software Selection – OS Compatibility

This project has been tested and confirmed to work with the following configuration:

  • βœ… Ubuntu 20.04 + ROS 1 Noetic (Fully tested and verified)

Other configurations may work as well, but they have not been tested yet. If you successfully run this project on a different setup, feel free to contribute by sharing your experience! πŸš€

πŸ”§ Hardware Setting

  1. Install the 4 robots and 4 cameras according to the original ALOHA.

  2. Plug in the USB microphone and mount it next to the top-view camera. We use the FIFINE K053 USB Lavalier Lapel Microphone, available at this link.

  3. The contact microphone for the comparison experiment is available at this link.

  4. You can check your microphone by running:

    python aloha_scripts/record_audio.py # Changing TARGET_DEVICE_NAME to test different mic.
    

    This will generate an audio file named "selected_audio.wav".

  5. ❗ Camera Focus Configuration (Not described in ALOHA):

    The cameras in the ALOHA series are set to fixed focus in ROS launch. The focus value is configured through aloha.launch in aloha/launch:

    <param name="focus" value="40"/>

    It is necessary to determine the appropriate focus value for each camera; otherwise, the camera image may appear blurry during manipulation.

    The recommended procedure to find a suitable focus value is:

    • To check the available video devices, run the following command:
      ls /dev/CAM_*  # or ls /dev/video*

    If you have set your camera serial numbers according to ALOHA, you can see your camera list as follows:

      /dev/CAM_HIGH  /dev/CAM_LEFT_WRIST  /dev/CAM_LOW  /dev/CAM_RIGHT_WRIST
    • Use the following command to open the camera and adjust the focus:
      guvcview -d /dev/CAM_HIGH
    • Test and note the appropriate focus value for each camera ;
  6. ❗ Disable Auto Focus:

    • You must disable the continuous autofocus by setting the focus_automatic_continuous control parameter as follows:
      v4l2-ctl -d /dev/CAM_HIGH --set-ctrl focus_automatic_continuous=0
    • For other cameras please modifiy /dev/CAM_HIGH to /dev/CAM_LOW, /dev/CAM_LEFT_WRIST, etc.
    • The way to check whether we have set the camera correctly is to run roslaunch aloha aloha.launch and ensure that no warning like this appears:
    Error setting controls: Permission denied
    VIDIOC_S_EXT_CTRLS: failed: Permission denied
    • You can add the following to your .bashrc, which allows you to conveniently run cameras-autofocus from the command line.
    # Define the cameras-autofocus function to disable autofocus
    cameras-autofocus() {
        # List of camera devices to configure
        cameras=("/dev/CAM_HIGH" "/dev/CAM_LEFT_WRIST" "/dev/CAM_LOW" "/dev/CAM_RIGHT_WRIST")
    
        # Function to disable autofocus for a single camera
        disable_autofocus() {
            local camera=$1
            echo "Disabling autofocus for $camera..."
            
            # Disable automatic continuous focus
            v4l2-ctl -d "$camera" --set-ctrl focus_automatic_continuous=0
    
            echo "Autofocus disabled for $camera."
        }
    
        # Main loop to process all cameras
        echo "Configuring autofocus for cameras..."
        for cam in "${cameras[@]}"; do
            if [ -e "$cam" ]; then
                disable_autofocus "$cam"
            else
                echo "Camera $cam not found. Skipping."
            fi
        done
        echo "Autofocus configuration complete."
    }

    Note: You will need to reapply the focus_automatic_continuous=0 setting whenever you reboot the computer or unplug and replug the cameras.

πŸ› οΈ Installation

    git clone https://github.com/sonicaloha/sonicaloha_ml.git
    cd sonicaloha_ml
    conda create -n sonicaloha python=3.8.10
    conda activate sonicaloha
    pip install -r requirements.txt
    cd detr && pip install -e .

πŸ“‘ Dataset

Dataset Description:

We utilize human teleoperation to collect demonstration data. During each task's data collection, information is recorded at each timestep within an episode. Specifically, each timestep includes the current robot joint values, images from the active cameras (top-view, right arm, left arm, and front-view), and the corresponding audio segment captured at that timestep, which are compressed and saved as HDF5 files. The structure of the dataset is illustrated in the following structure tree:

<dataset_root>/
└── <task_name>/                # e.g., alarm_shutting, stapler_checking, etc.,
    β”œβ”€β”€ episode_0/
    β”œβ”€β”€ ...
    β”œβ”€β”€ episode_18/
    β”‚   β”œβ”€β”€ timestep_0/
    β”‚   β”œβ”€β”€ ...    
    β”‚   β”œβ”€β”€ timestep_t/
    β”‚   β”‚   β”œβ”€β”€ robot_joint_value         # Joint positions
    β”‚   β”‚   β”œβ”€β”€ camera/                   # Camera images (number of cameras is configurable)
    β”‚   β”‚   β”‚   β”œβ”€β”€ rgb_cam_top
    β”‚   β”‚   β”‚   β”œβ”€β”€ rgb_cam_right_arm
    β”‚   β”‚   β”‚   β”œβ”€β”€ rgb_cam_left_arm
    β”‚   β”‚   β”‚   β”œβ”€β”€ rgb_cam_front
    β”‚   β”‚   β”œβ”€β”€ audio_current_recorded    # Audio segment for this timestep
    β”‚   β”‚
    β”‚   β”œβ”€β”€ timestep_t+1/
    β”‚   β”œβ”€β”€ ...
    β”œβ”€β”€ episode_19/
    β”œβ”€β”€ ...

This timestep-aligned data collection method can help address synchronization issues in multimodal information, especially in long-horizon tasks. However, because different timesteps may have inconsistent wall-clock durations, the length of audio_current_recorded may vary. In our code, for each timestep, we allocate a fixed-size array large enough to store the corresponding audio segment. The audio segment data for each timestep is stored sequentially from the beginning of the array, with the last element used to record the actual length of the audio data for that timestep.

For each timestep, the audio segment data is stored in a fixed-size array as follows:

+----------------+----------------+-----+----------------+-----+------------------+
| sample[0]      | sample[1]      | ... | sample[N-1]    | ... | length_of_audio  |
+----------------+----------------+-----+----------------+-----+------------------+
      ↑                ↑                      ↑                      ↑
  Audio sample 0   Audio sample 1      Last audio sample      Number of valid samples

This design facilitates efficient retrieval of the audio data for each timestep and allows for flexible composition of fixed-length audio segments during subsequent training. Although HDF5's variable-length storage was considered, it was found that this approach could easily lead to out-of-memory errors during training.

Dataset Collection Process:

  1. πŸ€– SonicAloha robot system launch: We assume you have installed your robot system according to ALOHA. This step launches the four robot arms, four cameras.

    # ROS terminal
    conda deactivate
    source /opt/ros/noetic/setup.sh && source ~/interbotix_ws/devel/setup.sh
    roslaunch aloha aloha.launch
    
  2. πŸ“ Define the type of robotic manipulation dataset:
    Including the task name, dataset_dir, length of each episode, and the cameras used. You can set this information in TASK_CONFIGS of aloha_scripts/constants.py. An example is as follows:

    'alarm_shutting': {
        'dataset_dir': DATA_DIR + '/saved_folder_name',
        'episode_len': 900, # This value may be modified according to the length of your task.
        'camera_names': ['cam_high', 
                           'cam_left_wrist', 
                           'cam_right_wrist', 
                           'cam_low']}
  3. πŸš€ Begin teleoperation to perform task manipulation.:

    cd sonicaloha_ml
    source ~/interbotix_ws/devel/setup.sh
    python aloha_scripts/record_episodes_compress_audio.py \ # You could change the audio sensor by modifing TARGET_DEVICE_NAME
    --task_name Task_name \
    --start_idx 0 --end_idx 50
    

    The author also provides the code for recording two audio streams simultaneously, available in record_episodes_compress_audio.py. After each episode collection, you can enter c to save this episode and continue, or enter r to recollect this episode. If you want to quit this process, you can enter q. We use foot pedals to assist with this confirmation, which can facilitate this work.

  4. πŸ“Š Data Visualization and Listening :

    python aloha_scripts/visualize_episodes_audio.py --dataset_dir <data save dir> --episode_idx 0
    

    If you want to visualize multiple data points, you can input the --episode_idx parameter like this: --episode_idx 3, 5, 8 or --episode_idx 5-19.

  5. πŸ”„ Robot shut down or sleep:

    python aloha_scripts/sleep_plus.py --shut_down       # All robots will move to zero position and turn off the torque.
    python aloha_scripts/sleep_plus.py --shut_down_puppet   # Only the puppet robots will move to zero position and turn off the torque.
    python aloha_scripts/sleep_plus.py --sleep      # All robots will move to zero position but don't turn off the torque.
    

🧠 Policy Training

  1. βœ… Set up your training configuration in aloha_scripts/constants.py:

    'alarm_ast_dora_cross_100audio': {
      'dataset_dir': DATA_DIR + '/alarm_random_pos',
      'episode_len': 900,
      'camera_names': ['cam_high',
                       # 'cam_low',
                       'cam_left_wrist',
                       'cam_right_wrist',
                       ]
     },
    
     'boxlockdown_ast_dora_cross_200audio': {
         'dataset_dir': DATA_DIR + '/boxlockdown',
         'episode_len': 750,
         'camera_names': ['cam_high',
                          # 'cam_low',
                          'cam_left_wrist',
                          'cam_right_wrist',
                          ]
     },
    
    
     'stapler_checking_ast_dora_cross_150audio': {
         'dataset_dir': DATA_DIR + '/stapler_checking',
         'episode_len': 600,
         'camera_names': ['cam_high',
                          # 'cam_low',
                          'cam_left_wrist',
                          'cam_right_wrist',
                          ]
     },
  2. Set the audio lenght in aloha_scripts/constants.py: Audio_Lenght_For_Learning = 100 # 100 200 150

  3. πŸš€ Train your policy:

    export CUDA_VISIBLE_DEVICES= 0,1
    python imitate_episodes_multi_gpu.py  \
    --task_name alarm_ast_dora_cross_100audio \
    --ckpt_dir  <data save dir>  \
    --policy_class SonicAloha \
    --kl_weight 10 --chunk_size 100 \
    --hidden_dim 512 --batch_size 16 \
    --dim_feedforward 3200 --lr 1e-5 --seed 0 \
    --num_steps 100000 --eval_every 2000 \
    --validate_every 2000 --save_every 2000

πŸ“‘ Policy Deployment

  1. You may set the Length_multiple value to control the maximum manipulation timesteps, and the k value for temporal ensemble in imitate_episodes_multi_gpu.py.

  2. Run your trained policy:

    export CUDA_VISIBLE_DEVICES= 0
    python imitate_episodes_multi_gpu.py  \
    --task_name alarm_ast_dora_cross_100audio \
    --ckpt_dir  <data save dir>  \
    --policy_class SonicAloha \
    --kl_weight 10 --chunk_size 100 \
    --hidden_dim 512 --batch_size 16 \
    --dim_feedforward 3200 --lr 1e-5 --seed 0 \
    --num_steps 100000 --eval_every 2000 \
    --validate_every 2000 --save_every 2000 \
    --temporal_ensemble --eval --num_rollouts 20

πŸ™ Acknowledgements

This project codebase is built based on ALOHA and ACT.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •