Main Project Website: https://sonicaloha.github.io/
This repository is used for robot teleoperation, dataset collection, and imitation learning algorithms. It can be placed anywhere on your computer.
3D_Print3D-printed models for the stapler press and for mounting the contact audio microphone used in comparison experimentsaloha_scriptsFolders for controlling the robot, camera and microphone. You can use it to test teleoperation, datasets, put the robot to sleep, and visualize the data. You can define your task inaloha_scripts/constants.pydetrModel definitionsimitate_episodes_multi_gpu.pyTrain and Evaluate policy with multiple GPUspolicy.pyAn adaptor for policyutils.pyUtils such as data loading and helper functions
This project has been tested and confirmed to work with the following configuration:
- β Ubuntu 20.04 + ROS 1 Noetic (Fully tested and verified)
Other configurations may work as well, but they have not been tested yet. If you successfully run this project on a different setup, feel free to contribute by sharing your experience! π
-
Install the 4 robots and 4 cameras according to the original ALOHA.
-
Plug in the USB microphone and mount it next to the top-view camera. We use the FIFINE K053 USB Lavalier Lapel Microphone, available at this link.
-
The contact microphone for the comparison experiment is available at this link.
-
You can check your microphone by running:
python aloha_scripts/record_audio.py # Changing TARGET_DEVICE_NAME to test different mic.This will generate an audio file named "selected_audio.wav".
-
β Camera Focus Configuration (Not described in ALOHA):
The cameras in the ALOHA series are set to fixed focus in ROS launch. The focus value is configured through
aloha.launchinaloha/launch:<param name="focus" value="40"/>
It is necessary to determine the appropriate focus value for each camera; otherwise, the camera image may appear blurry during manipulation.
The recommended procedure to find a suitable focus value is:
- To check the available video devices, run the following command:
ls /dev/CAM_* # or ls /dev/video*
If you have set your camera serial numbers according to ALOHA, you can see your camera list as follows:
/dev/CAM_HIGH /dev/CAM_LEFT_WRIST /dev/CAM_LOW /dev/CAM_RIGHT_WRIST
- Use the following command to open the camera and adjust the focus:
guvcview -d /dev/CAM_HIGH
- Test and note the appropriate focus value for each camera ;
-
βDisable Auto Focus:- You must disable the continuous autofocus by setting the focus_automatic_continuous control parameter as follows:
v4l2-ctl -d /dev/CAM_HIGH --set-ctrl focus_automatic_continuous=0
- For other cameras please modifiy
/dev/CAM_HIGHto/dev/CAM_LOW,/dev/CAM_LEFT_WRIST, etc. - The way to check whether we have set the camera correctly is to run
roslaunch aloha aloha.launchand ensure that no warning like this appears:
Error setting controls: Permission denied VIDIOC_S_EXT_CTRLS: failed: Permission denied
- You can add the following to your
.bashrc, which allows you to conveniently runcameras-autofocusfrom the command line.
# Define the cameras-autofocus function to disable autofocus cameras-autofocus() { # List of camera devices to configure cameras=("/dev/CAM_HIGH" "/dev/CAM_LEFT_WRIST" "/dev/CAM_LOW" "/dev/CAM_RIGHT_WRIST") # Function to disable autofocus for a single camera disable_autofocus() { local camera=$1 echo "Disabling autofocus for $camera..." # Disable automatic continuous focus v4l2-ctl -d "$camera" --set-ctrl focus_automatic_continuous=0 echo "Autofocus disabled for $camera." } # Main loop to process all cameras echo "Configuring autofocus for cameras..." for cam in "${cameras[@]}"; do if [ -e "$cam" ]; then disable_autofocus "$cam" else echo "Camera $cam not found. Skipping." fi done echo "Autofocus configuration complete." }
Note: You will need to reapply the focus_automatic_continuous=0 setting whenever you reboot the computer or unplug and replug the cameras.
git clone https://github.com/sonicaloha/sonicaloha_ml.git
cd sonicaloha_ml
conda create -n sonicaloha python=3.8.10
conda activate sonicaloha
pip install -r requirements.txt
cd detr && pip install -e .We utilize human teleoperation to collect demonstration data. During each task's data collection, information is recorded at each timestep within an episode. Specifically, each timestep includes the current robot joint values, images from the active cameras (top-view, right arm, left arm, and front-view), and the corresponding audio segment captured at that timestep, which are compressed and saved as HDF5 files. The structure of the dataset is illustrated in the following structure tree:
<dataset_root>/
βββ <task_name>/ # e.g., alarm_shutting, stapler_checking, etc.,
βββ episode_0/
βββ ...
βββ episode_18/
β βββ timestep_0/
β βββ ...
β βββ timestep_t/
β β βββ robot_joint_value # Joint positions
β β βββ camera/ # Camera images (number of cameras is configurable)
β β β βββ rgb_cam_top
β β β βββ rgb_cam_right_arm
β β β βββ rgb_cam_left_arm
β β β βββ rgb_cam_front
β β βββ audio_current_recorded # Audio segment for this timestep
β β
β βββ timestep_t+1/
β βββ ...
βββ episode_19/
βββ ...
This timestep-aligned data collection method can help address synchronization issues in multimodal information, especially in long-horizon tasks. However, because different timesteps may have inconsistent wall-clock durations, the length of audio_current_recorded may vary. In our code, for each timestep, we allocate a fixed-size array large enough to store the corresponding audio segment. The audio segment data for each timestep is stored sequentially from the beginning of the array, with the last element used to record the actual length of the audio data for that timestep.
For each timestep, the audio segment data is stored in a fixed-size array as follows:
+----------------+----------------+-----+----------------+-----+------------------+
| sample[0] | sample[1] | ... | sample[N-1] | ... | length_of_audio |
+----------------+----------------+-----+----------------+-----+------------------+
β β β β
Audio sample 0 Audio sample 1 Last audio sample Number of valid samples
This design facilitates efficient retrieval of the audio data for each timestep and allows for flexible composition of fixed-length audio segments during subsequent training. Although HDF5's variable-length storage was considered, it was found that this approach could easily lead to out-of-memory errors during training.
-
π€ SonicAloha robot system launch: We assume you have installed your robot system according to ALOHA. This step launches the four robot arms, four cameras.
# ROS terminal conda deactivate source /opt/ros/noetic/setup.sh && source ~/interbotix_ws/devel/setup.sh roslaunch aloha aloha.launch -
π Define the type of robotic manipulation dataset:
Including the task name,dataset_dir, length of each episode, and the cameras used. You can set this information inTASK_CONFIGSofaloha_scripts/constants.py. An example is as follows:'alarm_shutting': { 'dataset_dir': DATA_DIR + '/saved_folder_name', 'episode_len': 900, # This value may be modified according to the length of your task. 'camera_names': ['cam_high', 'cam_left_wrist', 'cam_right_wrist', 'cam_low']}
-
π Begin teleoperation to perform task manipulation.:
cd sonicaloha_ml source ~/interbotix_ws/devel/setup.sh python aloha_scripts/record_episodes_compress_audio.py \ # You could change the audio sensor by modifing TARGET_DEVICE_NAME --task_name Task_name \ --start_idx 0 --end_idx 50The author also provides the code for recording two audio streams simultaneously, available in
record_episodes_compress_audio.py. After each episode collection, you can entercto save this episode and continue, or enterrto recollect this episode. If you want to quit this process, you can enterq. We use foot pedals to assist with this confirmation, which can facilitate this work. -
π Data Visualization and Listening :
python aloha_scripts/visualize_episodes_audio.py --dataset_dir <data save dir> --episode_idx 0If you want to visualize multiple data points, you can input the
--episode_idxparameter like this:--episode_idx 3, 5, 8or--episode_idx 5-19. -
π Robot shut down or sleep:
python aloha_scripts/sleep_plus.py --shut_down # All robots will move to zero position and turn off the torque. python aloha_scripts/sleep_plus.py --shut_down_puppet # Only the puppet robots will move to zero position and turn off the torque. python aloha_scripts/sleep_plus.py --sleep # All robots will move to zero position but don't turn off the torque.
-
β Set up your training configuration in
aloha_scripts/constants.py:'alarm_ast_dora_cross_100audio': { 'dataset_dir': DATA_DIR + '/alarm_random_pos', 'episode_len': 900, 'camera_names': ['cam_high', # 'cam_low', 'cam_left_wrist', 'cam_right_wrist', ] }, 'boxlockdown_ast_dora_cross_200audio': { 'dataset_dir': DATA_DIR + '/boxlockdown', 'episode_len': 750, 'camera_names': ['cam_high', # 'cam_low', 'cam_left_wrist', 'cam_right_wrist', ] }, 'stapler_checking_ast_dora_cross_150audio': { 'dataset_dir': DATA_DIR + '/stapler_checking', 'episode_len': 600, 'camera_names': ['cam_high', # 'cam_low', 'cam_left_wrist', 'cam_right_wrist', ] },
-
Set the audio lenght in
aloha_scripts/constants.py:Audio_Lenght_For_Learning = 100 # 100 200 150 -
π Train your policy:
export CUDA_VISIBLE_DEVICES= 0,1 python imitate_episodes_multi_gpu.py \ --task_name alarm_ast_dora_cross_100audio \ --ckpt_dir <data save dir> \ --policy_class SonicAloha \ --kl_weight 10 --chunk_size 100 \ --hidden_dim 512 --batch_size 16 \ --dim_feedforward 3200 --lr 1e-5 --seed 0 \ --num_steps 100000 --eval_every 2000 \ --validate_every 2000 --save_every 2000
-
You may set the
Length_multiplevalue to control the maximum manipulation timesteps, and thekvalue for temporal ensemble inimitate_episodes_multi_gpu.py. -
Run your trained policy:
export CUDA_VISIBLE_DEVICES= 0 python imitate_episodes_multi_gpu.py \ --task_name alarm_ast_dora_cross_100audio \ --ckpt_dir <data save dir> \ --policy_class SonicAloha \ --kl_weight 10 --chunk_size 100 \ --hidden_dim 512 --batch_size 16 \ --dim_feedforward 3200 --lr 1e-5 --seed 0 \ --num_steps 100000 --eval_every 2000 \ --validate_every 2000 --save_every 2000 \ --temporal_ensemble --eval --num_rollouts 20