We can either predict touch from vision or vision from touch, leading to two subtasks: 1) Vision2Touch: Given an image of a local region on the object’s surface, predict the corresponding tactile RGB image that aligns with the visual image patch in both position and orientation; and 2) Touch2Vision: Given a tactile reading on the object’s surface, predict the corresponding local image patch where the contact happens.
The dataset used to train the baseline models can be downloaded from here
Start the training process, and test the best model on test-set after training:
# Train VisGel as an example
python main.py --lr 1e-4 --batch_size 64 \
--model VisGel \
--src_modality touch --des_modality vision \
--patience 500 \
--exp touch2vision
Evaluate the best model in touch2vision:
# Evaluate VisGel as an example
python main.py --lr 1e-4 --batch_size 64 \
--model VisGel \
--src_modality touch --des_modality vision \
--patience 500 \
--exp touch2vision \
--eval
To train and test your new model on ObjectFolder Visuo-Tactile Cross Generation Benchmark, you only need to modify several files in models, you may follow these simple steps.
-
Create new model directory
mkdir models/my_model
-
Design new model
cd models/my_model touch my_model.py
-
Build the new model and its optimizer
Add the following code into models/build.py:
elif args.model == 'my_model': from my_model import my_model model = my_model.my_model(args) optimizer = optim.AdamW(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
-
Add the new model into the pipeline
Once the new model is built, it can be trained and evaluated similarly:
python main.py --lr 1e-4 --batch_size 64 \ --model my_model \ --src_modality touch --des_modality vision \ --exp touch2vision
We choose 50 objects with rich tactile features and reasonable size, and sample 1, 000 visuo-tactile image pairs on each of them. This results in 50 × 1, 000 = 50, 000 image pairs. We conduct both cross-contact and cross-object experiments by respectively splitting the 1, 000 visuo-tactile pairs of each object into train/validation/test = 800/100/100 and splitting the 50 objects into train/validation/test = 40/5/5. The two settings require the model to generalize to new areas or new objects during testing.
Vision | Vision->Touch | Touch->Vision | ||
PSNR | SSIM | PSNR | SSIM | |
pix2pix | 22.85 | 0.71 | 9.16 | 0.28 |
VisGel | 29.60 | 0.87 | 14.56 | 0.61 |
Vision | Vision->Touch | Touch->Vision | ||
PSNR | SSIM | PSNR | SSIM | |
pix2pix | 18.91 | 0.63 | 7.03 | 0.12 |
VisGel | 25.91 | 0.82 | 12.61 | 0.38 |