Skip to content

merlresearch/unic

Repository files navigation

UNIC: Learning Unified Multimodal Extrinsic Contact Estimation


Abstract

Contact-rich manipulation requires reliable estimation of extrinsic contacts—the interactions between a grasped object and its environment—which provide essential contextual information for planning, control, and policy learning. However, existing approaches often rely on restrictive assumptions, such as predefined contact types, fixed grasp configurations, or camera calibration, that hinder generalization to novel objects and deployment in unstructured environments.

UNIC is a unified multimodal framework for extrinsic contact estimation that operates without any prior knowledge or camera calibration. UNIC directly encodes visual observations in the camera frame and integrates them with proprioceptive and tactile modalities in a fully data-driven manner. It introduces a unified contact representation based on scene affordance maps that captures diverse contact formations and employs a multimodal fusion mechanism with random masking, enabling robust multimodal representation learning.


Demo Videos

Diverse Contact Types

UNIC handles multiple contact scenarios including single-object contact, multi-object interactions, and no-contact states.



Contact Estimation Under Robot Motion

UNIC performs reliable estimation during robot motion and in-hand object slip.


Real-time Estimation with Dynamic Camera

UNIC adapts to dynamic camera viewpoints without requiring recalibration.


Robustness Across Configurations

UNIC generalizes to diverse object configurations and contact locations.


Generalization to Unseen Objects

UNIC demonstrates strong generalization to objects not seen during training.





Method Overview

Architecture

UNIC integrates four sensing modalities:

  • 🔵 Point clouds - 3D information from RGB-D camera
  • 🟠 Tactile signals - Marker displacement maps from GelSight sensors
  • 🟢 Force-torque - 6D wrench from wrist-mounted sensor
  • 🟡 Proprioception - End-effector rotation

Key Technical Contributions

  1. Prior-free Contact Affordance Representation

    • Unified representation based on scene affordance maps
    • Captures diverse contact types: point, line, patch
    • Models complex contact chains (gripper–object–object–environment)
    • No camera calibration or object geometry required
  2. Masked Multimodal Fusion

    • Random masking during training
    • Learns robust cross-modal representations
    • Enables reliable estimation even with missing modalities at deployment
    • Flexible sensor configuration without retraining
  3. Efficient Sampling Strategy

    • Decouples global multimodal fusion from point-wise affordance generation
    • Lightweight point-wise computation
    • Supports real-time inference (>600 Hz)

Installation

Setup

  1. Install Miniforge (recommended). Miniforge is the conda-forge–recommended installer and includes mamba out of the box.

  2. Create conda environment:

mamba env create -f conda_env.yaml
conda activate unic
  1. Install third-party dependencies:
bash third_party.sh

Dataset

For all dataset merge and usage instructions, see dataset_readme.md.

The training dataset is a Zarr archive (~98 GB unzipped). For distribution it is split into two balanced zip parts, each ~48.9 GB, hosted on Zenodo:

Released under CC-BY-SA-4.0.


Training

Train the UNIC model with:

python train.py --config-dir=./unic/config --config-name=train_unic

Training Configuration

Training configurations are located in unic/config.

Monitoring Training

Training logs and metrics are automatically tracked with Weights & Biases (wandb). Checkpoints are saved periodically in the output directory specified in the config.


Citation

If you find this work useful, please consider citing:

@inproceedings{xu2026unic,
    author = {Xu, Zhengtong and Shirai, Yuki},
    title = {UNIC: Learning Unified Multimodal Extrinsic Contact Estimation},
    booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
    year = {2026}
}

License

Released under AGPL-3.0-or-later license, as found in the LICENSE.md file.

All files:

Copyright (C) 2025 Mitsubishi Electric Research Laboratories (MERL)

SPDX-License-Identifier: AGPL-3.0-or-later

Contact

For questions or issues, please contact:

About

UNIC: Learning Unified Multimodal Extrinsic Contact Estimation

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors