Skip to content

A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images

License

Notifications You must be signed in to change notification settings

PardisTaghavi/SwinMTL

Repository files navigation

SwinMTL: Multi-Task Learning with Swin Transformer

A multi-task learning framework designed for simultaneous depth estimation and semantic segmentation using the Swin Transformer architecture.

Project Page Paper Papers with Code

News

  • [30th June] Paper Accepted at the IROS 2024 Conference 🔥🔥🔥
Qualitative Results

Installation

To get started, follow these steps:

  1. Only for ROS installation (otherwise skip this part)

    cd catkin_ws/src
    catkin_create_pkg SwinMTL_ROS std_msgs rospy
    cd ..
    catkin_make
    source devel/setup.bash
    cd src/SwinMTL_ROS/src
    git clone https://github.com/PardisTaghavi/SwinMTL.git
    chmod +x inference_ros.py
    mv ./launch/ ./..  
  2. Clone the repository:

    git clone https://github.com/PardisTaghavi/SwinMTL.git
    cd SwinMTL
  3. Create a conda environment and activate it:

    conda env create --file environment.yml
    conda activate prc

Testing

To run the testing for the project, follow the below steps:

  1. Download Pretrained Models:

    • here access the pretrained models.
    • Download the pretrained models you need.
  2. Move Pretrained Models:

    • Create a new folder named model_zoo and move the pretrained models into the model_zoo folder you created in the project directory.
    • Refer to testLive.ipynb for testing.

ROS Launch

roslaunch SwinMTL_ROS swinmtl_launch.launch
Zero-shot Results

3D Mapping

3D Mapping Results

Zero-shot Results on the Kitti Dataset

Zero-shot Results

Contributions

  • Introduction of a multi-task learning approach for joint depth estimation and semantic segmentation.
  • Achievement of state-of-the-art performance on Cityscapes and NYUv2 datasets.
  • Utilization of an efficient shared encoder-decoder architecture coupled with novel techniques to enhance accuracy.

We welcome feedback and contributions to the SwinMTL project. Feel free to contact taghavi.pardis@gmail.com.


Acknowledgments

Special thanks to the authors of the following projects for laying the foundation of this work. Our code relies on: