Skip to content

PardisTaghavi/SwinMTL

Repository files navigation

SwinMTL: Multi-Task Learning with Swin Transformer

Welcome to the SwinMTL project, a multi-task learning framework designed for simultaneous depth estimation and semantic segmentation using the Swin Transformer architecture.

Project Page Paper Papers with Code

Qualitative Results

Installation

To get started, follow these steps:

  1. Only for ROS installation (otherwise skip this part)

    cd catkin_ws/src
    catkin_create_pkg SwinMTL_ROS std_msgs rospy
    cd ..
    catkin_make
    source devel/setup.bash
    cd src/SwinMTL_ROS/src
    git clone https://github.com/PardisTaghavi/SwinMTL.git
    chmod +x inference_ros.py
    mv ./launch/ ./..  
  2. Clone the repository:

    git clone https://github.com/PardisTaghavi/SwinMTL.git
    cd SwinMTL
  3. Create a conda environment and activate it:

    conda env create --file environment.yml
    conda activate prc

Testing

To run the testing for the project, follow the below steps:

  1. Download Pretrained Models:

    • Click here to access the pretrained models.
    • Download the pretrained models you need.
    • Create a new folder named model_zoo in the project directory.
  2. Move Pretrained Models:

    • Create a new folder named model_zoo
    • After downloading, move the pretrained models into the model_zoo folder you created in the project directory.
    • Refer to testLive.ipynb for testing.

ROS Launch

roslaunch SwinMTL_ROS swinmtl_launch.launch
Zero-shot Results

3D Mapping

3D Mapping Results

Zero-shot Results on the Kitti Dataset

Zero-shot Results

Contributions

  • Introduction of a multi-task learning approach for joint depth estimation and semantic segmentation.
  • Achievement of state-of-the-art performance on Cityscapes and NYUv2 datasets.
  • Utilization of an efficient shared encoder-decoder architecture coupled with novel techniques to enhance accuracy.

We welcome feedback and contributions to the SwinMTL project. Feel free to contact taghavi.pardis@gmail.com.


Acknowledgments

Special thanks to the authors of the following projects for laying the foundation of this work. Our code relies on: