Skip to content

heekhero/DTL

Repository files navigation

DTL for Memory-Efficient Tuning

This repo is the official implementation of our AAAI2024 paper "DTL: Disentangled Transfer Learning for Visual Recognition" (arXiv).

LAMDA, Nanjing University

TL;DR

Different from current efficient tuning methods Adapter, LoRA and VPT that closely entangle the small trainable modules with the huge frozen backbone. We disentangle the weights update from the backbone network using a lightweight Compact Side Network (CSN). DTL not only greatly reduces GPU memory footage, but also achieves high accuracy in knowledge transfer.


Environment

  • python 3.8
  • pytorch >= 1.7
  • torchvision >= 0.8
  • timm 0.5.4

Data Preparation

1. Visual Task Adaptation Benchmark (VTAB)

Please refer to SSF or VPT for preparing the 19 datasets included in VTAB-1K. For convenience, you can download the extracted file (VTAB.zip) to easily access the datasets.

2. Few-Shot Classification

We follow NOAH to conduct the few-shot evaluation. There are two parts you shold pay attention to:

  • Images

    For improved organization and indexing, images from five datasets (fgvc-aircraft, food101, oxford-flowers102, oxford-pets, standford-cars) should be consolidated into a folder named FGFS.

  • Train/Val/Test splits

    The content, copied from the data/few-shot directory in NOAH, should be placed in the FGFS folder and renamed as few-shot_split for path correction.

The file structure should look like:

FGFS
├── few-shot_split
│   ├── fgvc-aircraft
│   │   └── annotations
│   │       ├── train_meta.list.num_shot_1.seed_0
│   │       └── ...
│   │    ...
│   └── food101
│       └── annotations
│           ├── train_meta.list.num_shot_1.seed_0
│           └── ...
├── fgvc-aircraft
│   ├── img1.jpeg
│   ├── img2.jpeg
│   └── ...
│   ...
└── food101
    ├── img1.jpeg
    ├── img2.jpeg
    └── ...

For convenience, the extracted datasets are uploaded (FGFS.zip).

3. Domain Generalization

For convenience, the extracted datasets are uploaded (DG.zip).

Note: The training set for ImageNet (train directiory in DG/imagenet/images) has not been uploaded due to the large file size, so you will need to prepare it yourself (probably by a symbolic link).

Usage

Pre-trained Models

  • The pre-trained weights of ViT-B/16 is stored at this link.
  • For Swin-B, the pre-trained weights will be automatically download to cache directory when you run training scripts.

Fine-tuning ViT-B/16 on VTAB

bash train_scripts/vit/vtab/$DATASET_NAME/train_dtl(+).sh
  • Replace DATASET_NAME with the name you want for your dataset.
  • Update the data_dir and load_path variables in the script to your specified values.

Fine-tuning ViT-B/16 on Few-shot Learning

bash train_scripts/vit/few_shot/$DATASET_NAME/train_dtl(+)_shot_$SHOT.sh

Fine-tuning ViT-B/16 on Domain Generalization

bash train_scripts/vit/domain_generalization/$DATASET_NAME/train_dtl(+).sh

Fine-tuning Swin-B on VTAB

bash train_scripts/swin/vtab/$DATASET_NAME/train_dtl(+).sh

Citation

If this project is helpful for you, you can cite our paper:

@inproceedings{fu2024dtl,
      title={DTL: Disentangled Transfer Learning for Visual Recognition},
      author={Fu, Minghao and Zhu, Ke and Wu, Jianxin},
      booktitle={Proceedings of AAAI Conference on Artificial Intelligence (AAAI)},
      year={2024},
}

Acknowledgement

The code is built upon SSF, NOAH, VPT and timm.

About

This repository is the official implementation of "DTL: Disentangled Transfer Learning for Visual Recognition", which is accepted by AAAI 2024.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published