Skip to content

mossbee/DCAL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dual Cross-Attention Learning (DCAL)

Python PyTorch

📋 Table of Contents

Overview

DCAL addresses the challenge of learning subtle feature embeddings for fine-grained recognition tasks. The method introduces two complementary cross-attention mechanisms:

  1. Global-Local Cross-Attention (GLCA): Uses attention rollout to identify high-response regions and computes cross-attention between selected local queries and global key-value pairs, reinforcing spatial-wise discriminative clues.

  2. Pair-Wise Cross-Attention (PWCA): A training-only regularization technique that introduces confusion by computing cross-attention between query vectors of one image and combined key-value vectors from both images in a pair, helping discover more discriminative regions and reducing overfitting.

Architecture Diagrams

GLCA (Global-Local Cross-Attention) PWCA (Pair-Wise Cross-Attention)
GLCA Architecture PWCA Architecture

The architecture consists of:

  • L=12 Self-Attention (SA) blocks
  • M=1 GLCA blocks
  • T=12 PWCA blocks (training only)

During inference, only SA and GLCA modules are used, with no additional computational cost from PWCA.

🔧 Installation

Setup

# Clone the repository
git clone https://github.com/mossbee/dcal.git
cd dcal

# Install requirements
pip install -r requirements.txt

Download pre-trained weights (e.g., ViT-B_16) and place them in the weights/ directory from Google Cloud Storage: ViT Weights

📦 Dataset Preparation

CUB-200-2011 (Fine-Grained Visual Categorization)

  1. Download the dataset from the official website
  2. Extract and organize the dataset as follows:
data/CUB_200_2011/
├── images/
│   ├── 001.Black_footed_Albatross/
│   ├── 002.Laysan_Albatross/
│   └── ...
├── images.txt
├── train_test_split.txt
├── classes.txt
└── image_class_labels.txt

VeRi-776 (Vehicle Re-Identification)

  1. Download the dataset from the official source
  2. Organize the dataset as follows:
data/VeRi_776/
├── image_train/
├── image_query/
├── image_test/
├── name_train.txt
├── name_query.txt
├── name_test.txt
├── train_label.xml
└── test_label.xml

🚀 Usage

Training

Fine-Grained Visual Categorization on CUB-200-2011

PYTHONPATH=src python3 -m tasks.fgvc_cub \
  --data-root data/CUB_200_2011 \
  --weights weights/ViT-B_16.npz \
  --output runs/fgvc_cub \
  --log-interval 25 \
  --val-interval 1 \
  --wandb --wandb-project dcal --wandb-run-name cub-run

Training Configuration:

  • Input size: 448×448 (resized from 550×550, then randomly cropped)
  • Batch size: 16
  • Optimizer: AdamW (weight decay: 0.05)
  • Learning rate: 5e-4 / 512 × batch_size with cosine decay
  • Epochs: 100
  • Local query ratio (R): 10%
  • Stochastic depth: Enabled

Vehicle Re-Identification on VeRi-776

PYTHONPATH=src python3 -m tasks.reid_veri \
  --data-root data/VeRi_776 \
  --weights weights/ViT-B_16.npz \
  --output runs/reid_veri \
  --log-interval 50 \
  --val-interval 1 \
  --wandb --wandb-project dcal --wandb-run-name veri-run

Training Configuration:

  • Input size: 256×256
  • Batch size: 64 (4 images per identity)
  • Optimizer: SGD (momentum: 0.9, weight decay: 1e-4)
  • Learning rate: 0.008 with cosine decay
  • Epochs: 120
  • Local query ratio (R): 30%
  • Loss: Cross-entropy + Triplet loss

📁 Project Structure

plan_dcal/
├── src/
│   ├── models/          # ViT backbone and DCAL implementation
│   │   ├── vit_backbone.py
│   │   └── vit_dcal.py
│   ├── attention/       # Attention mechanisms
│   │   ├── rollout.py
│   │   └── stochastic_depth.py
│   ├── datasets/        # Dataset loaders
│   │   ├── cub.py
│   │   └── veri.py
│   ├── tasks/           # Training entrypoints
│   │   ├── fgvc_cub.py
│   │   └── reid_veri.py
│   ├── losses/          # Loss functions
│   │   └── uncertainty.py
│   └── utils/           # Utilities
│       ├── data.py
│       ├── metrics.py
│       └── wandb_logging.py
├── weights/             # Pre-trained model weights
├── figures/             # Architecture diagrams
└── refs/                # Reference implementations

📝 Citation

If you use this code in your research, please cite the original paper:

@inproceedings{zhu2022dual,
  title={Dual cross-attention learning for fine-grained visual categorization and object re-identification},
  author={Zhu, Haowei and Ke, Wenjing and Li, Dong and Liu, Ji and Tian, Lu and Shan, Yi},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  pages={4692--4702},
  year={2022}
}

🙏 Acknowledgments

This implementation is built upon the following excellent open-source projects:

I am grateful to the authors for making their code publicly available, which greatly facilitated this implementation.

📄 License

This project is released for research purposes. Please refer to the original paper and dataset licenses for usage terms.


Note: This is an unofficial implementation. The original authors have not released their official code. This implementation has been verified against the paper's methodology (see DCAL_VERIFICATION_REPORT.md for details).

About

Unofficial PyTorch implementation of DCAL: Dual Cross-Attention Learning for FGVC and Re-ID (CVPR 2022)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages