BRIDGE

Official implementation of BRIDGE

Overview

BRIDGE is a vision-language model that introduces efficient cross-modal interaction layers to bridge pretrained vision and text encoders. Key features:

Dual-stream architecture: Separate vision (ViT) and text (BERT-style) encoders with lightweight cross-modal interaction layers
Dual embeddings: Maintains both cross-modal and unimodal representations for fast retrieval
Gated residual connections: Learnable gates control information flow between modalities
Multi-objective training: Combines ITC (contrastive), MLM (masked language modeling), MIM (masked image modeling), ITM (image-text matching), and cycle consistency losses
Staged training curriculum: Gradually unfreezes encoders from interaction layers to deeper blocks

Architecture

BRIDGE processes vision and text inputs through separate encoders, then applies bidirectional cross-attention in a shared latent space at selected layers. This design enables:

Efficient bi-encoder retrieval using unimodal embeddings
Rich cross-modal understanding via interaction layers
Flexible adaptation to downstream tasks (VQA, captioning, retrieval, classification)

Training Stages

Stage A (Stabilize): Freeze encoders, train only interaction layers and gates
Stage B (Align): Unfreeze top encoder blocks, train with full objective mix
Stage C (Task Tuning): Fine-tune for specific downstream tasks

Repository

This codebase is adapted from the BLIP repository, with modifications for the BRIDGE architecture and training procedure.

License

See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
data_utils		data_utils
download		download
eval		eval
scripts		scripts
src		src
train		train
transform		transform
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BRIDGE

Overview

Architecture

Training Stages

Repository

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Folders and files

Latest commit

History

Repository files navigation

BRIDGE

Overview

Architecture

Training Stages

Repository

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages