Authors: Savya Khosla, Sethuraman TV, Aryan Chadha, Alex Schwing, Derek Hoiem
T-REN (Text-aligned Region Encoder Network) is an image encoder that produces region-level tokens aligned with text, built on top of DINOv3 ViT-L/16 backbone. Compared to its patch-based backbone, T-REN delivers +5.9 mIoU on ADE20K open-vocabulary segmentation, +18.4% recall on COCO object-level text-image retrieval, +15.6% recall on Ego4D video object localization, and +17.6% mIoU on VSPW video scene parsing, all while reducing token counts by more than 24× for images and 187× for videos.
This repository contains training code, a small inference demo (tren.py), and evaluation scripts for several benchmarks: semantic segmentation, video query search, video scene parsing, and Visual Haystacks.
git clone https://github.com/savya08/T-REN.git
cd T-REN
conda env create -f setup.yaml
conda activate tren
Pretrained T-REN RegionEncoder weights are hosted on Hugging Face — savyak2/T-REN. To download, run:
./download.shThis creates logs/tren-ckpts/ and downloads tren_region_encoder.pth. To use a different directory:
./download.sh /path/to/my-ckptsIf you use a custom path, set logging.save_dir and logging.exp_name in your configs so that save_dir/exp_name/ matches that folder.
model.py expects the following files next to tren_region_encoder.pth (same directory), with exact names:
dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pthdinov3_vitl16_dinotxt_vision_head_and_text_encoder-a442d8f5.pth
They are not on the savyak2/T-REN model card. Obtain them from the DINOv3 release / torch.hub workflow and copy them into logs/tren-ckpts/ (or your chosen checkpoint directory).
Set dataset paths in the configs before training or evaluation. These paths are currently set to /path/to/... placeholders.
| Config File | Purpose |
|---|---|
configs/train_dinov3_vitl16.yaml |
Multi-dataset training paths and hyperparameters |
semantic_segmentation/config.yaml |
ADE20K / Cityscapes roots |
video_query_search/config.yaml |
VQ2D validation records |
video_scene_parsing/config.yaml |
VSPW root |
visual_haystacks/config.yaml |
COCO 2017 and Visual Haystacks roots |
The checkpoint directory in task configs is set to ../logs/tren-ckpts/. Update it to point to the custom checkpoint path, if needed.
# optional: Weights & Biases for logging (off by default)
export USE_WANDB=1
python train.pyTraining reads configs/train_dinov3_vitl16.yaml, uses aux_files/cat_to_idx.json for category indexing, and writes checkpoints under the configured logging.save_dir / logging.exp_name.
python tren.pyThis downloads a sample image from a public URL, runs TREN, and writes visualizations under region_vis/.
Each task lives in its own directory with a config.yaml and eval.py. After pointing dataset paths, run the corresponding eval.py from that directory.
This project is released under the MIT License. See LICENSE for details.
@misc{khosla2026tren,
title={T-REN: Learning Text-Aligned Region Tokens Improves Dense Vision-Language Alignment and Scalability},
author={Savya Khosla and Sethuraman T V and Aryan Chadha and Alexander Schwing and Derek Hoiem},
year={2026},
archivePrefix={arXiv},
primaryClass={cs.CV},
}