Skip to content
View imr555's full-sized avatar
🤖
There is no easy day. The only easy day was yesterday - Pritom Mojumder
🤖
There is no easy day. The only easy day was yesterday - Pritom Mojumder

Block or report imr555

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

visual_language_models_ucf

46 repositories

Explorations into the recently proposed Taylor Series Linear Attention

Python 100 3 Updated Aug 18, 2024

[CVPR 2023 & TPAMI 2025] Explicit Visual Prompting for Low-Level Structure Segmentations

Python 221 16 Updated Oct 22, 2025

Adapting Meta AI's Segment Anything to Downstream Tasks with Adapters and Prompts

Python 1,486 121 Updated Dec 1, 2025

Implementation of MagViT2 Tokenizer in Pytorch

Python 661 34 Updated Jan 12, 2025
Python 360 11 Updated Jan 27, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,554 2,745 Updated Aug 12, 2024

MERLOT: Multimodal Neural Script Knowledge Models

Python 226 25 Updated Mar 15, 2022

Reading list for research topics in multimodal machine learning

6,835 897 Updated Aug 20, 2024

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Jupyter Notebook 17,456 1,582 Updated Sep 5, 2024

Awesome-DragGAN: A curated list of papers, tutorials, repositories related to DragGAN

83 2 Updated Nov 8, 2023

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Python 2,800 262 Updated Mar 25, 2025

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 890 76 Updated Nov 26, 2025

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Python 677 32 Updated Sep 19, 2022

[ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383

Python 420 35 Updated Oct 28, 2022

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.

Jupyter Notebook 3,377 214 Updated May 19, 2025

This repository reproduces the results of the paper: "Fixing the train-test resolution discrepancy" https://arxiv.org/abs/1906.06423

Python 1,044 147 Updated Aug 11, 2021

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 11,183 1,102 Updated Nov 18, 2024

A flexible and efficient codebase for training visually-conditioned language models (VLMs)

Python 947 963 Updated Jul 4, 2024

[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios

Python 58 1 Updated Sep 4, 2024

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 2,215 142 Updated Dec 15, 2025

A curated list of research papers in Referring Expression Comprehension (REC)

46 6 Updated May 13, 2021
Python 87 15 Updated Apr 15, 2022

Torch Implementation of Speaker-Listener-Reinforcer for Referring Expression Generation and Comprehension

Jupyter Notebook 34 12 Updated Mar 8, 2018

tiny vision language model

Python 9,419 738 Updated Nov 14, 2025

(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, Bengali and Urdu.

Python 84 7 Updated Aug 5, 2025

A Survey on Data Selection for Language Models

255 15 Updated Apr 29, 2025

A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.

Python 842 63 Updated Jul 1, 2024

Official PyTorch implementation of Which Tokens to Use? Investigating Token Reduction in Vision Transformers presented at ICCV 2023 NIVT workshop

Python 35 5 Updated Aug 10, 2023