Ifty Mohammad Rezwan imr555
- Florida, United States
- https://imr555.github.io/
- @imr165
Highlights
visual_language_models_ucf
Explorations into the recently proposed Taylor Series Linear Attention
[CVPR 2023 & TPAMI 2025] Explicit Visual Prompting for Low-Level Structure Segmentations
Adapting Meta AI's Segment Anything to Downstream Tasks with Adapters and Prompts
Implementation of MagViT2 Tokenizer in Pytorch
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
MERLOT: Multimodal Neural Script Knowledge Models
Reading list for research topics in multimodal machine learning
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Awesome-DragGAN: A curated list of papers, tutorials, repositories related to DragGAN
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
[ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
This repository reproduces the results of the paper: "Fixing the train-test resolution discrepancy" https://arxiv.org/abs/1906.06423
LAVIS - A One-stop Library for Language-Vision Intelligence
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
A curated list of research papers in Referring Expression Comprehension (REC)
Torch Implementation of Speaker-Listener-Reinforcer for Referring Expression Generation and Comprehension
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, Bengali and Urdu.
A Survey on Data Selection for Language Models
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
Official PyTorch implementation of Which Tokens to Use? Investigating Token Reduction in Vision Transformers presented at ICCV 2023 NIVT workshop



