GitHub - infly-ai/INF-MLLM

InfMLLM: A Unified Model for Visual-Language Tasks

Release

[12/06] Make the models and evaluation code available; the manuscript v2 will be posted on ArXiv in two days.
[11/06] Upload the initial version of the manuscript to arXiv.

Install

conda create -n infmllm python=3.9
conda activate infmllm
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

Model Zoo

Both the multitask and instruction tuning models are now available on Hugging Face!

Evaluation

We conducted evaluations of the InfMLLM-7B multitask model across five VQA (Visual Question Answering) datasets and three visual grounding datasets. Meanwhile, the InfMLLM-7B-Chat model, tuned for instruction-following, was assessed on four VQA datasets and six multi-modal benchmarks. For detailed evaluation procedures, please refer to Evaluation.

Demo

Trying InfMLLM-7B-Chat is straightforward. We've provided a demo script to run on the following example image.

CUDA_VISIBLE_DEVICES=0 python demo.py

The conversation generated is shown below.

Citation

@misc{zhou2023infmllm,
      title={InfMLLM: A Unified Framework for Visual-Language Tasks}, 
      author={Qiang Zhou and Zhibin Wang and Wei Chu and Yinghui Xu and Hao Li and Yuan Qi},
      year={2023},
      eprint={2311.06791},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgments

This work wouldn't be possible without the incredible open-source code of these projects. Huge thanks!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
evaluate		evaluate
infmllm		infmllm
.DS_Store		.DS_Store
README.md		README.md
demo.py		demo.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InfMLLM: A Unified Model for Visual-Language Tasks

Release

Contents

Install

Model Zoo

Evaluation

Demo

Citation

Acknowledgments

About

Releases

Packages

Languages

infly-ai/INF-MLLM

Folders and files

Latest commit

History

Repository files navigation

InfMLLM: A Unified Model for Visual-Language Tasks

Release

Contents

Install

Model Zoo

Evaluation

Demo

Citation

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages