Notice

In this repo, we provide the pretrain code and model. It utilizes resolution of 224. If you want to use the high-resolution model (after all of the four training stages), please refer to our hugging face page at https://huggingface.co/Infi-MM/infimm-hd.

Pretrain model

Just download the model at https://huggingface.co/lllliuhhhhggg/infimm_pretrain/tree/main. We provide two pretraining models in our paper (only stage1) https://arxiv.org/abs/2403.01487. It is a Flamingo style model, the only difference is we remove the perceiver resampler. We use vit-e and vicuna in our model. These models are pretrained on mmc4, obelisc, coyo238m (sampled from coyo700m), laion115m, laioncoco. Our model's training speed is much faster than LLaVA due to the cross attention information fusion (using same amount of data). Feel free to build something from our pretrained model.

Training

As we utilize the company's training framework, we can not provide the training code directly. So here we give you a demo of the data process and forward pass in demo_forward.py

License

This project is licensed under the **CC BY-NC 4.0**.

The copyright of the images belongs to the original authors.

See LICENSE for more information.

Reference

https://github.com/baaivision/EVA https://github.com/mlfoundations/open_flamingo

Citation

@misc{liu2024infimmhdleapforwardhighresolution,
      title={InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding}, 
      author={Haogeng Liu and Quanzeng You and Xiaotian Han and Yiqi Wang and Bohan Zhai and Yongfei Liu and Yunzhe Tao and Huaibo Huang and Ran He and Hongxia Yang},
      year={2024},
      eprint={2403.01487},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2403.01487}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
configs		configs
model		model
1.jpg		1.jpg
LICENSE		LICENSE
README.md		README.md
demo_forward.py		demo_forward.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Notice

Pretrain model

Training

License

Reference

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Notice

Pretrain model

Training

License

Reference

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages