llama3v

llama3v is a SOTA vision model that is powered by Llama3 8B and siglip-so400m.

[ GitHub ] [ Model Weights ] [ Blog Post ]

Features

SOTA open-source VLLM
Model is available on Huggingface
Fast local inference
Release inference code (training code is coming soon, just cleaning up)

Checkout huggingface for the model weights.

Usage

You can use llama3v with the Transformers library.

from transformers import AutoTokenizer, AutoModel
from PIL import Image

model = AutoModel.from_pretrained("mustafaaljadery/llama3v").cuda()
tokenizer = AutoTokenizer.from_pretrained("mustafaaljadery/llama3v")

image = Image.open("test_image.png")

answer = model.generate(image=image, message="What is this image?", temperature=0.1, tokenizer=tokenizer)

print(answer)

The model first passes through the image through the vision model to extract the features, then pass through the language model to generate the answer. Here is a sample inference pipeline:

Training Process

In our training process, we combine the siglip-so400m model for vision with the Llama3 8B model for multi-modal image-text input with text generation.

We add a projection layer to the siglip-so400m model to project the image features to the LLaMA embedding space for the model to better understand the image.

In the pretraining process, we use freeze all the weights other than the projection layer. We train on about 600K images.

In the fine-tuning process, we update the weights of the Llama3 8B model while freezing the weights of the siglip-so400m model and the projection layer. We train for approximately 1M images. Moreover, we generate synthetic multimodal data from YI's model family for multimodal text generation as well. We finetune our model on this synsthetic data.

Read more about our training process here.

Acknowledgements

Citations

This was built with the help of the following resources:

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
llama3v		llama3v
README.md		README.md
model.safetensors.index.json		model.safetensors.index.json
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llama3v

Features

Usage

Training Process

Acknowledgements

Citations

About

Releases

Packages

Languages

jungmin76park/llama3v

Folders and files

Latest commit

History

Repository files navigation

llama3v

Features

Usage

Training Process

Acknowledgements

Citations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages