Skip to content

[NeurIPS 2023 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards multimodal GPT-4 level capabilities.

License

Notifications You must be signed in to change notification settings

natlamir/LLaVA-Windows

 
 

Repository files navigation

🌋 LLaVA: Large Language and Vision Assistant

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.

Windows Install

  1. Clone this repository and navigate to LLaVA folder
git clone https://github.com/natlamir/LLaVA-Windows.git llava
cd llava
  1. Create environment and install dependencies
conda create -n llava python=3.10
conda activate llava
pip install -r requirements.txt
  1. Install PyTorch from Pytorch Website
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
  1. Downgrade pydantic to fix the model queue infinite load issue
pip install pydantic==1.10.9
  1. Install bitsandbytes for windows to be able to run quantized model
pip install git+https://github.com/Keith-Hon/bitsandbytes-windows.git

Manually Download Model (Optional)

The models are located in the Model Zoo. For this example, I will use the "liuhaotian/llava-v1.5-7b" model.

  1. Create a folder called "models" within the "llava" install folder

  2. cd into the "models" folder from a prompt

  3. Download 7b model from hugging face into the models folder

git lfs install
git clone https://huggingface.co/liuhaotian/llava-v1.5-7b

Usage

  1. Launch 3 anaconda prompts. For each one: Activate llava environment, and cd into the llava install folder.

  2. In 1st anaconda prompt, launch the controller

python -m llava.serve.controller --host 0.0.0.0 --port 10000
  1. In the 2nd anaconda prompt, launch the model worker using 8-bit quantized model

Using the model card (this will also download the model first)

python -m llava.serve.model_worker --host "0.0.0.0" --controller-address "http://localhost:10000" --port 40000 --worker-address "http://localhost:40000" --model-path "liuhaotian/llava-v1.5-7b" --load-8bit

Or: this is using the manually installed model that should be in the models folder from the previous optional step

python -m llava.serve.model_worker --host "0.0.0.0" --controller-address "http://localhost:10000" --port 40000 --worker-address "http://localhost:40000" --model-path "models/llava-v1.5-7b" --load-8bit
  1. In the 3rd anaconda prompt, launch the Gradio Web UI
python -m llava.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload
  1. Open a browser and navigate to http://127.0.0.1:7860

Citation

If you find LLaVA useful for your research and applications, please cite using this BibTeX:

@misc{liu2023improvedllava,
      title={Improved Baselines with Visual Instruction Tuning}, 
      author={Liu, Haotian and Li, Chunyuan and Li, Yuheng and Lee, Yong Jae},
      publisher={arXiv:2310.03744},
      year={2023},
}

@misc{liu2023llava,
      title={Visual Instruction Tuning}, 
      author={Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae},
      publisher={arXiv:2304.08485},
      year={2023},
}

Acknowledgement

  • Vicuna: the codebase we built upon, and our base model Vicuna-13B that has the amazing language capabilities!

Related Projects

For future project ideas, please check out:

About

[NeurIPS 2023 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards multimodal GPT-4 level capabilities.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 90.3%
  • Shell 5.2%
  • JavaScript 2.3%
  • HTML 1.8%
  • CSS 0.4%