利用したJupyter Notebook: <https://colab.research.google.com/drive/1L28bJX14-Y5lJvswYwydsletYFMIxVH5?usp=sharing>

- 修正点
  - セットアップ、テストの部分を消しました。
  - demo.launchの引数を変更しました。

# LLaVA Chatbot

This notebook shows how to create a chatbot with the [multimodal LLaVA model](https://llava-vl.github.io/). See the [related blog post for context](https://medium.com/p/b06f88ce8efa).

The notebook is divided in two parts:

- Load a LLaVA model and make it generate an answer from an image and a prompt

- Create a LLaVA chatbot interface using Gradio

The notebook can be run using Google Colab and a T4 TPU.


In [1]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

## Load the LLaVA model

The model is loaded from its [HuggingFace repository](https://huggingface.co/llava-hf/llava-1.5-7b-hf), using the transformers' `pipeline` object.

Let us quantize it in 4bits to reduce the GPU memory consumption.

In [2]:
from transformers import pipeline, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

# 13B version can be loaded with llava-hf/llava-1.5-13b-hf
model_id = "llava-hf/llava-1.5-7b-hf"


pipe = pipeline("image-to-text", model=model_id, model_kwargs={"quantization_config": quantization_config})



Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


# LLaVA chatbot with Gradio


In [3]:
def update_conversation(new_message, history, image):

    if image is None:
        return "Please upload an image first using the widget on the left"

    conversation_starting_from_image = [[user, assistant] for [user, assistant] in history if not assistant.startswith('Please')]

    prompt = "USER: <image>\n"

    for i in range(len(history)):
        prompt+=history[i][0]+'\nASSISTANT: '+history[i][1]+"\nUSER: "

    prompt = prompt+new_message+'\nASSISTANT: '

    outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200
                                                          #, "do_sample" : True,
                                                          #"temperature" : 0.7
                                                          })[0]['generated_text']

    return outputs[len(prompt)-6:]


In [5]:
import gradio as gr

with gr.Blocks() as demo:

    with gr.Row():
      image = gr.Image(type='pil', interactive=True)

      gr.ChatInterface(
          update_conversation, additional_inputs=[image],
          chatbot= gr.Chatbot()
      )

demo.launch(inbrowser=True,share=False, server_name='0.0.0.0', server_port=7860, auth=None)

Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.




Related notebooks:

- Loading LLaVA in 4bits with colab: https://colab.research.google.com/drive/1qsl6cd2c8gGtEW1xV5io7S8NHh-Cp1TV?usp=sharing (from official transformers library at https://huggingface.co/docs/transformers/model_doc/llava)

