# LLaVA 13B Model Inference
This notebook demonstrates the inference process for the LLaVA 13B model, optimized with 4-bit quantization. It includes steps for setting up the environment, loading the model, and running inference with visual and textual inputs.



## Check Python Version
Ensuring that the Python version is compatible with the LLaVA library is crucial for preventing compatibility issues. This check helps verify that our environment aligns with the requirements for running the model.


In [1]:
# Check the Python version to ensure compatibility with LLaVA requirements

!python --version

Python 3.10.12


## Import Necessary Libraries
This section involves importing all necessary libraries and modules that will be used throughout the notebook. These imports include handling images, managing conversation templates, and performing tokenization and inference, which are critical for interacting with the LLaVA model.


In [3]:
# Import essential libraries and setup the environment for LLaVA model operations.
# This includes utilities for handling images, managing conversation templates, and performing tokenization and inference


import os
import requests
from PIL import Image
from io import BytesIO
from llava.conversation import conv_templates, SeparatorStyle
from llava.utils import disable_torch_init
from llava.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN, DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN
from llava.mm_utils import tokenizer_image_token, get_model_name_from_path, KeywordsStoppingCriteria
from transformers import TextStreamer
from transformers import AutoTokenizer, BitsAndBytesConfig
from llava.model import LlavaLlamaForCausalLM
import torch

  from .autonotebook import tqdm as notebook_tqdm


## Verify Directory Contents
Before proceeding, it's important to ensure that all necessary files and directories are present. Listing the contents of the current directory helps us verify that the environment is correctly set up for the tasks ahead.


In [6]:
!# Display the contents of the current working directory to verify the presence of necessary files and directories.

ls

docs	 LLaVA_13b_4bit_vanilla_inference_code.ipynb  README.md
images	 llava-v1.5-13b-3GB			      scripts
LICENSE  playground				      Untitled.ipynb
llava	 pyproject.toml


## Model Loading and Configuration

In this section, we initialize and load the LLaVA model with specific settings tailored for memory efficiency and performance. The model's path points to the version and configuration intended for deployment. Special attention is given to the quantization parameters to leverage 4-bit precision, which greatly reduces the memory footprint during inference.


In [None]:


model_path = "llava-v1.5-13b-3GB"

kwargs = {"device_map": "cpu"}
kwargs['load_in_4bit'] = True
kwargs['quantization_config'] = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.int8,
    bnb_4bit_use_double_quant=True,
    load_in_8bit_fp32_cpu_offload=True,
    bnb_4bit_quant_type='nf4'
)

model = LlavaLlamaForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)



  return torch.load(checkpoint_file, map_location="cpu")
Loading checkpoint shards:  33%|██████            | 3/9 [01:14<02:29, 24.96s/it]

In [None]:
vision_tower = model.get_vision_tower()
if not vision_tower.is_loaded:
    vision_tower.load_model()
vision_tower.to(device='cuda')
image_processor = vision_tower.image_processor

## Define Inference Function
The `interact_image` function is defined here to facilitate the process of loading images, preprocessing them, and performing inference. The function takes an image path and a textual prompt as inputs, processes these inputs using the LLaVA model, and returns the model's generated response. This function exemplifies how to integrate and utilize the model's capabilities for practical applications.


In [None]:

def interact_image(image_file, prompt):


     """
    Function to load an image, preprocess it, and perform inference using the LLaVA model.
    
    Args:
    image_path (str): The path to the image file.
    prompt (str): The prompt to guide the model's response generation, including queries about the image.

    Returns:
    tuple: The original image and the model's textual output.
    """

    # Load and preprocess the image for inference
    if image_file.startswith('http') or image_file.startswith('https'):
        response = requests.get(image_file)
        image = Image.open(BytesIO(response.content)).convert('RGB')
    else:
        image = Image.open(image_file).convert('RGB')
    disable_torch_init()
    conv_mode = "llava_v0"
    conv = conv_templates[conv_mode].copy()
    roles = conv.roles
    image_tensor = image_processor.preprocess(image, return_tensors='pt')['pixel_values'].half().cuda()

    # Prepare the input prompt with role and token markers
    inp = f"{roles[0]}: {prompt}"
    inp = DEFAULT_IM_START_TOKEN + DEFAULT_IMAGE_TOKEN + DEFAULT_IM_END_TOKEN + '\n' + inp
    # Initialize a conversation object and append initial messages
    conv.append_message(conv.roles[0], inp)
    conv.append_message(conv.roles[1], None)
    # Construct the full prompt and convert it to tensor for model input
    raw_prompt = conv.get_prompt()
    input_ids = tokenizer_image_token(raw_prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors='pt').unsqueeze(0).cuda()
    stop_str = conv.sep if conv.sep_style != SeparatorStyle.TWO else conv.sep2
    keywords = [stop_str]
    stopping_criteria = KeywordsStoppingCriteria(keywords, tokenizer, input_ids)
    # Perform model inference in no_grad context to optimize memory usage
    with torch.inference_mode():
      output_ids = model.generate(input_ids, images=image_tensor, do_sample=True, temperature=0.2,
                                  max_new_tokens=1024, use_cache=True, stopping_criteria=[stopping_criteria])
    outputs = tokenizer.decode(output_ids[0, input_ids.shape[1]:]).strip()
    conv.messages[-1][-1] = outputs
    output = outputs.rsplit('</s>', 1)[0]
    return image, output

## Define Inference Function
The `interact_image` function is defined here to facilitate the process of loading images, preprocessing them, and performing inference. The function takes an image path and a textual prompt as inputs, processes these inputs using the LLaVA model, and returns the model's generated response. This function exemplifies how to integrate and utilize the model's capabilities for practical applications.


In [None]:
# Run inference to analyze an image and generate a description based on the provided prompt.
# This demonstrates the model's ability to integrate visual and textual information.



image, output = interact_image(f'Screenshot from 2024-08-24 13-52-43.png',
'Describe the image and color details. as well what are this drawings? which dimensions are provided here? and what are the measurements available?'
)
print(output)

The image is a black and white drawing of a model, likely a 3D model, featuring a long object with a curve. The drawing includes various measurements and dimensions, such as 1.5" and 1.25". The measurements are provided in inches, and the drawing appears to be a blueprint or a technical drawing. The image also has a few notes, which might provide additional information or instructions for the model.
