## Liquid AI 1.6B

In [1]:
import time
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers.image_utils import load_image

# Load model and processor
model_id = "LiquidAI/LFM2-VL-1.6B"
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

`torch_dtype` is deprecated! Use `dtype` instead!


Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

In [11]:
# Load image and create conversation
image = Image.open("../tasks/bird.jpg")
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "What do you see in the image? Answer in 100 words."},
        ],
    },
]

# Generate Answer
inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    tokenize=True,
).to(model.device)

t1 = time.time()
outputs = model.generate(**inputs, max_new_tokens=128)
t2 = time.time()

gen = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(gen)

print(f"\n Generation Time : {round(t2-t1, 2)} s")

user
What do you see in the image? Answer in 100 words.
assistant
The image showcases a hummingbird in flight, hovering near a vibrant flower. The hummingbird's wings are blurred, capturing its rapid movement. The flower is striking, with a green stem and a mix of orange and yellow petals. The background is a soft blur of yellow and green, creating a warm, natural setting. This scene beautifully illustrates the symbiotic relationship between hummingbirds and flowers, highlighting the intricate details of both creatures in their natural habitat.

 Generation Time : 7.95 s


In [13]:
# Load image and create conversation
image = Image.open("../tasks/bird.jpg")
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Find the bird and answer it's bbox center coordinate in (x,y) format normalized using image height and width"},
        ],
    },
]

# Generate Answer
inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    tokenize=True,
).to(model.device)

t1 = time.time()
outputs = model.generate(**inputs, max_new_tokens=128)
t2 = time.time()

gen = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(gen)

print(f"\n Generation Time : {round(t2-t1, 2)} s")

user
Find the bird and answer it's bbox center coordinate in (x,y) format normalized using image height and width
assistant
The hummingbird in the image is located in the upper right quadrant, facing left. Its center coordinate in normalized (x,y) format is:

(0.5, 0.3)

This coordinate represents the approximate center of the hummingbird's body relative to the image's width and height.

 Generation Time : 5.96 s
