<a href="https://colab.research.google.com/github/nypstud/Awesome-Nano-Banana-images/blob/main/%F0%9F%92%A7_LFM2_VL_Inference_with_transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ðŸ’§ LFM2-VL Inference with transformers

This notebook allows you to easily run LFM2-VL models (like [`LiquidAI/LFM2-VL-1.6B`](https://huggingface.co/LiquidAI/LFM2-VL-1.6B)) with Hugging Face's [transformers](https://github.com/huggingface/transformers) library.

You can run it on GPU or CPU by switching the runtime (`Runtime` â†’ `Change runtime type`).

In [None]:
!pip install -qqqU transformers pillow --progress-bar off

from transformers import AutoProcessor, AutoModelForImageTextToText, TextStreamer
from transformers.image_utils import load_image

# Load model and processor
model_id = "LiquidAI/LFM2-VL-1.6B"
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="bfloat16",
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
streamer = TextStreamer(processor, skip_prompt=True, skip_special_tokens=False)

In [None]:
from IPython.display import display, Image as IPImage

# Load image
url = "https://www.ilankelman.org/stopsigns/australia.jpg"
image = load_image(url)
display(IPImage(url=url))

# Create conversation
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "What is in this image?"},
        ],
    },
]

# Generate answer
inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    tokenize=True,
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, streamer=streamer)

This image shows a bustling street scene in what appears to be a Chinatown area. The focal point is a red stop sign with white lettering, positioned on a metal pole. Behind the stop sign, there's a large red archway with Chinese characters, likely marking the entrance to a Chinatown district.

The street is lined with various shops and businesses, including a store called "Optus" and another with a sign that says "KUO". There are also other businesses visible, though their names aren't clearly identifiable.

A black SUV is driving down the street, and there are a few pedestrians visible in the background. The scene is set during the day, with sunlight illuminating the area.

The architecture and signage strongly suggest this is a Chinatown district in a city, possibly in Australia given the "Optus" sign, which is an Australian telecommunications company. The combination of Chinese characters, the style of the buildings, and the presence of Chinese cultural elements like the archway and