## Install all the requirements

In [1]:
!pip install -r requirements.txt

Collecting fastapi==0.115.4 (from -r requirements.txt (line 1))
  Downloading fastapi-0.115.4-py3-none-any.whl.metadata (27 kB)
Collecting Pillow==11.0.0 (from -r requirements.txt (line 2))
  Using cached pillow-11.0.0-cp312-cp312-macosx_11_0_arm64.whl.metadata (9.1 kB)
Collecting torch==2.4.0 (from -r requirements.txt (line 3))
  Using cached torch-2.4.0-cp312-none-macosx_11_0_arm64.whl.metadata (26 kB)
Collecting transformers==4.44.2 (from -r requirements.txt (line 4))
  Using cached transformers-4.44.2-py3-none-any.whl.metadata (43 kB)
Collecting uvicorn==0.32.0 (from -r requirements.txt (line 5))
  Downloading uvicorn-0.32.0-py3-none-any.whl.metadata (6.6 kB)
Collecting groq==0.12.0 (from -r requirements.txt (line 7))
  Using cached groq-0.12.0-py3-none-any.whl.metadata (13 kB)
Collecting starlette<0.42.0,>=0.40.0 (from fastapi==0.115.4->-r requirements.txt (line 1))
  Downloading starlette-0.41.3-py3-none-any.whl.metadata (6.0 kB)
Collecting tokenizers<0.20,>=0.19 (from transforme

In [2]:
import os
import base64
from groq import Groq
from dotenv import load_dotenv

## Registering to Groq using API key
Get an API key [here](https://console.groq.com/docs/quickstart)

In [4]:
load_dotenv()
client = Groq(api_key=os.getenv("GROQ_API_KEY"))

## Loading the vision model

In [5]:
MODEL="llama-3.2-11b-vision-preview"

## Function definition

In [6]:
# Function to encode the image base 64
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8') 
  
# Function to ask the Llama-3.2-11b-vision-preview
def ask_llama_vision(question: str, image_path: str) -> str:
  base64_image = encode_image(image_path)

  client = Groq(api_key=os.getenv("GROQ_API_KEY"))

  chat_completion = client.chat.completions.create(
    messages=[
      {
        "role": "user",
        "content": [
            {"type": "text", "text": question},
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{base64_image}",
                },
            },
        ],
      }
    ],
    model=MODEL,
  )

  return chat_completion.choices[0].message.content

## Testing the function

In [7]:
# === CALLING THE FUNCTION ===
image_path = "./imgs/ai-generated-stray-cat-in-danger-background-animal-background-photo.jpg"
question = "What animal can you identify from this image?"
response = ask_llama_vision(question, image_path)
print(response)

The image shows an orange and white cat sitting on a gray brick floor. The cat is centered in the photo, facing the camera, with its body oriented toward the left side of the image. It has long hair with orange fur on its back and white fur on its underside(patch near the base of the neck is white), face, chest, and legs. It has small white paws, darker and thicker compared to the fur on its body. Its face has orange-striped markings on the forehead, whiskers, and pink nose. It also has small ears, and tufts of white hair on the inside area of each ear and white circular markings around the eyes. A long furry orange tail curls around its torso.

The background in blurred and indistinct, but seems to show a row of colorful buildings and what appears to be a water hose. The image and the pose of the cat let us know this could be a domesticated feline rather than a wild feline.
