**Image Description Generator**:

Build a tool that generates detailed, accurate text descriptions of uploaded images to improve accessibility.
This tests their ability to integrate multimodal AI capabilities. (consider architecture pictures)

In [3]:
import requests
from io import BytesIO
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
import torch

processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")

def generate_caption(image_path):
    """Generate a highly detailed caption using BLIP."""
    if image_path.startswith("http"):
        headers = {"User-Agent": "Mozilla/5.0"}
        response = requests.get(image_path, headers=headers, stream=True)
        if response.status_code == 200 and "image" in response.headers.get("Content-Type", ""):
            image = Image.open(BytesIO(response.content)).convert("RGB")
        else:
            raise Exception(f"Failed to download image, status code: {response.status_code}")
    else:
        image = Image.open(image_path).convert("RGB")

    inputs = processor(images=image, return_tensors="pt")

    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_length=500,
            num_beams=10,
            repetition_penalty=2.5,
            do_sample=True,
            temperature=0.5,
            top_k=50,
            num_return_sequences=1,
            decoder_start_token_id=model.config.bos_token_id
        )

    return processor.decode(output[0], skip_special_tokens=True)

image_url = "https://www.cybermedian.com/wp-content/uploads/2022/02/0j3G8oZH4Yj5voOmG.png"

try:
    caption = generate_caption(image_url)
    print("Generated Caption:", caption)
except Exception as e:
    print("Error:", e)


Generated Caption: this is an image of a flow diagram that shows how to use the system


In [19]:
import os
from groq import Groq
import pytesseract
from PIL import Image
import requests

os.environ["GROQ_API_KEY"]="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"
client = Groq()
completion = client.chat.completions.create(
    model="llama-3.2-11b-vision-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Explain the image."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://www.cybermedian.com/wp-content/uploads/2022/02/0j3G8oZH4Yj5voOmG.png"
                    }
                }
            ]
        }
    ],
    temperature=1,
    max_completion_tokens=1024,
    top_p=1,
    stream=False,
    stop=None,
)

print("Image Description:", completion.choices[0].message.content)


Image Description: This infographic presents a flowchart illustrating the operation of a specific process.

The flowchart begins at the top left corner and tracks 15 main steps, with the starting position outlined. The first three actions are: 1. "Power On" (blue box, accompanied by "start")
2. "Plan Route" (yellow box), and
3. "Scan Environment" (blue box, no accompanying action). The fourth step is to "Generates Map and Location" (yellow box), which will be used to define the "Route" (blue box). From this point onwards, the process proceeds mainly in shades of yellow and light blue.

The chart provides information about the battery, environment, and route by indicating the various states such as "battery Low", "vacuum Full", and "Follow Route", along with subsequent decision points and actions. The final main action is returned to the battery area to "stop moving". In summary, the chart presents a sequence of actions that are governed by conditions. The chart starts and ends with "st