**Image Description Generator**:

Build a tool that generates detailed, accurate text descriptions of uploaded images to improve accessibility.
This tests their ability to integrate multimodal AI capabilities. (consider architecture pictures)

In [5]:
from io import BytesIO
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
import torch

processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")

def generate_caption(image_path):
    if image_path.startswith("http"):
        headers = {"User-Agent": "Mozilla/5.0"}
        response = requests.get(image_path, headers=headers, stream=True)
        if response.status_code == 200 and "image" in response.headers.get("Content-Type", ""):
            image = Image.open(BytesIO(response.content)).convert("RGB")
        else:
            raise Exception(f"Failed to download image, status code: {response.status_code}")
    else:
        image = Image.open(image_path).convert("RGB")

    inputs = processor(images=image, return_tensors="pt")

    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_length=500,
            num_beams=10,
            repetition_penalty=2.5,      #Penalizes repetitive words in the caption.
            do_sample=True,              #Enables randomness in text generation.
            temperature=0.5,
            top_k=50,                    #Selects the top 50 most likely words at each step to form the caption.
            num_return_sequences=1,
            decoder_start_token_id=model.config.bos_token_id    #Specifies where decoding starts.
        )

    return processor.decode(output[0], skip_special_tokens=True)   # Ensures that special model tokens (like [CLS] or [SEP]) are removed.

image_url = "https://images.unsplash.com/photo-1517649763962-0c623066013b"

try:
    caption = generate_caption(image_url)
    print("Generated Caption:", caption)
except Exception as e:
    print("Error:", e)

Generated Caption: there are many bicyclists riding down the road in a race


In [6]:
pip install groq



In [3]:
import base64
from groq import Groq

from google.colab import userdata
groq_api_key=userdata.get('groq_api_key')

client = Groq(api_key=groq_api_key)

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

def generate_image_description(image_input):

    if image_input.startswith("http"):
        image_data = {"type": "image_url", "image_url": {"url": image_input}}
    else:
        base64_image = encode_image(image_input)
        image_data = {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}

    completion = client.chat.completions.create(
        model="llama-3.2-11b-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Describe the image."},
                    image_data
                ]
            }
        ],
        temperature=0.7,
        max_completion_tokens=512,
        top_p=1,
    )

    return completion.choices[0].message.content

image_input = "https://www.cybermedian.com/wp-content/uploads/2022/02/0j3G8oZH4Yj5voOmG.png"
#image_input = "cycles.png"

image_description = generate_image_description(image_input)
print("Image Description:", image_description)



Image Description: The image depicts a flowchart with multiple nodes and arrows, illustrating the process of navigating a route. The chart is divided into two columns, with the left column featuring seven yellow boxes containing text, including "Start", "Vacuum On", "Scan Environment", "Generate Map and Location", "Plan Route", "Route", and "Vacuum On". The right column also contains seven yellow boxes with text, including "Follow route", "Finished Route", "Battery Low", "Vacuum Off", "Return power dock", "Power Off", and "End".

The chart's background is white, providing a clean and clear visual representation of the process. The flowchart appears to be a detailed guide for navigating a route, with various steps and conditions that determine the next action. The use of different colors and shapes adds visual interest and helps to distinguish between different parts of the process.

Overall, the image effectively communicates the steps involved in navigating a route, making it easy to 