<a href="https://colab.research.google.com/github/hannahbanjo/AssociationOfDataScience/blob/main/ai_agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Build-An-Agent**

*In* this notebook, we'll walk through the process of building an AI agent that can generate images from a text prompt using a pre-trained Stable Diffusion model. Then, the agent will analyze the generated image using a pre-trained object detection model. This activity will allow us to see how an AI agent can seamlessly integrate multiple machine learning pipelines, from image generation to object detection, to create a powerful tool for visual content creation and analysis.



## Creating the image

Let's begin by importing the libraries we need. We'll use diffusers for image generation, torch for GPU support, PIL for image manipulation, and transformers for object detection.

In [None]:
from diffusers import StableDiffusionPipeline
import torch
import matplotlib.pyplot as plt
from PIL import Image

Here we generate an image based on the prompt we provide and save it to the file generated_image.jpg

Note: You may have to change the runtime type if you encounter a run time error. Go to runtime --> change runtime type --> T4 GPU

In [None]:
# Load pre-trained model (Stable Diffusion)
pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-base", torch_dtype=torch.float32)

# Move the pipeline to GPU (if available)
pipe = pipe.to("cuda")

# Generate image based on text prompt
prompt = "A meadow with a dog and cat"
image = pipe(prompt).images[0]

# Save the generated image to a file
image.save("/content/generated_image.jpg")

# Display the generated image
plt.figure(figsize=(8, 8))
plt.imshow(image)
plt.axis("off")  # Turn off axis
plt.show()

# Object Detection

Next, we will import the necessary libraries for performing object detection on the generated image. We'll use the transformers library to load a pre-trained object detection model.

In [None]:
from transformers import pipeline
from PIL import Image, ImageDraw

Next, we perform object detection, output the identified objects, and display their respective locations within the image.

In [None]:
from transformers import pipeline
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt

# Load pre-trained object detection model
image_analyzer = pipeline("object-detection")

# Load the image (use your generated image)
image = Image.open("/content/generated_image.jpg")

# Analyze the image for object detection (with adjusted threshold)
result = image_analyzer(image, threshold=0.3)  # Adjust threshold to your preference

# Print the result to inspect the structure
print(result)

To make the object detection process clearer, let's add bounding boxes around the detected objects, allowing us to visualize their locations more easily.

In [None]:
# Visualize detected objects
draw = ImageDraw.Draw(image)

# Iterate over each detection result in the returned list
for detection in result:
    score = detection["score"]
    label = detection["label"]
    box = detection["box"]  # This should be in the format {'xmin': value, 'ymin': value, 'xmax': value, 'ymax': value}

    # Extract the bounding box coordinates from the dictionary
    left = int(box['xmin'])
    top = int(box['ymin'])
    right = int(box['xmax'])
    bottom = int(box['ymax'])

    # Draw the bounding box if the score is above the threshold
    if score > 0.5:  # You can adjust the confidence threshold
        draw.rectangle([left, top, right, bottom], outline="red", width=3)
        draw.text((left, top), str(label), fill="red")

# Display the image with bounding boxes
plt.imshow(image)
plt.axis("off")
plt.show()

Experiment with different confidence thresholds to see how it affects which objects are detected.


In [None]:
# add your code here

Create your own prompt! How well does the object detection do with more/less objects in the image?

In [None]:
# add your code here