# Build-An-Agent
In this notebook, we'll walk through the process of building an AI agent that integrates image generation, object detection, and language modeling. First, the agent will generate images from a text prompt using a pre-trained Stable Diffusion model. Then, it will analyze the generated image using a pre-trained object detection model to identify and label objects. Finally, an LLM will enhance the workflow by refining prompts and providing insightful analysis based on detected objects. This activity demonstrates how multiple machine learning pipelines can work together to create a powerful AI system for visual content creation and interpretation.









Note: You may have to change the runtime type if you encounter a run time error. Go to runtime --> change runtime type --> T4 GPU

Let's begin by downloading the libraries we need.




In [None]:
!pip install diffusers transformers torch matplotlib pillow openai

In [None]:
!pip install transformers ctransformers

Next, let's import the libraries we need.

In [None]:
import os
import torch
import argparse
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw, ImageFont
from diffusers import StableDiffusionPipeline
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
import json
import time
from IPython.display import display


This class handles image creation, enhancement, and analysis using a combination of Stable Diffusion and a language model for a more interactive and creative experience. Let's run it!









In [None]:
class AIImageLLMAgent:
    """An AI agent for generating and analyzing images with LLM assistance"""

    def __init__(self,
                 sd_model="stabilityai/stable-diffusion-2-1-base",
                 output_dir="./generated_images",
                 device=None,
                 llm_model="TheBloke/Llama-2-7B-Chat-GGML",  # Open source LLM
                 llm_enabled=True):
        """Initialize the AI Image Agent with models and configuration"""
        self.output_dir = output_dir
        self.llm_enabled = llm_enabled

        # Create output directory if it doesn't exist
        if not os.path.exists(output_dir):
            os.makedirs(output_dir)

        # Determine device (GPU or CPU)
        if device is None:
            self.device = "cuda" if torch.cuda.is_available() else "cpu"
        else:
            self.device = device

        print(f"Using device: {self.device}")

        # Load the image generation model
        print("Loading Stable Diffusion model...")
        torch_dtype = torch.float16 if self.device == "cuda" else torch.float32
        self.sd_pipeline = StableDiffusionPipeline.from_pretrained(
            sd_model,
            torch_dtype=torch_dtype
        )
        self.sd_pipeline = self.sd_pipeline.to(self.device)

        # Load the object detection model
        print("Loading object detection model...")
        self.object_detector = pipeline("object-detection")

        # Initialize the LLM if enabled
        self.llm = None
        self.tokenizer = None
        if llm_enabled:
            try:
                print(f"Loading LLM model: {llm_model}")
                # Initialize the model - choose appropriate model type
                if "GGML" in llm_model or "ggml" in llm_model:
                    # For GGML quantized models, use CTransformers
                    from ctransformers import AutoModelForCausalLM as CTAutoModelForCausalLM
                    self.llm = CTAutoModelForCausalLM.from_pretrained(
                        llm_model,
                        model_type="llama" if "llama" in llm_model.lower() else "gpt2"
                    )
                    self.tokenizer = AutoTokenizer.from_pretrained(llm_model)
                else:
                    # For regular HF models, use transformers
                    self.tokenizer = AutoTokenizer.from_pretrained(llm_model)
                    self.llm = AutoModelForCausalLM.from_pretrained(
                        llm_model,
                        torch_dtype=torch_dtype,
                        low_cpu_mem_usage=True,
                        device_map="auto"
                    )
                print("Successfully loaded LLM model")
            except Exception as e:
                print(f"Error loading LLM model: {e}")
                print("Continuing without LLM capabilities")
                self.llm_enabled = False
        else:
            print("LLM features disabled.")

    def generate_image(self, prompt, negative_prompt="",
                       num_inference_steps=50, guidance_scale=7.5,
                       width=512, height=512, save=True):
        """Generate an image based on text prompt"""
        print(f"Generating image for prompt: '{prompt}'")

        # Generate the image
        image = self.sd_pipeline(
            prompt=prompt,
            negative_prompt=negative_prompt,
            num_inference_steps=num_inference_steps,
            guidance_scale=guidance_scale,
            width=width,
            height=height
        ).images[0]

        # Save the image if requested
        if save:
            # Create a filename from the prompt
            filename = "_".join(prompt.split()[:5]).lower()
            filename = "".join(c if c.isalnum() or c == "_" else "_" for c in filename)
            filepath = os.path.join(self.output_dir, f"{filename}.png")
            image.save(filepath)
            print(f"Image saved to {filepath}")

        return image

    def _generate_llm_response(self, system_prompt, user_prompt, max_length=1000):
        """Generate a response from the local LLM model"""
        if not self.llm_enabled or self.llm is None or self.tokenizer is None:
            return "LLM functionality not available."

        try:
            # Format prompt based on Llama 2 chat template
            # This format is for Llama 2 - adjust for other models
            formatted_prompt = f"""<s>[INST] <<SYS>>
{system_prompt}
<</SYS>>

{user_prompt} [/INST]"""

            # Tokenize prompt
            inputs = self.tokenizer(formatted_prompt, return_tensors="pt").to(self.device)

            # Generate response
            with torch.no_grad():
                output = self.llm.generate(
                    inputs["input_ids"],
                    max_new_tokens=max_length,
                    temperature=0.7,
                    top_p=0.9,
                    repetition_penalty=1.1,
                    do_sample=True
                )

            # Decode response
            response = self.tokenizer.decode(output[0], skip_special_tokens=True)

            # Extract only the assistant's response (after the prompt)
            response = response.split("[/INST]")[-1].strip()
            return response

        except Exception as e:
            print(f"Error generating LLM response: {e}")
            return "Error generating response from LLM."

    def enhance_prompt(self, user_prompt):
        """Use LLM to enhance a basic prompt into a more detailed one for better image generation"""
        if not self.llm_enabled:
            print("LLM enhancement not available: LLM features are disabled")
            return user_prompt

        print("Enhancing prompt with local LLM...")

        system_message = """
        You are an expert at creating detailed prompts for AI image generation.
        Take the user's basic prompt and enhance it with rich, descriptive details that will help
        Stable Diffusion create a better image. Focus on visual elements, lighting, style, mood,
        and composition. Return ONLY the enhanced prompt text with no explanation or additional content.
        """

        enhanced_prompt = self._generate_llm_response(
            system_message,
            f"Enhance this image prompt: {user_prompt}"
        )

        # Clean up the response - remove any explanations the LLM might add
        if ":" in enhanced_prompt:
            enhanced_prompt = enhanced_prompt.split(":")[-1].strip()

        print(f"Original prompt: {user_prompt}")
        print(f"Enhanced prompt: {enhanced_prompt}")
        return enhanced_prompt

    def analyze_image_with_llm(self, image, detections):
        """Use LLM to provide an insightful analysis of the image and detected objects"""
        if not self.llm_enabled:
            print("LLM analysis not available: LLM features are disabled")
            return None

        print("Analyzing image with local LLM...")

        # Prepare detection data for LLM
        detection_text = "Objects detected:\n"
        if detections:
            for i, detection in enumerate(detections):
                detection_text += f"{i+1}. {detection['label']} (confidence: {detection['score']:.2f})\n"
        else:
            detection_text += "No objects detected by the object detection model.\n"

        system_message = """
        You are an expert art critic and image analyst. Provide a thoughtful, insightful analysis of the image
        based on the detected objects and what you can infer about the composition, style, and content.
        Consider both what the object detection model found and what might be missing or misinterpreted.
        Keep your analysis to 3-4 paragraphs of insightful observations.
        """

        analysis = self._generate_llm_response(
            system_message,
            f"Here is the object detection result:\n\n{detection_text}\n\nPlease provide your analysis of this image."
        )

        print("\n--- LLM's Analysis ---")
        print(analysis)
        return analysis

    def detect_objects(self, image, threshold=0.3, visualize=True, llm_analyze=False):
        """Detect objects in the image"""
        print("Analyzing image for objects...")

        # Perform object detection
        results = self.object_detector(image, threshold=threshold)

        # Print detection results
        if results:
            print(f"Detected {len(results)} objects:")
            for detection in results:
                print(f"- {detection['label']} ({detection['score']:.2f})")
        else:
            print("No objects detected.")

        # Get LLM analysis if requested
        if llm_analyze and self.llm_enabled:
            self.analyze_image_with_llm(image, results)

        # Visualize the results if requested
        if visualize:
            return self._visualize_detections(image, results)

        return results

    def _visualize_detections(self, image, detections):
        """Visualize detected objects on the image"""
        # Create a copy of the image to draw on
        image_with_boxes = image.copy()
        draw = ImageDraw.Draw(image_with_boxes)

        # Try to get a font
        try:
            font = ImageFont.truetype("arial.ttf", 15)
        except IOError:
            font = ImageFont.load_default()

        # Draw boxes and labels for each detection
        for detection in detections:
            score = detection["score"]
            label = detection["label"]
            box = detection["box"]

            left = box['xmin']
            top = box['ymin']
            right = box['xmax']
            bottom = box['ymax']

            # Draw bounding box
            draw.rectangle([left, top, right, bottom], outline="red", width=3)

            # Draw label with score
            text = f"{label}: {score:.2f}"
            text_width, text_height = draw.textsize(text, font=font) if hasattr(draw, 'textsize') else (len(text) * 7, 15)
            draw.rectangle([left, top, left + text_width, top + text_height], fill="red")
            draw.text((left, top), text, fill="white", font=font)

        return image_with_boxes

    def get_creative_image_ideas(self, topic):
        """Generate creative image prompt ideas based on a topic using LLM"""
        if not self.llm_enabled:
            print("Feature not available: LLM features are disabled")
            return None

        print(f"Generating creative image ideas for topic: '{topic}'...")

        system_message = """
        You are a creative director specializing in visual concepts.
        Generate 5 unique, interesting image prompt ideas related to the user's topic.
        Each prompt should be detailed and visually descriptive, ready to use with Stable Diffusion.
        Format your response as a numbered list with one prompt per line, starting each line with a number and a period.
        Example:
        1. First prompt idea
        2. Second prompt idea
        3. Third prompt idea
        4. Fourth prompt idea
        5. Fifth prompt idea
        """

        response_text = self._generate_llm_response(
            system_message,
            f"Generate 5 creative and unique image prompt ideas related to: {topic}."
        )

        # Parse the response into a list of ideas
        ideas = []
        for line in response_text.split('\n'):
            line = line.strip()
            if line and (line[0].isdigit() or line.startswith("- ")):
                # Remove the number/bullet and get just the prompt
                prompt = line.split('.', 1)[-1].strip() if '.' in line else line[2:].strip()
                if prompt:
                    ideas.append(prompt)

        # If parsing failed, just split by newlines and take up to 5 items
        if not ideas:
            ideas = [line.strip() for line in response_text.split('\n') if line.strip()][:5]

        print("\n--- Creative Image Ideas ---")
        for i, idea in enumerate(ideas):
            print(f"{i+1}. {idea}")

        return ideas

    def interactive_session(self):
        """Start an interactive session with the agent"""
        print("\n===== AI Image LLM Agent Interactive Session =====")
        print("Type 'exit' or 'quit' to end the session.\n")

        current_image = None
        last_filename = None

        while True:
            print("\nAvailable commands:")
            print("1. generate - Generate an image from a text prompt")
            print("2. enhance - Use LLM to enhance a basic prompt")
            print("3. analyze - Detect and analyze objects in an image")
            print("4. ideas - Get creative image prompt ideas")
            print("5. help - Show detailed help")
            print("6. quit - Exit the session")

            command = input("\nEnter command (1-6): ").strip().lower()

            if command in ['quit', 'exit', '6']:
                print("Exiting interactive session.")
                break

            elif command in ['help', '5']:
                self._print_help()

            elif command in ['generate', '1']:
                prompt = input("Enter image prompt: ")
                negative_prompt = input("Enter negative prompt (optional): ")

                current_image = self.generate_image(prompt, negative_prompt=negative_prompt)
                self._display_image(current_image)

            elif command in ['enhance', '2']:
                if not self.llm_enabled:
                    print("Feature not available: LLM features are disabled")
                    continue

                basic_prompt = input("Enter basic image concept: ")
                enhanced_prompt = self.enhance_prompt(basic_prompt)

                use_enhanced = input("Generate image with this enhanced prompt? (y/n): ").lower()
                if use_enhanced == 'y':
                    current_image = self.generate_image(enhanced_prompt)
                    self._display_image(current_image)

            elif command in ['analyze', '3']:
                if current_image is None:
                    filepath = input("Enter path to image (or press Enter to load the last generated): ")
                    if filepath:
                        try:
                            current_image = Image.open(filepath)
                        except Exception as e:
                            print(f"Error loading image: {e}")
                            continue
                    elif last_filename:
                        try:
                            current_image = Image.open(last_filename)
                        except Exception as e:
                            print(f"Error loading last image: {e}")
                            continue
                    else:
                        print("No image to analyze. Generate or load an image first.")
                        continue

                threshold = input("Detection threshold (0.0-1.0, default 0.3): ")
                threshold = float(threshold) if threshold else 0.3

                llm_analyze = input("Use LLM for in-depth analysis? (y/n, default n): ").lower() == 'y'

                image_with_detections = self.detect_objects(current_image, threshold=threshold, llm_analyze=llm_analyze)
                self._display_image(image_with_detections)

            elif command in ['ideas', '4']:
                if not self.llm_enabled:
                    print("Feature not available: LLM features are disabled")
                    continue

                topic = input("Enter a topic or theme for image ideas: ")
                ideas = self.get_creative_image_ideas(topic)

                if ideas:
                    selection = input("Enter idea number to generate (or 0 to skip): ")
                    try:
                        idx = int(selection) - 1
                        if 0 <= idx < len(ideas):
                            current_image = self.generate_image(ideas[idx])
                            self._display_image(current_image)
                    except:
                        print("Invalid selection, not generating an image.")

            else:
                print("Unknown command. Type 'help' to see available commands.")

    def _print_help(self):
        """Print help information for interactive mode"""
        print("\n--- Command Details ---")
        print("1. generate - Generate a new image from a text prompt")
        print("   - Enter a detailed text description of the image you want to create")
        print("   - Optionally provide a negative prompt (things to avoid in the image)")

        print("\n2. enhance - Use LLM to enhance a basic prompt")
        print("   - Enter a simple concept and LLM will expand it with visual details")
        print("   - Great for getting better results from Stable Diffusion")

        print("\n3. analyze - Detect objects in the current or specified image")
        print("   - Uses computer vision to identify objects in the image")
        print("   - Can optionally use LLM to provide an in-depth analysis")

        print("\n4. ideas - Get creative image prompt ideas on a topic")
        print("   - Enter a general topic and LLM will suggest specific image concepts")
        print("   - You can select one of the ideas to generate immediately")

        print("\n5. help - Show this help message")
        print("\n6. quit/exit - End the interactive session")

    def _display_image(self, image):
        """Display an image using matplotlib"""
        plt.figure(figsize=(10, 10))
        plt.imshow(np.array(image))
        plt.axis('off')
        plt.show()


def main():
    """Main function to run the AI Image LLM Agent"""
    parser = argparse.ArgumentParser(description="AI Image Generator and Analyzer with LLM")
    parser.add_argument("--sd_model", default="stabilityai/stable-diffusion-2-1-base",
                        help="Stable Diffusion model to use")
    parser.add_argument("--output_dir", default="./generated_images",
                        help="Directory to save generated images")
    parser.add_argument("--interactive", action="store_true",
                        help="Start in interactive mode")
    parser.add_argument("--prompt",
                        help="Text prompt for image generation (non-interactive mode)")
    parser.add_argument("--device", choices=["cuda", "cpu"],
                        help="Device to use (default: auto-detect)")
    parser.add_argument("--llm_model", default="TheBloke/Llama-2-7B-Chat-GGML",
                        help="LLM model to use for text generation")
    parser.add_argument("--llm_disabled", action="store_true",
                        help="Disable LLM features")
    parser.add_argument("--enhance", action="store_true",
                        help="Use LLM to enhance the prompt")
    parser.add_argument("--analyze", action="store_true",
                        help="Analyze the generated image with LLM")

    args = parser.parse_args()

    # Create the agent
    agent = AIImageLLMAgent(
        sd_model=args.sd_model,
        output_dir=args.output_dir,
        device=args.device,
        llm_model=args.llm_model,
        llm_enabled=not args.llm_disabled
    )

    # Run in interactive mode or generate a single image
    if args.interactive:
        agent.interactive_session()
    elif args.prompt:
        # Enhance the prompt if requested and possible
        prompt = args.prompt
        if args.enhance and not args.llm_disabled:
            prompt = agent.enhance_prompt(prompt)

        # Generate the image
        image = agent.generate_image(prompt)

        # Analyze if requested
        if args.analyze and not args.llm_disabled:
            detections = agent.detect_objects(image, llm_analyze=True)
        else:
            detections = agent.detect_objects(image)

        # Display the image with detections
        plt.figure(figsize=(10, 10))
        plt.imshow(np.array(detections))
        plt.axis('off')
        plt.show()
    else:
        parser.print_help()


if __name__ == "__main__":
    main()

Let's start an interactive session where the user can generate images, improve prompts, and analyze results, interacting with both the image model and language model.

In [None]:
agent = AIImageLLMAgent(
    sd_model="stabilityai/stable-diffusion-2-1-base",
    output_dir="./generated_images",
    llm_model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",  # Smaller model
    llm_enabled=True
)
agent.interactive_session()

Congratulations! You've created your first AI agent. Feel free to keep exploring its capabilities and challenge it by adjusting your prompts and confidence levels. There's a lot to discover, so dive in and see how it can adapt and respond!
