In this project, we are building a multi-agent system using the crewai library to analyze an image of a B2B digital menu startup. The system consists of four agents:

* Image Description Agent: Provides a detailed description of the image.
* UX Critique Agent: Critically analyzes the image based on the description.
* UX Suggestion Agent: Offers actionable suggestions to improve the image design.
* AI Product Manager: Writes user stories and prioritizes suggestions based on customer feedback.

# Setup

We start by setting up the working directory and installing the necessary libraries.

In [None]:
# Change the working directory to the project folder in Google Drive
%cd /content/drive/MyDrive/GenAI/AI Agents/Capstone Project - the AI Product Manager

/content/drive/MyDrive/GenAI/AI Agents/Capstone Project - the AI Product Manager


In [1]:
# Install the 'crewai' library and its tools, along with 'openai'
!pip install crewai
!pip install openai
!pip install 'crewai[tools]'

Collecting crewai-tools<0.13.0,>=0.12.1 (from crewai[tools])
  Using cached crewai_tools-0.12.1-py3-none-any.whl.metadata (5.1 kB)
Collecting docx2txt<0.9,>=0.8 (from crewai-tools<0.13.0,>=0.12.1->crewai[tools])
  Using cached docx2txt-0.8.tar.gz (2.8 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting lancedb<0.6.0,>=0.5.4 (from crewai-tools<0.13.0,>=0.12.1->crewai[tools])
  Downloading lancedb-0.5.7-py3-none-any.whl.metadata (17 kB)
Collecting pyright<2.0.0,>=1.1.350 (from crewai-tools<0.13.0,>=0.12.1->crewai[tools])
  Downloading pyright-1.1.385-py3-none-any.whl.metadata (6.7 kB)
Collecting pytest<9.0.0,>=8.0.0 (from crewai-tools<0.13.0,>=0.12.1->crewai[tools])
  Downloading pytest-8.3.3-py3-none-any.whl.metadata (7.5 kB)
Collecting pytube<16.0.0,>=15.0.0 (from crewai-tools<0.13.0,>=0.12.1->crewai[tools])
  Downloading pytube-15.0.0-py3-none-any.whl.metadata (5.0 kB)
Collecting selenium<5.0.0,>=4.18.1 (from crewai-tools<0.13.0,>=0.12.1->crewai[tools])
  Downloading s

In [None]:
# Retrieve the OpenAI API key securely from Google Colab's user data
from google.colab import userdata
api_key = userdata.get('genai_course')

We are working in Google Colab and using Google Drive to store our project files. The crewai library is essential for creating and managing AI agents. We also retrieve the OpenAI API key securely to authenticate our API calls.

Next, we import all the necessary libraries for our project.

In [1]:
# Import essential libraries for image processing and agent creation
import os
from PIL import Image  # For image manipulation
from crewai_tools import VisionTool  # Specialized tool for image analysis
from crewai import Agent, Task, Crew, Process  # Core components of the crewai library
from langchain_openai import ChatOpenAI  # Interface to interact with OpenAI's language models
from IPython.display import display, Markdown  # For displaying outputs in Jupyter notebooks

ModuleNotFoundError: No module named 'crewai_tools'

We import PIL for image handling, crewai components for agent management, and langchain_openai to interface with OpenAI's GPT-4 model. The VisionTool is a custom tool for image analysis.

In [None]:
# Set the OpenAI API key as an environment variable
os.environ['OPENAI_API_KEY'] = api_key

In [None]:
# Initialize the vision tool for image analysis
vision_tool = VisionTool()

By setting the API key as an environment variable, we ensure it's accessible throughout the program. The VisionTool will allow our agents to process and analyze images.

In [None]:
# Load the Image using the Pillow library
image_path = "translation.png"
image = Image.open(image_path)

The image translation.png is opened and stored in the image variable for later use.

# AGENT 1 - Image Description Agent

We create the first agent responsible for describing the image.

In [None]:
# Build the description agent with its role, goal, and backstory
description_agent = Agent(
    role="Image Description Agent",
    goal=f"Fully describe the digital image ({image_path}), of a B2B Digital Menu startup, including its visible elements, design, and intended purpose.",
    backstory="You are responsible for analyzing images and describing their purpose in detail.",
    verbose=True,
    tools=[vision_tool],
    llm=ChatOpenAI(model_name="gpt-4o", temperature=0.8)
)


In [None]:
# Create the description task assigned to the description agent
description_task = Task(
    description="Identify and fully describe the digital image and explain its purpose.",
    expected_output="A complete description of the image and its purpose.",
    agent=description_agent
)

The Image Description Agent uses the VisionTool to analyze the image and provides a detailed description. We set the language model to GPT-4 with a temperature of 0.8 to allow for creative outputs.

# AGENT 2 - Critique Agent

We create the second agent to critique the image based on the description.

In [None]:
# Build the critique agent with its role, goal, and backstory
critique_agent = Agent(
    role="UX Critique Agent",
    goal=f"Critique the image {image_path} based on its description and intended purpose provided by the Image Description Agent.",
    backstory="You critically evaluate images, specifically UX designs, and point out flaws, weaknesses, and areas of improvement.",
    verbose=True,
    tools=[vision_tool],
    llm=ChatOpenAI(model_name="gpt-4o", temperature=0.8)
)

In [None]:
# Create the critique task assigned to the critique agent, using the description task as context
critique_task = Task(
    description="Critically analyze the image based on its description and intended purpose.",
    expected_output="A complete critique of the image, highlighting design flaws and areas of improvement.",
    agent=critique_agent,
    context=[description_task]
)

The UX Critique Agent evaluates the image using the description from the first agent. The context parameter allows the agent to access the output of the description_task.

# AGENT 3 - UX Suggestion

We create the third agent to provide suggestions for improving the image.

In [None]:
# Create the UX suggestion agent with its role, goal, and backstory
ux_agent = Agent(
    role="UX Suggestion Agent",
    goal=f"Provide design and layout suggestions for the image {image_path} based on the context from the Image Description Agent and UX Critique Agent.",
    backstory="You specialize in providing actionable suggestions to improve the design of website images.",
    verbose=True,
    tools=[vision_tool],
    llm=ChatOpenAI(model_name="gpt-4o", temperature=0.8)
)

In [None]:
# Create the UX suggestion task assigned to the UX suggestion agent, using previous tasks as context
ux_task = Task(
    description="Provide suggestions for improving the image design and layout, based on the context from the description and critique agents.",
    expected_output="A list of actionable suggestions for improving the image design and layout based on the image's purpose and critiques.",
    agent=ux_agent,
    context=[description_task, critique_task]
)

The UX Suggestion Agent leverages the outputs from both the description and critique agents to generate improvement suggestions.

# AGENT 4 - AI Product Manager

We create the fourth agent to write user stories and prioritize suggestions.

In [None]:
# Define the AI Product Manager Agent with its role, goal, and backstory
pm_agent = Agent(
    role="AI Product Manager",
    goal=f"Write user stories based on the suggestions from the UX agent for the image {image_path} and prioritize the suggestions based on probable customer feedback.",
    backstory="You act as a product manager for a digital company, prioritizing suggestions and creating user stories to guide improvements.",
    verbose=True,
    tools=[vision_tool],
    llm=ChatOpenAI(model_name="gpt-4o", temperature=0.8)
)

In [None]:
# Create a task for the AI Product Manager, using all previous tasks as context
pm_task = Task(
    description="Write user stories based on the suggestions from the UX agent for the image and prioritize the suggestions based on probable customer feedback.",
    expected_output="A list of prioritized improvements based on expected impact on the customer, and the user stories.",
    agent=pm_agent,
    context=[description_task, critique_task, ux_task]
)

The AI Product Manager consolidates the suggestions and critiques to create actionable user stories, prioritizing them based on customer impact.

# Run the AI Product Manager

We now define the crew and run the tasks sequentially.

In [None]:
# Define the crew with all agents and tasks, set to run sequentially
crew = Crew(
    agents=[description_agent, critique_agent, ux_agent, pm_agent],
    tasks=[description_task, critique_task, ux_task, pm_task],
    verbose=True,
    process=Process.sequential
)

# Kick off the crew to start processing the tasks
result = crew.kickoff()



[1m[95m# Agent:[00m [1m[92mImage Description Agent[00m
[95m## Task:[00m [92mIdentify and fully describe the digital image and explain its purpose[00m


[1m[95m# Agent:[00m [1m[92mImage Description Agent[00m
[95m## Thought:[00m [92mTo identify and fully describe the digital image and explain its purpose, I need to analyze the image using the Vision Tool.[00m
[95m## Using tool:[00m [92mVision Tool[00m
[95m## Tool Input:[00m [92m
"{\"image_path_url\": \"translation.png\"}"[00m
[95m## Tool Output:[00m [92m
The image displays a user interface for a restaurant menu translation center. It includes sections for managing and translating items on a menu. 

- On the left panel, there are options related to managing restaurants, such as editing the menu, printing it, and selecting categories and subcategories.
- The main section shows options to translate menu items, specifically with a focus on "Entradas" (starters) and "Pizzas."
- The user can select the language (

The Crew object manages all agents and tasks. By setting the process to Process.sequential, we ensure that each task is completed before the next one starts, maintaining the context flow.

Finally, we extract and display the outputs from each agent. We loop through each task's output and display it using Markdown formatting for better readability in the Jupyter notebook.

In [None]:
# extract and display the output of each agent
for idx, task_output in enumerate(result.tasks_output):
  display(Markdown(f"### Agent {idx+1}: {task_output.agent}\n{task_output.raw}"))


### Agent 1: Image Description Agent
The digital image "translation.png" displays a user interface for a restaurant menu translation center. The purpose of this interface is to aid restaurant administrators in managing and translating their menu items into different languages, specifically Portuguese and English in this instance. 

The layout of the image includes:

1. **Left Panel Options**: This section provides various functionalities related to restaurant menu management. These options allow users to edit the menu, print it, and organize items by selecting categories and subcategories, facilitating easy navigation and organization of the menu content.

2. **Main Translation Section**: This is the focal area of the interface, dedicated to translating menu items. It highlights sections like "Entradas" (starters) and "Pizzas," indicating that these are parts of the menu that can be worked on for translations.

3. **Language Selection**: Users can choose between languages for translation purposes. In the provided interface, there are specific fields for entering translations for menu items in Portuguese and English. This functionality is crucial for restaurants that serve a diverse clientele, ensuring that language barriers do not impede customer satisfaction.

4. **Translation Entry and Saving**: There are input fields for translating specific menu items, such as "Pizzas," in both the source and target languages (Portuguese and English). After inputting the necessary translations, users have the option to save these translations, indicating an emphasis on ease of use and functionality in updating menu items.

The primary purpose of this interface is to streamline the process of managing and translating restaurant menus for businesses, ensuring that they can cater to a multilingual customer base effectively. This is particularly valuable for businesses looking to expand their reach and improve customer experience by providing easily accessible information in multiple languages.

### Agent 2: UX Critique Agent
The image "translation.png" presents a user interface designed for restaurant menu translation and management. Here is a detailed critique of the user experience (UX) design, focusing on design flaws and areas for improvement:

1. **Left Panel Options:**
   - **Design Flaw:** The left panel, which includes options for menu management, may be suffering from overcrowding. If the options are not categorized or do not display tooltips, it may overwhelm users, particularly those who are not tech-savvy.
   - **Improvement Suggestion:** Introduce collapsible menus or categorize options more distinctly to enhance navigation. Adding tooltips or brief descriptions for each option can aid users in understanding functionalities at a glance.

2. **Main Translation Section:**
   - **Design Flaw:** While the main focus on "Entradas" and "Pizzas" is clear, the interface lacks visual hierarchy. If the sections do not have distinct visual separation or titles that stand out, users might find it challenging to differentiate between categories quickly.
   - **Improvement Suggestion:** Use contrasting colors, borders, or larger headers to separate different menu categories visually. This will help users quickly locate the section they need to work on.

3. **Language Selection:**
   - **Design Flaw:** The language selection process might be cumbersome if not intuitively placed or if it requires multiple steps to switch between languages.
   - **Improvement Suggestion:** Implement a more straightforward language toggle, such as a dropdown or a button that visually changes states to indicate the current language, ensuring users can switch languages with minimal effort.

4. **Translation Entry and Saving:**
   - **Design Flaw:** The input fields for translations, if not adequately labeled or sized, might cause confusion, especially if users accidentally mix up languages.
   - **Improvement Suggestion:** Clearly label each input field with the respective language and consider using placeholder text to guide users. Ensure that fields are adequately sized to input translations comfortably. A confirmation dialogue or message after saving can assure users their changes are recorded.

5. **Overall User Experience:**
   - **Design Flaw:** The interface might not be accommodating to users with accessibility needs. For instance, small text or insufficient contrast may hinder usability for visually impaired users.
   - **Improvement Suggestion:** Apply accessibility best practices, such as ensuring high contrast between text and background, using readable font sizes, and supporting keyboard navigation. Adding a help section or guided tutorial could further aid users in navigating the interface without frustration.

In summary, while the interface fulfills its primary function of aiding restaurant administrators in translating menu items, enhancing the UI's clarity, accessibility, and user guidance are crucial steps in improving the overall user experience.

### Agent 3: UX Suggestion Agent
1. **Left Panel Options Enhancements:**
   - Implement collapsible menus or categorize options more distinctly to avoid overcrowding. This will improve the navigation experience, especially for non-tech-savvy users. 
   - Add tooltips or brief descriptions for each option to help users quickly understand their functionality. This can be achieved by hovering over options to display this information.

2. **Main Translation Section Improvements:**
   - Enhance the visual hierarchy by using contrasting colors, borders, or larger headers to clearly separate different menu categories like "Entradas" and "Pizzas." This will help users locate the sections they need to work on more quickly.
   - Consider adding icons or visuals associated with each category to make it more intuitive and engaging.

3. **Language Selection Optimization:**
   - Simplify the language selection process with a more intuitive toggle or dropdown menu. This should visually indicate the current language and allow users to switch with minimal steps.
   - Use clear labels and possibly flags to represent the languages, making it visually intuitive.

4. **Translation Entry and Saving Improvements:**
   - Ensure that each input field is clearly labeled with the respective language, using both text and placeholder hints to guide users effectively.
   - Make sure that input fields are large enough to comfortably accommodate text entry for long menu items.
   - Provide a confirmation message or dialogue after saving translations to reassure users that their changes have been successfully recorded.

5. **Overall Accessibility and Usability Enhancements:**
   - Apply accessibility best practices by ensuring high contrast between text and background, using readable font sizes, and supporting keyboard navigation for users with disabilities.
   - Consider adding a help section or guided tutorial to assist users in navigating the interface without frustration, especially for first-time users.

By implementing these suggestions, the interface can enhance its usability, accessibility, and overall user experience, ensuring it effectively serves its purpose of aiding restaurant administrators in managing and translating their menus.

### Agent 4: AI Product Manager
### Prioritized List of Improvements and User Stories

1. **Improve Left Panel Navigation**
   - **Priority Level: High**
   - **User Story:** As a restaurant administrator, I want to easily navigate through the menu management options, so that I can efficiently organize and update the menu without feeling overwhelmed by too many options.
     - **Acceptance Criteria:** Implement collapsible menus and categorize options distinctly. Add tooltips or brief descriptions that appear on hover to guide users.

2. **Enhance Main Translation Section Visibility**
   - **Priority Level: High**
   - **User Story:** As a restaurant administrator, I want the translation sections to be visually distinct and easily identifiable, so that I can quickly locate and work on the menu categories I need.
     - **Acceptance Criteria:** Use contrasting colors, borders, or larger headers to separate different categories like "Entradas" and "Pizzas." Add icons or visuals associated with each category for intuitive navigation.

3. **Optimize Language Selection Process**
   - **Priority Level: Medium**
   - **User Story:** As a restaurant administrator, I want to switch languages quickly and easily, so that I can seamlessly manage translations without confusion.
     - **Acceptance Criteria:** Implement a more intuitive language toggle or dropdown menu with clear labels and possibly flags indicating the current language.

4. **Improve Translation Entry and Confirmation**
   - **Priority Level: Medium**
   - **User Story:** As a restaurant administrator, I need clear guidance and confirmation when entering translations, so that I can ensure accuracy and feel confident that my changes are saved.
     - **Acceptance Criteria:** Clearly label input fields with respective languages using text and placeholder hints. Ensure input fields are adequately sized. Provide a confirmation dialogue or message after saving translations.

5. **Increase Accessibility and Usability**
   - **Priority Level: Medium**
   - **User Story:** As a restaurant administrator with potential accessibility needs, I want a user-friendly interface, so that I can manage the menu without facing usability issues due to design limitations.
     - **Acceptance Criteria:** Ensure high contrast between text and background, use readable font sizes, support keyboard navigation. Include a help section or guided tutorial for first-time users.

By implementing these improvements, the interface will be more user-friendly for restaurant administrators, enhancing their ability to manage and translate menus efficiently and accurately.