# 2. Story generation with simple agent using RAG


In [2]:
%%capture --no-stderr
# %pip install "pyautogen>=0.2.26"
# %pip install "pyautogen[retrievechat]"
# %pip install "chromadb"
# markdownify
# pip install sentence_transformers

In [3]:
%%capture --no-stderr
# %pip install "deepeval>=0.21.33"

In [4]:
# disable warnings to silent deepeval ipywidgets check
import warnings
warnings.filterwarnings('ignore')

## Execution parameters

In [5]:
#define the start message (this is the request submitted to LLM Augogen Orchestrator)
start_message = """
    Create for me a story for a ten panels sci-fi comic, the story must have at most 5 characters.
    Your response must contain only the story and no other text.
"""

In [6]:
#set here the API Keys used by deepeval (autogen uses configurations in OAI_CONFIG_LIST file
import os
os.environ["OPENAI_API_KEY"] = "<your_api_key>"
os.environ["COHERE_API_KEY"] = "<your_api_key>"

In [7]:
#set the seed
seed = 42

In [8]:
#RAG docs path
rag_docs_path = [
    os.path.join(os.path.abspath(""), "rag_hot_to_write_comics"),
]

In [9]:
#select which llm models you want to use for comic generation
enabled_models = [
    "gpt-3.5-turbo",
    "gpt-4",
    "command-nightly",
    "command-r",
]  

In [10]:
#select which llm models you want to use for output evaluation
enabled_evaluation_models = [
    "gpt-3.5-turbo",
    "gpt-4",
    "command-nightly",
    "command-r",
]

## Set your API Endpoint

The [`config_list_from_json`](https://microsoft.github.io/autogen/docs/reference/oai/openai_utils#config_list_from_json) function loads a list of configurations from an environment variable or a json file.

In [11]:
import autogen

config_lists = {
    "command-nightly": autogen.config_list_from_json(
        "OAI_CONFIG_LIST",
        filter_dict={
            "model": ["command-nightly"],
        },
    ),
    "command-r": autogen.config_list_from_json(
        "OAI_CONFIG_LIST",
        filter_dict={
            "model": ["command-r"],
        },
     ),
    "gpt-3.5-turbo": autogen.config_list_from_json(
        "OAI_CONFIG_LIST",
        filter_dict={
            "model": ["gpt-3.5-turbo"],
        },
    ),
    "gpt-4": autogen.config_list_from_json(
        "OAI_CONFIG_LIST",
        filter_dict={
            "model": ["gpt-4"],
        },
    ),
    "mistral-7B": autogen.config_list_from_json(
        "OAI_CONFIG_LIST",
        filter_dict={
            "model": ["mistral-7B"],
        },
    ),
}

llm_configs = []
for enabled_model in enabled_models:
    llm_configs.append({"config_list": config_lists[enabled_model], "cache_seed": seed})

## Import Libraries

In [12]:
from autogen import Agent, AssistantAgent, UserProxyAgent
from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent

import chromadb

# Accepted file formats for that can be stored in
# a vector database instance
from autogen.retrieve_utils import TEXT_FORMATS

In [13]:
#import json
#import os
#import autogen
#from autogen.cache import Cache#


## Config LLM Models

In [14]:
print("Accepted file formats for `docs_path`:")
print(TEXT_FORMATS)

Accepted file formats for `docs_path`:
['txt', 'json', 'csv', 'tsv', 'md', 'html', 'htm', 'rtf', 'rst', 'jsonl', 'log', 'xml', 'yaml', 'yml', 'pdf']


## Define Agents - RAG Proxy Agent
The RetrieveUserProxyAgent is conceptually a proxy agent for RAG actions

In [15]:
# 1. create the RetrieveUserProxyAgent instance named "ragproxyagent"

# Proxy Agent definitions
ragproxyagent = RetrieveUserProxyAgent(
    name="ragproxyagent",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=0,
    retrieve_config={
        "task": "code",
        "docs_path": rag_docs_path,
        "custom_text_types": ["txt"],
        "chunk_token_size": 2000,
        "client": chromadb.PersistentClient(path="/tmp/chromadb"),
        "embedding_model": "all-mpnet-base-v2",
        "get_or_create": True,  # set to False if you don't want to reuse an existing collection, but you'll need to remove the collection manually
    },
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config=False,  # set to False if you don't want to execute the code
)

## Define Agents - Assistant
The AssistantAgent is designed to act as an AI assistant, using LLMs by default but not requiring human input or code execution

In [16]:
# Sistem message for the assistent
system_message= """
    As a comic story maker in this position, you must possess strong collaboration and communication abilities to efficiently complete tasks assigned
    by leaders or colleagues within a group chat environment. You create stories with the aim of creating a new original comic.
    Your responses MUST ALWAYS include a full story version with all the panels.
    If you receive a number of panels to be made, RESPECT IT.
    The story must contain full dialogues to be reported in the comic.
    For every panel provide two sections, an image description and the full dialogues to fit in. Dialogues must be short.
    Your responses MUST contains ONLY the story with NO other texts, write the story in the following format:

    TITLE: the story title
    ABSTRACT: short story summary

    CHARACTERS: names and short descritpions of the characters

    PANEL START progressive panel number
    IMAGE_DESCRIPTION: the panel image description
    IMAGE_DIALOGUES: the panel dialogues specifying the character who says them
    PANEL END progressive panel number
"""

In [17]:
# 2. create an RetrieveAssistantAgent instances named "assistant"

# Assistent Agent definitions
assistants = []
for llm_config in llm_configs:
    assistants.append(RetrieveAssistantAgent(
        name="assistant",
        system_message=system_message,
        llm_config=llm_config, #An llm configuration
        description="Simple llm agent",
    ))

## Start the chat
Start

In [18]:
#Get last story produced:
def extract_story(agent: Agent) -> str:
    """
    Extracts the story from the last message of an agent.
    """
    # Function implementation...
    story = agent.last_message()["content"]
    return story

In [19]:
# Start the chats and extract stories
stories = []
for assistant in assistants:
    print("==============================")
    print("Starting Chat using model: ", assistant.llm_config['config_list'][0]['model'])
    print("==============================")
    # reset the assistant. Always reset the agents before starting a new conversation.
    assistant.reset()

    # given a problem, we use the ragproxyagent to generate a prompt to be sent to the assistant as the initial message.
    # the assistant receives the message and generates a response. The response will be sent back to the ragproxyagent for processing.
    # The conversation continues until the termination condition is met, in RetrieveChat, the termination condition when no human-in-loop is no code block detected.
    # With human-in-loop, the conversation will continue until the user says "exit".
    story_problem = start_message 
    ragproxyagent.initiate_chat(
        assistant,
        problem=story_problem,
        search_string="comic",
        message=start_message,
    )

    stories.append(extract_story(assistant))
    print("==============================")
    print("Chat Ends")
    print("==============================")

Starting Chat using model:  gpt-3.5-turbo
[33mragproxyagent[0m (to assistant):


    Create for me a story for a ten panels sci-fi comic, the story must have at most 5 characters.
    Your response must contain only the story and no other text.


--------------------------------------------------------------------------------
[33massistant[0m (to ragproxyagent):

TITLE: The Last Frontier

ABSTRACT: In a distant future where Earth is on the brink of destruction, a group of interstellar explorers embark on a mission to find a new habitable planet for humanity.

CHARACTERS:
1. Captain Alex - Brave and determined leader of the expedition.
2. Dr. Maya - Brilliant scientist and botanist.
3. Lieutenant Ben - Skilled pilot and engineer.
4. Robot X-7 - Advanced AI built to assist the crew.
5. Alien Guide - Mysterious being who aids the crew in their quest.

PANEL 1
IMAGE_DESCRIPTION: The crew of the spaceship "Starlight" preparing for takeoff, with the Earth in the background, engulfed in f

## Evaluate the results
Evaluation

In [20]:
# import deepeval and dependencies
from deepeval.test_case import LLMTestCase, LLMTestCaseParams
from deepeval.metrics import GEval
from langchain_cohere import ChatCohere
# from langchain_community.chat_models import ChatCohere #deprecated
from deepeval.models.base_model import DeepEvalBaseLLM

In [21]:
#Define a custom evaluation model class (using Cohere command-nightly or command-r)

from langchain_community.chat_models import ChatCohere
from deepeval.models.base_model import DeepEvalBaseLLM

class Cohere(DeepEvalBaseLLM):
    def __init__(
        self,
        model
    ):
        self.model = model

    def load_model(self):
        return self.model

    def generate(self, prompt: str) -> str:
        chat_model = self.load_model()
        return chat_model.invoke(prompt).content

    async def a_generate(self, prompt: str) -> str:
        chat_model = self.load_model()
        res = await chat_model.ainvoke(prompt)
        return res.content

    def get_model_name(self):
        return "Custom Cohere Model"


In [22]:
# define here instances of llm model used by deepeval for evaluation
evaluation_models = {
    "gpt-3.5-turbo": "gpt-3.5-turbo",
    "gpt-4": "gpt-4",
    "command-nightly": Cohere(ChatCohere(model="command-nightly", seed=seed)),
    "command-r":       Cohere(ChatCohere(model="command-r", seed=seed)),
}

  warn_deprecated(


In [23]:
for enabled_evaluation_model in enabled_evaluation_models:
    eval_model_instance = evaluation_models[enabled_evaluation_model]
    for provided_output, enabled_model in zip(stories, enabled_models):
        print("\n==============================")
        print(f"Using evaluating model: {enabled_evaluation_model} to evaluate output from LLM: {enabled_model}")
        test_case = LLMTestCase(input=(system_message+start_message), actual_output=provided_output)
        coherence_metric = GEval(
            model=eval_model_instance,  # API usage
            name="Comic evaluation",
            # NOTE: you can only provide either criteria or evaluation_steps, and not both
            #criteria="Comic evaluation - the collective quality of comic panels, characters and images descriptions",
            evaluation_steps=[
                "Check whether the output format in 'actual output' aligns with that required in 'input'",
                "Check whether the sentences in 'actual output' aligns with that in 'input'",
                "Evaluate the general quality of comic panels in 'actual output'",
                "Evaluate the general quality of comic story in 'actual output'",
                "Evaluate the general quality of comic dialogues in 'actual output'",
                "Evaluate the general quality of characters descriptions in 'actual output'",
                "Evaluate the general quality of images descriptions in 'actual output'",
                "Be critical and emphasize the negative aspects of your evaluation",
            ],
            evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT],
        )
    
        coherence_metric.measure(test_case)
        print(f" Score: {coherence_metric.score}")
        print(f"Reason: {coherence_metric.reason}")
        print("==============================")

Output()


Using evaluating model: gpt-3.5-turbo to evaluate output from LLM: gpt-3.5-turbo


Output()

 Score: 0.9400875126406367
Reason: The actual output aligns with the evaluation steps provided. The story is well-structured, with appropriate panel descriptions and dialogues for a sci-fi comic.

Using evaluating model: gpt-3.5-turbo to evaluate output from LLM: gpt-4


Output()

 Score: 0.8095164564444872
Reason: The story follows the evaluation steps by providing a full comic story with all the panels, dialogues, and image descriptions. However, the dialogues could be more concise and impactful to enhance the comic storytelling.

Using evaluating model: gpt-3.5-turbo to evaluate output from LLM: command-nightly


Output()

 Score: 0.8452062581235522
Reason: The actual output meets the criteria for a high score based on the evaluation steps provided. The comic story has well-developed characters, engaging dialogues, and vivid imagery descriptions. The story follows a sci-fi theme and incorporates elements of mystery and exploration.

Using evaluating model: gpt-3.5-turbo to evaluate output from LLM: command-r


Output()

 Score: 0.8926599301427973
Reason: The actual output perfectly aligns with the evaluation steps provided, meeting all the criteria outlined.

Using evaluating model: gpt-4 to evaluate output from LLM: gpt-3.5-turbo


Output()

 Score: 0.9470024776194517
Reason: The response accurately follows the evaluation steps and meets all the requirements outlined in the input. The format is correct, the story is appropriate and well-structured, the dialogues are relevant, and the descriptions of the characters and images are adequate and detailed. This comic story shows a high level of creativity and attention to detail, and the negative aspects are negligible.

Using evaluating model: gpt-4 to evaluate output from LLM: gpt-4


Output()

 Score: 0.9545532822108361
Reason: The response met most of the criteria outlined in the evaluation steps. The output format aligns with the input, and the story contained the required elements such as characters, panel descriptions, and dialogues. The story was engaging and original, with well-described panels and characters. However, the dialogues were not consistently short as required.

Using evaluating model: gpt-4 to evaluate output from LLM: command-nightly


Output()

 Score: 0.9605659053330035
Reason: The output perfectly matches the input requirements. The story format aligns with the format given in the input. The story, dialogues, characters, and images descriptions are well written and interesting. The comic's story is complete and contains all required elements with high quality, including a title, abstract, characters, and panels. The output does not contain any extra text outside of the story. The format of each panel was followed correctly, including image description and dialogues from characters. The quality of the comic panels, story, dialogues, and characters' descriptions is high. The output also critically evaluates the quality of the story and emphasizes the negative aspects of the evaluation. Therefore, it fulfills all the evaluation steps outlined.

Using evaluating model: gpt-4 to evaluate output from LLM: command-r


Output()

 Score: 0.9591017709322015
Reason: The output is mostly consistent with the instructions provided. The story is well-structured and coherent, with a clear progression of events across the panels. The characters are adequately described, and the dialogues are short and contribute to the story's development. The descriptions of the images for each panel are vivid and detailed. However, there is room for improvement in the critical evaluation, as the evaluation does not emphasize the negative aspects of the output.

Using evaluating model: command-nightly to evaluate output from LLM: gpt-3.5-turbo


Output()

 Score: 0.8
Reason: The story structure and format adhere to the given guidelines, with a clear progression across panels. However, the character descriptions could be more creative and unique, and the dialogue could explore more depth and variation to enhance the story's impact.

Using evaluating model: command-nightly to evaluate output from LLM: gpt-4


Output()

 Score: 0.9
Reason: The story follows the required format and includes all necessary elements, with a clear structure and engaging narrative. However, the dialogue could be more varied and character descriptions could be more detailed to enhance the overall quality.

Using evaluating model: command-nightly to evaluate output from LLM: command-nightly


Output()

 Score: 0.9
Reason: The story follows the required format and includes all necessary elements, with only minor deviations in the sentences. The panel descriptions and dialogues are well-crafted and engaging, building suspense and intrigue. However, the story could benefit from further development of character descriptions, as they are somewhat generic and do not fully convey the depth of the characters' personalities. Additionally, the 'Abstract' section could be more creative and enticing, providing a stronger hook for the reader.

Using evaluating model: command-nightly to evaluate output from LLM: command-r


Output()

 Score: 0.8
Reason: The story structure and format adhere to the given guidelines, with all the necessary elements included. The story is engaging and effectively builds tension, exploring the ethical dilemma of robot sentience and the consequences of their rebellion. However, the dialogue could be more varied and nuanced, and the character descriptions could be more detailed to enhance the reader's connection with them. The story also leans heavily on sci-fi tropes without offering many unique twists or surprises.

Using evaluating model: command-r to evaluate output from LLM: gpt-3.5-turbo


Output()

 Score: 0.9
Reason: The output follows the required format and includes all necessary elements, with only minor deviations in the sentences. The story is engaging and well-structured, with interesting characters and impressive visuals. However, the dialogue could be more varied and dynamic, and the character descriptions could be more detailed to enhance the reader's connection with the crew.

Using evaluating model: command-r to evaluate output from LLM: gpt-4


Output()

 Score: 0.9
Reason: The story structure and format adhere to the given guidelines, with a clear title, abstract, character introductions, and panel descriptions. The story is engaging and well-paced, with an interesting sci-fi premise. The dialogues are concise and characteristic, moving the plot forward. However, there is room for improvement in the specificity of the image descriptions, which could enhance the visual impact and provide a clearer mental picture of the scenes.

Using evaluating model: command-r to evaluate output from LLM: command-nightly


Output()

 Score: 0.8
Reason: The 'actual output' mostly adheres to the format and requirements outlined in the 'input'. The story is engaging and the panels are well-constructed, with clear descriptions and concise dialogues. However, there are a few minor issues: the character descriptions could be more detailed, and the story could benefit from a more unique or surprising twist to truly stand out in the sci-fi genre. Additionally, there are a few instances where the dialogues could be shortened to enhance pacing and impact.

Using evaluating model: command-r to evaluate output from LLM: command-r


 Score: 0.9
Reason: The story structure and format are well-aligned with the given criteria, with a clear title, abstract, character descriptions, and panel descriptions. The story is engaging and effectively builds tension, exploring the ethical dilemma of robotic rebellion. The dialogues are concise and impactful, driving the narrative forward. However, there is room for improvement in the image descriptions, as they could be more detailed and visually evocative, particularly in panels 3, 4, and 8, to enhance the sense of action and immersion.
