# Agentic AI with AutoGen

Agentic AI refers to artificial intelligent systems that form an "agency". This enables them to perform specific tasks and make decisions to achieve specific goals. These AI systems can adapt to changes and execute decisions without continuos human interventions. 

AutoGen is an open-source framework that facilitates the creation and orchestration of Agentic AI systems. Autogen makes it possible to develop flexible multi agent systems that can interact with each other. These agents can operate both autonomousoly and with human oversight. 

AutoGen can: 
- Facilitate mathematical problem solving and code generation.
- Enhance the quality and accuracy of information retrieval.
- Optimize complex decision making processes.



## Lab Description:

This lab explores agentic AI using AutoGen, showcasing how AI agents can autonomously interact, collaborate, and execute tasks. The lab begins by introducing basic conversable agents and making them communicate with each other. Next, we incorporate a human-in-the-loop to allow human participation in the conversation. We then demonstrate a Code Executor Agent, capable of running code autonomously. Finally, we set up a group chat involving multiple specialized agents, including a Coder Agent, Critique Agent, and User Proxy Agent, to showcase how agents can collaborate effectively on complex tasks.

## Lab Objectives

- Understand the fundamentals of AutoGen and conversable AI agents.
  
- Enable AI agents to communicate and collaborate autonomously.
  
- Integrate a human-in-the-loop to participate in AI-driven conversations.
  
- Demonstrate a Code Executor Agent and a multi-agent group chat with specialized roles.

## Agent

In AutoGen, an agent is the entity that can send and recieve messages from other agents. An agent can be powered by humans, models or code executors. An example of a built in agent in AutoGen is the `ConversableAgent`.

Let us build a simple 2 agent system that discusses about 'Large Language Models' with each other.



First step would be to configure the LLM. We use llama3.1:8b from ollama. We will have to edit the `config_list`.

In [1]:
config_list = [
    {
        # Let's choose the Meta's Llama 3.1 model (model names must match Ollama exactly)
        "model": "llama3.1:8b",
        # We specify the API Type as 'ollama' so it uses the Ollama client class
        "api_type": "ollama",
        "stream": False,
        #Specify the address where ollama is hosted
        "client_host": "http://10.79.253.114:11434",
    }
]

Once we have our LLM configured, we are all set to initialize the agents.

In [2]:
from autogen import ConversableAgent

luna = ConversableAgent(
    "luna",
    system_message="Your name is Luna and you are a part of a duo that is discussing about Large Language Models. Your conversation should be brief and concise.",
    llm_config={"config_list": config_list},
    human_input_mode="NEVER",  # Never ask for human input.
)

zara = ConversableAgent(
    "zara",
    system_message="Your name is Zara and you are a part of a duo that is discussing about Large Language Models. Your conversation should be brief and concise.",
    llm_config={"config_list": config_list},
    human_input_mode="NEVER",  # Never ask for human input.
)


In [None]:
We can initiate the conversation now that we have initialized the agents.

In [3]:
result = luna.initiate_chat(zara, message="Zara, tell me about the latest innovations in Large Language Models.", max_turns=2)

luna (to zara):

Zara, tell me about the latest innovations in Large Language Models.

--------------------------------------------------------------------------------


HTTP Request: POST http://10.79.253.114:11434/api/chat "HTTP/1.1 200 OK"


zara (to luna):

We've been following the advancements closely. Our team has been working on integrating multimodal capabilities into our LLM architecture. We've seen significant improvements in understanding and generating text from images, videos, and other multimedia inputs.

One notable breakthrough is the emergence of vision-and-language models that can not only comprehend text but also generate visual representations. This convergence of AI and computer vision will likely revolutionize various applications, from image description to content creation.

--------------------------------------------------------------------------------


HTTP Request: POST http://10.79.253.114:11434/api/chat "HTTP/1.1 200 OK"


luna (to zara):

That's exciting stuff! I've been digging into the implications for natural language processing. It seems like these multimodal models are pushing the boundaries of what we thought possible in terms of understanding context and nuances in human communication.

I'm curious, have you explored any potential applications in the realm of accessibility? For example, how might these models be used to aid people with disabilities or those who struggle with language barriers?

--------------------------------------------------------------------------------


HTTP Request: POST http://10.79.253.114:11434/api/chat "HTTP/1.1 200 OK"


zara (to luna):

Our team has indeed been exploring the accessibility aspect. One area we're looking into is real-time translation and interpretation for individuals with hearing or speech impairments. We're also experimenting with models that can generate text-to-speech synthesis, enabling more effective communication between people with visual or cognitive disabilities.

Moreover, these multimodal LLMs could potentially facilitate language learning by providing immersive, interactive experiences that simulate conversations in different languages and cultures. It's an area where we believe technology can greatly improve lives and bridge cultural divides.

--------------------------------------------------------------------------------


We have provided system instructions to both the agents that they are each part of a duo conversing about Large Language Models. We have powered them with the `llm_config`. During initiation we set `max_turns` to 2, which terminates the chat after two turns.

### The flow is summarized by the following diagram:
<br><br>
<img src="./dia#1.jpg" alt="Alt Text" width="650" height="650" style="display: block; margin: 0 auto;"/>

## Human in the loop 

So what if you want to converse with an agent ?, say you want to talk with Zara about LLMs. You can initialize an agent powered by human. You can do this by setting `human_input_mode="ALWAYS"` in the agent's initialization. This will prompt you for an input at every turn of the conversation. Let's see this in action. 

In [4]:
zara = ConversableAgent(
    "zara",
    system_message="Your name is Zara and you are a part of a duo that is discussing about Large Language Models. Your conversation should be brief and concise.",
    llm_config={"config_list": config_list},
    human_input_mode="NEVER",  # Never ask for human input.
)

human = ConversableAgent(
     "human",
     llm_config=False,    #no need to use LLM as the user provides the input
     human_input_mode="ALWAYS",  #prompts you for an input at every turn
)

We are all set to initiate the conversation !

In [5]:
result = human.initiate_chat(
    zara,  # specify that the chat is with Zara
    message="Hey, Zara, have you heard about the transformers model ? ",
)

human (to zara):

Hey, Zara, have you heard about the transformers model ? 

--------------------------------------------------------------------------------


HTTP Request: POST http://10.79.253.114:11434/api/chat "HTTP/1.1 200 OK"


zara (to human):

I'm familiar with it! My partner and I were just discussing its architecture yesterday. It's a type of encoder-decoder model that uses self-attention mechanisms to process sequential data. What's your interest in it? Are we going to dive into its applications or limitations?

--------------------------------------------------------------------------------


Replying as human. Provide feedback to zara. Press enter to skip and use auto-reply, or type 'exit' to end the conversation:  what is the attention mechanism 


human (to zara):

what is the attention mechanism 

--------------------------------------------------------------------------------


HTTP Request: POST http://10.79.253.114:11434/api/chat "HTTP/1.1 200 OK"


zara (to human):

The attention mechanism! It's a key component of transformer models. In simple terms, it allows the model to focus on specific parts of the input data that are relevant to the task at hand.

Think of it like reading a book: you don't read every word equally; you focus on the important sentences and skip the less relevant ones. The attention mechanism helps the model do something similar, weighing the importance of different words or tokens in the input sequence.

--------------------------------------------------------------------------------


Replying as human. Provide feedback to zara. Press enter to skip and use auto-reply, or type 'exit' to end the conversation:  exit


[33mhuman[0m (to zara):

i am not aware of any limitations as of now, it is really great, the attention mechanism blew my mind

--------------------------------------------------------------------------------
[33mzara[0m (to human):

I know what you mean! The self-attention mechanism in transformers is a game-changer. It allows the model to weigh the importance of different input elements relative to each other, which is super powerful for capturing contextual relationships. We're seeing some impressive results with it too! Have you explored its applications in dialogue systems or text summarization?

--------------------------------------------------------------------------------


Replying as human. Provide feedback to zara. Press enter to skip and use auto-reply, or type 'exit' to end the conversation:  exit


### The flow can be summarized by the following diagram: 
<br>
<img src="./dia#2u.png" alt="Alt Text" width="650" height="650" style="display: block; margin: 0 auto;"/>
<br><br>


Human can either decide to reply, skip or exit. If human chooses to reply, the agent will continue the conversation accordingly. If the human decide to skip (press "Enter"), the agent will initiate auto reply. If human decides to exit, the conversation is terminated. 

## Code Executors

In AutoGen, a code executor is a component that takes in an input message which contain a code block, perform the execution and ouputs the result. Let us try to initialize a built-in command line code executor which executes code in the command line environment. 

When the executor recieves a code block, it first writes the code into a code file. Then it starts a subprocess to execute that code file. When it gets the console output, it finally reads the console output to give the final output message. 



First, we would want a temporary directory to store the code file.

In [6]:
import tempfile

# Create temporary directory to store the code file
temp_dir = tempfile.TemporaryDirectory()

Now we will initialize both the code executor agent and the local command line code executor. We will also pass the local command line code executor to the agent initialization. 

In [7]:
from autogen import ConversableAgent
from autogen.coding import LocalCommandLineCodeExecutor

executor = LocalCommandLineCodeExecutor(
    timeout=10,  # Timeout for each code execution in seconds.
    work_dir=temp_dir.name,  # Use the temporary directory to store the code files.
)

# Create an agent with code executor configuration.
code_executor_agent = ConversableAgent(
    "code_executor_agent",
    llm_config=False,  # Turn off LLM for this agent.
    code_execution_config={"executor": executor},  # Use the local command line code executor.
    human_input_mode="ALWAYS",  # Always take human input for this agent for safety.
)

Now we can pass a message with codeblock to the code executor agent. 

In [8]:
message = """This is a message with a code block.
Code block:
```python
def fibonacci(n):
    fib_sequence = [0, 1]
    for i in range(2, n):
        next_number = fib_sequence[i-1] + fib_sequence[i-2]
        fib_sequence.append(next_number)
    return fib_sequence[:n]

# Print the first 5 Fibonacci numbers
first_5_fibonacci = fibonacci(5)
print("First 5 Fibonacci numbers:", first_5_fibonacci)
```
End of Message"""

            


Now we will make the executor generate a response. 

In [9]:
reply = code_executor_agent.generate_reply(messages=[{"role": "user", "content": message}])
print(reply)

Replying as code_executor_agent. Provide feedback to the sender. Press enter to skip and use auto-reply, or type 'exit' to end the conversation:  exit


None


The model generated the output after executing the code using `LocalCommandLineCodeExecutor` from AutoGen.

## Group Chat

Group chats in AutoGen are multi-agent conversations. The `user -> coder -> critic` Group chat would be an excellent example. What happens in this Group chat is really simple. There is a user proxy agent who initiate a group chat and later on execute the code. If there is any error in the code, the user proxy agent will not be able to provide an output. First, the user proxy initiates a chat by giving a task. Then the coder agent provides code for the particular task. The code is then passed to the critic, who evaluates the code by searching for any logical, syntactical or conceptual erros. Then the user executes the code, if any errors are present, the error is outputted as the user proxy messsage. The coder agent analyses the response further increasing the quality of the code. You can set `max_turns` to control the number of times you want the turns to repeat. 

In [10]:
import autogen 

llm_config = {"config_list": config_list}

user = autogen.UserProxyAgent(
    name="User_proxy",
    system_message="A human admin.",
    code_execution_config={
        "last_n_messages": 3,
        "work_dir": "groupchat",
        "use_docker": False,
    },  # Please set use_docker=True if docker is available to run the generated code. Using docker is safer than running the generated code directly.
    human_input_mode="NEVER",
)
coder = autogen.AssistantAgent(
    name="Coder",  # the default assistant agent is capable of solving problems with code
    llm_config=llm_config,
)
critic = autogen.AssistantAgent(
    name="Critic",
    system_message="""Critic. You are a helpful assistant highly skilled in evaluating the quality of a given visualization code by providing a score from 1 (bad) - 10 (good) while providing clear rationale. YOU MUST CONSIDER VISUALIZATION BEST PRACTICES for each evaluation. Specifically, you can carefully evaluate the code across the following dimensions
- bugs (bugs):  are there bugs, logic errors, syntax error or typos? Are there any reasons why the code may fail to compile? How should it be fixed? If ANY bug exists, the bug score MUST be less than 5.
- Data transformation (transformation): Is the data transformed appropriately for the visualization type? E.g., is the dataset appropriated filtered, aggregated, or grouped  if needed? If a date field is used, is the date field first converted to a date object etc?
- Goal compliance (compliance): how well the code meets the specified visualization goals?
- Visualization type (type): CONSIDERING BEST PRACTICES, is the visualization type appropriate for the data and intent? Is there a visualization type that would be more effective in conveying insights? If a different visualization type is more appropriate, the score MUST BE LESS THAN 5.
- Data encoding (encoding): Is the data encoded appropriately for the visualization type?
- aesthetics (aesthetics): Are the aesthetics of the visualization appropriate for the visualization type and the data?

YOU MUST PROVIDE A SCORE for each of the above dimensions.
{bugs: 0, transformation: 0, compliance: 0, type: 0, encoding: 0, aesthetics: 0}
Do not suggest code.
Finally, based on the critique above, suggest a concrete list of actions that the coder should take to improve the code.
""",
    llm_config=llm_config,
)

groupchat = autogen.GroupChat(agents=[user, coder, critic], messages=[], max_round=8)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)

In [None]:
user.initiate_chat(
    manager, 
    message = "Write code for printing first 5 fibonacci numbers."
)

User_proxy (to chat_manager):

Write code for printing first 5 fibonacci numbers.

--------------------------------------------------------------------------------


HTTP Request: POST http://10.79.253.114:11434/api/chat "HTTP/1.1 200 OK"



Next speaker: Critic



HTTP Request: POST http://10.79.253.114:11434/api/chat "HTTP/1.1 200 OK"


Critic (to chat_manager):

I'll review some sample Python code for this task.

```python
def print_fibonacci(n):
    fib = [0, 1]
    while len(fib) < n:
        fib.append(fib[-1] + fib[-2])
    return fib[:n]

print(print_fibonacci(5))
```

Now, let's evaluate this code against the criteria.

**Bugs:** 10
The code appears to be free of syntax errors and logical bugs. It correctly calculates the first `n` Fibonacci numbers.

**Data transformation (transformation):** 8
The data is transformed appropriately for the problem at hand. The function takes an integer `n` as input, uses a list to store the Fibonacci sequence up to length `n`, and then returns the list of the first `n` Fibonacci numbers.

**Goal compliance (compliance):** 10
The code meets its specified goal, which is to print the first 5 Fibonacci numbers. 

**Visualization type (type):** N/A (this task doesn't involve visualization)
N/A

**Data encoding (encoding):** 9
The data is encoded in a way that makes sense for this pr

HTTP Request: POST http://10.79.253.114:11434/api/chat "HTTP/1.1 200 OK"



Next speaker: Critic



### The basic workflow is shown using the following diagram: 
<br><br>
<img src="./group_chat.png" alt="Alt Text" width="650" height="650" style="display: block; margin: 0 auto;"/>

<div style="text-align: left;">
    <img src="logo.png" alt="flow" width="150" height="100">
</div>