<a href="https://colab.research.google.com/gist/salekh/f850a3e435fdad401cd28bb0775fc541/sap-dkom-autogen-exercise.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Workshop: Building AI Agents with AutoGen and Magentic-One

## Environment Setup

In [1]:
%pip install --upgrade --quiet langchain-community
%pip install --upgrade --quiet autogen-agentchat
%pip install --upgrade --quiet autogen-ext
%pip install --upgrade --quiet arxiv
%pip install --upgrade --quiet colorama
%pip install --upgrade --quiet python-dotenv

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
# Standard library imports
import asyncio
import json
import os
from datetime import datetime
from logging import INFO, FileHandler, Formatter, Logger, getLogger
from random import choice
from typing import Sequence

# Autogen related imports
from autogen_agentchat.agents import (AssistantAgent, BaseChatAgent,
                                      UserProxyAgent)
from autogen_agentchat.base import Response
from autogen_agentchat.messages import ChatMessage, TextMessage
from autogen_agentchat.teams import MagenticOneGroupChat
from autogen_agentchat.ui import Console
from autogen_core import CancellationToken
from autogen_core.models._types import UserMessage
from autogen_ext.models.openai import AzureOpenAIChatCompletionClient
from autogen_ext.models.openai.config import ResponseFormat
from autogen_ext.tools.langchain import LangChainToolAdapter
from colorama import Fore, Style

# Third-party imports
from dotenv import load_dotenv
from langchain_community.tools.arxiv import ArxivQueryRun
from langchain_community.tools.bing_search import BingSearchResults
from langchain_community.utilities import BingSearchAPIWrapper

In [3]:

load_dotenv()

configs = [
    {
        "key": os.getenv("AOAI_KEY"),
        "endpoint": os.getenv("AOAI_ENDPOINT"),
        "deployment": os.getenv("AOAI_DEPLOYMENT"),
        "bing_search_url": os.getenv("BING_SEARCH_URL"),
        "bing_subscription_key": os.getenv("BING_SUBSCRIPTION_KEY")
    }
]

random_config = choice(configs)

## Utils

In [5]:
def display_final_answer(answer):
    print(Style.BRIGHT + Fore.GREEN + f"Final answer:\n{answer}" + Style.RESET_ALL)


def display_judgement(judgement):
    print(Style.BRIGHT + Fore.RED + judgement + Style.RESET_ALL)


def get_azure_openai_model_client():
    model_client = AzureOpenAIChatCompletionClient(
        model="gpt-4o-2024-11-20",
        azure_deployment=random_config["deployment"],
        api_key=random_config["key"],
        azure_endpoint=random_config["endpoint"],
        api_version="2025-01-01-preview",
        max_retries=10,
        temperature=0,
    )
    return model_client

# Integrate tools from Langchain
arxiv_tool = LangChainToolAdapter(ArxivQueryRun())
bing_search_api_wrapper = BingSearchAPIWrapper(bing_search_url=random_config["bing_search_url"],
                                               bing_subscription_key=random_config["bing_subscription_key"])

# Wrap the Bing search tool as a function that complies with AutoGen's tool interface.
def bing_search_tool(query: str) -> str:
    """
    Given a search query, this function uses LangChain's Bing Search tool
    to return search results.
    """
    try:
        result = bing_search_api_wrapper.run(query)
        return result
    except Exception as err:
        return f"Error during search: {err}"

current_date_prompt: str = "Whenever asked for latest data, keep in mind that today is 1st April 2025."

## Agentic Reflection Pattern (Researcher)

In [6]:
model_client = get_azure_openai_model_client()

# Write your code here:

# Integrate tools from Langchain
arxiv_tool = LangChainToolAdapter(ArxivQueryRun())
researcher = AssistantAgent("Researcher",
                       model_client=model_client,
                       system_message="You are a research assistant. Find relevant articles and prepare them for the target audience.",
                       description="An agent that finds articles on arXiv and prepares them for the target audience.",
                       tools=[arxiv_tool]
                       )

team = MagenticOneGroupChat(participants=[researcher], model_client=model_client)
answer = await Console(team.run_stream(task="Explain reinforcement learning in simple terms, and use some recent research articles." + current_date_prompt))
display_final_answer(answer.messages[-1].content)

---------- user ----------
Explain reinforcement learning in simple terms, and use some recent research articles.Whenever asked for latest data, keep in mind that today is 1st April 2025.
---------- MagenticOneOrchestrator ----------

We are working to address the following user request:

Explain reinforcement learning in simple terms, and use some recent research articles.Whenever asked for latest data, keep in mind that today is 1st April 2025.


To answer this request we have assembled the following team:

Researcher: An agent that finds articles on arXiv and prepares them for the target audience.


Here is an initial fact sheet to consider:

### 1. GIVEN OR VERIFIED FACTS
- The request is to explain reinforcement learning in simple terms.
- The explanation should incorporate recent research articles.
- The current date is specified as 1st April 2025.

### 2. FACTS TO LOOK UP
- Recent research articles on reinforcement learning (published in 2024 or early 2025). These could be found

## Agentic Reflection Pattern (Researcher + Critic)

In [13]:
model_client = get_azure_openai_model_client()

# Write your code here:

# Integrate tools from Langchain
arxiv_tool = LangChainToolAdapter(ArxivQueryRun())
researcher = AssistantAgent("Researcher",
                       model_client=model_client,
                       system_message="You are a research assistant. Find relevant articles and prepare them for the target audience.",
                       description="An agent that finds articles on arXiv and prepares them for the target audience.",
                       tools=[arxiv_tool]
                       )

critic = AssistantAgent("Critic",
                       model_client=model_client,
                       system_message="You are a professor with deep knowledge. "
                        + "Whenever presented with research on a particular topic, you analyze the content and provide insights on how to improve it."
                        + "You also find gaps in the research and provide specific guidance and constructive feedback for improving it."
                        + "You also analyze if the researcher agent is getting articles that follow the user's request closely.",
                       description="An critic that analyzes the research work and provides constructive feedback for improvement",
                       )

team = MagenticOneGroupChat(participants=[researcher, critic], model_client=model_client)
answer = await Console(team.run_stream(task="What are the most effective ways of post-training large-language models using Reinforcement Learning?" +
                                       "Write an essay and provide a summary and specific examples of SoTA models and the techniques used in a table." +
                                       current_date_prompt))
display_final_answer(answer.messages[-1].content)

---------- user ----------
What are the most effective ways of post-training large-language models using Reinforcement Learning?Write an essay and provide a summary and specific examples of SoTA models and the techniques used in a table.Whenever asked for latest data, keep in mind that today is 1st April 2025.
---------- MagenticOneOrchestrator ----------

We are working to address the following user request:

What are the most effective ways of post-training large-language models using Reinforcement Learning?Write an essay and provide a summary and specific examples of SoTA models and the techniques used in a table.Whenever asked for latest data, keep in mind that today is 1st April 2025.


To answer this request we have assembled the following team:

Researcher: An agent that finds articles on arXiv and prepares them for the target audience.
Critic: An critic that analyzes the research work and provides constructive feedback for improvement


Here is an initial fact sheet to consider

## Agentic Reflection Pattern (Team of Agents with an additional Web Grounding Agent)

In [14]:
from threading import current_thread
model_client = get_azure_openai_model_client()

# Write your code here:
researcher = AssistantAgent("Researcher",
                       model_client=model_client,
                       system_message="You are a research assistant. Find relevant articles and prepare them for the target audience.",
                       description="An agent that finds articles on arXiv and prepares them for the target audience.",
                       tools=[arxiv_tool]
                       )

critic = AssistantAgent("Critic",
                       model_client=model_client,
                       system_message="You are a professor with deep knowledge. "
                        + "Whenever presented with research on a particular topic, you analyze the content and provide insights on how to improve it."
                        + "You also find gaps in the research and provide specific guidance and constructive feedback for improving it."
                        + "You also analyze if the researcher agent is getting articles that follow the user's request closely.",
                       description="An critic that analyzes the research work and provides constructive feedback for improvement",
                       )

web_grounding_agent = AssistantAgent("WebGrounding",
                                     model_client=model_client,
                                     system_message="You are a web grounding agent with access to search results from the Bing search engine. "
                                      + "Your main task is fetching information from the web about latest facts and research advancements. "
                                      + "You supplant academic information with up-to-date facts and numbers, and also correct old facts. "
                                      + "You don't remove any existing information, but add your additional facts in brackets after the respective sentence. "
                                      + "Your aim is to make sure that there is absolutely no wrong or out-of-date information or facts stated in the answer. ",
                                    description="A web grounding critic that analyses the text and adds up-to-date information whenever necessary. ",
                                    tools=[bing_search_tool]
                                    )



team = MagenticOneGroupChat(participants=[researcher, critic, web_grounding_agent], model_client=model_client)
answer = await Console(team.run_stream(task="What are the latest SoTA LLMs developed by OpenAI, DeepMind, Meta and Anthropic. " +
                                       "Study the system cards and performance benchmarks available in the public domain on the web" +
                                       "explain the unique features, strengths and weaknesses of each model. If the model has used " +
                                       "any groundbreaking method for its training, find that in academic literature and explain what is novel about it." +
                                       current_date_prompt))
display_final_answer(answer.messages[-1].content)

---------- user ----------
What are the latest SoTA LLMs developed by OpenAI, DeepMind, Meta and Anthropic. Study the system cards and performance benchmarks available in the public domain on the webexplain the unique features, strengths and weaknesses of each model. If the model has used any groundbreaking method for its training, find that in academic literature and explain what is novel about it.Whenever asked for latest data, keep in mind that today is 1st April 2025.
---------- MagenticOneOrchestrator ----------

We are working to address the following user request:

What are the latest SoTA LLMs developed by OpenAI, DeepMind, Meta and Anthropic. Study the system cards and performance benchmarks available in the public domain on the webexplain the unique features, strengths and weaknesses of each model. If the model has used any groundbreaking method for its training, find that in academic literature and explain what is novel about it.Whenever asked for latest data, keep in mind t

## Human in the Loop Pattern

Agents are effective at knowledge work, but in many instances, it is very useful to have human feedback while the agents are performing their task.

This makes sure that the agents are able to work on the larger goal more effectively and increases the usefulness of the answer.

In [15]:
model_client = get_azure_openai_model_client()

# Write your code here:
# Integrate tools from Langchain
bing_search_api_wrapper = BingSearchAPIWrapper(bing_search_url=random_config["bing_search_url"],
                                               bing_subscription_key=random_config["bing_subscription_key"])

# Wrap the Bing search tool as a function that complies with AutoGen's tool interface.
def bing_search_tool(query: str) -> str:
    """
    Given a search query, this function uses LangChain's Bing Search tool
    to return search results.
    """
    try:
        result = bing_search_api_wrapper.run(query)
        return result
    except Exception as err:
        return f"Error during search: {err}"

critic = AssistantAgent("Critic",
                       model_client=model_client,
                       system_message="You are a professor with deep knowledge. "
                        + "Whenever presented with research on a particular topic, you analyze the content and provide insights on how to improve it."
                        + "You also find gaps in the research and provide specific guidance and constructive feedback for improving it."
                        + "You also analyze if the researcher agent is getting articles that follow the user's request closely.",
                       description="An critic that analyzes the research work and provides constructive feedback for improvement",
                       )

web_grounding_agent = AssistantAgent("WebGrounding",
                                     model_client=model_client,
                                     system_message="You are a web grounding agent with access to search results from the Bing search engine. "
                                      + "Your main task is fetching information from the web about latest facts and research advancements. "
                                      + "You supplant academic information with up-to-date facts and numbers, and also correct old facts. "
                                      + "You don't remove any existing information, but add your additional facts in brackets after the respective sentence. "
                                      + "Your aim is to make sure that there is absolutely no wrong or out-of-date information or facts stated in the answer. ",
                                    description="A web grounding critic that analyses the text and adds up-to-date information whenever necessary. ",
                                    tools=[bing_search_tool]
                                    )

user = UserProxyAgent("User")


team = MagenticOneGroupChat(participants=[web_grounding_agent, critic, user], model_client=model_client)
answer = await Console(team.run_stream(task="What are the different enterprise use-cases from various industries where LLM fine-tuning can be required? " +
                                       "Make an initial research and provide 10-12 most important use-cases to the user. Then prompt the user to select a " +
                                       "few of those use-cases or add any from their own side. Once they do this, perform a deep research on how each one of " +
                                       "these use-cases can benefit from fine-tuning." + current_date_prompt))
display_final_answer(answer.messages[-1].content)

---------- user ----------
What are the different enterprise use-cases from various industries where LLM fine-tuning can be required? Make an initial research and provide 10-12 most important use-cases to the user. Then prompt the user to select a few of those use-cases or add any from their own side. Once they do this, perform a deep research on how each one of these use-cases can benefit from fine-tuning.
---------- MagenticOneOrchestrator ----------

We are working to address the following user request:

What are the different enterprise use-cases from various industries where LLM fine-tuning can be required? Make an initial research and provide 10-12 most important use-cases to the user. Then prompt the user to select a few of those use-cases or add any from their own side. Once they do this, perform a deep research on how each one of these use-cases can benefit from fine-tuning.


To answer this request we have assembled the following team:

WebGrounding: A web grounding critic th

## LLM as a Judge Pattern (Extra Task: Overloading AgentChat)

In [8]:
model_client = get_azure_openai_model_client()

class AgentWithJudge(BaseChatAgent):
    def __init__(self, agent: AssistantAgent):
        super().__init__(agent.name, agent.description)
        self.inner_agent = agent

    @property
    def produced_message_types(self) -> Sequence[type[ChatMessage]]:
        return [TextMessage]

    async def on_reset(self, cancellation_token: CancellationToken) -> None:
        pass

    async def on_messages(self, messages: Sequence[ChatMessage], cancellation_token: CancellationToken) -> Response:
        answer = await self.inner_agent.on_messages(messages, cancellation_token)
        question = "\n\n".join([m.content for m in messages])
        evaluation_task = f"""Evaluate the agent's performance on a scale from 1 to 10. Provide a brief explanation of your evaluation.
        Question: {question}
        Answer: {answer}
        Respond in the following JSON format: {{ "evaluation": 10, "explanation": "The agent provided a comprehensive answer to the question." }}"""

        evaluation = await model_client.create([UserMessage(source="user", content=evaluation_task)], json_output=True)

        display_judgement(f"Agent: {self.name}\n\nQuestion: {question}\n\nAnswer: {answer}\n\nEvaluation: {evaluation.content}")

        result = Response(
            chat_message=TextMessage(source=self.name, content=answer.chat_message.content))
        return result


In [11]:
model_client = get_azure_openai_model_client()

# Write your code here:
# Add LLM as a judge task to the existing web grounding and critic agents
web_grounding_agent = AssistantAgent("WebGrounding",
                                     model_client=model_client,
                                     system_message="You are a web grounding agent with access to search results from the Bing search engine. "
                                      + "Your main task is fetching information from the web about latest facts and research advancements. "
                                      + "You supplant academic information with up-to-date facts and numbers, and also correct old facts. "
                                      + "You don't remove any existing information, but add your additional facts in brackets after the respective sentence. "
                                      + "Your aim is to make sure that there is absolutely no wrong or out-of-date information or facts stated in the answer. ",
                                    description="A web grounding agent that analyses the text and adds up-to-date information whenever necessary. ",
                                    tools=[bing_search_tool]
                                    )
web_grounding_agent_with_judge = AgentWithJudge(web_grounding_agent)

team = MagenticOneGroupChat(participants=[web_grounding_agent_with_judge], model_client=model_client)
answer = await Console(team.run_stream(task="Describe the training method of Deepseek r1 based on its academic paper, " +
                                       "and mention all the important inventions made during its development. " +
                                       "Write a report and summarize your findings in a table." + current_date_prompt))
display_final_answer(answer.messages[-1].content)

---------- user ----------
Describe the training method of Deepseek r1 based on its academic paper, and mention all the important inventions made during its development. Write a report and summarize your findings in a table.Whenever asked for latest data, keep in mind that today is 1st April 2025.
---------- MagenticOneOrchestrator ----------

We are working to address the following user request:

Describe the training method of Deepseek r1 based on its academic paper, and mention all the important inventions made during its development. Write a report and summarize your findings in a table.Whenever asked for latest data, keep in mind that today is 1st April 2025.


To answer this request we have assembled the following team:

WebGrounding: A web grounding agent that analyses the text and adds up-to-date information whenever necessary.


Here is an initial fact sheet to consider:

### 1. GIVEN OR VERIFIED FACTS
- The request is about describing the training method of "Deepseek r1" base