<a href="https://colab.research.google.com/github/microsoft/autogen/blob/main/notebook/agentchat_video_transcript_translate_with_whisper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a id="toc"></a>
# Auto Generated Agent Chat: Translating Video audio using Whisper and GPT-3.5-turbo
In this notebook, we demonstrate how to use whisper and GPT-3.5-turbo with `AssistantAgent` and `UserProxyAgent` to recognize and translate
the speech sound from a video file and add the timestamp like a subtitle file based on [agentchat_function_call.ipynb](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_function_call.ipynb)


## Requirements
AutoGen requires `Python>=3.8`. To run this notebook example, please install `openai`, `pyautogen`, `whisper`, and `moviepy`:
```bash
pip install openai
pip install openai-whisper
pip install moviepy
pip install pyautogen
```

In [None]:
%%capture --no-stderr
# %pip install moviepy~=1.0.3
# %pip install openai-whisper~=20230918
# %pip install openai~=1.3.5
# %pip install "pyautogen>=0.2.3"

## Set your API Endpoint
It is recommended to store your OpenAI API key in the environment variable. For example, store it in `OPENAI_API_KEY`.

In [8]:
import os
import io
import json
import logging
import sys
import json

from dataclasses import dataclass

from langchain.output_parsers.openai_functions import JsonOutputFunctionsParser, JsonKeyOutputFunctionsParser
from langchain_openai import ChatOpenAI
from langchain.prompts.chat import ChatPromptTemplate
from langchain.utils.openai_functions import convert_pydantic_to_openai_function

from typing_extensions import Annotated

import autogen
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager, ConversableAgent

from autogen.cache import Cache

from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv()) # read local .env file

config_list = [
    {
        'model': 'gpt-4-1106-preview',
        'api_key': os.getenv("OPENAI_API_KEY"),
    },
]

print(json.dumps(config_list))


[{"model": "gpt-4-1106-preview", "api_key": "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"}]


In [9]:
# generated by datamodel-codegen:
#   filename:  schema_v3.json
#   timestamp: 2024-01-15T16:43:18+00:00

from __future__ import annotations

from enum import Enum
from typing import Optional, List

from pydantic import BaseModel, Field


class ContactType(Enum):
    Phone = 'Phone'
    Text = 'Text'


class CustomerNeed(Enum):
    Complete_purchase_support = 'Complete purchase support'
    Technical_support = 'Technical support'
    Membership_management = 'Membership management'
    Other = 'Other'


class EmployeeResponse(Enum):
    Added_protection_plan_to_product = 'Added protection plan to product'
    Engaged_manager_or_support = 'Engaged manager or support'
    Fixed_in_the_moment = 'Fixed in the moment'
    Gave_recommendation_or_advice = 'Gave recommendation or advice'
    Started_remote_support_session = 'Started remote support session'
    Unable_to_resolve_to_satisfaction = 'Unable to resolve to satisfaction'
    Canceled_membership = 'Canceled membership'
    Explained_membership_benefits = 'Explained membership benefits'
    Referred_to_other_team___unable_to_help = 'Referred to other team - unable to help'
    Renewed_membership = 'Renewed membership'
    Other = 'Other'


class CustomerSentimentGoingIn(Enum):
    Positive = 'Positive'
    Negative = 'Negative'
    Neutral = 'Neutral'


class CustomerSentimentGoingOut(Enum):
    Positive = 'Positive'
    Negative = 'Negative'
    Neutral = 'Neutral'

class GiftCard(BaseModel):
    giftCardOffered: bool = Field(..., title="Was a gift card offered to the customer?")
    giftCardAmount: str = Field(..., title="Dollar amount of the gift card offered")
    giftCardAccepted: bool = Field(..., title="If gift card was offered, did the customer accept the gift card?")


class CaseSchema(BaseModel):
    """ Information about the Customer's Case Conversation """
    summary: str = Field(...,
                         title='A detailed summary of the entire call transcript. Be sure to include why the customer was calling and if the customers issue was resolved, please provide those details in your summary. Do not use the customer or agents actual name in the summary, just refer to them as either "Agent" or "Customer"!')
    requiredElements: Optional[List[str]] = Field(None,
                                title="Any elements that are explicitly called out in the transcript as being required")
    orderNumber: str = Field(..., title="The customer's order number")
    productSKU: str = Field(..., title="The SKU associated to the product the customer might be calling about. Do not include product brand, make or model in this value.")
    driver: Optional[str] = Field(None,
                        title="A name or description that could be used to identify the individual delivering and/or doing service at the customer's house")
    photos: bool = Field(...,
                         title="where photos captured at the customer's house showing damage or product condition?")
    agentCall: bool = Field(..., title="Did the agent or delivery person make a call while at the customer's house?")
    contactType: ContactType = Field(..., title='Contact Channel')
    productSafetyFlag: bool = Field(..., title='Is there a product safety concern?')
    customerNeed: CustomerNeed = Field(..., title='Customer Need')
    employeeResponse: EmployeeResponse = Field(..., title='Employee Response')
    customerSentimentGoingIn: CustomerSentimentGoingIn = Field(...,
                                                               title="The customer's sentiment going into the conversation with agent")
    customerSentimentGoingOut: CustomerSentimentGoingOut = Field(...,
                                                                 title="The customer's sentiment at the end of the conversation with agent")
    agentName: Optional[str] = Field(None, title='The name of the agent')
    customerName: Optional[str] = Field(None, title='The name of the customer')
    giftCards: GiftCard = Field(..., title="Were gift cards offered and if so, were they accepted by the customer?")


## Example and Output
Below is an example of speech recognition from a [Peppa Pig cartoon video clip](https://drive.google.com/file/d/1QY0naa2acHw2FuH7sY3c-g2sBLtC2Sv4/view?usp=drive_link) originally in English and translated into Chinese.
'FFmpeg' does not support online files. To run the code on the example video, you need to download the example video locally. You can change `your_file_path` to your local video file path.

In [10]:
input_json1 = {
  "transcript": [{
        "source": "agent",
        "timestamp": "2024-01-17T18:26:17.671956",
        "message": "Hi Thanks for calling geek squad. My name is Valerie before we start. May I please have your full name and your phone number."
    }, {
        "source": "customer",
        "timestamp": "2024-01-17T18:26:27.609811",
        "message": "No. Yes, I just I'm sorry I think I was in the wrong department I need to talk to somebody about printer I bought, the order number is 12345."
    }, {
        "source": "agent",
        "timestamp": "2024-01-17T18:26:33.117753",
        "message": "Uh Yes, I am from computing services, but what do you need? I'm sorry I cannot hear you properly."
    }, {
        "source": "customer",
        "timestamp": "2024-01-17T18:26:37.875478",
        "message": "I need to know if you sell a specific type of printer in."
    }, {
        "source": "agent",
        "timestamp": "2024-01-17T18:26:41.890700",
        "message": "Okay, I am going to provide you the phone number of the right department. Okay?"
    }, {
        "source": "customer",
        "timestamp": "2024-01-17T18:26:48.074268",
        "message": "No cause I already called and was on hold for 12 minutes and I got disconnected. So can you connect me."
    }, {
        "source": "agent",
        "timestamp": "2024-01-17T18:26:52.816488",
        "message": "Uh, yes, for sure. Thank you so much for calling geek squad and have a wonderful day."
    }, {
        "source": "customer",
        "timestamp": "2024-01-17T18:26:54.107702",
        "message": "Thank you."
    }, {
        "source": "agent",
        "timestamp": "2024-01-17T18:26:52.816488",
        "message": "For your trouble, I would like to offer you a $20 gift card, would that be acceptable?"
    }, {
        "source": "customer",
        "timestamp": "2024-01-17T18:26:54.107702",
        "message": "No, that is ok.."
    }, {
        "source": "agent",
        "timestamp": "2024-01-17T18:27:39.057198",
        "message": "All agents are busy at this time. Please hold."
    }, {
        "source": "system",
        "message": "Based on this call, the following elements are required:  ['orderNumber', 'productSKU']"
    }]
}


missing_value_dict = {}
missing_value_dict['orderNumber'] = "Please provide the customer's order number"
missing_value_dict['productSKU'] = "Please provide the SKU for the customer's product"


@dataclass
class ExecutorGroupchat(GroupChat):
    dedicated_executor: autogen.UserProxyAgent = None

    def select_speaker(
            self, last_speaker: autogen.ConversableAgent, selector: autogen.ConversableAgent
    ):
        """Select the next speaker."""

        try:
            message = self.messages[-1]
            if "function_call" in message:
                return self.dedicated_executor
        except Exception as e:
            print(e)
            pass

        selector.update_system_message(self.select_speaker_msg())
        final, name = selector.generate_oai_reply(
            self.messages
            + [
                {
                    "role": "system",
                    "content": f"Read the above conversation. Then select the next role from {self.agent_names} to play. Only return the role.",
                }
            ]
        )
        if not final:
            # i = self._random.randint(0, len(self._agent_names) - 1)  # randomly pick an id
            return self.next_agent(last_speaker)
        try:
            return self.agent_by_name(name)
        except ValueError:
            return self.next_agent(last_speaker)


# @user_proxy.register_for_execution()
# @chatbot.register_for_llm(description="Call transcript entity extractor.")
def extract_entities(call_transcript_id: Annotated[str, "The id of the call transcript to be processed"], ) -> str:
    try:
        print(f"Transcript ID: {call_transcript_id}")
        # model = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-1106")
        model = ChatOpenAI(temperature=0, model="gpt-4-1106-preview")

        prompt = ChatPromptTemplate.from_messages([
            ("system",
             """
             Extract the relevant information, if not explicitly provided do not guess. Extract partial info.
             There is one entity (requiredElements) that you need to pay special attention to.
            Do not make up what you think is required nor use the data model provided to you to determine
            what is required. Only use the agent's responses in the call transcript to determine what is required.
            For example, you might find a response from the agent such as "productSKU is required", if you find
            phrases like that, then only those elements should be considered required
             """),
            ("human", "{input}")
        ])

        case_extraction_functions = [convert_pydantic_to_openai_function(CaseSchema)]
        extraction_model = model.bind(
            functions=case_extraction_functions,
            function_call={"name": "CaseSchema"}
        )
        extraction_chain = prompt | extraction_model | JsonOutputFunctionsParser()
        # Run
        json_output = extraction_chain.invoke({"input": input_json1})
        # return json_output
        return json.dumps(json_output)

    except Exception as e:
        return f"An unexpected error occurred: {str(e)}"


# @user_proxy.register_for_execution()
# @chatbot.register_for_llm(description="Format json to look pretty")
def style_output(input_json: Annotated[str, "The JSON string to be formatted"], ) -> str:
    try:
        print(f"input_json: {input_json}")
        # Print output
        # json_dict = json.loads(input_json)
        # json_pretty = json.dumps(json_dict, indent=4)
        json_pretty = json.dumps(input_json, indent=4)
        print(json_pretty)
        return json_pretty
    except Exception as e:
        return f"An unexpected error occurred: {str(e)}"


def find_missing_values(extract_entities_output: Annotated[str, "The JSON object to look in for missing values"], ) -> \
        List[str]:
    result_list = []

    try:
        # Parse the JSON data
        data = json.loads(extract_entities_output)

        # Check if "required" key is present in the JSON
        if "requiredElements" in data and hasattr(data["requiredElements"], '__iter__'):
            required_keys = data["requiredElements"]

            # Loop through the elements in the "required" array
            for element in required_keys:
                # Check if the element is a key in the JSON
                if element in data:
                    # Check if the value for the key is either null or empty
                    if data[element] is None or (isinstance(data[element], str) and not data[element].strip()):
                        result_list.append(missing_value_dict[element])
                else:
                    result_list.append(missing_value_dict[element])
        else:
            print('"requiredElements" key not found in the JSON data.')
    except json.JSONDecodeError as e:
        print(f'Error decoding JSON: {e}')

    return result_list


llm_config = {
    "config_list": config_list,
    "seed": 41,
    "temperature": 0,
    "timeout": 60,
    "functions": [
        {
            "description": "Function to be called when text based entities need to be extracted from Call Transcripts.",
            "name": "extract_entities",
            "parameters": {
                "type": "object",
                "properties": {
                    "call_transcript_id": {
                        "type": "string",
                        "description": "The id of the call transcript to be processed"
                    }
                },
                "required": [
                    "call_transcript_id"
                ]
            }
        },
        {
            "description": "Function to be called to determine if there are missing values from the response returned by the extract_entities function. ",
            "name": "find_missing_values",
            "parameters": {
                "type": "object",
                "properties": {
                    "extract_entities_output": {
                        "type": "string",
                        "description": "The JSON output from the extract_entities function"
                    }
                },
                "required": [
                    "extract_entities_output"
                ]
            }
        },
        # {
        #     "description": "Function to be called when content needs to be formatted to look nice.",
        #     "name": "style_output",
        #     "parameters": {
        #         "type": "object",
        #         "properties": {
        #             "input_json": {
        #                 "type": "string",
        #                 "description": "The JSON string to be formatted"
        #             }
        #         },
        #         "required": [
        #             "input_json"
        #         ]
        #     }
        # }
    ]
}

llm_config_short = {
    "config_list": config_list,
    "seed": 41,
    "temperature": 0,
    "timeout": 60
}

def print_messages(recipient, messages, sender, config):
    if "callback" in config and config["callback"] is not None:
        callback = config["callback"]
        callback(sender, recipient, messages[-1])
    print(f"Messages sent to: {recipient.name} from {sender.name}| message: {messages[-1]}")
    return False, None  # required to ensure the agent communication flow continues

# entity_extractor_agent_prompt = '''
#         You are a helpful assistant that can extract text based entities from call transcripts. Do not ask the User for feedback. Once you are done,
#         the next agent to be called is the software_engineer_agent.
#         '''

entity_extractor_agent_prompt = '''
    You are a helpful assistant that has strong Python skills and knows how to use Python to extract text based entities from call transcripts.
    Do not ask any questions. 
    '''

entity_extractor_agent = AssistantAgent(
    name="entity_extractor_agent",
    system_message=entity_extractor_agent_prompt,
    llm_config=llm_config,
    description='''
     This agent has strong Python skills and is responsible for extracting entities from call transcripts. 
    '''
)


entity_extractor_agent.register_reply(
    [autogen.Agent, None],
    reply_func=print_messages,
    config={"callback": None},
)

# software_engineer_agent_prompt = '''
#     You are responsible for parsing the output from the entity_extractor_agent and making sure that all the required keys and values
#     are there.
#     If you find that there are no missing values, then the next agent to be called is the styling_agent,
#     otherwise you should pass the list of missing values to the user_proxy so that the
#     user_proxy can follow up with the human to get the values for those required keys.
#     '''

software_engineer_agent_prompt = '''
    You are a helpful assistant that has strong Python skills and is primarily responsible for analyzing the output from the entity_extractor_agent 
    to determine if there are any required fields that are missing values in the output.
    Do not ask any questions and specifically do not ask if further assistance is required. Just do your work and move on!
    '''

software_engineer_agent = AssistantAgent(
    name="software_engineer_agent",
    system_message=software_engineer_agent_prompt,
    llm_config=llm_config,
    description="""
    This agent is responsible for a couple different things. Firstly, once the entity_extractor_agent is finished, 
    this agent is responsible for determining if there are any required fields that are missing values. If there are
    missing fields, then those missing fields should be given to the user_proxy for human feedback. Once this feedback
    is provided, this agent is then responsible for updating the output with values provided by the human.
    Once all the required values are provided, the chat_manager needs to work with the styling_agent to generate
    the final output.
    """
)

software_engineer_agent.register_reply(
    [autogen.Agent, None],
    reply_func=print_messages,
    config={"callback": None},
)


styling_agent_prompt = '''
    You are a helpful assistant who is an expert in using Python to take in a JSON object and then generate 
    output that is both pleasing to the eye and useful for the end user.
    Create both a JSON and markdown version of the output.
    Once you have completed assisting the user output TERMINATE
    '''

styling_agent = AssistantAgent(
    name="styling_agent",
    system_message=styling_agent_prompt,
    llm_config=llm_config,
    description="""
        This agent is responsible for working with the user_proxy at the very end to generate the final output that 
        will be presented to the end user.
    """
)

styling_agent.register_reply(
    [autogen.Agent, None],
    reply_func=print_messages,
    config={"callback": None},
)

# description = """
#            This agent is responsible for formatting the final output returned from this group chat. This agent will always be the last agent to run and should only run only run under the following scenarios:
#            1.) after the software_engineer_agent has finished and it was determine that there were 0 missing values
#            2.) after the software_engineer has finished and if it was determined that there were missing values, then this agent will need to wait until
#            after the human has provided the missing values and the entity_extractor_agent has had a chance to reprocess the call transcript using the missing values
#            provided by the human. Only then can this agent do its thing.
#        """

user_proxy = UserProxyAgent(
    name="user_proxy",
    system_message="""
        A human admin that can execute function calls and report back the execution results, as well as gather feedback from the human as necessary.
        """,
    function_map={
        "extract_entities": extract_entities,
        "find_missing_values": find_missing_values,
        # "style_output": style_output,
    },
    is_termination_msg=lambda msg: "TERMINATE" in msg["content"],
    code_execution_config={"work_dir": "groupchat", "use_docker": False},
    human_input_mode="ALWAYS",

)

user_proxy.register_reply(
    [autogen.Agent, None],
    reply_func=print_messages,
    config={"callback": None},
)


groupchat = ExecutorGroupchat(
    agents=[user_proxy, entity_extractor_agent, software_engineer_agent, styling_agent],
    messages=[],
    max_round=20,
    dedicated_executor=user_proxy)

manager = autogen.GroupChatManager(groupchat=groupchat, code_execution_config=False, llm_config=llm_config_short)






In [None]:
chatbot.llm_config["tools"]

In [11]:


# user_proxy.initiate_chat(
#     chatbot,
#     message="For the call transcript with id 1, please extract all the entities in the corresponding call transcript",
#     llm_config=llm_config,
# )

#with Cache.disk():
    # start the conversation
user_proxy.initiate_chat(
    manager,
    message="""
       For the call transcript with id 1, please extract all the entities in the corresponding call transcript.
       """,
)
# print(user_proxy.chat_messages[text_extractor_agent])
# print(user_proxy.chat_messages[software_engineer_agent])
# print(user_proxy.chat_messages[manager])

[33muser_proxy[0m (to chat_manager):


       For the call transcript with id 1, please extract all the entities in the corresponding call transcript.
       

--------------------------------------------------------------------------------
Messages sent to: entity_extractor_agent from chat_manager| message: {'content': '\n       For the call transcript with id 1, please extract all the entities in the corresponding call transcript.\n       ', 'name': 'user_proxy', 'role': 'user'}
[33mentity_extractor_agent[0m (to chat_manager):

[32m***** Suggested function Call: extract_entities *****[0m
Arguments: 
{"call_transcript_id":"1"}
[32m*****************************************************[0m

--------------------------------------------------------------------------------
Messages sent to: user_proxy from chat_manager| message: {'content': '', 'function_call': {'arguments': '{"call_transcript_id":"1"}', 'name': 'extract_entities'}, 'name': 'entity_extractor_agent', 'role': 'assista

Provide feedback to chat_manager. Press enter to skip and use auto-reply, or type 'exit' to end the conversation:  


[31m
>>>>>>>> NO HUMAN INPUT RECEIVED.[0m
[31m
>>>>>>>> USING AUTO REPLY...[0m
[35m
>>>>>>>> EXECUTING FUNCTION extract_entities...[0m
Transcript ID: 1
[33muser_proxy[0m (to chat_manager):

[32m***** Response from calling function "extract_entities" *****[0m
{"summary": "Customer called to inquire about a specific type of printer but was in the wrong department. The agent offered to provide the phone number for the correct department, but the customer requested a direct connection due to a previous long hold time and disconnection. The agent agreed to connect the customer and offered a $20 gift card for the trouble, which the customer declined.", "requiredElements": ["orderNumber", "productSKU"], "orderNumber": "12345", "productSKU": "", "photos": false, "agentCall": true, "contactType": "Phone", "productSafetyFlag": false, "customerNeed": "Complete purchase support", "employeeResponse": "Referred to other team - unable to help", "customerSentimentGoingIn": "Neutral", "custom

Provide feedback to chat_manager. Press enter to skip and use auto-reply, or type 'exit' to end the conversation:  


[31m
>>>>>>>> NO HUMAN INPUT RECEIVED.[0m
[31m
>>>>>>>> USING AUTO REPLY...[0m
[35m
>>>>>>>> EXECUTING FUNCTION find_missing_values...[0m
[33muser_proxy[0m (to chat_manager):

[32m***** Response from calling function "find_missing_values" *****[0m
["Please provide the SKU for the customer's product"]
[32m****************************************************************[0m

--------------------------------------------------------------------------------
Messages sent to: user_proxy from chat_manager| message: {'content': '["Please provide the SKU for the customer\'s product"]', 'name': 'find_missing_values', 'role': 'function'}


Provide feedback to chat_manager. Press enter to skip and use auto-reply, or type 'exit' to end the conversation:  product SKU value is 12345


[33muser_proxy[0m (to chat_manager):

product SKU value is 12345

--------------------------------------------------------------------------------
Messages sent to: software_engineer_agent from chat_manager| message: {'content': 'product SKU value is 12345', 'name': 'user_proxy', 'role': 'user'}
[33msoftware_engineer_agent[0m (to chat_manager):

The product SKU value cannot be "12345" as that is already provided as the order number. The SKU is a unique identifier for the product itself, not the order. Please provide the correct product SKU.

--------------------------------------------------------------------------------
Messages sent to: user_proxy from chat_manager| message: {'content': 'The product SKU value cannot be "12345" as that is already provided as the order number. The SKU is a unique identifier for the product itself, not the order. Please provide the correct product SKU.', 'name': 'software_engineer_agent', 'role': 'user'}


Provide feedback to chat_manager. Press enter to skip and use auto-reply, or type 'exit' to end the conversation:  product sku value is 191919191


[33muser_proxy[0m (to chat_manager):

product sku value is 191919191

--------------------------------------------------------------------------------
Messages sent to: software_engineer_agent from chat_manager| message: {'content': 'product sku value is 191919191', 'name': 'user_proxy', 'role': 'user'}
[33msoftware_engineer_agent[0m (to chat_manager):

The product SKU has been updated to "191919191". Thank you for providing the correct information.

--------------------------------------------------------------------------------
Messages sent to: styling_agent from chat_manager| message: {'content': 'The product SKU has been updated to "191919191". Thank you for providing the correct information.', 'name': 'software_engineer_agent', 'role': 'user'}
[33mstyling_agent[0m (to chat_manager):

### Extracted Entities (JSON Format)

```json
{
  "summary": "Customer called to inquire about a specific type of printer but was in the wrong department. The agent offered to provide the phone

Provide feedback to chat_manager. Press enter to skip and use auto-reply, or type 'exit' to end the conversation:  exit




ChatResult(chat_history=[{'content': '\n       For the call transcript with id 1, please extract all the entities in the corresponding call transcript.\n       ', 'role': 'assistant'}, {'content': '', 'function_call': {'arguments': '{"call_transcript_id":"1"}', 'name': 'extract_entities'}, 'name': 'entity_extractor_agent', 'role': 'assistant'}, {'content': '{"summary": "Customer called to inquire about a specific type of printer but was in the wrong department. The agent offered to provide the phone number for the correct department, but the customer requested a direct connection due to a previous long hold time and disconnection. The agent agreed to connect the customer and offered a $20 gift card for the trouble, which the customer declined.", "requiredElements": ["orderNumber", "productSKU"], "orderNumber": "12345", "productSKU": "", "photos": false, "agentCall": true, "contactType": "Phone", "productSafetyFlag": false, "customerNeed": "Complete purchase support", "employeeResponse"

In [None]:
print(user_proxy.chat_messages[manager])