# Set up the environment

## LLM installation and setup

### OpenAI path

In [None]:
!pip install openai tiktoken

To utilize OpenAI's GPT models, you need to first acquire an API key. This key can be generated after signing up for a free trial on [OpenAI's website](https://beta.openai.com/signup/). Post registration, visit the 'View API Keys' section in your 'Personal' menu, to get your OpenAI API Key. You'll need to insert this key into the next cell, replacing 'Your-API-Key' (in the instruction `openai.api_key = "Your-Api-Key"`) with your actual key.

If you're interested in an open-source alternative, you could opt for HuggingFace's Transformers Path, leveraging the FalconInstruct7B model. This choice, may lead to lower performance and longer wait times in comparison to OpenAI's GPT models.

In [8]:
import openai
openai.api_key = "Your-Api-Key"

Test configuration

In [None]:
openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hello world"}])

### Hugging Face's Transformers path

If following this path, it is recommended to change the execution environment to one that utilizes Graphics Processing Units (GPUs). To do this, navigate to the "Runtime" menu, click on "Change runtime type", then select "GPU" from the dropdown menu under "Hardware accelerator". Remember to click "Save" to apply the changes. This will allow you to take advantage of Colab's free GPU resources and reduce the execution times.

In [None]:
!nvidia-smi

In [None]:
%pip install transformers accelerate einops langchain

In [None]:
from transformers import AutoTokenizer

model = "tiiuae/falcon-7b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model)

In [None]:
import torch
import transformers

PIPELINE = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
    max_length=1000,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id,
)

# Let's prepare our Agent entity

An AIAgent represents a conversational AI, designed to manage a dialogue flow. This abstraction eases interactions with underlying AI models, with the goal of providing a more human-like interaction with AI, thus making it an effective dialogue partner.

Agents can be equipped with the ability to perform a variety of "actions" - specific tasks initiated by prompts or responses - and can be utilized through different "integrations" - conversational interfaces. When equipped and utilized in this way, agents can handle diverse interactions and deliver dynamic responses across different contexts, positioning them as intelligent entities. The human-like presentation of this entity facilitates the construction of modular systems, adhering to robust patterns, thus establishing the Agent abstraction as an effective tool for system development.

Agents used in this workshop will be very simple, and in the following section we will focus in creating the abstraction's class able to manage the dialogue flow. Each agent will be equipped with specific attributes: a name, personality, context (representing their knowledge), and goal (indicating their objective in each conversation). Additionally, they'll have an attribute determining which underlying AI model to use for generating responses.


## Declare LLM serving function

We will build a dict with different serving functions to abstract the choice between OpenAI GPT models and HuggingFace Falcon model. If you are following just one path, execute only the cells corresponding to that path. This abstraction ensures flexibility, enabling you to easily switch between the different models as per your requirements.

In [6]:
LLM_SERVING_FN = {}

### OpenAI path

In [None]:
import tiktoken

openai_model = "gpt-3.5-turbo"

def count_tokens(text, model):
    """
    This function counts the number of tokens in a text.
    Parameters:
        text: The text to count the tokens of.
        model: The model to use for tokenization.
    Returns:
        The number of tokens in the text."""
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

def generate_answer(conversation_info, message):
    """
    This function will continue the conversation.
    Parameters:
        conversation_info: The information of the conversation
        message: The message to append
    """
    print("Generating answer")
    messages = [{"role": "system", "content": conversation_info["system_prompt"]}]
    for msg in conversation_info["messages"]:
        messages.append({"role": msg["role"], "content": msg["content"]})
    messages.append({"role": "user", "content": message})

    # Limit the conversation history to a certain number of tokens
    max_context_length = 3500  # Adjust this value based on your desired context length
    current_context_length = sum([count_tokens(msg["content"], openai_model) for msg in messages])

    while current_context_length > max_context_length:
        removed_message = messages.pop(1)  # Remove the earliest non-system message
        current_context_length -= count_tokens(removed_message["content"], openai_model)

    full_prompt = {
        "model": openai_model,
        "messages": messages
    }
    response = openai.ChatCompletion.create(**full_prompt)
    return response["choices"][0]["message"]["content"]

LLM_SERVING_FN["openai"] = generate_answer

### Falcon Instruct 7B Path

In [7]:
from langchain import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=PIPELINE)

def generate_answer_falcon(conversation_info, message):
    """
    This function will continue the conversation.
    Parameters:
        conversation_info: Dict object containing the information of the conversation
        message: The message to append
    """

    # Add new message to the conversation
    conversation_info["messages"].append({"role": "user", "content": message})

    # Format messages to be compatible with the model
    prompt = conversation_info["system_prompt"] + "\n"
    for msg in conversation_info["messages"]:
        prompt += msg["role"].capitalize() + ": " + msg["content"] + "\n"

    # Generate response
    response = llm(prompt + "Assistant:")
    return response.split("User")[0]

LLM_SERVING_FN["falcon"] = generate_answer_falcon

## Declare Agent Class

In [8]:
SYSTEM_PROMPT = """You're an AI assistant called {name}.
This is your personality:
{personality}.

You are aware of the following context:
{context}.
When you don't know something, you say it. Your knowledge is focused in your context.

Your goal in this conversation is:
{goal}.
You must focus in your goal, if user tries to change conversation topic, you should dismiss it. You never write on behalf of user. You only write Assistant Messages.
"""

In [9]:
class AIAgent:

    def __init__(self, name, personality, model_codename, context, goal):
        self.name = name
        self.personality = personality
        self.model_codename = model_codename
        self.context = context
        self.goal = goal
        self.conversation = []

    def _create_conversation(self):
        system_prompt = SYSTEM_PROMPT.format(
            context=self.context,
            personality=self.personality,
            name=self.name,
            goal=self.goal
        )
        conversation_info = {
            "system_prompt": system_prompt,
            "model_codename": self.model_codename,
            "messages": []
        }
        self.conversation=conversation_info

    def continue_conversation(self, message):
        # append user message to the conversation
        self.conversation["messages"].append({
            "role": "user",
            "content": message
        })
        # generate agent response
        agent_response = LLM_SERVING_FN[self.conversation["model_codename"]](
            self.conversation, message)
        # append agent response to the conversation
        self.conversation["messages"].append({
            "role": "assistant",
            "content": agent_response
        })
        return agent_response

    def create_conversation(self):
        # start a new conversation
        self._create_conversation()

        while True:
          # get user input
          user_message = input("User: ")

          # check if user wants to exit the conversation
          if user_message.lower() == "exit":
            break

          # let the agent respond to the user message
          agent_response = agent.continue_conversation(user_message)

          # print the agent's response
          print("Agent: ", agent_response)

## Build your own Agent

In [10]:
NAME = "james"
PERSONALITY = """
Sarcastic and permanently bothered
"""
MODEL_CODENAME = "falcon" # Substitute this with "falcon" if following HuggingFace path
CONTEXT = """
You don't know anything, but you act like if you know everything and always mock the user
"""
GOAL = """
Creating a fun conversation with the permanent sarcasm and condescendence
"""

In [11]:
agent = AIAgent(NAME, PERSONALITY, MODEL_CODENAME, CONTEXT, GOAL)

## And have a little warmup conversation

In [None]:
agent.create_conversation()

# Ok, now it's our AudioIntegration turn

As we progress, remember that an "integration" represents the way an AI agent interfaces with users. In this case, our AudioIntegration enables voice interaction, expanding the environments where our agents can operate and enhancing the interaction experience. Let's gear up!

## Install Whisper and Speech T5 requirements

In [None]:
# install openai's whisper
!pip install -U openai-whisper

# update the packages
!sudo apt update && sudo apt install ffmpeg

In [None]:
!pip install datasets sentencepiece transformers

## Speech T5 Setup

In [None]:
from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
from datasets import load_dataset
import torch
import soundfile as sf

# Load the processor, model, and vocoder
PROCESSOR = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
TTS_MODEL = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

# Load xvector containing speaker's voice characteristics from a dataset
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)

## Audio management setup

Use javascript to record audio.
Declare `record()` and `play(audio)` functions

In [None]:
# all imports
from io import BytesIO
from base64 import b64decode
from google.colab import output
from IPython.display import Javascript
import IPython.display as ipd

RECORD = """
const sleep  = time => new Promise(resolve => setTimeout(resolve, time))
const b2text = blob => new Promise(resolve => {
  const reader = new FileReader()
  reader.onloadend = e => resolve(e.srcElement.result)
  reader.readAsDataURL(blob)
})
var record = time => new Promise(async resolve => {
  stream = await navigator.mediaDevices.getUserMedia({ audio: true })
  recorder = new MediaRecorder(stream)
  chunks = []
  recorder.ondataavailable = e => chunks.push(e.data)
  recorder.start()
  await sleep(time)
  recorder.onstop = async ()=>{
    blob = new Blob(chunks)
    text = await b2text(blob)
    resolve(text)
  }
  recorder.stop()
})
"""

def record(sec=3):
    input("Press Enter to start recording...")
    print("Speak Now...")
    display(Javascript(RECORD))
    sec += 1
    s = output.eval_js('record(%d)' % (sec*1000))
    print("Done Recording !")
    b = b64decode(s.split(',')[1].strip())
    return b #byte stream

def play(audio):
  ipd.display(ipd.Audio(audio))


## Declare AudioIntegration

The `AudioIntegration` class is a standard integration, showcasing how an agent can operate in different interfaces. Integrations should have functions to receive user inputs and relay agent responses. Here, `get_text` transcribes user speech to text, and `give_response` transforms agent text to speech, enabling voice interaction.

In [None]:
import IPython.display as ipd
import subprocess
import whisper

USER_TIME_TO_TALK = 5 # This controls how much time to record

class AudioIntegration:

    def get_text(self, sec=USER_TIME_TO_TALK):
        # get user input as audio
        user_audio = record(sec=sec)
        # Save audio to a file
        with open('audio.wav', 'wb') as f:
            f.write(user_audio)
        # transcribe user audio to textimport whisper
        model = whisper.load_model("base")
        result = model.transcribe("audio.wav")
        return result["text"].strip()

    def give_response(self, message):
        # Convert the message to speech
        max_length = 512
        chunks = [message[0:min(len(message), max_length)]]
        chunks.extend([message[i-min(12,len(message)):min(i+max_length,
                                                          len(message))] \
                       for i in range(max_length, len(message), max_length)])
        for chunk in chunks:
          inputs = PROCESSOR(text=chunk, return_tensors="pt")
          speech = TTS_MODEL.generate_speech(inputs["input_ids"],
                                             speaker_embeddings,
                                             vocoder=vocoder)

          # Save the speech to a .wav file
          sf.write("agent_speech.wav", speech.numpy(), samplerate=16000)

          # Display the agent's response
          play('agent_speech.wav')

## Test the integration

In [None]:
integration = AudioIntegration()
integration.get_text()

In [None]:
integration.give_response("I was walking down the street with two llamas, and one llama say the other llama: `Tu como te llamas?`. Turned out they were spaniards")

# We will now redefine our AI Agent class

Let's modify the `AIAgent` class to seamlessly accommodate different integrations. By doing so, we'll be able to equip it with the `AudioIntegration` we have built, thereby enriching its interaction capabilities.

In [None]:
class AIAgent:

    class BasicIntegration:
        # We want to still be able to communicate with our agents the simplest
        # posible way, by default
        def get_text(self):
            return input("User: ")
        def give_response(self, message):
            print("Agent: ", message)

    def __init__(self, name, personality, model_codename, context, goal):
        self.name = name
        self.personality = personality
        self.model_codename = model_codename
        self.context = context
        self.goal = goal
        self.conversation = []

    def _create_conversation(self):
        system_prompt = SYSTEM_PROMPT.format(
            context=self.context,
            personality=self.personality,
            name=self.name,
            goal=self.goal
        )
        conversation_info = {
            "system_prompt": system_prompt,
            "model_codename": self.model_codename,
            "messages": []
        }
        self.conversation=conversation_info

    def continue_conversation(self, message):
        # append user message to the conversation
        self.conversation["messages"].append({
            "role": "user",
            "content": message
        })
        # generate agent response
        agent_response = LLM_SERVING_FN[self.conversation["model_codename"]](
            self.conversation, message)
        # append agent response to the conversation
        self.conversation["messages"].append({
            "role": "assistant",
            "content": agent_response
        })
        return agent_response

    def create_conversation(self, integration=BasicIntegration()):
        # start a new conversation
        self._create_conversation()

        while True:
          # get user input
          user_message = integration.get_text()

          # check if user wants to exit the conversation
          if user_message.lower() in ["exit", "exit."]:
            break

          # let the agent respond to the user message
          agent_response = agent.continue_conversation(user_message)

          # print the agent's response
          integration.give_response(agent_response)

## And let's hear our agent!

In [None]:
# Rebuild it
agent = AIAgent(NAME, PERSONALITY, MODEL_CODENAME, CONTEXT, GOAL)

In [None]:
# Create conversation
agent.create_conversation(integration=AudioIntegration())