# Empowering Chatbots with Retrieval Augmented Generation (RAG)!

Imagine supercharging your chatbot with the ability to swiftly comb through the vast resources of the internet or specific databases, enabling it to provide more up-to-date and insightful answers. This is where the magic of RAG comes into play!

Here's the scoop: When a user poses a question to your chatbot, instead of solely relying on its internal knowledge, the chatbot sends the query to an API, such as an Internet Search or VectorDB. It then leverages the results to construct a more comprehensive and informed response. Think of it as your chatbot conducting quick online research before delivering an answer!

## What Makes RAG So Remarkable?

- Cost-Effective: Unlike the resource-intensive process of fine-tuning, RAG is more lightweight and easier to manage.
- Human Analogy: Consider your information needs. Would you attempt to absorb the entirety of a library's content, or would you locate the precise book and extract what you require? RAG operates on the latter principle. 

For businesses, instead of laboriously sifting through heaps of documents to train your chatbot, you can simply store them in a VectorDB. When a query arises, your chatbot can retrieve the pertinent information and present it in a user-friendly format.

## RAG vs. Fine-Tuning

- Choose Fine-Tuning when you need to refine your chatbot's behavior.
- Opt for RAG when you want your chatbot to tap into external knowledge bases. Bonus: RAG works seamlessly even with smaller 7B models!

Our journey begins with a beginner's example, addresses challenges, and delves into advanced use cases in the world of RAG. Let's get started!

In [None]:
%pip install google-cloud-aiplatform
%pip install requests
%pip install duckduckgo_search
%pip install gptrim

In [None]:
import vertexai
from vertexai.language_models import ChatModel, InputOutputTextPair
import os
import IPython

vertexai.init() ## initalise the vertexai class

chat_model = ChatModel.from_pretrained("chat-bison")

parameters = {
    "temperature": 0.7,  # Temperature controls the degree of randomness in token selection.
}

As evident, when utilizing zero-shot learning, the responses generated are rooted in the LLM's training data, which is limited to information available up to its training cutoff date. Example has been ammened due to Vertex cutoff date.

In [None]:
memory = "\n".join([  ### Vertex doesnt use the same concept of sending a list like openai - instead, i am taking the list and joining it togather as one string seperated by a newline (\n)

])

chat = chat_model.start_chat(  ### We start our chat with the context
    context=memory
)

response = chat.send_message( ## Send a new message along with parameters
    message="What happened to Matthew Perry?", **parameters)

IPython.display.Markdown(response.text)

Now, let's introduce a function that enables us to leverage DuckDuckGo and adapt the prompt in a way that allows us to present both the query and the search results concurrently...

In [None]:
# RAG - DuckDuckGo
from duckduckgo_search import DDGS
from gptrim import trim

def search_internet(query):
    """
    Use Duckduck go Search
    """
    
    count = 5
    result_text=''
    with DDGS() as ddgs:
        search_results = [r for r in ddgs.text(query, max_results=count)]
        for result in search_results:
            title = result.get('title', '')
            snippet = result.get('body', '')
            url = result.get('href', '')
            result_text += f'Title: {title}\nSnippet: {snippet}\nURL: {url}\n\n'

        search_prompt = f"""
        Based on the internet search results provided in <>, provide an answer to the query [] if it is relevant along with a source URL. \
        If there are no internet search results or if they are not relevant then say \"Please try again.\"\
        
        context:<{trim(result_text)}>
        query:[{query}]
        """

        return search_prompt



With our function at our disposal, let's approach the task by first sending our query to the DuckDuckGo function and then supplying the adapted prompt to the LLM.

In [None]:

# STEP 1 - ASK THE QUESTION
question = 'What happened to Matthew Perry?'


# STEP 2 - SUBMIT THE QUESTION TO AN API TO GET RESULTS AND CREATE A NEW PROMPT FOR THE LLM

prompt = search_internet(question)


# STEP 4 - SEND THE MODIFIED PROMPT TO THE LLM 

memory =[]

memory.append(prompt)

chat = chat_model.start_chat(  ### We start our chat with the context
    context="\n".join(memory)
)

response = chat.send_message( ## Send a new message along with parameters
    message=question, **parameters)

IPython.display.Markdown(response.text)

Now, let's delve into the challenges at hand:

### Challenge 1 - Simplifying Complex Queries
One prominent issue is the substantial text required to extract a concise answer. This not only impacts comprehension but also results in the usage of more tokens, potentially incurring additional costs. However, you can employ modules like "gptrim" to condense the text by approximately 50%, eliminating spaces, stop words, and the like. Remarkably, the LLM, particularly GPT-3 and 4, remains adept at understanding and responding effectively to such condensed input. However, you might get mixed results from VertexAI's chat-bison and other models.

### Challenge 2 - Transforming User Queries into Search Queries
Accepting the user's query as-is doesn't always lead to optimal results. To address this, we need a mechanism to modify the query into an effective search query. Here, the LLM can assist in generating appropriate search queries using the function calling API. While several frameworks like Langchain and Semantic Kernel are available, I have found that function calling offers a straightforward and practical approach.

### Challenge 3 - Efficient Decision-Making
Consider that this setup operates within a chatbot application, across platforms such as websites, Microsoft Teams, Slack, Alexa, or robotic interfaces. In such scenarios, we don't want to trigger internet searches unnecessarily. To optimize this, we can introduce functions with distinct names and descriptions to the LLM's function calling API. The LLM then possesses the discretion to determine whether a function call is warranted, allowing for context-appropriate responses.

### Challenge 4 - Accessing Company Documents
Intriguingly, there's a desire to engage with company documents effectively. While you can explore this avenue with Confluence, Stack Overflow, and Sharepoint APIs, it's important to note that these solutions rely on keyword-based searches. This necessitates the exact matching of keywords found in document titles for retrieval. Alternatively, leveraging a vector database introduces the capacity for semantic searches. Here, you can import all your documents or text snippets and harness the power of semantic search by querying the concept or idea most closely aligned with the content you seek. We'll delve into a practical example of this in a separate notebook.

# Retrieval Augmented Generation with Function Calling

In this approach, we entrust the LLM with the authority to discern when a function (often referred to as "skills") is necessary, generate the appropriate search query, and then incorporate the results into the LLM to obtain the final answer.

Here's the breakdown of this process:

1. **Function Development:** We create functions that act as additional "eyes" (read) and "hands" (actions/write) for the LLM.

2. **Function Definitions:** We define these functions in a list, specifying the exact function name and a description. The LLM utilizes this information to select the relevant function. We also define the properties required for each function.

3. **Query Submission:** Your query is submitted to the LLM, along with the function definitions.

4. **Function Requirement:** If the LLM determines that a function is required, you will receive a response that includes the necessary function and its associated properties.

5. **Function Execution:** You utilize the information provided by the LLM to internally trigger and execute the required function. This could involve searching the internet, performing specific actions, or any other defined tasks.

6. **Final Answer:** After executing the function, you return to the LLM with the required information, and the LLM generates the final response.

This approach streamlines the interaction between the LLM and the external functions, enhancing the model's ability to deliver precise and context-aware responses.


## VertexAI's chat-bison doesnt have such functionality, as this behaviour is fine-tuned with the latest version of OpenAI models. Therefore, i will compensate for this by providing my own behavioural prompts...

In [None]:
# Make sure to run the search_internet function before this!
import json

## Requires a function 
QUESTION = "Who is Twitter's CEO in 2023?"
## Doesn't require a function
#QUESTION = "How can i make a cup of tea?"


# Let us define the search_internet function for the LLM...
skill_definitions = [
      {
        "name": "search_internet",
        "description": "Searches the internet to provide context to a query if required",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The query to search for"
                }
            },
            "required": ["query"]
        }
    }
]



tool_memory =['You only answer in JSON format with nothing else.'] ## Set the behaviour so we can extract JSON.
message_memory = []

tool_memory.append(prompt)

chat = chat_model.start_chat(  ### We start our chat with the context
    context="\n".join(tool_memory)
    ### Examples can be used to demonstrate how tools would be selected if required...
)


special_prompt = f"""
You have access to a JSON toolbox of functions with a name, description and required arguments as shown within []. Please look at the query submitted in <> to establish if \n
a tool is required to answer the question, or else act as a helpful assistant.\n

Please provide a JSON response using the following keys:
tool_required(true/false),func_name,func_arguments, message_no_tool

toolbox: {skill_definitions}
query: {QUESTION}

"""



first_response = chat.send_message( ## Send a new message along with parameters
    message=special_prompt, **parameters)


response_json = json.loads(first_response.text) ### Get the response and load as python dict.

print (response_json)

### Do we need a skill? True
if response_json['tool_required']:

    # tie response with our functions
    available_functions = {
        'search_internet': search_internet
    }


    # extract function name and arguments from the LLM
    function_name = response_json['func_name']
    function_args = response_json['func_arguments']

    function2call = available_functions.get(function_name)

    

    # call it appropriately
    function_response = function2call(**function_args)


    # send the message history to the LLM again without functions.
    chat = chat_model.start_chat(  ### We start our chat with the context
        context="\n".join(message_memory)
        
    )

    final_response = chat.send_message( ## Send a new message along with parameters
    message=function_response, **parameters)
    
    # Result

    print (str(final_response.text))
    



else: ## No skill required - False then output the message normally
    print(response_json['message_no_tool']) 






As evident, the LLM has determined the necessity of a tool and has furnished the essential arguments for its execution. Notably, the query was also appropriately modified to align with this requirement.

While this approach is effective, it's crucial to be vigilant, as the LLM may occasionally propose function names that do not exist, or not act according to your instructions at all! Implementing a conditional checks to ensure that the returned function name aligns with the predefined ones is advisable to maintain control and accuracy.

With this approach, you have the flexibility to integrate as many "hands" and "eyes" as necessary into your chatbot, enabling it to adapt and extend its capabilities as needed.

# Vector & RAG using VertexAI
Unfortunately, due to VertexAI API limitations, i cannot provide examples involving a VectorDB.