# AI Sales Assistant
Develop a sales assistant that will give advice to salesman, taking into considerations internal guidelines, using LangChain.

---

## Introduction

This notebook will reveal the challenges faced and the solutions discovered during the creation of this project. You'll learn about the two distinct text-splitting approaches that didn't work and how these failures paved the way for an effective solution.

Firstly, the author tried to rely solely on the LLM, but he encountered issues such as response inconsistency and slow response times with GPT-4. Secondly, he naively split the custom knowledge base into chunks, but this approach led to context misalignment and inefficient results.

After these unsuccessful attempts, **a more intelligent way of splitting the knowledge base based on its structure was adopted**. This change greatly improved the quality of responses and ensured better context grounding for LLM responses. This process is explained in detail, helping you grasp how to navigate similar challenges in your own AI projects.

By the end of this lesson, you'll learn how to utilize LLMs, how to intelligently split your knowledge base, and integrate it with a vector database like Deep Lake for optimal performance.

## Workflow
This project aims to build a assistant that leverages GPT4 to search for answers within documents. The workflow for the experiment is explained in the following diagram:
<br/>
<img src="../../images/sales-assistant-workflow.png" alt="Sales Assistant Workflow" style="width: 70%; height: auto;"/>

## Setup

In [4]:
import openai
import os
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())
openai.api_type = os.environ.get("OPENAI_API_TYPE")
openai.api_base = os.environ.get("OPENAI_API_BASE")
openai.api_version = os.environ.get("OPENAI_API_VERSION")
openai.api_key = os.environ.get("OPENAI_API_KEY")

## Building the assistant

### Creating, Loading, and Querying Our Database for AI
For our knowledge base, rather than splitting the text based on size, why don't we split the text based on its structure? We want each chunk to begin with the objection, and end before the "Objection" of the next chunk. Based on the data we have, we want the following:

If the content string is `1. First sentence. 2. Second sentence.`, the `re.split(r"(?=\d+\. )", content)` function will split the string into a list as `['1. First sentence. ', '2. Second sentence.']`, effectively separating the sentences based on the numbered format.

In [5]:
import os
import re
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import DeepLake


class DeepLakeLoader:
    def __init__(self, source_data_path):
        self.source_data_path = source_data_path
        self.file_name = os.path.basename(
            source_data_path
        )  # What we'll name our database
        self.data = self.split_data()
        if self.check_if_db_exists():
            self.db = self.load_db()
        else:
            self.db = self.create_db()

    def check_if_db_exists(self):
        """
        Check if the database already exists.
        Returns:
            bool: True if the database exists, False otherwise.
        """
        return os.path.exists(f"deeplake/{self.file_name}")

    def split_data(self):
        """
        Preprocess the data by splitting it into passages.

        If using a different data source, this function will need to be modified.

        Returns:
            split_data (list): List of passages.
        """
        with open(self.source_data_path, "r") as f:
            content = f.read()
        split_data = re.split(r"(?=\d+\. )", content)
        if split_data[0] == "":
            split_data.pop(0)
        split_data = [entry for entry in split_data if len(entry) >= 30]
        return split_data

    def load_db(self):
        """
        Load the database if it already exists.

        Returns:
            DeepLake: DeepLake object.
        """
        return DeepLake(
            dataset_path=f"deeplake/{self.file_name}",
            embedding_function=HuggingFaceEmbeddings(),
            read_only=True,
        )

    def create_db(self):
        """
        Create the database if it does not already exist.

        Databases are stored in the deeplake directory.

        Returns:
            DeepLake: DeepLake object.
        """
        return DeepLake.from_texts(
            self.data,
            HuggingFaceEmbeddings(),
            dataset_path=f"deeplake/{self.file_name}",
        )

    def query_db(self, query):
        """
        Query the database for passages that are similar to the query.

        Args:
            query (str): Query string.

        Returns:
            content (list): List of passages that are similar to the query.
        """
        results = self.db.similarity_search(query, k=3)
        content = []
        for result in results:
            content.append(result.page_content)
        return content

In [6]:
db = DeepLakeLoader("../../data/salestesting.txt")



Dataset(path='deeplake/salestesting.txt', tensors=['embedding', 'id', 'metadata', 'text'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
 embedding  embedding  (44, 768)  float32   None   
    id        text      (44, 1)     str     None   
 metadata     json      (44, 1)     str     None   
   text       text      (44, 1)     str     None   




### Function to query the model

In [13]:
from langchain.prompts import PromptTemplate
from langchain.chat_models import AzureChatOpenAI


def query_model(query: str) -> str:
    # retrieve relevant chunks
    retrieved_chunks = db.query_db(query)

    # prompt template
    template = """
    You are SalesCopilot. You will be provided with a customer objection, and a selection \
    of guidelines on how to respond to certain objections. \
    Using the provided content, write out the objection and the actionable course of action you recommend \
    in the following format:
    {delimeter}
    It seems like the customer is <explain their objection>.
    I recommend you <course of action>.
    {delimeter}

    Objection: {delimeter} {objection} {delimeter}
    Guidelines: {delimeter} {guidelines} {delimeter}
    """

    prompt = PromptTemplate(
        input_variables=["objection", "guidelines", "delimeter"], template=template
    )

    # format the prompt
    prompt_formatted = prompt.format(
        objection=query, guidelines=retrieved_chunks, delimeter="----"
    )

    # generate answer
    llm = AzureChatOpenAI(deployment_name="gpt4", temperature=0)
    answer = llm.predict(prompt_formatted)
    return answer

### Call the function with user query

In [14]:
# user question
query = "we have planned to use our budget on something more important"

answer = query_model(query)
print(answer)

It seems like the customer is planning to use their budget on something more important.
I recommend you share case studies of similar companies that have saved money, increased efficiency, or had a massive ROI with your product/service, making it a priority that deserves budget allocation now. You can say, "We had a customer with a similar issue, but by purchasing [product] they were actually able to increase their ROI and assign some of their new revenue to other parts of the budget."
