In [1]:
#|default_exp app

Let's start by loading the environment variables we need to use.

In [9]:
#|export
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# This is the YouTube video we're going to use. State of competitive intelligence 2023
YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=GTZAoRiZnpQ&t=96s"

## Setting up the model
Let's define the LLM model that we'll use as part of the workflow.

In [10]:
from langchain_openai.chat_models import ChatOpenAI

model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model="gpt-3.5-turbo")

We can test the model by asking a simple question.

In [11]:
model.invoke("What MLB team won the World Series during the COVID-19 pandemic?")

AIMessage(content='The Los Angeles Dodgers won the World Series during the COVID-19 pandemic, defeating the Tampa Bay Rays in the 2020 World Series.')

## String Parsing

In [12]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = model | parser
chain.invoke("What is the capital of India?")

'The capital of India is New Delhi.'

## Introducing prompt templates

We want to provide the model with some context and the question. [Prompt templates](https://python.langchain.com/docs/modules/model_io/prompts/quick_start) are a simple way to define and reuse prompts.

In [13]:
from langchain.prompts import ChatPromptTemplate

template = """
Answer the question based on the context below. If you can't 
answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt.format(context="Klue refers to a competitive intelligence software", question="What is Klue?")

'Human: \nAnswer the question based on the context below. If you can\'t \nanswer the question, reply "I don\'t know".\n\nContext: Klue refers to a competitive intelligence software\n\nQuestion: What is Klue?\n'

## create chain

In [14]:
chain = prompt | model | parser
chain.invoke({
    "context": "Klue refers to a competitive intelligence software, crayon is their biggest competitor",
    "question": "What is Klue?"
})

'Klue is a competitive intelligence software.'

## Transcribing the YouTube Video

The context we want to send the model comes from a YouTube video. Let's download the video and transcribe it using [OpenAI's Whisper](https://openai.com/research/whisper).

In [10]:
YOUTUBE_VIDEO

'https://www.youtube.com/watch?v=GTZAoRiZnpQ&t=96s'

Need to run this for ffmpeg -> conda install -c conda-forge ffmpeg

### transcription is needed only once

In [12]:
import tempfile
import whisper
from pytube import YouTube


# Let's do this only if we haven't created the transcription file yet.
if not os.path.exists("transcription.txt"):
    youtube = YouTube(YOUTUBE_VIDEO)
    audio = youtube.streams.filter(only_audio=True).first()

    # Let's load the base model. This is not the most accurate
    # model but it's fast.
    whisper_model = whisper.load_model("base")

    with tempfile.TemporaryDirectory() as tmpdir:
        file = audio.download(output_path=tmpdir)
        transcription = whisper_model.transcribe(file, fp16=False)["text"].strip()

        with open("transcription.txt", "w") as file:
            file.write(transcription)

Let's read the transcription and display the first few characters to ensure everything works as expected.

In [13]:
with open("transcription.txt") as file:
    transcription = file.read()

transcription[:100]

'Welcome everyone. Thank you for for joining us today for the State of competitive intelligence in 20'

In [14]:
len(transcription)

59810

## Using the entire transcription as context

If we try to invoke the chain using the transcription as context, the model will return an error because the context is too long.

Large Language Models support limitted context sizes. The video we are using is too long for the model to handle, so we need to find a different solution.

In [16]:
try:
    chain.invoke({
        "context": transcription,
        "question": "What is the best tip for a company trying to enable their sales orgs around compete?"
    })
except Exception as e:
    print(e)

## Splitting the transcription

In [15]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("transcription.txt")
text_documents = loader.load()
text_documents

[Document(page_content="Welcome everyone. Thank you for for joining us today for the State of competitive intelligence in 2023. My name is Connor. I'm on the content team here at crayon and I am joined today. I'm very happy today to be joined by Mimi in August who run competitive intelligence at Akamai and Deltech respectively. I am also joined today by Sheila Leihar who is our senior director of content here at crayon. She will be monitoring the the Q&A section. So periodically throughout today's session, I will I'll throw it over to Sheila and she will show chime in with some questions that we're getting from all of you. So please throughout this session, any questions that come around please put them in that Q&A section and you know if it aligns with what we're talking about, we will we'll discuss it in real time and then we also have some time set aside the end for questions that we aren't able to get to. We also have a poll question that we will push lives in just a minute. But fi

There are many different ways to split a document. For this example, we'll use a simple splitter that splits the document into chunks of a fixed size. Check [Text Splitters](https://python.langchain.com/docs/modules/data_connection/document_transformers/) for more information about different approaches to splitting documents.

For illustration purposes, let's split the transcription into chunks of 100 characters with an overlap of 20 characters and display the first few chunks:

In [16]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
documents = text_splitter.split_documents(text_documents)

In [17]:
documents

[Document(page_content="Welcome everyone. Thank you for for joining us today for the State of competitive intelligence in 2023. My name is Connor. I'm on the content team here at crayon and I am joined today. I'm very happy today to be joined by Mimi in August who run competitive intelligence at Akamai and Deltech respectively. I am also joined today by Sheila Leihar who is our senior director of content here at crayon. She will be monitoring the the Q&A section. So periodically throughout today's session, I will I'll throw it over to Sheila and she will show chime in with some questions that we're getting from all of you. So please throughout this session, any questions that come around please put them in that Q&A section and you know if it aligns with what we're talking about, we will we'll discuss it in real time and then we also have some time set aside the end for questions that we aren't able to get to. We also have a poll question that we will push lives in just a minute. But fi

## Finding the relevant chunks

Let's generate embeddings for an arbitrary query:

In [18]:
from langchain_openai.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
embedded_query = embeddings.embed_query("What is competitive intelligence?")

print(f"Embedding length: {len(embedded_query)}")
print(embedded_query[:10])

Embedding length: 1536
[-0.02676220181819881, -0.0055145550288614635, 0.01305554314016356, -0.0170751880981857, -0.00973352210576431, 0.024915158223655663, -0.025260649089236004, 0.005285335593320399, -0.018882367362084963, -0.05511897313686283]


To illustrate how embeddings work, let's first generate the embeddings for two different sentences:

In [19]:
sentence1 = embeddings.embed_query("Its very hard to grow a start up")
sentence2 = embeddings.embed_query("Sales enablement is one of the primary use cases of competitive enablement")

We can now compute the similarity between the query and each of the two sentences. The closer the embeddings are, the more similar the sentences will be.

We can use [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) to calculate the similarity between the query and each of the sentences:

In [26]:
from sklearn.metrics.pairwise import cosine_similarity

query_sentence1_similarity = cosine_similarity([embedded_query], [sentence1])[0][0]
query_sentence2_similarity = cosine_similarity([embedded_query], [sentence2])[0][0]

query_sentence1_similarity, query_sentence2_similarity

(0.741941282185766, 0.83047198279227)

## Setting up a Vector Store

We need an efficient way to store document chunks, their embeddings, and perform similarity searches at scale. To do this, we'll use a **vector store**.

A vector store is a database of embeddings that specializes in fast similarity searches. 

<img src='images/system4.png' width="1200">

To understand how a vector store works, let's create one in memory and add a few embeddings to it:

Our prompt expects two parameters, "context" and "question." We can use the retriever to find the chunks we'll use as the context to answer the question.

We can create a map with the two inputs by using the [`RunnableParallel`](https://python.langchain.com/docs/expression_language/how_to/map) and [`RunnablePassthrough`](https://python.langchain.com/docs/expression_language/how_to/passthrough) classes. This will allow us to pass the context and question to the prompt as a map with the keys "context" and "question."

## Loading transcription into the vector store

We initialized the vector store with a few random strings. Let's create a new vector store using the chunks from the video transcription.

In [35]:
len(documents)

62

In [48]:
from langchain_community.vectorstores import DocArrayInMemorySearch
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

Let's set up a new chain using the correct vector store. This time we are using a different equivalent syntax to specify the [`RunnableParallel`](https://python.langchain.com/docs/expression_language/how_to/map) portion of the chain:

## Setting up Pinecone

So far we've used an in-memory vector store. In practice, we need a vector store that can handle large amounts of data and perform similarity searches at scale. For this example, we'll use [Pinecone](https://www.pinecone.io/).

The first step is to create a Pinecone account, set up an index, get an API key, and set it as an environment variable `PINECONE_API_KEY`.

Then, we can load the transcription documents into Pinecone:

In [46]:
from langchain_pinecone import PineconeVectorStore

index_name = "competitive-intelligence-index"

pinecone = PineconeVectorStore.from_documents(
    documents, embeddings, index_name=index_name
)

Let's now run a similarity search on pinecone to make sure everything works:

In [75]:
pinecone.similarity_search("What is the best way for a company to get started with competitive intelligence?")[:3]

[Document(page_content="16% of CI leaders said yes and this year 36% of CI leaders said yes that is a 125% increase you know since 2018 so folks are getting better and better at measuring the impact of competitive intelligence so maybe we'll we'll start with you if that's okay I'm curious for folks on the phone who do not have KPIs for their CI program any advice you have for those folks to to get started yeah I'm start simple and our our KPIs frankly are very very simple too at least we're about two years into the journey here at Alka Mai so a couple of KPIs some are hard numbers and others are a little bit more aspirational the hard numbers is number of battle cards right we started with technically zero there were existing battle cards that predated my my arrival that we converted but we wanted to have a good baseline again bait using the tears we wanted the tier one competitors covered and over time we will build more right to address all of the others so that's one KPI that you kn

In [76]:
retriever = pinecone.as_retriever()

Let's setup the new chain using Pinecone as the vector store:

In [82]:
prompt

ChatPromptTemplate(input_variables=['context', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template='\nAnswer the question based on the context below. If you can\'t \nanswer the question, reply "I don\'t know".\n\nContext: {context}\n\nQuestion: {question}\n'))])

In [87]:
from langchain_openai.chat_models import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from operator import itemgetter

chain = (
      {
        "context": itemgetter("question") | retriever,
        "question": itemgetter("question"),
    }
    | prompt
    | model
    | parser
)

# chain.invoke("What is the best way for a company to get started with competitive intelligence?")

## Evals

## Create a Knowledge Base
Let's start by loading the content in a pandas DataFrame.

In [51]:
import pandas as pd

df = pd.DataFrame([d.page_content for d in documents], columns=["text"])
df.head(10)

Unnamed: 0,text
0,Welcome everyone. Thank you for for joining us...
1,"before I get to it today's agenda, Mimi and Au..."
2,"in in 2018. So if you're familiar with, you kn..."
3,be spent walking through the top five insights...
4,"who are on this call, they made it really clea..."
5,"competitive. And as you can see, for the avera..."
6,it's awareness of a particular resource that t...
7,they have a need or requirement of particular ...
8,specific thing that they're looking for. I thi...
9,Do you have any thoughts there August? I like ...


In [61]:
! pip install -U pydantic

Collecting pydantic
  Downloading pydantic-2.6.4-py3-none-any.whl.metadata (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.1/85.1 kB[0m [31m419.0 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting pydantic-core==2.16.3 (from pydantic)
  Using cached pydantic_core-2.16.3-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.5 kB)
Downloading pydantic-2.6.4-py3-none-any.whl (394 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m394.9/394.9 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0mm
[?25hUsing cached pydantic_core-2.16.3-cp310-cp310-macosx_11_0_arm64.whl (1.7 MB)
Installing collected packages: pydantic-core, pydantic
  Attempting uninstall: pydantic-core
    Found existing installation: pydantic_core 2.3.0
    Uninstalling pydantic_core-2.3.0:
      Successfully uninstalled pydantic_core-2.3.0
  Attempting uninstall: pydantic
    Found existing installation: pydantic 2.0.3
    Uninstalling pydantic-2.0.3:
      Successfull

In [62]:
from pydantic import ValidationError, validate_call

In [64]:
from giskard.rag import KnowledgeBase

knowledge_base = KnowledgeBase(df)

## Dont run below again as it uses your Open AI API key

In [65]:
from giskard.rag import generate_testset

testset = generate_testset(
    knowledge_base,
    num_questions=60,
    agent_description="A chatbot answering questions about competitive intelligence",
)

2024-03-29 20:35:06,080 pid:19391 MainThread giskard.rag  INFO     Finding topics in the knowledge base.
2024-03-29 20:35:06,081 pid:19391 MainThread giskard.rag  INFO     Computing Knowledge Base embeddings.


OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.


2024-03-29 20:35:12,521 pid:19391 MainThread giskard.rag  INFO     Found 3 topics in the knowledge base.


Generating questions:   0%|          | 0/60 [00:00<?, ?it/s]

In [67]:
test_set_df = testset.to_pandas()

for index, row in enumerate(test_set_df.head(3).iterrows()):
    print(f"Question {index + 1}: {row[1]['question']}")
    print(f"Reference answer: {row[1]['reference_answer']}")
    print("Reference context:")
    print(row[1]['reference_context'])
    print("******************", end="\n\n")

Question 1: What is one of the criteria used to evaluate ad hoc project requests?
Reference answer: One of the criteria used to evaluate ad hoc project requests is alignment with strategic objectives.
Reference context:
Document 55: those projects and alignment with strategic objectives as well you know so taking it back to to mean these really excellent point about prioritization one of the hurdles or one of the you know linds is through which we look at ad hoc project requests is alignment with strategic objectives as well and so we are evaluating ourselves based on how well we're supporting each of those and i would add you know i i am for my team love this but i kind of want to track all of the requests that we get our way even the ones that we decline right because that proves again the value of the of the team and um i'm sure everyone feels the pain at a loose yeah teams are very very small and so um i can actually then you know quarterly or annual summary um show my leadership t

In [68]:
test_set_df.to_csv('generated_test_set.csv', index = None)

In [69]:
testset.save("test-set.jsonl")

## Evaluating the Model on the Test Set
We need to create a function that invokes the chain with a specific question and returns the answer.

In [88]:
def answer_fn(question, history=None):
    return chain.invoke({"question": question})

## evaluate

In [89]:
from giskard.rag import evaluate

report = evaluate(answer_fn, testset=testset, knowledge_base=knowledge_base)

Asking questions to the agent:   0%|          | 0/60 [00:00<?, ?it/s]

Correctness evaluation:   0%|          | 0/60 [00:00<?, ?it/s]

In [90]:
display(report)

0,1,2
GENERATOR,76.0% The Generator is the LLM inside the RAG to generate the answers.,76.0%
RETRIEVER,75.0% The Retriever fetches relevant documents from the knowledge base according to a user query.,75.0%
REWRITER,53.33% The Rewriter modifies the user query to match a predefined format or to include the context from the chat history.,53.33%
ROUTING,90.0% The Router filters the query of the user based on his intentions (intentions detection).,90.0%
KNOWLEDGE_BASE,63.46% The knowledge base is the set of documents given to the RAG to generate the answers. Its scores is computed differently than the other components: it is the difference between the maximum and minimum correctness score across all the topics of the knowledge base.,63.46%


In [91]:
report.to_html("report.html")

In [92]:
report.correctness_by_question_type()


Unnamed: 0_level_0,correctness
question_type,Unnamed: 1_level_1
complex,0.7
conversational,0.1
distracting element,0.6
double,0.9
simple,0.9
situational,0.7


In [93]:
report.get_failures()

Unnamed: 0_level_0,question,reference_answer,reference_context,conversation_history,metadata,agent_answer,correctness,correctness_reason
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
9ed6691c-59a4-4ece-a601-843e323eca53,What are the recommended steps for someone who...,The recommended steps are to make themselves m...,Document 8: specific thing that they're lookin...,[],"{'question_type': 'simple', 'seed_document_id'...",The recommended steps for someone struggling t...,False,The agent's answer does not match the ground t...
3092b274-1fdd-4912-bc1b-181ab0eafa68,"In the context of competitive intelligence, wh...",The behavior that translates into more wins is...,Document 52: that we've tried to follow is to ...,[],"{'question_type': 'complex', 'seed_document_id...",Reinforcing and repeating the knowledge and aw...,False,The agent's answer does not match the ground t...
49d8012a-0f24-4da3-8678-d78b27272415,What specific actions are taken when an except...,When a particularly valuable insight comes in ...,Document 34: a little bit as well so you know ...,[],"{'question_type': 'complex', 'seed_document_id...",When an exceptionally valuable piece of inform...,False,The agent's answer does not match the ground t...
d8fb0e5e-717c-46d3-8c18-f625112b4627,Can you identify the final sentence or phrase ...,see you next time thanks everyone thanks so mu...,Document 61: see you next time thanks everyone...,[],"{'question_type': 'complex', 'seed_document_id...",I don't know.,False,The agent was unable to provide the final sent...
4085369e-0d40-4c00-b4c1-a9ec0d48d892,Given the increasing importance of measuring t...,CI professionals sometimes need to act like ma...,Document 48: will I be working towards my goal...,[],"{'question_type': 'distracting element', 'seed...",The two roles that CI professionals might need...,False,The agent's answer does not match the ground t...
b096d1e6-8fe6-449d-8aff-d3071273cb6e,"Considering the dispersed nature of the team, ...",The team uses WebEx for ad hoc communications ...,Document 15: there's so many meetings right fo...,[],"{'question_type': 'distracting element', 'seed...","Based on the context provided, the communicati...",False,The agent's answer does not match the ground t...
808a728d-10cf-40e2-97de-e548e873a1f6,Considering the importance of understanding st...,One of the keys to earning the trust of your s...,Document 41: requests and folks have to yes or...,[],"{'question_type': 'distracting element', 'seed...",One key factor in earning stakeholders' trust ...,False,The agent's answer focuses on demonstrating sk...
d0f36fa6-b2fb-4645-bcf6-1d6563ff591a,Given the process of building a business case ...,The team provides quarterly readouts to the bo...,Document 58: looped in on kind of the regular ...,[],"{'question_type': 'distracting element', 'seed...",The team presents their findings to the board ...,False,The agent incorrectly stated that the team pre...
d00b11fb-20e6-42d1-be27-d7f50d710b89,As an intelligence analyst trying to understan...,Competitive intelligence (CI) professionals ne...,Document 48: will I be working towards my goal...,[],"{'question_type': 'situational', 'seed_documen...","In this context, the role of competitive intel...",False,The agent's answer focuses on the general role...
1724044f-77c3-4847-be4d-83d99074b064,As a competitive intelligence analyst at Akama...,Akamai uses Salesforce data for win loss analy...,Document 18: customers and so my team has kind...,[],"{'question_type': 'situational', 'seed_documen...","Based on the context provided, Akamai currentl...",False,The agent's answer is partially correct but it...


In [106]:
df_failures = report.get_failures()

In [112]:
df_failures.question[0:1]

id
9ed6691c-59a4-4ece-a601-843e323eca53    What are the recommended steps for someone who...
Name: question, dtype: object

In [109]:
df_failures.to_csv('failures.csv', index=None)

## Creating a Test Suite
We can create a test suite and use it to compare different models.

Load the test set from disk.

In [94]:
from giskard.rag import QATestset

testset = QATestset.load("test-set.jsonl")

In [95]:
test_suite = testset.to_test_suite("Competitive Intelligence Test Suite")

We need a function that takes a DataFrame of questions, invokes the chain with each question, and returns the answers.



In [96]:
import giskard


def batch_prediction_fn(df: pd.DataFrame):
    return chain.batch([{"question": q} for q in df["question"].values])

In [97]:
giskard_model = giskard.Model(
    model=batch_prediction_fn,
    model_type="text_generation",
    name="Competitive Intelligence Question and Answer Model",
    description="This model answers questions about Competitive Intelligence.",
    feature_names=["question"], 
)

2024-03-29 21:24:24,581 pid:19391 MainThread giskard.models.automodel INFO     Your 'prediction_function' is successfully wrapped by Giskard's 'PredictionFunctionModel' wrapper class.


In [98]:
test_suite_results = test_suite.run(model=giskard_model)

2024-03-29 21:25:00,899 pid:19391 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'question': 'object'} to {'question': 'object'}
2024-03-29 21:25:12,395 pid:19391 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (60, 5) executed in 0:00:11.503563
Executed 'TestsetCorrectnessTest' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x16cd068c0>, 'dataset': <giskard.datasets.base.Dataset object at 0x16c6ed5a0>}: 
               Test succeeded
               Metric: 0.65
               
               
2024-03-29 21:26:32,808 pid:19391 MainThread giskard.core.suite INFO     Executed test suite 'Competitive Intelligence Test Suite'
2024-03-29 21:26:32,808 pid:19391 MainThread giskard.core.suite INFO     result: success
2024-03-29 21:26:32,809 pid:19391 MainThread giskard.core.suite INFO     TestsetCorrectnessTest ({'model': <giskard.models.function.PredictionFunctionModel object at 0x16cd068c0>, 'dataset

In [99]:
display(test_suite_results)

## Pytest

In [104]:
import ipytest
ipytest.autoconfig()


In [105]:
%%ipytest

import pytest
from giskard.rag import QATestset
from giskard.testing.tests.llm import test_llm_correctness


@pytest.fixture
def dataset():
    testset = QATestset.load("test-set.jsonl")
    return testset.to_dataset()


@pytest.fixture
def model():
    return giskard_model


def test_chain(dataset, model):
    test_llm_correctness(model=model, dataset=dataset, threshold=0.5).assert_()

[32m.[0m2024-03-29 21:30:21,150 pid:19391 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'question': 'object'} to {'question': 'object'}
2024-03-29 21:30:21,151 pid:19391 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (60, 5) executed in 0:00:00.004510
[32m.[0m[33m                                                                                           [100%][0m
../../../../../../../opt/anaconda3/envs/dev-env/lib/python3.10/site-packages/_pytest/config/__init__.py:1276
    self._mark_plugins_for_rewrite(hook)

t_8bde076c95f64482b61e67facd696275.py::test_llm_correctness

