### TruLens

Based on:
- [TruLens and Pinecone](https://docs.pinecone.io/integrations/trulens)
- [TruLens LangChain quickstart](https://www.trulens.org/trulens_eval/getting_started/quickstarts/langchain_quickstart/#import-from-langchain-and-trulens)
- [TruLens + Google Cloud Vertex AI Tutorial: Improve the customers support](https://lablab.ai/t/trulens-google-vertex-ai-tutorial-improve-the-customers-support)

In [147]:
import bs4
import google
import os
import numpy as np

from dotenv import load_dotenv

In [148]:
def trace(toggle):
    if toggle:
        os.environ['LANGCHAIN_TRACING_V2'] = 'true'
        os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
        os.environ['LANGCHAIN_API_KEY'] = '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'
        os.environ['LANGCHAIN_PROJECT'] = 'ragas-rag-eval'
    else:
        del os.environ['LANGCHAIN_TRACING_V2']
        del os.environ['LANGCHAIN_ENDPOINT']
        del os.environ['LANGCHAIN_API_KEY']
        del os.environ['LANGCHAIN_PROJECT']

In [149]:
trace(True)

### Auth

In [150]:
PROJECT_ID='langgraph-graded-rag'
REGION_ID='us-central1'
CHAT_MODEL='gemini-1.0-pro'
EMB_MODEL='textembedding-gecko'

In [151]:
config = {
    'project_id': PROJECT_ID,
    'chat_model_id': CHAT_MODEL,
    'embedding_model_id': EMB_MODEL
}

# authenticate to GCP
creds, _ = google.auth.default(quota_project_id=config["project_id"])
print(creds)

<google.oauth2.credentials.Credentials object at 0x2f08a07d0>


In [152]:
load_dotenv()

os.environ["HUGGINGFACE_API_KEY"] = os.getenv("HUGGINGFACE_API_KEY")
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") # gpt-3.5

### Models

In [153]:
import vertexai
from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings

vertexai.init(project=PROJECT_ID, location=REGION_ID)

llm = ChatVertexAI(
    credentials=creds,
    model_name=config["chat_model_id"],
)

embeddings = VertexAIEmbeddings(
    credentials=creds, model_name=config["embedding_model_id"]
)

### Data

In [154]:
from langchain_community.document_loaders import PyPDFLoader, WebBaseLoader

def medicare_docs():
    # uses rapidocr-onnxruntime to extract text from images
    loader = PyPDFLoader("https://www.medicare.gov/Pubs/pdf/10050-medicare-and-you.pdf", extract_images=True)
    return loader.load()

def llm_agent_docs():
    loader = WebBaseLoader(
        web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
        bs_kwargs=dict(
            parse_only=bs4.SoupStrainer(
                class_=("post-content", "post-title", "post-header")
            )
        ),
    )
    return loader.load()

In [155]:
docs = medicare_docs()
docs[:1]

[Document(page_content='2024Medicare\n& YouThe official U.S. government \nMedicare handbook \n', metadata={'source': 'https://www.medicare.gov/Pubs/pdf/10050-medicare-and-you.pdf', 'page': 0})]

### Vector DB

In [156]:
from langchain_community.vectorstores import Chroma 
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)

db = Chroma.from_documents(documents, embeddings)

In [157]:
db.similarity_search('Medicare Overview')

[Document(page_content='3\nContents\nWhat’s new & important? ......................................................................................................... 2\nIndex of topics .............................................................................................................................. 4\nWhat are the parts of Medicare? ........................................................................................... 9\nYour Medicare options ............................................................................................................. 10\nAt a glance: Original Medicare vs. Medicare Advantage ........................................... 11\nGet started with Medicare ...................................................................................................... 13\nGet help finding the right coverage for you ................................................................... 14\nSection 1: Signing up for Medicare ...................................

### RAG

In [158]:
from langchain import hub

#### Prompt

In [159]:
prompt = hub.pull("rlm/rag-prompt")
prompt.format(question='<QUESTION>', context='<CONTEXT>')

"Human: You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: <QUESTION> \nContext: <CONTEXT> \nAnswer:"

In [175]:
from langchain_core.runnables import RunnablePassthrough
from langchain.schema import StrOutputParser

# only 1 doc
retriever = db.as_retriever(search_kwargs={'k': 1})

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [176]:
rag_chain.invoke("What is Medicare part A?")

'Medicare Part A is a hospital insurance plan that helps cover the costs of inpatient care in hospitals, skilled nursing facility care, hospice care, and home health care. \nPart A is typically free for those aged 65 and over who have worked and paid Medicare taxes for at least 10 years. '

### Metrics

- TruLens comes with some [stock feedback functions](https://www.trulens.org/trulens_eval/evaluation/feedback_implementations/stock/)
- TruLens allows feedback providers. The [current list of feedback providers are here](https://github.com/truera/trulens/tree/main/trulens_eval/trulens_eval/feedback/provider)
- For this notebook, we're using Huggingface as feedback provider ([hugs.py](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval/feedback/provider/hugs.py))

In [162]:
# Imports main tools for eval
from trulens_eval import Huggingface, Feedback, Tru, feedback, Select, TruChain
from trulens_eval.feedback.provider.hugs import Huggingface

In [163]:
tru = Tru()
# Huggingface as feedback provider
hf_feedback_provider = Huggingface()

# OpenAI as feedback provider
openai_feedback_provider = feedback.OpenAI()

# standard definitions 
query = Select.Record.app.retriever._get_relevant_documents.args.query  
context = Select.Record.app.retriever.get_relevant_documents.rets[:].page_content

In [179]:
openai_feedback_provider.relevance(
    prompt="Where is Germany?",
    response="Poland is in Europe."
)

RuntimeError: Endpoint openai request failed 4 time(s): 
	Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}
	Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}
	Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}
	Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

In [181]:
hf_feedback_provider.context_relevance(
    prompt="You are a Medicare expert",
    context="2024 has a lot of cool new things about Medicare"
)

RuntimeError: Rate limit reached. You reached free usage limit (reset hourly). Please subscribe to a plan at https://huggingface.co/pricing to use the API at this rate

#### Language match

`f_lang_match = Feedback(feedback_provider.language_match).on_input_output()`

- This feedback mechanism utilizes the feedback provider to check the language match between the user's input and the chatbot's output. Ensuring language consistency is key to maintaining a coherent conversation flow.
- For this notebook, we're using Huggingface as feedback provider ([language_match](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval/feedback/provider/hugs.py#L134)) which uses [this model](https://huggingface.co/papluca/xlm-roberta-base-language-detection) trained on [this dataset](https://huggingface.co/datasets/papluca/language-identification#additional-information):
  - _"This dataset was built during The Hugging Face Course Community Event, which took place in November 2021, with the goal of collecting a dataset with enough samples for each language to train a robust language detection model."_

In [164]:
f_lang_match = Feedback(hf_feedback_provider.language_match).on_input_output()

✅ In language_match, input text1 will be set to __record__.main_input or `Select.RecordInput` .
✅ In language_match, input text2 will be set to __record__.main_output or `Select.RecordOutput` .


#### Toxic content

`f_nontoxic = Feedback(feedback_provider.not_toxic).on_output()`

- This feedback mechanism ensures that the chatbot's responses are free from toxic content. Maintaining a safe and respectful communication environment is crucial, especially in customer support scenarios.
- For this notebook, we're using Huggingface as feedback provider ([toxic](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval/feedback/provider/hugs.py#L331)) which uses [this model](https://huggingface.co/martin-ha/toxic-comment-model) trained on 10% of [this dataset](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data)

In [165]:
f_toxic = Feedback(hf_feedback_provider.toxic).on_output()

✅ In toxic, input text will be set to __record__.main_output or `Select.RecordOutput` .


#### PII detection

`f_pii_detection = Feedback(feedback_provider.pii_detection).on_input()`

- This feedback mechanism detects any Personally Identifiable Information (PII) in the user's input, helping to prevent the chatbot from inadvertently storing or processing sensitive data.
- For this notebook, we're using Huggingface as feedback provider ([pii detection](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval/feedback/provider/hugs.py#L419)) that uses [this model](https://huggingface.co/bigcode/starpii) trained on [this dataset](https://huggingface.co/bigcode/starpii#dataset).
- The model classifies text into 6 target classes: Names, Emails, Keys, Passwords, IP addresses and Usernames.

In [166]:
f_pii_detection = Feedback(hf_feedback_provider.pii_detection).on_input()

✅ In pii_detection, input text will be set to __record__.main_input or `Select.RecordInput` .


#### Sentiment

`f_positive_sentiment = Feedback(feedback_provider.positive_sentiment).on_output()`

- This monitors the chatbot's output for positive sentiment, which is beneficial for maintaining a friendly and positive interaction with users.
- For this notebook, we're using Huggingface as feedback provider ([positive_sentiment](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval/feedback/provider/hugs.py#L293)) which uses [this model](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) trained on [this dataset](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment#twitter-roberta-base-for-sentiment-analysis):
  - ~58M tweets
  - finetuned for sentiment analysis with the TweetEval benchmark
  - model is suitable for English (for a similar multilingual model, see [XLM-T](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment)).

In [167]:
f_positive_sentiment = Feedback(hf_feedback_provider.positive_sentiment).on_output()

✅ In positive_sentiment, input text will be set to __record__.main_output or `Select.RecordOutput` .


#### Question / Answer Relevance

`f_qa_relevance = Feedback(feedback_provider.relevance).on_input_output()`

- This checks the question / answer relevance between overall question and answer
- The implementation of [relevance](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval/feedback/provider/base.py#L369) in the base feedback provider class uses 2 prompts to compute a score:
  - [ANSWER_RELEVANCE_SYSTEM](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval/feedback/prompts.py#L37) is a [system_prompt](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval/feedback/v2/feedback.py#L248) with instructions to the LLM to perform as a relevance grader and grade the relevance of the supplied `response` against the `prompt` on a grade scale of 0 to 10.
  - [ANSWER_RELEVANCE_USER](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval/feedback/prompts.py#L38) is a [user_prompt](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval/feedback/v2/feedback.py#L279) which plugs in the `prompt` and `response`

In [168]:
# Question/answer relevance between overall question and answer.
f_qa_relevance = Feedback(openai_feedback_provider.relevance).on_input_output()

✅ In relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .


#### Question / Statement Relevance

`f_qs_relevance = Feedback(feedback_provider.qs_relevance).on_input().on(Select.Record.app.combine_documents_chain._call.args.inputs.input_documents[:].page_content).aggregate(np.mean)`

- This checks the question / statement relevance between question and RAG statements
- The implementation will need to evaluate the context chunks, which are an intermediate step of the LLM app
  - The on(Select) references the langchain app object call chain
  - The last line aggregate (np.mean) specifies how feedback outputs are to be aggregated. This only applies to cases where the argument specification names more than one value for an input or output.

In [169]:
f_qs_relevance = Feedback(openai_feedback_provider.qs_relevance).on_input().on(context).aggregate(np.mean)

✅ In qs_relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In qs_relevance, input context will be set to __record__.app.retriever.get_relevant_documents.rets[:].page_content .


### Eval

In [173]:
# wrap with TruLens
chain_recorder = TruChain(
    rag_chain,
    app_id='Chain1_ChatApplication',
    feedbacks=[
        # f_lang_match,
        # f_toxic,
        # f_pii_detection,
        # f_positive_sentiment,
        f_qa_relevance,
        # f_qs_relevance
    ]
)

In [178]:
try:
    with chain_recorder as recording:
        response = rag_chain.invoke("Tell me about Medicare part A?")
        print(response)
except Exception as e:
    print(f"An error exception occured: '{e}")

## Medicare Part A: Hospital Insurance

Medicare Part A, also known as Hospital Insurance, helps cover the costs of inpatient care in hospitals, skilled nursing facilities, hospice care, and home health care. This means it can help pay for services like:

* **Hospital stays**: If you need to be admitted to a hospital, Part A can help cover the costs of your room, meals, nursing care, and other hospital services.
* **Skilled nursing facility care**: If you need additional care after a hospital stay, Part A can help cover the costs of a skilled nursing facility. 
* **Hospice care**: If you have a terminal illness, Part A can help cover the costs of hospice care, which provides comfort and support to patients and their families.
* **Home health care**: If you need skilled nursing care or physical therapy at home, Part A can help cover the costs of home health care.

It's important to note that Part A has limitations, such as the number of days it will cover for each type of care. It's als

### Report

In [172]:
tru.run_dashboard()

Starting dashboard ...
Config file already exists. Skipping writing process.
Credentials file already exists. Skipping writing process.
Dashboard already running at path:   Network URL: http://192.168.86.25:8501



<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>