<a href="https://colab.research.google.com/github/nupur-khare/TruLens-LiteLLM/blob/main/LiteLLM_and_TruLens.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LiteLLM with TruLens

TruLens -> TruLens is a software tool that helps you to objectively measure the quality and effectiveness of your LLM-based applications using feedback functions.

LiteLLM -> LiteLLM provides a unified interface to call 100+ LLMs using the same Input/Output format, including OpenAI, Huggingface, Anthropic, vLLM, Cohere.

In [55]:
!pip install litellm



In [56]:
! pip install trulens_eval chromadb openai



In [57]:
import os
os.environ["OPENAI_API_KEY"] = "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"

In [58]:
content = """
Mount Everest, standing tall at an elevation of 8,848.86 meters (29,031.7 feet), is the highest peak in the world. Located in the Himalayas on the border between Nepal and China, it has long captured the imagination of adventurers and mountaineers from around the globe. Known as "Chomolungma" in Tibetan and "Sagarmatha" in Nepali, meaning "Goddess Mother of the Earth" and "Forehead in the Sky," respectively, Mount Everest holds profound spiritual significance for the local Sherpa people. Since Sir Edmund Hillary and Tenzing Norgay's historic ascent in 1953, summiting Everest has become the ultimate challenge for climbers, drawing thousands each year to test their skills and endurance against its unforgiving slopes and treacherous conditions. However, climbing Everest is not without its dangers, with extreme weather, altitude sickness, avalanches, and crevasses posing significant risks to those attempting the ascent. Despite these challenges, the allure of conquering the world's highest peak continues to attract adventurers seeking to push the limits of human achievement and experience the awe-inspiring beauty of the Himalayas from its summit.
"""

# Create embeddings using LiteLLM

In [59]:
import litellm
from litellm import embedding
from chromadb import Documents, EmbeddingFunction, Embeddings

class MyEmbeddingFunction(EmbeddingFunction):
    def __call__(self, input: Documents) -> Embeddings:
        embd_list = []
        embeddings = embedding(
      model="text-embedding-ada-002",
          input=input
  )
        return embeddings.get('data')[0]['embedding']


# Create a vector collection to store embeddings



*   Here I have used chromadb to store data.





In [62]:
import chromadb
chroma_client = chromadb.Client()
vector_store = chroma_client.get_or_create_collection(name="mount_everest",
                                                      embedding_function=MyEmbeddingFunction())

Add the content to the embedding database.

In [63]:
vector_store.add("mount_everest", documents=content)

In [64]:
!pip install Tru



# RAG Implementation


* Used TruLens custom instrumentation.




In [65]:
from trulens_eval import Tru
from trulens_eval.tru_custom_app import instrument
tru = Tru()

In [66]:
class RAG_from_scratch:
    @instrument
    def retrieve(self, query: str) -> list:
        """
        Retrieve relevant text from vector store.
        """
        results = vector_store.query(
        query_texts=query,
        n_results=2
    )
        return results['documents'][0]

    @instrument
    def generate_completion(self, query: str, context_str: list) -> str:
        """
        Generate answer from context.
        """
        messages = [
            {"role": "user",
            "content":
            f"We have provided context information below. \n"
            f"---------------------\n"
            f"{context_str}"
            f"\n---------------------\n"
            f"Given this information, please answer the question: {query}"
            }
        ]
        response = litellm.completion(model="gpt-3.5-turbo",
                                      messages=messages,
                                      temperature=0.0, max_tokens=3000, top_p=0.0, n=1,
                                      stream=False, stop=None, presence_penalty=0.0, frequency_penalty=0.0,
                                      logit_bias={}).choices[0].message.content
        return response

    @instrument
    def query(self, query: str) -> str:
        context_str = self.retrieve(query)
        completion = self.generate_completion(query, context_str)
        return completion

rag = RAG_from_scratch()

# Feedback functions.


*   Used groundedness, answer relevance and context relevance to detect hallucination.

In [67]:
from trulens_eval import Feedback, Select
from trulens_eval.feedback import Groundedness
from trulens_eval.feedback.provider.openai import OpenAI as fOpenAI

import numpy as np

# Initialize provider class
fopenai = fOpenAI()

grounded = Groundedness(groundedness_provider=fopenai)

# Define a groundedness feedback function
f_groundedness = (
    Feedback(grounded.groundedness_measure_with_cot_reasons, name = "Groundedness")
    .on(Select.RecordCalls.retrieve.rets.collect())
    .on_output()
    .aggregate(grounded.grounded_statements_aggregator)
)

# Question/answer relevance between overall question and answer.
f_qa_relevance = (
    Feedback(fopenai.relevance_with_cot_reasons, name = "Answer Relevance")
    .on(Select.RecordCalls.retrieve.args.query)
    .on_output()
)

# Question/statement relevance between question and each context chunk.
f_context_relevance = (
    Feedback(fopenai.qs_relevance_with_cot_reasons, name = "Context Relevance")
    .on(Select.RecordCalls.retrieve.args.query)
    .on(Select.RecordCalls.retrieve.rets.collect())
    .aggregate(np.mean)
)

✅ In Groundedness, input source will be set to __record__.app.retrieve.rets.collect() .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Answer Relevance, input prompt will be set to __record__.app.retrieve.args.query .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input question will be set to __record__.app.retrieve.args.query .
✅ In Context Relevance, input statement will be set to __record__.app.retrieve.rets.collect() .


# App for evaluations


*   Added list of feedbacks for evaluation



In [68]:
from trulens_eval import TruCustomApp
tru_rag = TruCustomApp(rag,
    app_id = 'RAG v2',
    feedbacks = [f_groundedness, f_qa_relevance, f_context_relevance])



In [69]:
with tru_rag as recording:
    rag.query("What is the Mount everest called in Nepali?")



In [70]:
tru.get_leaderboard(app_ids=["RAG v5"])

  leaderboard = df.groupby('app_id')[col_agg_list].mean().sort_values(


Unnamed: 0_level_0,latency
app_id,Unnamed: 1_level_1


In [72]:
tru.run_dashboard()

Starting dashboard ...
Config file already exists. Skipping writing process.
Credentials file already exists. Skipping writing process.
Dashboard already running at path:   Submit this IP Address: 35.223.76.99



<Popen: returncode: 0 args: ['streamlit', 'run', '--server.headless=True', '...>