# LiteLLM Quickstart

In this quickstart you will learn how to use LiteLLM as a feedback function provider.

[LiteLLM](https://github.com/BerriAI/litellm) is a consistent way to access 100+ LLMs such as those from OpenAI, HuggingFace, Anthropic, and Cohere. Using LiteLLM dramatically expands the model availability for feedback functions. Please be cautious in trusting the results of evaluations from models that have not yet been tested.

Specifically in this example we'll show how to use TogetherAI, but the LiteLLM provider can be used to run feedback functions using any LiteLLM suppported model. We'll also use Mistral for the embedding and completion model also accessed via LiteLLM. The token usage and cost metrics for models used by LiteLLM will be also tracked by TruLens.

Note: LiteLLM costs are tracked for models included in this [litellm community-maintained list](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json).

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/expositional/models/litellm_quickstart.ipynb)



In [1]:
# ! pip install trulens_eval chromadb mistralai

In [2]:
import os
os.environ["TOGETHERAI_API_KEY"] = "..."
os.environ['MISTRAL_API_KEY'] = "..."

## Get Data

In this case, we'll just initialize some simple text in the notebook.

In [3]:
university_info = """
The University of Washington, founded in 1861 in Seattle, is a public research university
with over 45,000 students across three campuses in Seattle, Tacoma, and Bothell.
As the flagship institution of the six public universities in Washington state,
UW encompasses over 500 buildings and 20 million square feet of space,
including one of the largest library systems in the world.
"""

## Create Vector Store

Create a chromadb vector store in memory.

In [4]:
from litellm import embedding
import os

embedding_response = embedding(
    model="mistral/mistral-embed",
    input=university_info,
)

In [5]:
embedding_response.data[0]['embedding']

[-0.0302734375,
 0.01617431640625,
 0.028350830078125,
 -0.017974853515625,
 0.05322265625,
 -0.01155853271484375,
 0.053466796875,
 0.0017957687377929688,
 -0.00824737548828125,
 0.0037555694580078125,
 -0.037750244140625,
 0.0171966552734375,
 0.0099029541015625,
 0.0010271072387695312,
 -0.06402587890625,
 0.023681640625,
 -0.0029296875,
 0.0113677978515625,
 0.04144287109375,
 0.01119232177734375,
 -0.031890869140625,
 -0.03778076171875,
 -0.0233917236328125,
 0.0240020751953125,
 -0.01018524169921875,
 -0.0157623291015625,
 -0.021636962890625,
 -0.0692138671875,
 -0.04681396484375,
 -0.00518035888671875,
 0.0244140625,
 -0.0034770965576171875,
 0.0118560791015625,
 0.0124969482421875,
 -0.003833770751953125,
 -0.0194244384765625,
 -0.00225830078125,
 -0.04669189453125,
 0.0265350341796875,
 -0.0079803466796875,
 -0.02178955078125,
 -0.0103302001953125,
 -0.0426025390625,
 -0.034881591796875,
 0.0002834796905517578,
 -0.037384033203125,
 -0.0142364501953125,
 -0.036956787109375,
 -

In [6]:
import chromadb

chroma_client = chromadb.Client()
vector_store = chroma_client.get_or_create_collection(name="Universities")

Add the university_info to the embedding database.

In [8]:
vector_store.add("uni_info",
                 documents=university_info,
                 embeddings=embedding_response.data[0]['embedding'])

## Build RAG from scratch

Build a custom RAG from scratch, and add TruLens custom instrumentation.

In [9]:
from trulens_eval import Tru
from trulens_eval.tru_custom_app import instrument
tru = Tru()
tru.reset_database()



🦑 Tru initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of Tru` to prevent this.


In [10]:
import litellm

class RAG_from_scratch:
    @instrument
    def retrieve(self, query: str) -> list:
        """
        Retrieve relevant text from vector store.
        """
        results = vector_store.query(
        query_embeddings=embedding(
        model="mistral/mistral-embed",
        input=query).data[0]['embedding'],
        n_results=2
    )
        return results['documents'][0]

    @instrument
    def generate_completion(self, query: str, context_str: list) -> str:
        """
        Generate answer from context.
        """
        completion = litellm.completion(
        model="mistral/mistral-small",
        temperature=0,
        messages=
        [
            {"role": "user",
            "content": 
            f"We have provided context information below. \n"
            f"---------------------\n"
            f"{context_str}"
            f"\n---------------------\n"
            f"Given this information, please answer the question: {query}"
            }
        ]
        ).choices[0].message.content
        return completion

    @instrument
    def query(self, query: str) -> str:
        context_str = self.retrieve(query)
        completion = self.generate_completion(query, context_str)
        return completion

rag = RAG_from_scratch()

## Set up feedback functions.

Here we'll use groundedness, answer relevance and context relevance to detect hallucination.

In [11]:
from trulens_eval import Feedback, Select
from trulens_eval.feedback import Groundedness
from trulens_eval import LiteLLM

import numpy as np

# Initialize LiteLLM-based feedback function collection class:
provider = LiteLLM(model_engine="together_ai/togethercomputer/llama-2-70b-chat")

grounded = Groundedness(groundedness_provider=provider)

# Define a groundedness feedback function
f_groundedness = (
    Feedback(grounded.groundedness_measure_with_cot_reasons, name = "Groundedness")
    .on(Select.RecordCalls.retrieve.rets.collect())
    .on_output()
    .aggregate(grounded.grounded_statements_aggregator)
)

# Question/answer relevance between overall question and answer.
f_answer_relevance = (
    Feedback(provider.relevance_with_cot_reasons, name = "Answer Relevance")
    .on(Select.RecordCalls.retrieve.args.query)
    .on_output()
)

# Question/statement relevance between question and each context chunk.
f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons, name = "Context Relevance")
    .on(Select.RecordCalls.retrieve.args.query)
    .on(Select.RecordCalls.retrieve.rets.collect())
    .aggregate(np.mean)
)

f_coherence = (
    Feedback(provider.coherence_with_cot_reasons, name = "coherence")
    .on_output()
)

✅ In Groundedness, input source will be set to __record__.app.retrieve.rets.collect() .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Answer Relevance, input prompt will be set to __record__.app.retrieve.args.query .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input question will be set to __record__.app.retrieve.args.query .
✅ In Context Relevance, input context will be set to __record__.app.retrieve.rets.collect() .
✅ In coherence, input text will be set to __record__.main_output or `Select.RecordOutput` .


[nltk_data] Downloading package punkt to /Users/jreini/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [12]:
grounded.groundedness_measure_with_cot_reasons("""e University of Washington, founded in 1861 in Seattle, is a public '
  'research university\n'
  'with over 45,000 students across three campuses in Seattle, Tacoma, and '
  'Bothell.\n'
  'As the flagship institution of the six public universities in Washington 'githugithub
  'state,\n'
  'UW encompasses over 500 buildings and 20 million square feet of space,\n'
  'including one of the largest library systems in the world.\n']]""","The University of Washington was founded in 1861. It is the flagship institution of the state of washington.")

Groundedness per statement in source:   0%|          | 0/2 [00:00<?, ?it/s]

({'statement_0': 1.0, 'statement_1': 0.8},
 {'reasons': '\nSTATEMENT 0:\n  Statement Sentence: The University of Washington was founded in 1861.\nSupporting Evidence: The University of Washington, founded in 1861 in Seattle, is a public research university.\nScore: 10\n\n\nSTATEMENT 1:\n  Statement Sentence: It is the flagship institution of the state of Washington.\nSupporting Evidence: As the flagship institution of the six public universities in Washington state,\nScore: 8\n\n'})

## Construct the app
Wrap the custom RAG with TruCustomApp, add list of feedbacks for eval

In [13]:
from trulens_eval import TruCustomApp
tru_rag = TruCustomApp(rag,
    app_id = 'RAG v1',
    feedbacks = [f_groundedness, f_answer_relevance, f_context_relevance, f_coherence])

## Run the app
Use `tru_rag` as a context manager for the custom RAG-from-scratch app.

In [14]:
with tru_rag as recording:
    rag.query("Give me a long history of U Dub")

Groundedness per statement in source:   0%|          | 0/9 [00:00<?, ?it/s]

In [17]:
tru.get_leaderboard(app_ids=["RAG v1"])

Unnamed: 0_level_0,Answer Relevance,Context Relevance,Groundedness,coherence,latency,total_cost
app_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
RAG v1,0.8,0.8,0.866667,0.8,4.0,0.001942


In [None]:
tru.run_dashboard()