# Getting Started with LLMOps using Snowflake Cortex and TruLens

By completing this guide, you'll get started with LLMOps by building a RAG by combining [Cortex LLM Functions](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions) and [Cortex Search](https://github.com/Snowflake-Labs/cortex-search?tab=readme-ov-file), and then using [TruLens](https://www.trulens.org/) to add observability and guardrails.

Along the way, you will also learn how run TruLens feedback functions with Snowflake Cortex as the [feedback provider](https://www.trulens.org/trulens_eval/api/provider/), and how to [log TruLens traces and evaluation metrics to a Snowflake table](https://www.trulens.org/trulens_eval/tracking/logging/where_to_log/log_in_snowflake/#logging-in-snowflake). Last, we'll show how to use [TruLens guardrails](https://www.trulens.org/trulens_eval/guardrails/) for filtering retrieved context and reducing hallucination.

Here is a summary of what you will be able to learn in each step by following this quickstart:

- **Setup Environment**: Create a session to use Snowflake Cortex capabilities.
- **Cortex Complete**: Use Cortex `Complete()` to call Mistral Large.
- **Add Data**: Load and preprocess raw documentation from GitHub, and load to Cortex Search.
- **Search**: Search over the data loaded to Cortex Search.
- **Create a RAG**: Create a RAG with Cortex Search and Complete and add TruLens instrumentation.
- **Feedback Functions**: Add context relevance, groundedness and answer relevance evaluations to the RAG.
- **Application Testing**: Understand the performance of your RAG across a test set.
- **Guardrails**: Add context filter guardrails to reduce hallucinations.
- **Measure Improvement**: See the improved evaluation results after adding guardrails.

### What are Cortex LLM Functions?

Snowflake Cortex gives you instant access to industry-leading large language models (LLMs) trained by researchers at companies like Mistral, Reka, Meta, and Google, including Snowflake Arctic, an open enterprise-grade model developed by Snowflake.

### What is Cortex Search?

Cortex Search enables low-latency, high-quality search over your Snowflake data. Cortex Search powers a broad array of search experiences for Snowflake users including Retrieval Augmented Generation (RAG) applications leveraging Large Language Models (LLMs).

### What is TruLens?

[TruLens](https://www.trulens.org/) is a library for tracking and evaluating Generative AI applications. It provides an extensive set of feedback functions to systematically measure the quality of your LLM based applications. It also traces the internal steps of your application, and allows you to run feedback functions on any internal step. Feedback function results can be examined in a TruLens dashboard, or used at runtime as guardrails.

### What You Will Learn
- How to build a RAG with Cortex Search and Cortex LLM Functions.
- How to use TruLens Feedback Functions and Tracing.
- How to log TruLens Evaluation Results and Traces to Snowflake.
- How to use TruLens Feedback Functions as Guardrails to reduce hallucination.

### What You Will Build
- A retrieval-augmented generation (RAG) app
- An LLMOps pipeline
- Context filter guardrails

### Prerequisites
- A Snowflake account with Cortex LLM Functions and Cortex Search enabled.  If you do not have a Snowflake account, you can register for a [free trial account](https://signup.snowflake.com/?utm_cta=quickstarts_&_fsi=yYZEVo4S&_fsi=yYZEVo4S).
- A Snowflake account login with ACCOUNTADMIN role. If you have this role in your environment, you may choose to use it. If not, you will need to 1) Register for a free trial, 2) Use a different role that has the ability to create database, schema, tables, stages, tasks, user-defined functions, and stored procedures OR 3) Use an existing database and schema in which you are able to create the mentioned objects.

## Setup

In [2]:
!pip install python-dotenv

Collecting python-dotenv
  Using cached python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Using cached python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1


Once we have an environment with the right packages installed, we can load our credentials and set our Snowflake connection in a jupyter notebook notebook.

In [1]:
from dotenv import load_dotenv
from snowflake.snowpark.session import Session
import os


load_dotenv()

connection_params = {
  "account":  os.environ.get("SNOWFLAKE_ACCOUNT"),
  "user": os.environ.get("SNOWFLAKE_USER"),
  "password": os.environ.get("SNOWFLAKE_USER_PASSWORD"),
  "role": os.environ.get("SNOWFLAKE_ROLE"),
  "database": os.environ.get("SNOWFLAKE_DATABASE"),
  "schema": os.environ.get("SNOWFLAKE_SCHEMA"),
  "warehouse": os.environ.get("SNOWFLAKE_WAREHOUSE")
}

snowpark_session = Session.builder.configs(connection_params).create()

## Using Cortex Complete

With the session set, we have what need to call a Snowflake Cortex LLM:

In [2]:
from snowflake.cortex import Complete

print(Complete("mistral-large2", "What is streamlint"))

 As of my last update in October 2023, there is no widely recognized tool or software specifically named "streamlint." It's possible that "streamlint" could refer to a specialized tool or a typo. If you meant something else, such as a linting tool for streaming data or a specific software, please provide more context or clarify your question.

If you're looking for linting tools in general, there are several well-known ones for different programming languages and environments, such as:

- **ESLint** for JavaScript
- **Pylint** for Python
- **RuboCop** for Ruby
- **Flake8** for Python
- **TSLint** (now deprecated in favor of ESLint) for TypeScript

These tools help developers maintain code quality by checking for syntax errors, style issues, and potential bugs. If you have a specific context or need in mind, please provide more details so I can assist you better.


In [32]:
!pip install llama-index-llms-openai
!pip install llama-index-embeddings-openai
!pip install llama-index-readers-github
!pip install llama-index nest_asyncio

Collecting llama-index
  Using cached llama_index-0.12.2-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.5.0,>=0.4.0 (from llama-index)
  Using cached llama_index_agent_openai-0.4.0-py3-none-any.whl.metadata (726 bytes)
Collecting llama-index-cli<0.5.0,>=0.4.0 (from llama-index)
  Using cached llama_index_cli-0.4.0-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-indices-managed-llama-cloud>=0.4.0 (from llama-index)
  Using cached llama_index_indices_managed_llama_cloud-0.6.3-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Using cached llama_index_legacy-0.9.48.post4-py3-none-any.whl.metadata (8.5 kB)
Collecting llama-index-multi-modal-llms-openai<0.4.0,>=0.3.0 (from llama-index)
  Using cached llama_index_multi_modal_llms_openai-0.3.0-py3-none-any.whl.metadata (726 bytes)
Collecting llama-index-program-openai<0.4.0,>=0.3.0 (from llama-index)
  Using cached llama_index_program_openai-0.3.1-py3-none-a

## Cortex Search

Next, we'll turn to the retrieval component of our RAG and set up Cortex Search.

This requires three steps:

1. Read and preprocess unstructured documents.
2. Embed the cleaned documents with Arctic Embed.
3. Call the Cortex search service.

### Read and preprocess unstructured documents

For this example, we want to load Cortex Search with documentation from Github about a popular open-source library, Streamlit. To do so, we'll use a GitHub data loader available from LlamaHub.

Here we'll also expend some effort to clean up the text so we can get better search results.

In [3]:
import requests

# Replace <your_github_token> with your token
headers = {"Authorization": os.environ.get("GITHUB_TOKEN")}
response = requests.get(
    "https://api.github.com/repos/streamlit/docs/branches/main", headers=headers
)

if response.status_code == 200:
    print(f"Branch data: {response.json()}")
else:
    print(f"Error: {response.status_code}, {response.json()}")


Branch data: {'name': 'main', 'commit': {'sha': 'e5a5d8699d8510d0cf1d45fb5701ae16c3749a16', 'node_id': 'C_kwDOFkWFONoAKGU1YTVkODY5OWQ4NTEwZDBjZjFkNDVmYjU3MDFhZTE2YzM3NDlhMTY', 'commit': {'author': {'name': 'Johannes Rieke', 'email': 'johannes.rieke@gmail.com', 'date': '2024-12-16T16:48:03Z'}, 'committer': {'name': 'GitHub', 'email': 'noreply@github.com', 'date': '2024-12-16T16:48:03Z'}, 'message': 'Update config-toml.md (#1204)', 'tree': {'sha': '6e9e0338f937ca27f3f29af79672dfe880145a99', 'url': 'https://api.github.com/repos/streamlit/docs/git/trees/6e9e0338f937ca27f3f29af79672dfe880145a99'}, 'url': 'https://api.github.com/repos/streamlit/docs/git/commits/e5a5d8699d8510d0cf1d45fb5701ae16c3749a16', 'comment_count': 0, 'verification': {'verified': True, 'reason': 'valid', 'signature': '-----BEGIN PGP SIGNATURE-----\n\nwsFcBAABCAAQBQJnYFnDCRC1aQ7uu5UhlAAAhDoQAGbwtNtfGkvh+PrcOZINjU8E\nMTW/ovp9gOc7OiEzoYso3B+j/G5A7KL403IJ1svX24GeIHpGnBTp8rD//pLhzvE9\nbFm6koXprg+JPNTY5aYih6EHLY2m8k/SrVddmZdc

In [4]:
github_token = os.environ.get("GITHUB_TOKEN")
print(github_token)

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


In [5]:
import os
import re
import nest_asyncio
nest_asyncio.apply()

from llama_index.readers.github import GithubRepositoryReader, GithubClient

#github_token = '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'
github_token = os.environ.get("GITHUB_TOKEN")
github_client = GithubClient(github_token=github_token, verbose=False)

reader = GithubRepositoryReader(
  github_client=github_client,
  owner="streamlit",
  repo="docs",
  use_parser=False,
  verbose=True,
  filter_directories=(
    ["content"],
    GithubRepositoryReader.FilterType.INCLUDE,
  ),
  filter_file_extensions=(
    [".md"],
    GithubRepositoryReader.FilterType.INCLUDE,
  )
)

print(f"Reader is {reader}")
documents = reader.load_data(branch="main")


def clean_up_text(content: str) -> str:
    """
    Remove unwanted characters and patterns in text input.

    :param content: Text input.

    :return: Cleaned version of original text input.
    """

    # Fix hyphenated words broken by newline
    content = re.sub(r"(\w+)-\n(\w+)", r"\1\2", content)

    unwanted_patterns = ["---\nvisible: false", "---", "#", "slug:"]
    for pattern in unwanted_patterns:
        content = re.sub(pattern, "", content)

    # Remove all slugs starting with a \ and stopping at the first space
    content = re.sub(r"\\slug: [^\s]*", "", content)

    # normalize whitespace
    content = re.sub(r"\s+", " ", content)
    return content


cleaned_documents = []

for d in documents:
    cleaned_text = clean_up_text(d.text)
    d.text = cleaned_text
    cleaned_documents.append(d)


Reader is <llama_index.readers.github.repository.base.GithubRepositoryReader object at 0x1351c6190>


KeyError: 'commit'

In [9]:
!pip install llama-index-llms-huggingface
!pip install llama-index-embeddings-huggingface

Collecting llama-index-embeddings-huggingface
  Using cached llama_index_embeddings_huggingface-0.4.0-py3-none-any.whl.metadata (767 bytes)
Collecting sentence-transformers>=2.6.1 (from llama-index-embeddings-huggingface)
  Using cached sentence_transformers-3.3.1-py3-none-any.whl.metadata (10 kB)
Collecting minijinja>=1.0 (from huggingface-hub[inference]>=0.19.0->llama-index-embeddings-huggingface)
  Using cached minijinja-2.5.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.metadata (8.8 kB)
Using cached llama_index_embeddings_huggingface-0.4.0-py3-none-any.whl (8.6 kB)
Using cached sentence_transformers-3.3.1-py3-none-any.whl (268 kB)
Using cached minijinja-2.5.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.7 MB)
Installing collected packages: minijinja, sentence-transformers, llama-index-embeddings-huggingface
Successfully installed llama-index-embeddings-huggingface-0.4.0 minijinja-2.5.0 sentence-transformers-3.3.1


### Process the documents with Semantic Splitting

We'll use Snowflake's Arctic Embed model available from HuggingFace to embed the documents. We'll also use Llama-Index's `SemanticSplitterNodeParser` for processing.

In [8]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SemanticSplitterNodeParser

embed_model = HuggingFaceEmbedding("Snowflake/snowflake-arctic-embed-m")

splitter = SemanticSplitterNodeParser(
  buffer_size=1, breakpoint_percentile_threshold=85, embed_model=embed_model
) 

With the embed model and splitter, we can execute them in an ingestion pipeline

In [None]:
from llama_index.core.ingestion import IngestionPipeline

cortex_search_pipeline = IngestionPipeline(
  transformations=[
    splitter,
  ],
)

results = cortex_search_pipeline.run(show_progress=True,documents=cleaned_documents)
#print(results)

### Load data to Cortex Search

Now that we've embedded our documents, we're ready to load them to Cortex Search.

Here we can use the same connection details as we set up for Cortex Complete.

In [10]:
import os
import snowflake.connector
from tqdm.auto import tqdm

snowflake_connector = snowflake.connector.connect(**connection_params)

cursor = snowflake_connector.cursor()

cursor.execute("CREATE OR REPLACE TABLE streamlit_docs(doc_text VARCHAR)")
for curr in tqdm(results):
    cursor.execute("INSERT INTO streamlit_docs VALUES (%s)", curr.text)

  0%|          | 0/1691 [00:00<?, ?it/s]

#Run in snowsql
CREATE OR REPLACE CORTEX SEARCH SERVICE LLMOPS_DB.LLMOPS_SCHEMA.LLMOPS_CORTEX_SEARCH_SERVICE
  ON doc_text
  WAREHOUSE = LLMOPS_WH_M
  TARGET_LAG = '1 hour'
AS (
  SELECT
      doc_text
  FROM LLMOPS_DB.LLMOPS_SCHEMA.streamlit_docs
);

### Call the Cortex Search Service

Next, we can go back to our python notebook and create a `CortexSearchRetreiver` class to connect to our cortex search service and add the `retrieve` method that we can leverage for calling it.

In [11]:
import os
from snowflake.core import Root
from typing import List

class CortexSearchRetriever:

    def __init__(self, session: Session, limit_to_retrieve: int = 4):
        self._session = session
        self._limit_to_retrieve = limit_to_retrieve

    def retrieve(self, query: str) -> List[str]:
        root = Root(self._session)
        cortex_search_service = (
        root
        .databases[os.environ.get("SNOWFLAKE_DATABASE")]
        .schemas[os.environ.get("SNOWFLAKE_SCHEMA")]
        .cortex_search_services[os.environ["SNOWFLAKE_CORTEX_SEARCH_SERVICE"]]
    )
        resp = cortex_search_service.search(
                query=query,
                columns=["doc_text"],
                limit=self._limit_to_retrieve,
            )

        if resp.results:
            return [curr["doc_text"] for curr in resp.results]
        else:
            return []

Once the retriever is created, we can test it out. Now that we have grounded access to the Streamlit docs, we can ask questions about using Streamlit, like "How do I launch a streamlit app".

In [12]:
retriever = CortexSearchRetriever(session=snowpark_session, limit_to_retrieve=4)

retrieved_context = retriever.retrieve(query="How do I launch a streamlit app?")

len(retrieved_context)



4

In [13]:
retriever.retrieve(query="How do I launch a streamlit app?")

[' title: Run your Streamlit app /develop/concepts/architecture/run-your-app Run your Streamlit app Working with Streamlit is simple. First you sprinkle a few Streamlit commands into a normal Python script, and then you run it. ',
 "The first step is to create a new Python script. Let's call it `uber_pickups.py`. 2. Open `uber_pickups.py` in your favorite IDE or text editor, then add these lines: ```python import streamlit as st import pandas as pd import numpy as np ``` 3. Every good app has a title, so let's add one: ```python st.title('Uber pickups in NYC') ``` 4. Now it's time to run Streamlit from the command line: ```bash streamlit run uber_pickups.py ``` Running a Streamlit app is no different than any other Python script. Whenever you need to view the app, you can use this command. <Tip> Did you know you can also pass a URL to `streamlit run`? This is great when combined with GitHub Gists. For example: ```bash streamlit run https://raw.githubusercontent.com/streamlit/demo-uber-

## Create a RAG with built-in observability

Now that we've set up the components we need from Snowflake Cortex, we can build our RAG.

We'll do this by creating a custom python class with each the methods we need. We'll also add TruLens instrumentation with the `@instrument` decorator to our app.

The first thing we need to do however, is to set the database connection where we'll log the traces and evaluation results from our application. This way we have a stored record that we can use to understand the app's performance. This is done when initializing `Tru`.

In [67]:
!pip install trulens
!pip install trulens-connectors-snowflake

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting trulens-connectors-snowflake
  Using cached trulens_connectors_snowflake-1.2.9-py3-none-any.whl.metadata (988 bytes)
Collecting snowflake-sqlalchemy<2.0,>=1.6 (from trulens-connectors-snowflake)
  Using cached snowflake_sqlalchemy-1.7.0-py3-none-any.whl.metadata (20 kB)
Using cached trulens_connectors_snowflake-1.2.9-py3-none-any.whl (898 kB)
Using cached snowflake_sqlalchemy-1.7.0-py3-none-any.whl (66 kB)
Installing collected packages: snowflake-sqlalchemy, trulens-connectors-snowflake
Successfully installed snowflake-sqlalchemy-1.7.0 trulens-connectors-snowflake-1.2.9


In [14]:
from trulens.core import TruSession
from trulens.connectors.snowflake import SnowflakeConnector

tru_snowflake_connector = SnowflakeConnector(snowpark_session=snowpark_session)

tru_session = TruSession(connector=tru_snowflake_connector)

Running the TruLens dashboard requires providing a `password` to the `SnowflakeConnector`.


🦑 Initialized with db url snowflake://%22praveensnowflake541%22:***@"gqmveyr-zk68763"/%22LLMOPS_DB%22/%22LLMOPS_SCHEMA%22?role=%22ACCOUNTADMIN%22&warehouse=%22LLMOPS_WH_M%22 .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of `TruSession` to prevent this.
Set TruLens workspace version tag: [('Statement executed successfully.',)]


Now we can construct the RAG.

In [15]:
from trulens.apps.custom import instrument


class RAG_from_scratch:

    def __init__(self):
        self.retriever = CortexSearchRetriever(session=snowpark_session, limit_to_retrieve=4)

    @instrument
    def retrieve_context(self, query: str) -> list:
        """
        Retrieve relevant text from vector store.
        """
        return self.retriever.retrieve(query)

    @instrument
    def generate_completion(self, query: str, context_str: list) -> str:
        """
        Generate answer from context.
        """
        prompt = f"""
          You are an expert assistant extracting information from context provided.
          Answer the question based on the context. Be concise and do not hallucinate.
          If you don´t have the information just say so.
          Context: {context_str}
          Question:
          {query}
          Answer:
        """
        return Complete("mistral-large2", prompt)

    @instrument
    def query(self, query: str) -> str:
        context_str = self.retrieve_context(query)
        return self.generate_completion(query, context_str)


rag = RAG_from_scratch()

In [16]:
rag.query("How do I launch a streamlit app?")

' To launch a Streamlit app, you can use the command line. First, create a Python script (e.g., `uber_pickups.py`). Then, run the following command in your terminal:\n\n```bash\nstreamlit run uber_pickups.py\n```\n\nAlternatively, you can also pass a URL to `streamlit run`, which is useful with GitHub Gists:\n\n```bash\nstreamlit run https://raw.githubusercontent.com/streamlit/demo-uber-nyc-pickups/master/streamlit_app.py\n```\n\nThe app should automatically open in a new tab in your browser.'

In [19]:
!pip install trulens-providers-cortex

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting trulens-providers-cortex
  Using cached trulens_providers_cortex-1.2.9-py3-none-any.whl.metadata (1.1 kB)
Using cached trulens_providers_cortex-1.2.9-py3-none-any.whl (6.5 kB)
Installing collected packages: trulens-providers-cortex
Successfully installed trulens-providers-cortex-1.2.9


After constructing the RAG, we can set up the feedback functions we want to use to evaluate the RAG.

Here, we'll use the [RAG Triad](https://www.trulens.org/trulens_eval/getting_started/core_concepts/rag_triad/). The RAG triad is made up of 3 evaluations along each edge of the RAG architecture: context relevance, groundedness and answer relevance.

Satisfactory evaluations on each provides us confidence that our LLM app is free from hallucination.

We will also use [LLM-as-a-Judge](https://arxiv.org/abs/2306.05685) evaluations, using Mistral Large on [Snowflake Cortex](https://www.trulens.org/trulens_eval/api/provider/cortex/) as the LLM.

In [17]:
from trulens.providers.cortex.provider import Cortex
from trulens.core import Feedback
from trulens.core import Select
import numpy as np

#provider = Cortex(snowpark_session.connection, "llama3.1-8b")
provider = Cortex(snowpark_session, "mistral-large2")

f_groundedness = (
    Feedback(
    provider.groundedness_measure_with_cot_reasons, name="Groundedness")
    .on(Select.RecordCalls.retrieve_context.rets[:].collect())
    .on_output()
)

f_context_relevance = (
    Feedback(
    provider.context_relevance,
    name="Context Relevance")
    .on_input()
    .on(Select.RecordCalls.retrieve_context.rets[:])
    .aggregate(np.mean)
)

f_answer_relevance = (
    Feedback(
    provider.relevance,
    name="Answer Relevance")
    .on_input()
    .on_output()
    .aggregate(np.mean)
)

feedbacks = [f_context_relevance,
            f_answer_relevance,
            f_groundedness,
        ]

✅ In Groundedness, input source will be set to __record__.app.retrieve_context.rets[:].collect() .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input context will be set to __record__.app.retrieve_context.rets[:] .
✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .


After defining the feedback functions to use, we can just add them to the application along with giving the application an ID.

In [28]:
from trulens_eval import TruCustomApp
#from trulens.apps.custom import TruCustomApp
tru_rag = TruCustomApp(rag,
    app_id = 'RAG v1',
    feedbacks = [f_groundedness, f_answer_relevance, f_context_relevance])

Function <function filtered_RAG_from_scratch.query at 0x619a7fe20> was not found during instrumentation walk. Make sure it is accessible by traversing app <__main__.RAG_from_scratch object at 0x59815ec50> or provide a bound method for it as TruCustomApp constructor argument `methods_to_instrument`.
Function <function context_filter.__call__.<locals>.wrapper at 0x619a7e8e0> was not found during instrumentation walk. Make sure it is accessible by traversing app <__main__.RAG_from_scratch object at 0x59815ec50> or provide a bound method for it as TruCustomApp constructor argument `methods_to_instrument`.
Function <function filtered_RAG_from_scratch.generate_completion at 0x619a7f560> was not found during instrumentation walk. Make sure it is accessible by traversing app <__main__.RAG_from_scratch object at 0x59815ec50> or provide a bound method for it as TruCustomApp constructor argument `methods_to_instrument`.


In [29]:
prompts = [
    "How do I launch a streamlit app?"

]

Now that the application is ready, we can run it on a test set of questions about streamlit to measure its performance.

In [30]:
with tru_rag as recording:
    for prompt in prompts:
        rag.query(prompt)

In [31]:
tru_session.get_leaderboard()



Unnamed: 0_level_0,Unnamed: 1_level_0,Answer Relevance,Context Relevance,Groundedness,latency,total_cost
app_name,app_version,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
RAG with filters,base,1.0,1.0,0.19284,26.348381,0.47953
RAG v1,base,0.708995,0.557796,0.817405,2.921739,0.201454


## Use Guardrails

In addition to making informed iteration, we can also directly use feedback results as guardrails at inference time. In particular, here we show how to use the context relevance score as a guardrail to filter out irrelevant context before it gets passed to the LLM. This both reduces hallucination and improves efficiency.

To do so, we'll rebuild our RAG using the `@context-filter` decorator on the method we want to filter, and pass in the feedback function and threshold to use for guardrailing.

In [22]:
# note: feedback function used for guardrail must only return a score, not also reasons
f_context_relevance_score = (
    Feedback(provider.context_relevance, name = "Context Relevance")
)

from trulens_eval.guardrails.base import context_filter

# note: feedback function used for guardrail must only return a score, not also reasons
f_context_relevance_score = (
    Feedback(provider.context_relevance, name = "Context Relevance")
    .on_input()
    .on(Select.RecordCalls.retrieve.rets)
)

class filtered_RAG_from_scratch:

    def __init__(self):
        self.retriever = CortexSearchRetriever(session=snowpark_session, limit_to_retrieve=4)
    @instrument
    @context_filter(f_context_relevance_score, 0.75, keyword_for_prompt="query")
    def retrieve_context(self, query: str) -> list:
        """
        Retrieve relevant text from vector store.
        """
        results = self.retriever.retrieve(query)
        return results

    @instrument
    def generate_completion(self, query: str, context_str: list) -> str:
        """
        Generate answer from context.
        """
        completion = Complete("mistral-large2",query)
        return completion

    @instrument
    def query(self, query: str) -> str:
        context_str = self.retrieve_context(query=query)
        completion = self.generate_completion(query=query, context_str=context_str)
        return completion

filtered_rag = filtered_RAG_from_scratch()


✅ In Context Relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input context will be set to __record__.app.retrieve.rets .


  from trulens_eval.guardrails.base import context_filter
  from trulens_eval.guardrails.base import context_filter


We can combine the new version of our app with the feedback functions we already defined

In [23]:
from trulens.apps.custom import TruCustomApp

filtered_tru_rag = TruCustomApp(filtered_rag,
    app_id = 'RAG with filters',
    feedbacks = [f_groundedness, f_answer_relevance, f_context_relevance])

Function <function RAG_from_scratch.generate_completion at 0x6183ab060> was not found during instrumentation walk. Make sure it is accessible by traversing app <__main__.filtered_RAG_from_scratch object at 0x619a83350> or provide a bound method for it as TruCustomApp constructor argument `methods_to_instrument`.
Function <function RAG_from_scratch.query at 0x6183ab100> was not found during instrumentation walk. Make sure it is accessible by traversing app <__main__.filtered_RAG_from_scratch object at 0x619a83350> or provide a bound method for it as TruCustomApp constructor argument `methods_to_instrument`.
Function <function RAG_from_scratch.retrieve_context at 0x6183479c0> was not found during instrumentation walk. Make sure it is accessible by traversing app <__main__.filtered_RAG_from_scratch object at 0x619a83350> or provide a bound method for it as TruCustomApp constructor argument `methods_to_instrument`.


Then we run it on a test set of questions about streamlit to measure its performance.

In [24]:
with filtered_tru_rag as recording:
    for prompt in prompts:
        filtered_rag.query(prompt)



In [26]:
tru_session.get_leaderboard()

Unnamed: 0_level_0,Unnamed: 1_level_0,Answer Relevance,Context Relevance,Groundedness,latency,total_cost
app_name,app_version,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
RAG with filters,base,1.0,1.0,0.19284,26.348381,0.47953
RAG v1,base,0.704301,0.557796,0.817405,2.897306,0.200206


## Conclusion And Resources

Congratulations! You've successfully built a RAG by combining Cortex Search and LLM Functions, adding in TruLens Feedback Functions as Observability. You also set up logging for TruLens to Snowflake, and added TruLens Guardrails to reduce hallucination.

### What You Learned

- How to build a RAG with Cortex Search and Cortex LLM Functions.
- How to use TruLens Feedback Functions and Tracing.
- How to log TruLens Evaluation Results and Traces to Snowflake.
- How to use TruLens Feedback Functions as Guardrails to reduce hallucination.

### Related Resources

- [Snowflake Cortex Documentation](https://docs.snowflake.com/en/guides-overview-ai-features)
- [TruLens Documentation](https://trulens.org/)
- [TruLens GitHub Repository](https://github.com/truera/trulens)