# Getting Started with LLMOps using Snowflake Cortex and TruLens

By completing this guide, you'll get started with LLMOps by building a RAG by combining [Cortex LLM Functions](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions) and [Cortex Search](https://github.com/Snowflake-Labs/cortex-search?tab=readme-ov-file), and then using [TruLens](https://www.trulens.org/) to add observability and guardrails.

Along the way, you will also learn how run TruLens feedback functions with Snowflake Cortex as the [feedback provider](https://www.trulens.org/trulens_eval/api/provider/), and how to [log TruLens traces and evaluation metrics to a Snowflake table](https://www.trulens.org/trulens_eval/tracking/logging/where_to_log/log_in_snowflake/#logging-in-snowflake). Last, we'll show how to use [TruLens guardrails](https://www.trulens.org/trulens_eval/guardrails/) for filtering retrieved context and reducing hallucination.

Here is a summary of what you will be able to learn in each step by following this quickstart:

- **Setup Environment**: Create a session to use Snowflake Cortex capabilities.
- **Cortex Complete**: Use Cortex `Complete()` to call Mistral Large.
- **Add Data**: Load and preprocess raw documentation from GitHub, and load to Cortex Search.
- **Search**: Search over the data loaded to Cortex Search.
- **Create a RAG**: Create a RAG with Cortex Search and Complete and add TruLens instrumentation.
- **Feedback Functions**: Add context relevance, groundedness and answer relevance evaluations to the RAG.
- **Application Testing**: Understand the performance of your RAG across a test set.
- **Guardrails**: Add context filter guardrails to reduce hallucinations.
- **Measure Improvement**: See the improved evaluation results after adding guardrails.

### What are Cortex LLM Functions?

Snowflake Cortex gives you instant access to industry-leading large language models (LLMs) trained by researchers at companies like Mistral, Reka, Meta, and Google, including Snowflake Arctic, an open enterprise-grade model developed by Snowflake.

### What is Cortex Search?

Cortex Search enables low-latency, high-quality search over your Snowflake data. Cortex Search powers a broad array of search experiences for Snowflake users including Retrieval Augmented Generation (RAG) applications leveraging Large Language Models (LLMs).

### What is TruLens?

[TruLens](https://www.trulens.org/) is a library for tracking and evaluating Generative AI applications. It provides an extensive set of feedback functions to systematically measure the quality of your LLM based applications. It also traces the internal steps of your application, and allows you to run feedback functions on any internal step. Feedback function results can be examined in a TruLens dashboard, or used at runtime as guardrails.

### What You Will Learn
- How to build a RAG with Cortex Search and Cortex LLM Functions.
- How to use TruLens Feedback Functions and Tracing.
- How to log TruLens Evaluation Results and Traces to Snowflake.
- How to use TruLens Feedback Functions as Guardrails to reduce hallucination.

### What You Will Build
- A retrieval-augmented generation (RAG) app
- An LLMOps pipeline
- Context filter guardrails

### Prerequisites
- A Snowflake account with Cortex LLM Functions and Cortex Search enabled.  If you do not have a Snowflake account, you can register for a [free trial account](https://signup.snowflake.com/?utm_cta=quickstarts_&_fsi=yYZEVo4S&_fsi=yYZEVo4S).
- A Snowflake account login with ACCOUNTADMIN role. If you have this role in your environment, you may choose to use it. If not, you will need to 1) Register for a free trial, 2) Use a different role that has the ability to create database, schema, tables, stages, tasks, user-defined functions, and stored procedures OR 3) Use an existing database and schema in which you are able to create the mentioned objects.

## Setup

In [1]:
!pip install python-dotenv

Collecting python-dotenv
  Using cached python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Using cached python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1


In [None]:
!pip install snowflake-snowpark-python
!pip install pandas

Once we have an environment with the right packages installed, we can load our credentials and set our Snowflake connection in a jupyter notebook notebook.

In [1]:
from dotenv import load_dotenv
from snowflake.snowpark.session import Session
import os


load_dotenv()

connection_params = {
  "account":  os.environ.get("SNOWFLAKE_ACCOUNT"),
  "user": os.environ.get("SNOWFLAKE_USER"),
  "password": os.environ.get("SNOWFLAKE_USER_PASSWORD"),
  "role": os.environ.get("SNOWFLAKE_ROLE"),
  "database": os.environ.get("SNOWFLAKE_DATABASE"),
  "schema": os.environ.get("SNOWFLAKE_SCHEMA"),
  "warehouse": os.environ.get("SNOWFLAKE_WAREHOUSE")
}

snowpark_session = Session.builder.configs(connection_params).create()

In [2]:
import pandas as pd

In [2]:
df =pd.read_csv("/Users/praveenhome/Desktop/PRAVEENBASE/SNOWFLAKE/cortex/snowflake_cortex_app/research/outputjson.csv")

NameError: name 'pd' is not defined

In [None]:
!pip install snowflake.core
!pip install snowflake-connector-python

In [None]:
!pip install snowflake
!pip install snowflake-connector-python==2.8.3

In [2]:
from snowflake.cortex import Complete, ExtractAnswer, Sentiment, Summarize, Translate, ClassifyText

text = """
    The Snowflake company was co-founded by Thierry Cruanes, Marcin Zukowski,
    and Benoit Dageville in 2012 and is headquartered in Bozeman, Montana.
"""

print(Complete("mistral-large2", "how do snowflakes get their unique patterns?"))
print(ExtractAnswer(text, "When was snowflake founded?"))
print(Sentiment("I really enjoyed this restaurant. Fantastic service!"))
print(Summarize(text))
print(Translate(text, "en", "fr"))
print(ClassifyText("France", ["Europe", "Asia"]))

 Snowflakes get their unique patterns through a complex process of crystallization that occurs as water vapor freezes in the atmosphere. Here's a simplified explanation:

1. **Nucleation**: The process begins with a tiny particle in the atmosphere, such as a speck of dust or pollen, which acts as a nucleus. Water vapor condenses onto this nucleus and starts to freeze.

2. **Crystal Growth**: As more water vapor freezes onto the initial ice crystal, it forms a hexagonal prism shape. This is because the water molecules arrange themselves in a hexagonal pattern due to their molecular structure.

3. **Branching**: As the ice crystal grows, it can develop branches, or "arms." The exact pattern of these branches is influenced by the temperature and humidity conditions in the atmosphere. For example, at around -15°C (5°F), the arms grow long and slender, while at around -5°C (23°F), the arms grow more complex, with side branches forming.

4. **Individuality**: The unique pattern of each snowf

## Using Cortex Complete

With the session set, we have what need to call a Snowflake Cortex LLM:

In [4]:
from snowflake.cortex import Complete

try:
    result = Complete("mistral-large2", "What is streamlit app?")
    print(result)
except Exception as e:
    print(f"Error occurred: {e}")


 Streamlit is an open-source Python library that makes it easy to create and share custom web apps for machine learning and data science. It allows you to turn data scripts into shareable web apps in just a few lines of code. Here are some key features of Streamlit:

1. **Easy to Use**: Streamlit is designed to be user-friendly, especially for those who are already familiar with Python. You can create interactive web apps using simple Python scripts.

2. **Interactive Widgets**: Streamlit provides a variety of interactive widgets like sliders, text inputs, buttons, and more, which can be used to create dynamic and interactive applications.

3. **Real-time Updates**: Any changes you make to your Python script are reflected in real-time in the web app, making it easy to iterate and develop quickly.

4. **Deployment**: Streamlit apps can be easily deployed and shared with others. You can host your app on Streamlit's own cloud platform, Streamlit Cloud, or on other platforms like Heroku, A

In [None]:
#ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'AWS_US';

In [4]:
!pip install llama-index-llms-openai
!pip install llama-index-embeddings-openai
!pip install llama-index-readers-github
!pip install llama-index nest_asyncio



## Cortex Search

Next, we'll turn to the retrieval component of our RAG and set up Cortex Search.

This requires three steps:

1. Read and preprocess unstructured documents.
2. Embed the cleaned documents with Arctic Embed.
3. Call the Cortex search service.

### Read and preprocess unstructured documents

For this example, we want to load Cortex Search with documentation from Github about a popular open-source library, Streamlit. To do so, we'll use a GitHub data loader available from LlamaHub.

Here we'll also expend some effort to clean up the text so we can get better search results.

In [3]:
import requests
import os

# Replace <your_github_token> with your token
headers = {"Authorization": os.environ.get("GITHUB_TOKEN")}
response = requests.get(
    "https://api.github.com/repos/praveen-prog/docs/branches/main", headers=headers
)

if response.status_code == 200:
    print(f"Branch data: {response.json()}")
else:
    print(f"Error: {response.status_code}, {response.json()}")


Branch data: {'name': 'main', 'commit': {'sha': 'd921ad84319e7f3ad5497094fa755e80ad4a8067', 'node_id': 'C_kwDONcSh7doAKGQ5MjFhZDg0MzE5ZTdmM2FkNTQ5NzA5NGZhNzU1ZTgwYWQ0YTgwNjc', 'commit': {'author': {'name': 'Praveen Kumar Chandran', 'email': 'praveenhome@Praveens-MacBook-Pro.local', 'date': '2024-12-11T22:19:29Z'}, 'committer': {'name': 'Praveen Kumar Chandran', 'email': 'praveenhome@Praveens-MacBook-Pro.local', 'date': '2024-12-11T22:19:29Z'}, 'message': 'test data addeed', 'tree': {'sha': 'a0533c2d0222e13978295389e17f6fbd598acea6', 'url': 'https://api.github.com/repos/praveen-prog/docs/git/trees/a0533c2d0222e13978295389e17f6fbd598acea6'}, 'url': 'https://api.github.com/repos/praveen-prog/docs/git/commits/d921ad84319e7f3ad5497094fa755e80ad4a8067', 'comment_count': 0, 'verification': {'verified': False, 'reason': 'unsigned', 'signature': None, 'payload': None, 'verified_at': None}}, 'url': 'https://api.github.com/repos/praveen-prog/docs/commits/d921ad84319e7f3ad5497094fa755e80ad4a8067',

In [4]:
github_token = os.environ.get("GITHUB_TOKEN")
print(github_token)

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


In [5]:
import os
import re
import nest_asyncio
nest_asyncio.apply()
load_dotenv()

from llama_index.readers.github import GithubRepositoryReader, GithubClient

#github_token = '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'
github_token = os.environ.get("GITHUB_TOKEN")
github_client = GithubClient(github_token=github_token, verbose=False)

reader = GithubRepositoryReader(
  github_client=github_client,
  owner="praveen-prog",
  repo="docs",
  use_parser=False,
  verbose=True,
  filter_directories=(
    ["content"],
    GithubRepositoryReader.FilterType.INCLUDE,
  ),
  filter_file_extensions=(
    [".md"],
    GithubRepositoryReader.FilterType.INCLUDE,
  )
)

print(f"Reader is {reader}")
documents = reader.load_data(branch="main")
print(f"Type is {type(documents)}")


def clean_up_text(content: str) -> str:
    """
    Remove unwanted characters and patterns in text input.

    :param content: Text input.

    :return: Cleaned version of original text input.
    """

    # Fix hyphenated words broken by newline
    content = re.sub(r"(\w+)-\n(\w+)", r"\1\2", content)

    unwanted_patterns = ["---\nvisible: false", "---", "#", "slug:"]
    for pattern in unwanted_patterns:
        content = re.sub(pattern, "", content)

    # Remove all slugs starting with a \ and stopping at the first space
    content = re.sub(r"\\slug: [^\s]*", "", content)

    # normalize whitespace
    content = re.sub(r"\s+", " ", content)
    return content

cleaned_documents = []

for d in documents:
    cleaned_text = clean_up_text(d.text)
    d.text = cleaned_text
    cleaned_documents.append(d)



    


Reader is <llama_index.readers.github.repository.base.GithubRepositoryReader object at 0x1694a2c50>


KeyError: 'commit'

In [11]:
cleaned_documents[0]

NameError: name 'cleaned_documents' is not defined

In [9]:
!pip install llama-index-llms-huggingface
!pip install llama-index-embeddings-huggingface

Collecting llama-index-embeddings-huggingface
  Using cached llama_index_embeddings_huggingface-0.4.0-py3-none-any.whl.metadata (767 bytes)
Collecting sentence-transformers>=2.6.1 (from llama-index-embeddings-huggingface)
  Using cached sentence_transformers-3.3.1-py3-none-any.whl.metadata (10 kB)
Collecting minijinja>=1.0 (from huggingface-hub[inference]>=0.19.0->llama-index-embeddings-huggingface)
  Using cached minijinja-2.5.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.metadata (8.8 kB)
Using cached llama_index_embeddings_huggingface-0.4.0-py3-none-any.whl (8.6 kB)
Using cached sentence_transformers-3.3.1-py3-none-any.whl (268 kB)
Using cached minijinja-2.5.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.7 MB)
Installing collected packages: minijinja, sentence-transformers, llama-index-embeddings-huggingface
Successfully installed llama-index-embeddings-huggingface-0.4.0 minijinja-2.5.0 sentence-transformers-3.3.1


### Process the documents with Semantic Splitting

We'll use Snowflake's Arctic Embed model available from HuggingFace to embed the documents. We'll also use Llama-Index's `SemanticSplitterNodeParser` for processing.

In [12]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SemanticSplitterNodeParser

embed_model = HuggingFaceEmbedding("Snowflake/snowflake-arctic-embed-m")

splitter = SemanticSplitterNodeParser(
  buffer_size=1, breakpoint_percentile_threshold=85, embed_model=embed_model
) 

With the embed model and splitter, we can execute them in an ingestion pipeline

In [13]:
from llama_index.core.ingestion import IngestionPipeline

cortex_search_pipeline = IngestionPipeline(
  transformations=[
    splitter,
  ],
)

results = cortex_search_pipeline.run(show_progress=False,documents=cleaned_documents)
print(type(results))

NameError: name 'cleaned_documents' is not defined

### Load data to Cortex Search

Now that we've embedded our documents, we're ready to load them to Cortex Search.

Here we can use the same connection details as we set up for Cortex Complete.

In [11]:
import os
import snowflake.connector
from tqdm.auto import tqdm

snowflake_connector = snowflake.connector.connect(**connection_params)

cursor = snowflake_connector.cursor()

cursor.execute("CREATE OR REPLACE TABLE streamlit_docs(doc_text VARCHAR)")
for curr in tqdm(results):
    cursor.execute("INSERT INTO streamlit_docs VALUES (%s)", curr.text)

  0%|          | 0/1 [00:00<?, ?it/s]

#Run in snowsql
CREATE OR REPLACE CORTEX SEARCH SERVICE LLMOPS_DB.LLMOPS_SCHEMA.LLMOPS_CORTEX_SEARCH_SERVICE
  ON doc_text
  WAREHOUSE = LLMOPS_WH_M
  TARGET_LAG = '1 hour'
AS (
  SELECT
      doc_text
  FROM LLMOPS_DB.LLMOPS_SCHEMA.streamlit_docs
);

### Call the Cortex Search Service

Next, we can go back to our python notebook and create a `CortexSearchRetreiver` class to connect to our cortex search service and add the `retrieve` method that we can leverage for calling it.

In [12]:
import os
from snowflake.core import Root
from typing import List

class CortexSearchRetriever:

    def __init__(self, session: Session, limit_to_retrieve: int = 4):
        self._session = session
        self._limit_to_retrieve = limit_to_retrieve

    def retrieve(self, query: str) -> List[str]:
        root = Root(self._session)
        cortex_search_service = (
        root
        .databases[os.environ.get("SNOWFLAKE_DATABASE")]
        .schemas[os.environ.get("SNOWFLAKE_SCHEMA")]
        .cortex_search_services[os.environ["SNOWFLAKE_CORTEX_SEARCH_SERVICE"]]
    )
        resp = cortex_search_service.search(
                query=query,
                columns=["doc_text"],
                limit=self._limit_to_retrieve,
            )

        if resp.results:
            return [curr["doc_text"] for curr in resp.results]
        else:
            return []

Once the retriever is created, we can test it out. Now that we have grounded access to the Streamlit docs, we can ask questions about using Streamlit, like "How do I launch a streamlit app".

In [13]:
retriever = CortexSearchRetriever(session=snowpark_session, limit_to_retrieve=4)

retrieved_context = retriever.retrieve(query="How do I launch a streamlit app?")

len(retrieved_context)



1

In [13]:
retriever.retrieve(query="How do I launch a streamlit app?")

['This hackathon is an opportunity to riff with cutting-edge AI technology. Join us and get comfortable with a setlist for learning AI with Cortex Search for retrieval, Mistral LLM (mistral-large2) on Snowflake Cortex for generation, and Streamlit Community Cloud for the front end']

## Create a RAG with built-in observability

Now that we've set up the components we need from Snowflake Cortex, we can build our RAG.

We'll do this by creating a custom python class with each the methods we need. We'll also add TruLens instrumentation with the `@instrument` decorator to our app.

The first thing we need to do however, is to set the database connection where we'll log the traces and evaluation results from our application. This way we have a stored record that we can use to understand the app's performance. This is done when initializing `Tru`.

In [None]:
!pip install trulens
!pip install trulens-connectors-snowflake

In [14]:
from trulens.core import TruSession
from trulens.connectors.snowflake import SnowflakeConnector

tru_snowflake_connector = SnowflakeConnector(snowpark_session=snowpark_session)

tru_session = TruSession(connector=tru_snowflake_connector)

Running the TruLens dashboard requires providing a `password` to the `SnowflakeConnector`.


🦑 Initialized with db url snowflake://%22praveensnowflake541%22:***@"JCYRVUG-LR36187"/%22LLMOPS_DB%22/%22LLMOPS_SCHEMA%22?role=%22ACCOUNTADMIN%22&warehouse=%22LLMOPS_WH_M%22 .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of `TruSession` to prevent this.
Set TruLens workspace version tag: [('Statement executed successfully.',)]


Now we can construct the RAG.

In [15]:
from trulens.apps.custom import instrument
from snowflake.cortex import Complete


class RAG_from_scratch:

    def __init__(self):
        self.retriever = CortexSearchRetriever(session=snowpark_session, limit_to_retrieve=4)

    @instrument
    def retrieve_context(self, query: str) -> list:
        """
        Retrieve relevant text from vector store.
        """
        return self.retriever.retrieve(query)

    @instrument
    def generate_completion(self, query: str, context_str: list) -> str:
        """
        Generate answer from context.
        """
        prompt = f"""
          You are an expert assistant extracting information from context provided.
          Answer the question based on the context. Be concise and do not hallucinate.
          If you don't have the information just say so.
          Context: {context_str}
          Question:
          {query}
          Answer:
        """
        return Complete("mistral-large2", prompt)

    @instrument
    def query(self, query: str) -> str:
        context_str = self.retrieve_context(query)
        return self.generate_completion(query, context_str)


rag = RAG_from_scratch()

In [17]:
rag.query(" streamlit app?")

' Yes, the context mentions Streamlit Community Cloud for the front end.'

In [None]:
!pip install trulens-providers-cortex

After constructing the RAG, we can set up the feedback functions we want to use to evaluate the RAG.

Here, we'll use the [RAG Triad](https://www.trulens.org/trulens_eval/getting_started/core_concepts/rag_triad/). The RAG triad is made up of 3 evaluations along each edge of the RAG architecture: context relevance, groundedness and answer relevance.

Satisfactory evaluations on each provides us confidence that our LLM app is free from hallucination.

We will also use [LLM-as-a-Judge](https://arxiv.org/abs/2306.05685) evaluations, using Mistral Large on [Snowflake Cortex](https://www.trulens.org/trulens_eval/api/provider/cortex/) as the LLM.

In [25]:
from trulens.providers.cortex.provider import Cortex
from trulens.core import Feedback
from trulens.core import Select

from trulens_eval.guardrails.base import context_filter
import numpy as np

#provider = Cortex(snowpark_session.connection, "llama3.1-8b")
provider = Cortex(snowpark_session, "mistral-large2")

f_groundedness = (
    Feedback(
    provider.groundedness_measure_with_cot_reasons, name="Groundedness")
    .on(Select.RecordCalls.retrieve_context.rets[:].collect())
    .on_output()
)

f_context_relevance = (
    Feedback(
    provider.context_relevance,
    name="Context Relevance")
    .on_input()
    .on(Select.RecordCalls.retrieve_context.rets[:])
    .aggregate(np.mean)
)

f_answer_relevance = (
    Feedback(
    provider.relevance,
    name="Answer Relevance")
    .on_input()
    .on_output()
    .aggregate(np.mean)
)

feedbacks = [f_context_relevance,
            f_answer_relevance,
            f_groundedness,
        ]

f_context_relevance_score = (
    Feedback(provider.context_relevance, name = "Context Relevance")
    .on_input()
    .on(Select.RecordCalls.retrieve.rets)
)


✅ In Groundedness, input source will be set to __record__.app.retrieve_context.rets[:].collect() .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input context will be set to __record__.app.retrieve_context.rets[:] .
✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input context will be set to __record__.app.retrieve.rets .


  from trulens_eval.guardrails.base import context_filter
  from trulens_eval.guardrails.base import context_filter


After defining the feedback functions to use, we can just add them to the application along with giving the application an ID.

In [21]:
from trulens_eval import TruCustomApp
#from trulens.apps.custom import TruCustomApp
tru_rag = TruCustomApp(rag,
    app_id = 'RAG v1',
    feedbacks = [f_groundedness, f_answer_relevance, f_context_relevance])

In [22]:
prompts = ["What is streamlit?"]

In [20]:
type(prompts)

list

In [37]:
a='112'
b=a.split()
print(type(a))
print(b)

<class 'str'>
['112']


Now that the application is ready, we can run it on a test set of questions about streamlit to measure its performance.

In [23]:
with tru_rag as recording:
    for prompt in prompts:
        result = rag.query(prompt)
        print(result)

Could not find an instance of DummyEndpoint. trulens will create an endpoint for cost tracking.


 Streamlit is a tool for the front end.


In [24]:
tru_session.get_leaderboard()

Unnamed: 0_level_0,Unnamed: 1_level_0,Answer Relevance,Context Relevance,Groundedness,latency,total_cost
app_name,app_version,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
RAG with filters,base,1.0,,0.117647,10.122098,0.188017
RAG v1,base,0.395833,0.761905,1.0,1.12402,0.004255


In [30]:
from trulens.apps.custom import instrument
from snowflake.cortex import Complete


class RAG_from_scratch:

    def __init__(self):
        self.retriever = CortexSearchRetriever(session=snowpark_session, limit_to_retrieve=4)

    @instrument
    @context_filter(f_context_relevance_score, 0.20, keyword_for_prompt="query")
    def retrieve_context(self, query: str) -> list:
        """
        Retrieve relevant text from vector store.
        """
        return self.retriever.retrieve(query)

    @instrument
    def generate_completion(self, query: str, context_str: list) -> str:
        """
        Generate answer from context.
        """
        prompt = f"""
          You are an expert assistant extracting information from context provided.
          Answer the question based on the context. Be concise and do not hallucinate.
          If you don't have the information just say so.
          Context: {context_str}
          Question:
          {query}
          Answer:
        """
        return Complete("mistral-large2", prompt)

    @instrument
    def query(self, query: str) -> str:
        context_str = self.retrieve_context(query)
        return self.generate_completion(query, context_str)


rag = RAG_from_scratch()

In [31]:
with tru_rag as recording:
    for prompt in prompts:
        result = rag.query(prompt)
        print(result)

 Streamlit is a tool for the front end.


## Use Guardrails

In addition to making informed iteration, we can also directly use feedback results as guardrails at inference time. In particular, here we show how to use the context relevance score as a guardrail to filter out irrelevant context before it gets passed to the LLM. This both reduces hallucination and improves efficiency.

To do so, we'll rebuild our RAG using the `@context-filter` decorator on the method we want to filter, and pass in the feedback function and threshold to use for guardrailing.

In [23]:
# note: feedback function used for guardrail must only return a score, not also reasons
f_context_relevance_score = (
    Feedback(provider.context_relevance, name = "Context Relevance")
)

from trulens_eval.guardrails.base import context_filter

# note: feedback function used for guardrail must only return a score, not also reasons
f_context_relevance_score = (
    Feedback(provider.context_relevance, name = "Context Relevance")
    .on_input()
    .on(Select.RecordCalls.retrieve.rets)
)

class filtered_RAG_from_scratch:

    def __init__(self):
        self.retriever = CortexSearchRetriever(session=snowpark_session, limit_to_retrieve=4)
    @instrument
    @context_filter(f_context_relevance_score, 0.75, keyword_for_prompt="query")
    def retrieve_context(self, query: str) -> list:
        """
        Retrieve relevant text from vector store.
        """
        results = self.retriever.retrieve(query)
        return results

    @instrument
    def generate_completion(self, query: str, context_str: list) -> str:
        """
        Generate answer from context.
        """
        completion = Complete("mistral-large2",query)
        return completion

    @instrument
    def query(self, query: str) -> str:
        context_str = self.retrieve_context(query=query)
        completion = self.generate_completion(query=query, context_str=context_str)
        return completion

filtered_rag = filtered_RAG_from_scratch()


✅ In Context Relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input context will be set to __record__.app.retrieve.rets .


  from trulens_eval.guardrails.base import context_filter
  from trulens_eval.guardrails.base import context_filter


We can combine the new version of our app with the feedback functions we already defined

In [24]:
from trulens.apps.custom import TruCustomApp

filtered_tru_rag = TruCustomApp(filtered_rag,
    app_id = 'RAG with filters',
    feedbacks = [f_groundedness, f_answer_relevance, f_context_relevance])

Function <function RAG_from_scratch.query at 0x32aed9080> was not found during instrumentation walk. Make sure it is accessible by traversing app <__main__.filtered_RAG_from_scratch object at 0x32ae03c50> or provide a bound method for it as TruCustomApp constructor argument `methods_to_instrument`.
Function <function RAG_from_scratch.generate_completion at 0x32aed8fe0> was not found during instrumentation walk. Make sure it is accessible by traversing app <__main__.filtered_RAG_from_scratch object at 0x32ae03c50> or provide a bound method for it as TruCustomApp constructor argument `methods_to_instrument`.
Function <function RAG_from_scratch.retrieve_context at 0x116f89940> was not found during instrumentation walk. Make sure it is accessible by traversing app <__main__.filtered_RAG_from_scratch object at 0x32ae03c50> or provide a bound method for it as TruCustomApp constructor argument `methods_to_instrument`.


Then we run it on a test set of questions about streamlit to measure its performance.

In [25]:
with filtered_tru_rag as recording:
    for prompt in prompts:
        filtered_rag.query(prompt)





In [26]:
tru_session.get_leaderboard()

Unnamed: 0_level_0,Unnamed: 1_level_0,Answer Relevance,Context Relevance,Groundedness,latency,total_cost
app_name,app_version,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
RAG with filters,base,1.0,,0.117647,10.122098,0.188017
RAG v1,base,0.333333,0.333333,1.0,1.012737,0.003175


## Conclusion And Resources

Congratulations! You've successfully built a RAG by combining Cortex Search and LLM Functions, adding in TruLens Feedback Functions as Observability. You also set up logging for TruLens to Snowflake, and added TruLens Guardrails to reduce hallucination.

### What You Learned

- How to build a RAG with Cortex Search and Cortex LLM Functions.
- How to use TruLens Feedback Functions and Tracing.
- How to log TruLens Evaluation Results and Traces to Snowflake.
- How to use TruLens Feedback Functions as Guardrails to reduce hallucination.

### Related Resources

- [Snowflake Cortex Documentation](https://docs.snowflake.com/en/guides-overview-ai-features)
- [TruLens Documentation](https://trulens.org/)
- [TruLens GitHub Repository](https://github.com/truera/trulens)