# ❄️ SKO RAG HOP - Snowflake Cortex with Anthropic and LLM Observability ❄️

This notebook demonstrates how to create a Retrieval-Augmented Generation (RAG) workflow in Snowflake using Cortex Search Services, integrate Anthropic LLMs like Claude 3.5, and evaluate responses with new LLM Observability features. Below is an overview of the flow and its key components.

### Step 1: Parse and Chunk Text from PDFs (BUILD)
### Step 2: Create Cortex Search Service (RETRIEVE)
### Step 3: Test Search Results with Experimental Configurations (AUGMENT)
### Step 4: Pass Retrieved Content to LLMs (GENERATE)
### Step 5: Create RAG Application Class (SERVE)
### Step 6: Observe and Evaluate LLM Performance with AI Observability (EVALUATE)

In [None]:
# Import necessary functions
import streamlit as st
from snowflake.snowpark.context import get_active_session
session = get_active_session()

# Define image in a stage and read the file
image=session.file.get_stream("@SKO_SKORAGHOP_LIVE_PROD.HOP.RAG/Flow.jpg", decompress=False).read() 

# Display the image
st.image(image, width=800)

In [None]:
import snowflake.snowpark as snowpark

from snowflake.snowpark.context import get_active_session
session = get_active_session()

In [None]:
-- List files in the stage to identify PDFs
LS @SKO_SKORAGHOP_LIVE_PROD.HOP.RAG;

## Step 1: Parse and Chunk Text from PDFs
We begin by parsing the content of uploaded PDFs and chunking the text using Snowflake's [PARSED_TEXT](https://docs.snowflake.com/sql-reference/functions/parse_document-snowflake-cortex) and [SPLIT_TEXT_RECURSIVE_CHARACTER](https://docs.snowflake.com/sql-reference/functions/split_text_recursive_character-snowflake-cortex) features. These steps structure the text into manageable segments optimized for retrieval. To ensure that the PDF parsing and chunking have been processed correctly, we run queries on the parsed and chunked tables. This step helps verify the integrity of the content.

Objective: **Transform unstructured content into indexed chunks for efficient search and retrieval.**

Key Outputs:
- SKO.HOP.PARSED_TEXT: Table containing the raw text.
- SKO.HOP.CORTEX_CHUNK: Chunked, searchable content.

In [1]:
-- Create a table to hold the extracted text from the PDF files loaded in the SKO_SKORAGHOP_LIVE_PROD.HOP.RAG stage

-- Complete the missing code (???) to use create a table called PARSED_TEXT

CREATE OR REPLACE TABLE SKO_SKORAGHOP_LIVE_PROD.HOP.PARSED_TEXT (relative_path VARCHAR(500), raw_text VARIANT);

SyntaxError: invalid syntax (579301450.py, line 1)

In [None]:
-- Use Snowflake's new PARSED_TEXT feature to extract the text from the PDFs loaded in @SKO_SKORAGHOP_LIVE_PROD.HOP.RAG stage
-- Cortex PARSE_DOCUMENT documentation link is https://docs.snowflake.com/sql-reference/functions/parse_document-snowflake-cortex

-- Complete the missing code (???) to:
---- Insert into your newly created PARSED_TEXT table
---- Use Cortex PARSE_DOCUMENT feature and layout mode

INSERT INTO SKO_SKORAGHOP_LIVE_PROD.HOP.PARSED_TEXT (relative_path, raw_text)
WITH pdf_files AS (
    SELECT DISTINCT
        METADATA$FILENAME AS relative_path
    FROM @SKO_SKORAGHOP_LIVE_PROD.HOP.RAG
    WHERE METADATA$FILENAME ILIKE '%.pdf'
      -- Exclude files that have already been parsed
      AND METADATA$FILENAME NOT IN (SELECT relative_path FROM PARSED_TEXT)
)
SELECT 
    relative_path,
    SNOWFLAKE.CORTEX.PARSE_DOCUMENT(
        '@SKO_SKORAGHOP_LIVE_PROD.HOP.RAG',  -- Your stage name
        relative_path,  -- File path
        {'mode': 'layout'}  -- Adjust mode as needed ('layout', 'ocr')
    ) AS raw_text
FROM pdf_files;

In [None]:
-- check the RAW_TEXT to ensure the PDF was parsed as expected
-- Complete the missing code (???) to check the RAW_TEXT to ensure the PDF was parsed as expected

SELECT *, SNOWFLAKE.CORTEX.COUNT_TOKENS('mistral-7b', RAW_TEXT) as token_count
FROM SKO_SKORAGHOP_LIVE_PROD.HOP.PARSED_TEXT;

In [None]:
-- Use Snowflake's new SPLIT_TEXT_RECURSIVE_CHARACTER feature to chunk parsed text from the PDFs loaded in @SKO_SKORAGHOP_LIVE_PROD.HOP.RAG stage
-- Cortex SPLIT_TEXT_RECURSIVE_CHARACTER documentation link is https://docs.snowflake.com/sql-reference/functions/split_text_recursive_character-snowflake-cortex

-- Complete the missing code (???) to:
---- Create a new table called CORTEX_CHUNK to hold the chunked text from your PDF documents
---- Use Cortex SPLIT_TEXT_RECURSIVE_CHARACTER feature with a 2000 chunk size and 100 overlap size

CREATE OR REPLACE TABLE SKO_SKORAGHOP_LIVE_PROD.HOP.CORTEX_CHUNK AS
WITH text_chunks AS (
    SELECT
        relative_path,
        SNOWFLAKE.CORTEX.SPLIT_TEXT_RECURSIVE_CHARACTER(
            raw_text:content::STRING,  -- Extract the 'content' field from the JSON
            'markdown', -- Adjust to 'markdown' if needed
            2000,       -- Adjust chunk size
            100,        -- Adjust overlap size
            ['\n\n']    -- Adjust separators
        ) AS chunks
    FROM SKO_SKORAGHOP_LIVE_PROD.HOP.PARSED_TEXT
)
SELECT
    relative_path,
    c.value AS chunk  -- Extract each chunk of the parsed text
FROM text_chunks,
LATERAL FLATTEN(INPUT => chunks) c;

In [None]:
-- check the CORTEX_CHUNK to ensure the PDF was chunked as expected
-- Complete the missing code (???) to check the CORTEX_CHUNK to ensure the PDF was chunked as expected for the PDF called "RAGWithoutAugmentation.pdf"

SELECT *, SNOWFLAKE.CORTEX.COUNT_TOKENS('mistral-7b', CHUNK) as token_count
FROM SKO_SKORAGHOP_LIVE_PROD.HOP.CORTEX_CHUNK 
WHERE RELATIVE_PATH ILIKE 'RAGWithoutAugmentation.pdf';

## Step 2: Create Cortex Search Service
Next, we create a [Cortex Search Service](https://docs.snowflake.com/LIMITEDACCESS/cortex-search/cortex-search-overview#overview) that enables retrieval of relevant text chunks for any query. This service uses the CHUNK column from the chunked table as the indexed content.

Purpose: **Index and search chunked content to support the RAG pipeline.**

Command:
```sql
CREATE OR REPLACE CORTEX SEARCH SERVICE SKO.HOP.RAG_SEARCH_SERVICE ON SEARCH_COL WAREHOUSE = COMPUTE_WH TARGET_LAG = '1 day' AS SELECT  ...;
```

In [None]:
# Define image in a stage and read the file
image=session.file.get_stream("@SKO_SKORAGHOP_LIVE_PROD.HOP.RAG/CortexSearch.jpg", decompress=False).read() 
st.image(image, width=800)

In [None]:
-- Create a search service over your new chunked pdf table that has one searchable text
-- Cortex Search Service documentation link is https://docs.snowflake.com/LIMITEDACCESS/cortex-search/cortex-search-overview#overview

-- Complete the missing code (???) to:
---- Create a search service called SKO_SKORAGHOP_LIVE_PROD.HOP.RAG_SEARCH_SERVICE to run over your new chunked pdf table
---- Queries to the service will search on a new column called SEARCH_COL 
---- Use an x-small warehouse
---- Use a target_lag of 365 days
---- SEARCH_COL is the name of the concatenation of RELATIVE_PATH and CHUNK from the CORTEX_CHUNK table

CREATE OR REPLACE CORTEX SEARCH SERVICE SKO_SKORAGHOP_LIVE_PROD.HOP.RAG_SEARCH_SERVICE
    ON SEARCH_COL
    WAREHOUSE = EBOTWICK
    TARGET_LAG = '365 days'
    AS SELECT 
        RELATIVE_PATH,
        CHUNK,
    (RELATIVE_PATH || ' ' || CHUNK) AS SEARCH_COL
FROM SKO_SKORAGHOP_LIVE_PROD.HOP.CORTEX_CHUNK;

## Step 3: Test Search Results with Experimental Configurations
We will now evaluate [Snowflake Cortex Experimental Knobs](https://docs.google.com/document/d/1HkHtDiY3CmzpSewCe_s9fpMNE5spOUvNSwr6CxFerqE/edit?usp=sharing) to fine-tune the retrieval service and analyze confidence scores and result rankings across configurations. These tests focus on boosting, recency, headers, and reranking to optimize search relevance.

**Configurations Tested:**
- **Boosted vs. Unboosted:** Compare the impact of keyword emphasis on rankings and scores.
- **Time-Based Decays:** Test how prioritizing recent documents affects relevance.
- **Header Boosts:** Evaluate the influence of structured headers (e.g., Markdown) on ranking.
- **Reranked vs. Non-Reranked:** Analyze trade-offs between query latency and search quality.

**Key Metrics:**
- **Confidence Scores:** Global relevance scores (0–3) for each result.
- **Result Rankings:** Position changes reveal the effectiveness of configurations.

By testing these configurations, we aim to enhance Cortex Search Service performance for specific use cases.

In [None]:
# Define image in a stage and read the file
image=session.file.get_stream("@SKO_SKORAGHOP_LIVE_PROD.HOP.RAG/CortexSearchEnhancements.jpg", decompress=False).read() 
st.image(image, width=800)

In [None]:
-- This query compares Cortex Search Service results across multiple experimental settings: 
---- boosted (using softBoosts), header boosted, and unboosted (default settings).

-- The results are presented side by side to analyze the impact of each configuration on confidence scores and document ranking for matching search columns.
-- This analysis helps evaluate the effectiveness of boosting and decay strategies in improving search relevance and recency-based ranking.

-- Missing code (???) has been completed to:
---- Call the SKO_SKORAGHOP_LIVE_PROD.HOP.RAG_SEARCH_SERVICE to test experimental configurations.
---- Use the query: "How can I augment my LLM prompts with relevant context in Snowpark?"
---- For the boosted_results section, apply softBoosts using the phrases "Augment" and "RAG."
---- Enable returnConfidenceScores to true for all configurations.

WITH boosted_results AS (
    SELECT DISTINCT
        VALUE:"SEARCH_COL"::STRING AS SearchColumn,
        VALUE:"@CONFIDENCE_SCORE"::STRING AS ConfidenceScore
    FROM (
        SELECT PARSE_JSON(
            SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
                'SKO_SKORAGHOP_LIVE_PROD.HOP.RAG_SEARCH_SERVICE',
                '{
                    "query": "How can I augment my llm prompts with relevant context in snowpark?",
                    "limit": 3,
                    "experimental": {
                        "softBoosts": [
                            { "phrase": "Augment" },
                            { "phrase": "RAG" }
                        ],
                        "reranker": "none",
                        "returnConfidenceScores": true
                    }
                }'
            )
        ) AS boosted_json
    ),
    LATERAL FLATTEN(input => boosted_json:"results")
),
header_boosted_results AS (
    SELECT DISTINCT
        VALUE:"SEARCH_COL"::STRING AS SearchColumn,
        VALUE:"@CONFIDENCE_SCORE"::STRING AS ConfidenceScore
    FROM (
        SELECT PARSE_JSON(
            SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
                'SKO_SKORAGHOP_LIVE_PROD.HOP.RAG_SEARCH_SERVICE',
                '{
                    "query": "How can I augment my llm prompts with relevant context in snowpark?",
                    "limit": 3,
                    "experimental": {
                        "headerBoost": {
                            "multiplier": 2,
                            "skipStopWords": true
                        },
                        "reranker": "none",
                        "returnConfidenceScores": true
                    }
                }'
            )
        ) AS header_boosted_json
    ),
    LATERAL FLATTEN(input => header_boosted_json:"results")
),
unboosted_results AS (
    SELECT DISTINCT
        VALUE:"SEARCH_COL"::STRING AS SearchColumn,
        VALUE:"@CONFIDENCE_SCORE"::STRING AS ConfidenceScore
    FROM (
        SELECT PARSE_JSON(
            SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
                'SKO_SKORAGHOP_LIVE_PROD.HOP.RAG_SEARCH_SERVICE',
                '{
                    "query": "How can I augment my llm prompts with relevant context in snowpark?",
                    "limit": 3,
                    "experimental": {
                        "returnConfidenceScores": true
                    }
                }'
            )
        ) AS unboosted_json
    ),
    LATERAL FLATTEN(input => unboosted_json:"results")
)
SELECT 
    COALESCE(b.SearchColumn, hb.SearchColumn, u.SearchColumn) AS SearchColumn,
    b.ConfidenceScore AS BoostedConfidenceScore,
    hb.ConfidenceScore AS HeaderBoostedConfidenceScore,
    u.ConfidenceScore AS UnboostedConfidenceScore
FROM
    boosted_results b
FULL OUTER JOIN header_boosted_results hb
    ON b.SearchColumn = hb.SearchColumn
FULL OUTER JOIN unboosted_results u
    ON COALESCE(b.SearchColumn, hb.SearchColumn) = u.SearchColumn
ORDER BY 
    CASE WHEN BoostedConfidenceScore IS NULL THEN 1 ELSE 0 END, 
    BoostedConfidenceScore DESC;

## Step 4: Pass Retrieved Content to LLMs
This step demonstrates how to pass retrieved contextual content to various LLMs using the Snowflake Cortex [`COMPLETE`](https://docs.snowflake.com/en/sql-reference/functions/complete-snowflake-cortex) function. The process includes:

- **Retrieving Contextual Information**: Context is fetched from the search service.
- **Generating Structured Prompts**: The retrieved context is injected into prompts for LLMs.
- **LLM Interaction**: Prompts are passed to models like `mistral-7b`, `mistral-large2`, and `Anthropic Claude 3.5` for response generation.
- **Comparative Analysis**: Model outputs are compared for quality, relevance, and coherence.

Example Query:
```sql
SELECT SNOWFLAKE.CORTEX.COMPLETE(
    'claude-3-5-sonnet',
    CONCAT('Your context: ', (SELECT LISTAGG(CHUNK, ' ') FROM searchresults))
) AS RESPONSE
FROM searchresults;
```

In [None]:
# Define image in a stage and read the file
image=session.file.get_stream("@SKO_SKORAGHOP_LIVE_PROD.HOP.RAG/CortexSearch_Complete.jpg", decompress=False).read() 
st.image(image, width=800)

**Queries to test the capabilities of the LLMs based on the PDF content:**
- What is the difference between semantic and lexical searches? Does a hybrid system exist?
- How can we optimize context retrieval in retrievel agumented geneartion for an LLM system?"'
- Can I use SQL in Snowflake to retrieve relevant context for my GPT prompt?
- What service runs fuzzy-search to retrieve context in Snowflake?

In [None]:
# Query your Snowflake Cortex Search Service using the Snowpark Python API to retrieve and process search results.

# Complete the missing code (???) to:
## Specify your database 'SKO_SKORAGHOP_LIVE_PROD', your schema 'HOP', and your Cortex Search Service named 'RAG_SEARCH_SERVICE'
## Specify your SEARCH_COL as the column of interest

from snowflake.snowpark import Session
from snowflake.core import Root
root = Root(session)

transcript_search_service = (root
  .databases['SKO_SKORAGHOP_LIVE_PROD']
  .schemas['HOP']
  .cortex_search_services['RAG_SEARCH_SERVICE']
)

resp = transcript_search_service.search(
  query="""How does Snowflake simplify the deployment of retrieval-augmented generation (RAG) workflows?""",
  columns=['SEARCH_COL'],
  limit=3
)
results = resp.results

context_str = ""
for i, r in enumerate(results):
    context_str += f"Context document {i+1}: {r['SEARCH_COL']}\n****************\n"

print(context_str)
df = session.create_dataframe(resp.results)
df.create_or_replace_temp_view("searchresults")

In [None]:
# Define image in a stage and read the file
image=session.file.get_stream("@SKO_SKORAGHOP_LIVE_PROD.HOP.RAG/Claude35Sonnet.jpg", decompress=False).read() 

# Display the image
st.image(image, width=800)

In [None]:
-- Create a temporary table with the LLM responses

-- Complete the missing code (???) to:
---- Create a TEMPORARY table called LLMResults
---- Use mistral-7b for the MISTRAL_7B (first model)
---- Use mistral-large2 for the MISTRAL_LARGE2 (second model)
---- Use the Anthropic model (claude-3-5-sonnet) for the CLAUDE_35 (third model)

CREATE OR REPLACE TEMPORARY TABLE LLMResults AS
WITH PROMPT_TEXT AS (
  SELECT CONCAT(
    'You are a helpful AI assistant specialized in assisting Sales Engineers...',
    (SELECT LISTAGG(SEARCH_COL, ' ') FROM searchresults),
    ' Focus on key points and avoid unnecessary details.'
  ) AS P
)
SELECT 
   SNOWFLAKE.CORTEX.COMPLETE('mistral-7b', (SELECT P FROM PROMPT_TEXT)) AS MISTRAL_7B,
   SNOWFLAKE.CORTEX.COMPLETE('mistral-large2', (SELECT P FROM PROMPT_TEXT)) AS MISTRAL_LARGE2,
   SNOWFLAKE.CORTEX.COMPLETE('claude-3-5-sonnet', (SELECT P FROM PROMPT_TEXT)) AS CLAUDE_35;

In [None]:
df = session.sql("SELECT * FROM LLMResults").to_pandas()
st.subheader("Output for Mistral-7b LLM")
mistral_7b_value = df.iloc[0]["MISTRAL_7B"]
st.code(mistral_7b_value, language="text")

In [None]:
st.subheader("Output for Mistral-Large2 LLM")
mistral_7b_value = df.iloc[0]["MISTRAL_LARGE2"]
st.code(mistral_7b_value, language="text")

In [None]:
st.subheader("Output for Anthropic Claude 3.5 Sonnet LLM")
claude_rag = df.iloc[0]["CLAUDE_35"]
st.code(claude_rag, language="text")

## Step 5: Create RAG Application Classes

In this step, we will create two Python classes to build a Retrieval-Augmented Generation (RAG) pipeline:

1. **`CortexSearchRetriever`**:
   - This class interacts with the Cortex Search Service to retrieve relevant contextual information based on a user query.
   - It connects to the Cortex Search Service using Snowflake's `Root` object and performs a search with the specified query and result limit.
   - The retrieved context (a list of relevant chunks) will be used to generate prompts for LLMs.

2. **`RAGWithObservability`**:
   - This class integrates the retrieval functionality with a specified Large Language Model (LLM) to complete the RAG pipeline.
   - It uses the retriever to fetch context, creates a structured prompt by combining the context with the user query, and generates a response using the Snowflake Cortex `COMPLETE` function.
   - The class allows testing of different LLMs (e.g., `llama3.1-8b`, `mistral-7b`, `claude-3-5-sonnet`) by specifying the desired model.
    - In the below cells you'll notice an **@instrument** decorator above each function in our RAGWithObservability class
    - This tells Trulens which stages of our application we want to track so we can understand how data flows through our application
    - For example if our query takes 10s to run - what portion of that 10s was spent on retrieval? On prompt augmentation? On completion generation?
        - This becomes increasingly important for complex GenAI applications (i.e. multi-agent apps)
    - After we pass prompts through our trulens recorder we will inspect these traces and spans!

### Workflow Summary:
1. The `CortexSearchRetriever` retrieves relevant context from the Cortex Search Service.
2. The `RAGWithObservability` uses this context to create prompts and generate responses with the specified LLM.

These two classes work together to streamline the RAG pipeline, enabling efficient retrieval and response generation for various use cases.

In [None]:
# Define the retriever class for interacting with the Cortex Search Service

# Complete the missing code (???) to:
## Specify your database 'SKO_SKORAGHOP_LIVE_PROD', your schema 'HOP', and your Cortex Search Service named 'RAG_SEARCH_SERVICE'
## Specify your SEARCH_COL as the column of interest
## Intialize retriever with your CortexSearchRetriever class
## Use "What are some components of the Snowflake Cortex offering? How do they work?" for the test_query

from typing import List
from snowflake.snowpark import Session
from snowflake.core import Root

# CortexSearchRetriever
class CortexSearchRetriever:
    def __init__(self, session: Session, limit_to_retrieve: int = 4):
        self._session = session
        self._limit_to_retrieve = limit_to_retrieve
        

    def retrieve(self, query: str) -> List[str]:
        root = Root(session)
        cortex_search_service = (
            root
            .databases["SKO_SKORAGHOP_LIVE_PROD"]
            .schemas["HOP"]
            .cortex_search_services["RAG_SEARCH_SERVICE"]
        )
        resp = cortex_search_service.search(
            query=query,
            columns=["SEARCH_COL"],
            limit=self._limit_to_retrieve,
        )
        return [row["SEARCH_COL"] for row in resp.results] if resp.results else []

# Initialize the retriever
retriever = CortexSearchRetriever(session=session, limit_to_retrieve=3)
test_query = "What are some components of the Snowflake Cortex offering? How do they work?"
retrieved_context = retriever.retrieve(query=test_query)
print(retrieved_context)

In [None]:
import os
os.environ["TRULENS_OTEL_TRACING"] = "1"

In [None]:
# Create the RAGWithObservability class to structure the RAG pipeline
from snowflake.cortex import complete
from trulens.core.otel.instrument import instrument
from trulens.otel.semconv.trace import SpanAttributes


class RAGWithObservability():
    def __init__(self, llm_model, retriever):
        self.llm_model = llm_model
        self.retriever = retriever
        
#Here we're using the @instrument decorator to trace various stages of our RAG applicaiton
    @instrument (
        span_type=SpanAttributes.SpanType.RETRIEVAL, 
        attributes={
            SpanAttributes.RETRIEVAL.QUERY_TEXT: "query",
            SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS: "return",
        })  
    def retrieve_context(self, query: str) -> List[str]:
        return self.retriever.retrieve(query)

    @instrument()
    def augment_prompt(self, query: str, contexts: list) -> str:
     
        prompt = f"""
        You are an expert assistant extracting information from context provided.
        Answer the question based on the context. Be concise and do not hallucinate.
        If you don't have the information, just say so.
        Context: {' '.join(contexts)}
        Question: {query}
        Answer:
        """
        return prompt


    @instrument (span_type=SpanAttributes.SpanType.GENERATION)    
    def generate_completion(self, query: str):
        
        df_response = complete(self.llm_model, query)
        return df_response


    @instrument (
        span_type=SpanAttributes.SpanType.RECORD_ROOT, 
        attributes={
            SpanAttributes.RECORD_ROOT.INPUT: "query",
            SpanAttributes.RECORD_ROOT.OUTPUT: "return",
        })
    def query_app(self, query: str) -> str:
        contexts = self.retrieve_context(query)
        prompt = self.augment_prompt(query, contexts)
        final_response = self.generate_completion(prompt)
        return final_response

In [None]:
import streamlit as st

#Define LLM classes
llama_rag = RAGWithObservability('llama3.1-8b', retriever)
mistral7b_rag = RAGWithObservability('mistral-7b', retriever)
claude_rag = RAGWithObservability('claude-3-5-sonnet', retriever)

#print Query
print(f"Query: {test_query}")

#Get and print responses
llama_response = llama_rag.query_app(test_query)
st.write(f"**Llama response** -  {llama_response} \n")

mistral_response = mistral7b_rag.query_app(test_query)
st.write(f"**Mistral-7b response** - {mistral_response} \n")

claude_response = claude_rag.query_app(test_query)
st.write(f"**Claude response** -  {claude_response} \n")

## Step 6: Observe and Evaluate LLM Performance with AI Observability (powered by TruLens)

**Adding Observability and Evaluataion to our RAG application**

Here, we enhance the Retrieval-Augmented Generation (RAG) process by introducing observability. Observability ensures that LLM responses can be measured and evaluated based on various feedback metrics, providing insights into the model's performance and areas for improvement.

**How This Works**

We will use a feature called AI Observability to register our recently created applications in Snowflake. This will allow users to pass in prompts to these applications, and trace each step the application takes to Retrieve appropriate context, Augment a system prompt with additional context and Generate a complete answer for the given prompt. 

From there we will use LLM-as-a-Judge based evaluations to measure LLM performance based on **feedback metrics** including:
- **Answer Relevance** - Evaluates how directly the LLM's response addresses the user's prompt.
- **Context Relevance** - Assesses the relevance of the retrieved context to the user's prompt.
- **Groundedness**  - Measures how well the LLM's response is anchored in the retrieved context.
- **Coherance** - Evaluates how logically structured and easy to follow the LLM's response is.

In [None]:
# Define image in a stage and read the file
image=session.file.get_stream("@SKO_SKORAGHOP_LIVE_PROD.HOP.RAG/AIObservability.jpg", decompress=False).read() 

# Display the image
st.image(image, width=800)

In [None]:
# from trulens.core import TruSession
from trulens.apps.app import TruApp
from trulens.connectors.snowflake import SnowflakeConnector

tru_snowflake_connector = SnowflakeConnector(snowpark_session=session)

app_name = "test_sko_app_update"
version_num = 'v0'

tru_rag_mistral = TruApp(
    mistral7b_rag,
    app_name=app_name,
    app_version=f"mistral_test_{version_num}",
    connector=tru_snowflake_connector
)

tru_rag_llama = TruApp(
    llama_rag,
    app_name=app_name,
    app_version=f"llama_test_{version_num}",
    connector=tru_snowflake_connector
)

tru_rag_claude = TruApp(
    claude_rag,
    app_name=app_name,
    app_version=f"claude_test_{version_num}",
    connector=tru_snowflake_connector
)

In [None]:
import pandas as pd

prompts = [
    "What are some metrics to measure the quality of a retrieval system?",
    "Can I have a back-and-forth conversation with Cortex?",
    "Does Snowflake support text-to-sql? What services would support this?",
    "What year was the war of 1812?",
    "Tell me a story about Snowflake Cortex"
]


batch_data = pd.DataFrame({'QUERY': prompts})
batch_data

In [None]:
from trulens.core.run import Run
from trulens.core.run import RunConfig

mistral_run_config = RunConfig(
    run_name=f"mistral_exp_{version_num}",
    description="questions about snowflake AI cababilities",
    dataset_name="SNOW_RAG_DF1",
    source_type="DATAFRAME",
    label="MISTRAL",
    llm_judge_name = "llama3.1-70b",
    dataset_spec={
        "RECORD_ROOT.INPUT": "QUERY",
    },
)



llama_run_config = RunConfig(
    run_name=f"llama_exp_{version_num}",
    description="questions about snowflake AI cababilities",
    dataset_name="SNOW_RAG_DF1",
    source_type="DATAFRAME",
    label="LLAMA",
    dataset_spec={
        "RECORD_ROOT.INPUT": "QUERY",
    },
    
)


claude_run_config = RunConfig(
    run_name=f"claude_exp_{version_num}",
    description="questions about snowflake AI cababilities",
    dataset_name="SNOW_RAG_DF1",
    source_type="DATAFRAME",
    label="CLAUDE",
    dataset_spec={
        "RECORD_ROOT.INPUT": "QUERY",
    },
    
)

In [None]:
mistral_run = tru_rag_mistral.add_run(run_config=mistral_run_config)

llama_run = tru_rag_llama.add_run(run_config=llama_run_config)

claude_run = tru_rag_claude.add_run(run_config=claude_run_config)

In [None]:
mistral_run.start(input_df=batch_data)
print("Finished mistral run")

In [None]:
llama_run.start(input_df=batch_data)
print("Finished Llama run")

In [None]:
claude_run.start(input_df=batch_data)
print("Finished Claude run")

In [None]:
print(f"Mistral: {mistral_run.get_status()}")
print(f"Llama: {llama_run.get_status()}")
print(f"Claude: {claude_run.get_status()}")

In [None]:
#The following code kicks off LLM-as-a-Judge evals for several metrics

mistral_run.compute_metrics([
    "coherence",
    "answer_relevance",
    "context_relevance",
    "groundedness",
])

In [None]:
#The following code kicks off LLM-as-a-Judge evals for several metrics

llama_run.compute_metrics([
    "coherence",
    "answer_relevance",
    "context_relevance",
    "groundedness",
])

In [None]:
#The following code kicks off LLM-as-a-Judge evals for several metrics

claude_run.compute_metrics([
    "coherence",
    "answer_relevance",
    "context_relevance",
    "groundedness",
])

In [None]:
print(f"Mistral: {mistral_run.get_status()}")
print(f"Llama: {llama_run.get_status()}")
print(f"Claude: {claude_run.get_status()}")

In [None]:
import streamlit as st

org_name = session.sql('SELECT CURRENT_ORGANIZATION_NAME()').collect()[0][0]
account_name = session.sql('SELECT CURRENT_ACCOUNT_NAME()').collect()[0][0]
db_name = session.sql('SELECT CURRENT_DATABASE()').collect()[0][0]
schema_name = session.sql('SELECT CURRENT_SCHEMA()').collect()[0][0]

st.write(f'https://app.snowflake.com/{org_name}/{account_name}/#/ai-evaluations/databases/{db_name}/schemas/{schema_name}/applications/{app_name.upper()}')

In [None]:
# Define image in a stage and read the file
image=session.file.get_stream("@SKO_SKORAGHOP_LIVE_PROD.HOP.RAG/AIObsApp.jpg", decompress=False).read() 

# Display the image
st.image(image, width=800)

In [None]:
image1=session.file.get_stream("@SKO_SKORAGHOP_LIVE_PROD.HOP.RAG/Anthropic.jpg", decompress=False).read() 
st.image(image1, width=800)
image2=session.file.get_stream("@SKO_SKORAGHOP_LIVE_PROD.HOP.RAG/Summary2.jpg", decompress=False).read() 
st.image(image2, width=800)