## RAG with Snowflake Cortex
This is a very simple demo to highlight the Cortex capabilties to build a RAG demo with a Strealit front end.
This demo takes a PDF file, chunks it down, and uses it for RAG.
It requires pydpf2 for the chunking. I had to install pypdf2-2.10.5 into my local evnironment.
The [notebook](https://github.com/sfc-gh-tchristian/snowflake-rag) comes from Tom Christian, also check out the full [article](https://medium.com/@thechosentom/rag-made-simple-with-snowflake-cortex-74d1df5143fd)

Another RAG demo that uses 10-Ks is frem Jeremy, check out the [article](https://medium.com/@jeremyjgriffith/retrieval-augmented-generation-rag-application-using-snowflake-cortex-and-streamlit-9cb261e81c2e) and the [Github](https://github.com/jeremyjgriffith/cortex-rag-application)

If you are looking to do RAG in Snowflake, instead of building it from parts, go use our [Cortex Search](https://docs.snowflake.com/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) that provides a managed search for RAG. 

In [1]:
# Import python packages & establish session
import pandas as pd
from PyPDF2 import PdfFileReader
from snowflake.snowpark.files import SnowflakeFile
from io import BytesIO
from snowflake.snowpark.types import StringType, StructField, StructType
from langchain.text_splitter import RecursiveCharacterTextSplitter
import json

In [2]:
from snowflake.snowpark.session import Session
from snowflake import connector
from snowflake.ml.utils import connection_params

In [3]:
with open('../creds.json') as f:
    data = json.load(f)
    USERNAME = data['user']
    PASSWORD = data['password']
    SF_ACCOUNT = data['account']
    SF_WH = data['warehouse']

CONNECTION_PARAMETERS = {
   "account": SF_ACCOUNT,
   "user": USERNAME,
   "password": PASSWORD,
}

session = Session.builder.configs(CONNECTION_PARAMETERS).create()

#from snowflake.snowpark.context import get_active_session
#session = get_active_session()


RAG Made Easy w/ Snowflake Cortex
========

Creating an end-to-end Retrieval Augmented Generation process (or RAG) directly in Snowflake.
1) Extract full text from PDF files using Snowpark.
2) Chunk those documents using Langchain in Snowpark.
3) Use Cortex to create embeddings of those chunks.
4) Use Vector Similarity to show the most similar chunk when prompting an LLM.

Have a stage for the pdfs - you will have to change from the stage name I used

In [5]:
#Create a Snowpark based function to extract text from PDFs

def readpdf(file_path):
    whole_text = ""
    with SnowflakeFile.open(file_path, 'rb') as file:
        f = BytesIO(file.readall())
        pdf_reader = PdfFileReader(f)
        whole_text = ""
        for page in pdf_reader.pages:
            whole_text += page.extract_text()
    return whole_text

In [6]:
#Register the UDF. 
#Optional : Convert the cell to markdown to prevent rerunning later.
session.udf.register(
    func = readpdf
  , return_type = StringType()
  , input_types = [StringType()]
  , is_permanent = True
  , name = 'SNOWPARK_PDF'
  , replace = True
  , packages=['snowflake-snowpark-python','pypdf2']
  , stage_location = 'RAJIV.PUBLIC.RAJ_UDFS'  ###set this to your stage
)

<snowflake.snowpark.udf.UserDefinedFunction at 0x169794610>

In [42]:
##Make sure you Stage has directory enable - one click in the UI
query = """CREATE OR REPLACE TABLE RAW_TEXT AS
SELECT
    relative_path
    , file_url
    , snowpark_pdf(build_scoped_file_url(@RAJ_UDFS, relative_path)) as raw_text
from directory(@RAJ_UDFS);"""

In [43]:
session.sql(query).collect()

[Row(status='Table RAW_TEXT successfully created.')]

In [None]:
query = """ --Optional : This will fail due to tokens exceeding limit, which means we need to chunk!
SELECT
SNOWFLAKE.ML.COMPLETE('llama2-7b-chat',CONCAT('summarise the following text',raw_text)) 
FROM
RAW_TEXT
LIMIT 1;"""

session.sql(query).collect()

##This should fail if you are using a large document

A note on chunking
-----
Chunking is the process of splitting a large body of text into smaller 'chunks' whilst attempting to keep as much relevant information as possible. Make the chunks too small and you run the risk of removing key information that the model requires to answer the question. Too large and it may be harder to retreive the correct body of text from the vector search - or spend tokens excessively.

There are many strategies towards chunking. Eg - pass the most relevant, top n relevant chunks, or pass the most relevent chunk + the chunk either side of that one. Play around and see what works for your use case!


In [45]:
#A class for chunking text and returning a table via UDTF
class text_chunker:

    def process(self,text):        
        text_raw=[]
        text_raw.append(text) 
        
        text_splitter = RecursiveCharacterTextSplitter(
            separators = ["\n"], # Define an appropriate separator. New line is good typically!
            chunk_size = 1000, #Adjust this as you see fit
            chunk_overlap  = 50, #This let's text have some form of overlap. Useful for keeping chunks contextual
            length_function = len,
            add_start_index = True #Optional but useful if you'd like to feed the chunk before/after
        )
    
        chunks = text_splitter.create_documents(text_raw)
        df = pd.DataFrame(chunks, columns=['chunks','meta'])
        
        yield from df.itertuples(index=False, name=None)


In [46]:
#Register the UDTF - set the stage location

schema = StructType([
     StructField("chunk", StringType()),
    StructField("meta", StringType()),
 ])

session.udtf.register( 
    handler = text_chunker,
    output_schema= schema, 
    input_types = [StringType()] , 
    is_permanent = True , 
    name = 'CHUNK_TEXT' , 
    replace = True , 
    packages=['pandas','langchain'], stage_location = 'RAJIV.PUBLIC.RAJ_UDFS' )

<snowflake.snowpark.udtf.UserDefinedTableFunction at 0x149926d90>

In [47]:
query = """

--Create the chunked version of the table
CREATE OR REPLACE TABLE CHUNK_TEXT AS
SELECT
        relative_path,
        func.*
    FROM raw_text AS raw,
         TABLE(chunk_text(raw_text)) as func;

"""

session.sql(query).collect()

[Row(status='Table CHUNK_TEXT successfully created.')]

In [48]:
query = """
--Convert your chunks to embeddings
CREATE OR REPLACE TABLE VECTOR_STORE AS
SELECT
RELATIVE_PATH as EPISODE_NAME,
CHUNK AS CHUNK,
snowflake.ml.embed_text('e5-base-v2', chunk) as chunk_embedding
FROM CHUNK_TEXT;
"""

session.sql(query).collect()

[Row(status='Table VECTOR_STORE successfully created.')]

In [49]:
query = """
--Vector distance allows use to find the most similar chunk to a question
SELECT EPISODE_NAME, CHUNK from RAJIV.PUBLIC.VECTOR_STORE 
            ORDER BY VECTOR_L2_DISTANCE(
            snowflake.ml.embed_text('e5-base-v2', 
            'What is Snowflake doing of AI/ML?'
            ), CHUNK_EMBEDDING
            ) limit 1
        ;
"""

session.sql(query).collect()

[Row(EPISODE_NAME='TRANSCRIPT_-Snowflake-Inc-SNOW-US-Q3-2024-Earnings-Call-29-November-2023-5_00-PM-ET.pdf', CHUNK='(\'page_content\', "Meanwhile, Snowflake has announced and showcased a plethora of new technologies that let customers mobilize \\nAI. We\'ve introduced Snowflake Cortex to leverage AI and machine learning on Snowflake. Cortex is a manage d \\nservice for inferencing large language models. This opens up direct access to models and specialized operations \\nlike translation, sentiment and vector functions. Business analysts and data engineers can now use AI \\nfunctionality without the fractured high ly technical challenges of the AI landscape.  \\n \\nLast summer, we introduced Snowpark Container Services, which also serves as the second pillar of our AI \\nenablement strategy. Developers can access any language, any library and flexible hardware inside the \\ngovernance boundary of Snowflake. More than 70 customers are already using Container Services in preview, \\nwith

In [53]:
query = """
--Pass the chunk we need along with the prompt to get a better structured answer from the LLM
SELECT snowflake.ml.complete(
    'llama2-7b-chat', 
     CONCAT( 
        'Answer the question based on the context. Be concise.','Context: ',
        (
            SELECT chunk FROM vector_Store 
            ORDER BY vector_l2_distance(
            snowflake.ml.embed_text('e5-base-v2', 
            'What is Snowflake doing for AI/ML?'
            ), chunk_embedding
            ) LIMIT 1
        ),
        'Question: What is Snowflake doing for AI/ML?',
        'Answer: '
    )
) as response;
"""

response = session.sql(query).collect()

In [54]:
response

[Row(RESPONSE=' Based on the context, Snowflake is providing several new technologies to enable AI and machine learning (ML) for its customers. These include:\n\n1. Snowflake Cortex: A managed service for inferencing large language models, providing direct access to models and specialized operations like translation, sentiment, and vector functions.\n2. Snowpark Container Services: A platform that allows developers to access any language, any library, and flexible hardware within the governance boundary of Snowflake, making it easier for business analysts and data engineers to use AI functionality without technical challenges.')]

Streamlit app to use inside Snowflake

In [None]:
## Stramlit Code

import streamlit as st # Import python packages
from snowflake.snowpark.context import get_active_session
session = get_active_session() # Get the current credentials

st.title("Ask Your Data Anything :snowflake:")
st.write("""Built using end-to-end RAG in Snowflake with Cortex functions.""")

model = st.selectbox('Select your model:',('llama2-70b-chat','mistral-7b'))

prompt = st.text_input("Enter prompt", placeholder="What makes time perceived to be slower?", label_visibility="collapsed")

quest_q = f'''
select snowflake.ml.complete(
    '{model}', 
    concat( 
        'Answer the question based on the context. Be concise.','Context: ',
        (
            select chunk from LLM_DEMO.PODCASTS.VECTOR_STORE 
            order by vector_l2_distance(
            snowflake.ml.embed_text('e5-base-v2', 
            '{prompt}'
            ), chunk_embedding
            ) limit 1
        ),
        'Question: ', 
        '{prompt}',
        'Answer: '
    )
) as response;
'''

if prompt:
    df_query = session.sql(quest_q).to_pandas()
    st.write(df_query['RESPONSE'][0])