## Task-specific [Snowflake Cortex](https://www.snowflake.com/en/data-cloud/cortex/) LLM SQL Functions & [Cortex Search](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-search/cortex-search-overview#overview) Demo
## Find the documentation for all task-specific LLM SQL functions [here](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions#task-specific-functions)
## You can learn and test the following functions in this Snowflake Notebook.  
- ### [CLASSIFY_TEXT](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions#classify-text)
- ### [COMPLETE](https://docs.snowflake.com/en/sql-reference/functions/complete-snowflake-cortex)
- ### [EXTRACT_ANSWER](https://docs.snowflake.com/en/sql-reference/functions/extract_answer-snowflake-cortex)
- ### [SENTIMENT](https://docs.snowflake.com/en/sql-reference/functions/sentiment-snowflake-cortex)
- ### [SUMMARIZE](https://docs.snowflake.com/en/sql-reference/functions/summarize-snowflake-cortex)
- ### [CORTEX SEARCH SERVICE](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-search/overview-tutorials)

# Learning Goals
- #### Understand the end-to-end workflow to process and gain insights from unstructured text data. 
- #### We will analyze various company's press releases using Snowflake Cortex LLM functions and Cortex Search.
- #### Learn how to use the Snowflake Cortex LLM SQL functions.
- #### Quickly analyze data by bringing AI to your data rather than moving data to an external AI service. 
- #### [Snowflake Cortex](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions) gives you instant access to industry-leading large language models (LLMs) trained by researchers at companies like Anthropic, Mistral, Reka, Meta, and Google, including Snowflake Arctic, an open enterprise-grade model developed by Snowflake.
- #### These LLMs are fully hosted and managed by Snowflake, using them requires no setup. 
- #### Your data stays within Snowflake, giving you the performance, scalability, and governance you expect.
- #### Get answers to questions about companies by searching press releases using Cortex Search.

# Text Classification
## [CLASSIFY_TEXT](https://docs.snowflake.com/en/sql-reference/functions/classify_text-snowflake-cortex): Given a piece of text, classifies it into one of the categories that you define.

## We will create a [transient table](https://docs.snowflake.com/en/user-guide/tables-temp-transient#transient-tables) in Snowflake with some data we wish to classify under the categories we define. 

In [None]:
CREATE OR REPLACE TRANSIENT TABLE text_classification_table AS
SELECT 'France' AS input, ['North America', 'Europe', 'Asia'] AS classes
UNION ALL
SELECT 'Singapore', ['North America', 'Europe', 'Asia']
UNION ALL
SELECT 'one day I will see the world', ['travel', 'cooking', 'dancing']
UNION ALL
SELECT 'my lobster bisque is second to none', ['travel', 'cooking', 'dancing'];

In [None]:
SELECT input,
       classes,
       SNOWFLAKE.CORTEX.CLASSIFY_TEXT(input, classes)['label'] as classification
FROM text_classification_table;

- All other examples in this Notebook use the text from the press releases. 

# Company Press Release Analysis Using [Snowflake Cortex LLM SQL](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions) Functions. 

### To test the other Snowflake LLM SQL functions, we will use press releases of publicly-listed U.S. companies.
- #### We will load the press release PDF files into a stage in Snowflake.  
- #### We will parse each press release and load the text contents into a table. 
- #### Since we are dealing with long, free-form text that may run to over a dozen pages, we will chunk the text using an LLM SQL function in Snowflake. 
- #### We will apply the LLM SQL functions - SUMMARIZE, EXTRACT_ANSWER, SENTIMENT - on the full or chunked text.  

# Why do we need to chunk text for use by LLM?
### The Merriam-Webster dictionary defines [CHUNK](https://www.merriam-webster.com/dictionary/chunk) as follows:
- #### "a large noteworthy quantity or part" 
- #### You can use the Snowflake Cortex COMPLETE function to understand why LLMs work better with chunked text. 
- #### I have used the Anthropic's Claude Sonnet LLM along with the COMPLETE functions to answer this question. 
- #### Snowflake has a [partnership with Anthropic](https://www.snowflake.com/en/news/press-releases/snowflake-and-anthropic-team-up-to-bring-claude-models-directly-to-the-ai-data-cloud/) to bring their models into Snowflake. 
- #### Customers have access to a wide-range of LLMs with their Snowflake environment. 
- #### These LLMs are [fully hosted and managed by Snowflake](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions), using them requires no setup. Your data stays within Snowflake, giving you the performance, scalability, and governance you expect.
- #### You can find the complete list of LLMs available in Snowflake over [here](https://docs.snowflake.com/en/sql-reference/functions/complete-snowflake-cortex)
- #### The COMPLETE functions allows you to pick an LLM you could use with it. 

In [None]:
-- Using claude-3-5-sonnet LLM with Snowflake Cortex COMPLETE.
SELECT SNOWFLAKE.CORTEX.COMPLETE('claude-3-5-sonnet','Why is text chunking necessary for LLM?');

## Why is text chunking necessary for LLM?
### Here's the answer provided by the Snowflake Cortex COMPLETE  function using the Claude-3.5 Sonnet LLM:

Text chunking is necessary for LLMs for several important reasons:

1. #### Context Window Limitations:
- LLMs have a maximum context window size (token limit)
- Long documents need to be broken into smaller pieces to fit within these limits

2. #### Memory Management:
- Processing large texts all at once requires significant computational resources
- Chunking helps manage memory usage more efficiently
- Allows for parallel processing of different chunks

3. #### Information Retrieval:
- Enables better semantic search and retrieval
- Easier to find relevant sections of text
- Helps in building knowledge bases and RAG (Retrieval Augmented Generation) systems

4. #### Better Processing:
- Smaller chunks are easier to process and understand
- Reduces noise and improves focus on relevant information
- Helps maintain coherence in responses

5. #### Cost Efficiency:
- Many LLM APIs charge by token count
- Chunking helps optimize token usage
- Allows processing only relevant portions of text

6. #### Quality of Responses:
- More focused context leads to more accurate responses
- Reduces hallucinations and irrelevant information
- Improves overall output quality

#### Best practices for chunking include:
- Maintaining semantic coherence
- Overlapping chunks to preserve context
- Using appropriate chunk sizes for the task
- Considering document structure

In [None]:
CREATE STAGE COMPANY_EARNINGS_PRESS_RELEASES_STG 
	DIRECTORY = ( ENABLE = true ) 
	ENCRYPTION = ( TYPE = 'SNOWFLAKE_SSE' );

## Instructions to upload the Earnings Press Release PDF files into the stage.

- #### You can download the press release PDF files stored as a zip archive from [Github here](https://github.com/rrprasan/Finance/tree/main/Snowflake/Notebooks/Company_Financials/Press_Release_Analysis).
- #### Unzip the files in your local drive. 
- #### Copy the PDF files into the Snowflake internal stage - COMPANY_EARNINGS_PRESS_RELEASES_STG.

- #### Follow the [following steps](https://docs.snowflake.com/en/user-guide/data-load-local-file-system-stage-ui#upload-files-onto-a-named-internal-stage) in Snowsight to upload the files into the stage.  
#### 1. Sign in to Snowsight.
#### 2. Select Data » Add Data.
#### 3. On the Add Data page, select Load files into a Stage.
#### 4. In the Upload Your Files dialog that appears, select the files that you want to upload. You can upload multiple files at the same time.
#### 5. Select the database schema in which you created the stage, then select the stage.
#### 6. Optionally, select or create a path where you want to save your files within the stage.
#### 7. Select Upload.

#### The following press release PDF files are in the zipped archive:

- ADBE_Q4_FY_2024_Earnings_Press_Release.pdf
- AMZN_Q3_FY_2024_Earnings_Press_Release.pdf
- CRM_Q3_FY_2025_Earnings_Press_Release.pdf
- CSCO_Q1_FY_2025_Earnings_Press_Release.pdf
- GOOGL_Q3_FY_2024_Earnings_Press_Release.pdf
- MSFT_Q1_FY_2025_Earnings_Press_Release.pdf
- NOW_Q3_FY_2024_Earnings_Press_Release.pdf
- ORCL_Q2_FY_2025_Earnings_Press_Release.pdf
- PANW_Q1_FY_2025_Earnings_Press_Release.pdf
- PG_Q1_FY_2025_Earnings_Press_Release.pdf


In [None]:
LIST @COMPANY_EARNINGS_PRESS_RELEASES_STG;

In [None]:
--****************
-- CREATE A SEQ FOR PRESS_RELEASES_TBL. 
-- We will use the Sequence as a ID in the COMPANY_PRESS_RELEASES_TBL.
-- This ID is not essential. 
-- You can remove it from the table definition below. 
--****************
CREATE OR REPLACE SEQUENCE PRESS_RELEASE_ID_SEQ;
SELECT PRESS_RELEASE_ID_SEQ.NEXTVAL;

In [None]:
CREATE OR REPLACE TRANSIENT TABLE COMPANY_PRESS_RELEASES_TBL (
    PRESS_RELEASE_ID NUMBER DEFAULT PRESS_RELEASE_ID_SEQ.NEXTVAL,
    COMPANY_NAME VARCHAR,
    TICKER_SYMBOL VARCHAR(10),
    PRESS_RELEASE_TYPE VARCHAR COMMENT 'PRESS RELEASE TYPE EARNINGS, NEWS',
    PRESS_RELEASE_FISCAL_PERIOD VARCHAR COMMENT 'SHOULD FOLLOW THIS FORMAT: Q1 FY 2025 OR FOR ANNUAL EARNINGS FULL-YEAR FY 2025',
    PRESS_RELEASE_DATE DATE,
    PRESS_RELEASE_CONTENT VARCHAR
);

# PARSE DOCUMENT
#### [PARSE_DOCUMENT](https://docs.snowflake.com/en/user-guide/snowflake-cortex/parse-document)

The PARSE_DOCUMENT function is a Cortex AI task-specific function that gives you the ability to extract text or layout from documents stored in an internal or external stage. PARSE_DOCUMENT combines powerful Optical Character Recognition (OCR) capabilities with machine learning models to identify text content, information stored in tables, and the structural elements of PDF documents. You can use the PARSE_DOCUMENT function to extract text and document layout to build information retrieval systems on large archives of business documents, and to load the extracted information into structured Snowflake tables for use by your applications.

### Parse Amazon's Press Release PDF Stored in the Stage. 

In [None]:
--
-- Parse Amazon's Earnings Press Release 
-- This is a test of the PARSE_DOCUMENT Function. 
-- 
SELECT TO_VARCHAR(SNOWFLAKE.CORTEX.PARSE_DOCUMENT('@COMPANY_EARNINGS_PRESS_RELEASES_STG','AMZN_Q3_FY_2024_Earnings_Press_Release.pdf', {'mode': 'LAYOUT'}):content);

### Parse each PDF File and load the contents into the PRESS_RELEASE_CONTENT column.  
### We have the other columns with the metadata information about each PRESS_RELEASE:
- COMPANY_NAME
- TICKER_SYMBOL
- PRESS_RELEASE_TYPE
- PRESS_RELEASE_FISCAL_PERIOD
- PRESS_RELEASE_DATE
### Process Amazon's Press Release. 

In [None]:
INSERT INTO COMPANY_PRESS_RELEASES_TBL 
    (COMPANY_NAME, 
     TICKER_SYMBOL, 
     PRESS_RELEASE_TYPE, 
     PRESS_RELEASE_FISCAL_PERIOD, 
     PRESS_RELEASE_DATE, 
     PRESS_RELEASE_CONTENT)
SELECT 'AMAZON.COM, INC.', 
        'AMZN', 
        'EARNINGS',
        'Q3 FY 2024', 
        '2024-12-03',       
TO_VARCHAR(SNOWFLAKE.CORTEX.PARSE_DOCUMENT('@COMPANY_EARNINGS_PRESS_RELEASES_STG','AMZN_Q3_FY_2024_Earnings_Press_Release.pdf', {'mode': 'LAYOUT'}):content);

### Process Adobe's Press Release

In [None]:
INSERT INTO COMPANY_PRESS_RELEASES_TBL 
    (COMPANY_NAME, 
     TICKER_SYMBOL, 
     PRESS_RELEASE_TYPE, 
     PRESS_RELEASE_FISCAL_PERIOD, 
     PRESS_RELEASE_DATE, 
     PRESS_RELEASE_CONTENT)
SELECT 'ADOBE INC.', 
        'ADBE', 
        'EARNINGS',
        'Q4 FY 2024', 
        '2024-12-11',       
TO_VARCHAR(SNOWFLAKE.CORTEX.PARSE_DOCUMENT('@COMPANY_EARNINGS_PRESS_RELEASES_STG','ADBE_Q4_FY_2024_Earnings_Press_Release.pdf', {'mode': 'LAYOUT'}):content);

### Process Salesforce's Press Release

In [None]:
INSERT INTO COMPANY_PRESS_RELEASES_TBL 
    (COMPANY_NAME, 
     TICKER_SYMBOL, 
     PRESS_RELEASE_TYPE, 
     PRESS_RELEASE_FISCAL_PERIOD, 
     PRESS_RELEASE_DATE, 
     PRESS_RELEASE_CONTENT)
SELECT 'Salesforce, Inc.', 
        'CRM', 
        'EARNINGS',
        'Q3 FY 2025', 
        '2024-12-03',       
TO_VARCHAR(SNOWFLAKE.CORTEX.PARSE_DOCUMENT('@COMPANY_EARNINGS_PRESS_RELEASES_STG','CRM_Q3_FY_2025_Earnings_Press_Release.pdf', {'mode': 'LAYOUT'}):content);

### Process Cisco System's Press Release

In [None]:
INSERT INTO COMPANY_PRESS_RELEASES_TBL 
    (COMPANY_NAME, 
     TICKER_SYMBOL, 
     PRESS_RELEASE_TYPE, 
     PRESS_RELEASE_FISCAL_PERIOD, 
     PRESS_RELEASE_DATE, 
     PRESS_RELEASE_CONTENT)
SELECT 'CISCO SYSTEMS, INC.', 
        'CSCO', 
        'EARNINGS',
        'Q1 FY 2025', 
        '2024-11-13',       
TO_VARCHAR(SNOWFLAKE.CORTEX.PARSE_DOCUMENT('@COMPANY_EARNINGS_PRESS_RELEASES_STG','CSCO_Q1_FY_2025_Earnings_Press_Release.pdf', {'mode': 'LAYOUT'}):content);

### Process Google's (Alphabet) Press Release

In [None]:
INSERT INTO COMPANY_PRESS_RELEASES_TBL 
    (COMPANY_NAME, 
     TICKER_SYMBOL, 
     PRESS_RELEASE_TYPE, 
     PRESS_RELEASE_FISCAL_PERIOD, 
     PRESS_RELEASE_DATE, 
     PRESS_RELEASE_CONTENT)
SELECT 'Alphabet, Inc.', 
        'GOOGL', 
        'EARNINGS',
        'Q3 FY 2024', 
        '2024-10-29',       
TO_VARCHAR(SNOWFLAKE.CORTEX.PARSE_DOCUMENT('@COMPANY_EARNINGS_PRESS_RELEASES_STG','GOOGL_Q3_FY_2024_Earnings_Press_Release.pdf', {'mode': 'LAYOUT'}):content);

### Process Microsoft Corporation's Press Release

In [None]:
INSERT INTO COMPANY_PRESS_RELEASES_TBL 
    (COMPANY_NAME, 
     TICKER_SYMBOL, 
     PRESS_RELEASE_TYPE, 
     PRESS_RELEASE_FISCAL_PERIOD, 
     PRESS_RELEASE_DATE, 
     PRESS_RELEASE_CONTENT)
SELECT 'Microsoft Corporation', 
        'MSFT', 
        'EARNINGS',
        'Q1 FY 2025', 
        '2024-10-30',       
TO_VARCHAR(SNOWFLAKE.CORTEX.PARSE_DOCUMENT('@COMPANY_EARNINGS_PRESS_RELEASES_STG','MSFT_Q1_FY_2025_Earnings_Press_Release.pdf', {'mode': 'LAYOUT'}):content);

### Process ServiceNow's Press Release

In [None]:
INSERT INTO COMPANY_PRESS_RELEASES_TBL 
    (COMPANY_NAME, 
     TICKER_SYMBOL, 
     PRESS_RELEASE_TYPE, 
     PRESS_RELEASE_FISCAL_PERIOD, 
     PRESS_RELEASE_DATE, 
     PRESS_RELEASE_CONTENT)
SELECT 'SERVICENOW, INC.', 
        'NOW', 
        'EARNINGS',
        'Q3 FY 2024', 
        '2024-10-23',       
TO_VARCHAR(SNOWFLAKE.CORTEX.PARSE_DOCUMENT('@COMPANY_EARNINGS_PRESS_RELEASES_STG','NOW_Q3_FY_2024_Earnings_Press_Release.pdf',{'mode': 'LAYOUT'}):content);

### Process Oracle Corporation's Press Release

In [None]:
INSERT INTO COMPANY_PRESS_RELEASES_TBL 
    (COMPANY_NAME, 
     TICKER_SYMBOL, 
     PRESS_RELEASE_TYPE, 
     PRESS_RELEASE_FISCAL_PERIOD, 
     PRESS_RELEASE_DATE, 
     PRESS_RELEASE_CONTENT)
SELECT 'Oracle Corporation', 
        'ORCL', 
        'EARNINGS',
        'Q2 FY 2025', 
        '2024-12-09',       
TO_VARCHAR(SNOWFLAKE.CORTEX.PARSE_DOCUMENT('@COMPANY_EARNINGS_PRESS_RELEASES_STG','ORCL_Q2_FY_2025_Earnings_Press_Release.pdf',{'mode': 'LAYOUT'}):content);

### Process Palo Alto Network's  Press Release

In [None]:
INSERT INTO COMPANY_PRESS_RELEASES_TBL 
    (COMPANY_NAME, 
     TICKER_SYMBOL, 
     PRESS_RELEASE_TYPE, 
     PRESS_RELEASE_FISCAL_PERIOD, 
     PRESS_RELEASE_DATE, 
     PRESS_RELEASE_CONTENT)
SELECT 'PALO ALTO NETWORKS, INC', 
        'PANW', 
        'EARNINGS',
        'Q1 FY 2025', 
        '2024-11-20',       
TO_VARCHAR(SNOWFLAKE.CORTEX.PARSE_DOCUMENT('@COMPANY_EARNINGS_PRESS_RELEASES_STG','PANW_Q1_FY_2025_Earnings_Press_Release.pdf',{'mode': 'LAYOUT'}):content);

### Process Procter & Gamble's Press Release

In [None]:
INSERT INTO COMPANY_PRESS_RELEASES_TBL 
    (COMPANY_NAME, 
     TICKER_SYMBOL, 
     PRESS_RELEASE_TYPE, 
     PRESS_RELEASE_FISCAL_PERIOD, 
     PRESS_RELEASE_DATE, 
     PRESS_RELEASE_CONTENT)
SELECT 'Procter & Gamble', 
        'PG', 
        'EARNINGS',
        'Q1 FY 2025', 
        '2024-10-18',       
TO_VARCHAR(SNOWFLAKE.CORTEX.PARSE_DOCUMENT('@COMPANY_EARNINGS_PRESS_RELEASES_STG','PG_Q1_FY_2025_Earnings_Press_Release.pdf', {'mode': 'LAYOUT'}):content);

- ### Use the Snowflake Cortex [SPLIT_TEXT_RECURSIVE_CHARACTER](https://docs.snowflake.com/en/sql-reference/functions/split_text_recursive_character-snowflake-cortex) to create chunks from each press release.  
- ### Store the chunks along with the meta data about each press release into a new "chunks" table - COMPANY_PRESS_RELEASES_CHUNKS_TBL

In [None]:
CREATE OR REPLACE TRANSIENT TABLE COMPANY_PRESS_RELEASES_CHUNKS_TBL 
AS
SELECT
   COMPANY_NAME, 
     TICKER_SYMBOL, 
     PRESS_RELEASE_TYPE, 
     PRESS_RELEASE_FISCAL_PERIOD, 
     PRESS_RELEASE_DATE, 
     PRESS_RELEASE_CONTENT,
     TO_VARCHAR(CHUNK_TXT.VALUE) AS PRESS_RELEASE_CHUNK
FROM
   COMPANY_PRESS_RELEASES_TBL,
   LATERAL FLATTEN( input => SNOWFLAKE.CORTEX.SPLIT_TEXT_RECURSIVE_CHARACTER (
      PRESS_RELEASE_CONTENT,
      'none',
      300,
      75
   )) CHUNK_TXT;

Summarize each Press Release Using the Snowflake Cortex COMPLETE function

In [None]:
SELECT SNOWFLAKE.CORTEX.COMPLETE(
    'mistral-large',
        CONCAT('Summarize this press release in four bullet points or less: <press_release>', PRESS_RELEASE_CONTENT,        
                '</press_release>')) AS Earnings_Summary 
FROM 
    COMPANY_PRESS_RELEASES_TBL 
LIMIT 3;

- EXTRACT_ANSWER from the PRESS_RELEASE_CONTENT column in the COMPANY_PRESS_RELEASES_TBL.
- Answers your question for each press release content and provides a confidence score of between 0 and 1, where a score closer to 1 indicates a higher confidence that the extracted answer is accurate based on the provided text and question.  

In [None]:
SELECT  COMPANY_NAME, 
        SNOWFLAKE.CORTEX.EXTRACT_ANSWER(PRESS_RELEASE_CONTENT,
        'What are the topics covered in the press release?')
FROM 
    COMPANY_PRESS_RELEASES_TBL 
LIMIT 6;

- Snowflake Cortex SENTIMENT
- A floating-point number from -1 to 1 (inclusive) 
- Indicating the level of negative or positive sentiment in the text. 
- Values around 0 indicate neutral sentiment.

In [None]:
--****************************************************************************
-- Snowflake Cortex SENTIMENT
-- A floating-point number from -1 to 1 (inclusive) 
-- indicating the level of negative or positive sentiment in the text. 
-- Values around 0 indicate neutral sentiment.
--****************************************************************************
SELECT 
    COMPANY_NAME,
    PRESS_RELEASE_CHUNK, 
    SNOWFLAKE.CORTEX.SENTIMENT(PRESS_RELEASE_CHUNK) ASSESS_SENTIMENT
FROM
    COMPANY_PRESS_RELEASES_CHUNKS_TBL;

- [Snowflake Cortex Summarize](https://docs.snowflake.com/en/sql-reference/functions/summarize-snowflake-cortex)
- A floating-point number from -1 to 1 (inclusive) 
- Indicating the level of negative or positive sentiment in the text. 
- Values around 0 indicate neutral sentiment.

In [None]:
SELECT 
    COMPANY_NAME,
    PRESS_RELEASE_CONTENT, 
    SNOWFLAKE.CORTEX.SUMMARIZE(PRESS_RELEASE_CONTENT) PRESS_RELEASE_SUMMARY
FROM
    COMPANY_PRESS_RELEASES_TBL
LIMIT 3;

## Cortex Search Use Case:
- ### Search the press release PDF files to answer specific questions about the companies.
- ### We will use Cortex Search Service to power a Retrieval Augmented Generation application to ask questions and get answers.  
- ### The search results returned by Cortex Search is send to the Snowflake Cortex COMPLETE functions to answer the question.  

# What is Cortex Search? 
- [Cortex Search](https://docs.snowflake.com/user-guide/snowflake-cortex/cortex-search/cortex-search-overview?utm_cta=website-resources-vhol#overview) gets you up and running with a hybrid (vector and keyword) search engine on your text data in minutes, without having to worry about embedding, infrastructure maintenance, search quality parameter tuning, or ongoing index refreshes. 
- This means you can spend less time on infrastructure and search quality tuning, and more time developing high-quality chat and search experiences using your data.
- Use the Create Cortex Search SQL command to create the vectors and index to power a search application. 
- Cortex Search service is created on a table. 
- In this case, we create the service on the COMPANY_PRESS_RELEASES_CHUNKS_TBL table. 
- Once the search service is created, it can stay updated with any new data coming into the table.
- CHANGE_TRACKING needs to be turned on the table if you wish for your Cortex Search service to get updated when new rows are added to the underlying table. 

In [None]:
ALTER TABLE COMPANY_PRESS_RELEASES_CHUNKS_TBL SET CHANGE_TRACKING = TRUE;

## Create Cortex Search Service 

In [None]:
CREATE CORTEX SEARCH SERVICE COMPANY_PRESS_RELEASES_SEARCH_SVC
    ON PRESS_RELEASE_CHUNK
    ATTRIBUTES COMPANY_NAME, TICKER_SYMBOL, PRESS_RELEASE_TYPE
    WAREHOUSE = COMPUTE_WH
    TARGET_LAG = '1 hour'
    AS (
        SELECT 
        *
        FROM COMPANY_PRESS_RELEASES_CHUNKS_TBL
    );

### Test the Cortex Search Service by using the Snowflake Cortex [SEARCH_PREVIEW](https://docs.snowflake.com/en/sql-reference/functions/search_preview-snowflake-cortex) SQL function.  

In [None]:
SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'DEMODB.EQUITY_RESEARCH.COMPANY_PRESS_RELEASES_SEARCH_SVC',
      '{
        "query": "Summarize earnings in five bullet points or less",
        "columns":[
            "PRESS_RELEASE_CHUNK"
        ],
        "filter": {"@eq": {"TICKER_SYMBOL": "NOW"} },
        "limit":5
      }'
  )
)['results'] as results;

## A Streamlit app to get Answers to your questions from the company press releases.  

In [None]:
# Import python packages
import streamlit as st
from snowflake.core import Root
from snowflake.snowpark.context import get_active_session

### The LLMs available for you to use with the Snowflake Cortex COMPLETE SQL function. 
### Select the LLM from the dropdown box. 

In [None]:
llm_option = st.selectbox(
    'Select a LLM:',
    ('claude-3-5-sonnet', 'llama3.2-1b', 'llama3.2-3b',
     'llama3.1-8b', 'llama3.1-70b', 'llama3.1-405b',
     'snowflake-arctic', 'reka-core', 'reka-flash',
     'mistral-large2', 'mixtral-8x7b', 'mistral-7b',
     'jamba-instruct', 'jamba-1.5-mini', 'jamba-1.5-large',
     'gemma-7b'))
str_selected_llm = f"""You Selected {llm_option} LLM Model. This model will be used in the SNOWFLAKE.CORTEX.COMPLETE function."""
st.write(str_selected_llm)

In [None]:
question_option = st.selectbox(
    'Select a question for which you seek an answer:',
    ('Summarize the earnings', 
     'Tell me about the revenue and earnings guidance', 
     'Tell me about remaining performance obligations'))
str_selected_question = f"""You Selected \"{question_option}\" question. 
                       This question will be used in the SNOWFLAKE.CORTEX.COMPLETE 
                       function as a prompt."""
st.write(str_selected_question)

- ### Select the companies for which press releases are available in the database. 
- ### We use the resultset to populate a Select Box. 

In [None]:
session = get_active_session()

root = Root(session)
selected_company = st.selectbox('Select a Company:', 
                session.sql("SELECT DISTINCT COMPANY_NAME FROM COMPANY_PRESS_RELEASES_TBL").collect())

st.write(selected_company)

### Here's an example of how you could query the Cortex Search service.  

In [None]:
prompt_str = f"""{question_option} for {selected_company} in four 
                 bullet points or less"""
st.write("""<h2>The Prompt is:</h2>""", unsafe_allow_html=True) 
st.write(prompt_str)

- #### Provide the prompt, the column name used in the search service, and the filter attribute.  
- #### The filter attribute is optional. 
- #### You can remove the filter and query across all the chunks in the table.  
- #### The "limit" controls the number of results you want returned.  

## Set the Database, Schema, and Cortex Search Service Name Before Proceeding. 

In [None]:
database_name = 'DEMODB'
schema_name = 'EQUITY_RESEARCH'
cortex_search_service_name = 'COMPANY_PRESS_RELEASES_SEARCH_SVC'

In [None]:
cortex_search_service = (
        root.databases[database_name]
        .schemas[schema_name]
        .cortex_search_services[cortex_search_service_name]
    )
context_documents = cortex_search_service.search(
         prompt_str, 
         columns=['PRESS_RELEASE_CHUNK'], 
         filter={"@eq": {"COMPANY_NAME": selected_company} },
         limit=12
    )
results = context_documents.results

In [None]:
# Define the HTML code for the sub-heading
html_code_llm_response = """
                        <h2>Here are the results returned by Cortex Search:</h2>
                        """

# Display the HTML code using st.write()
st.write(html_code_llm_response, unsafe_allow_html=True) 
st.write(results)

#### Concatenate the search results before passing to the Snowflake Cortex COMPLETE function. 

In [None]:
context_str = ""
for i, r in enumerate(results):
    # st.write(r)
    # context_str += f"Context document {i+1}: {r['PRESS_RELEASE_CHUNK']} \n" + "\n"
    context_str += f"{r['PRESS_RELEASE_CHUNK']} \n"
    # st.write(context_str)
# st.write(context_str)
# st.write(resp.results)
press_release_str = "<press_release>{context_str}</press_release>"
context_str +=  prompt_str + press_release_str

# Define the HTML code for the sub-heading
html_code_heading = """
            <h2>The Prompt to Analyze the Cortex Search Results</h2>
            """

# Display the HTML code using st.write()
st.write(html_code_heading, unsafe_allow_html=True)
st.write(prompt_str)

In [None]:
results = session.sql("SELECT snowflake.cortex.complete(?,?)", (llm_option, context_str)).collect()[0][0]


# Define the HTML code for the sub-heading
html_code_llm_response = """
                        <h2>Here's a Summary of the Company's Earnings Press Release:</h2>
                        """

# Display the HTML code using st.write()
st.write(html_code_llm_response, unsafe_allow_html=True) 
st.write(results)

## Drop the tables and other objects you created for this Notebook.  
### If you wish, you can delete the objects this Notebook created.  

In [None]:
DROP TABLE text_classification_table;

In [None]:
DROP STAGE COMPANY_EARNINGS_PRESS_RELEASES_STG; 

In [None]:
DROP SEQUENCE PRESS_RELEASE_ID_SEQ;

In [None]:
DROP TABLE COMPANY_PRESS_RELEASES_TBL;

In [None]:
DROP TABLE COMPANY_PRESS_RELEASES_CHUNKS_TBL;

In [None]:
DROP CORTEX SEARCH SERVICE COMPANY_PRESS_RELEASES_SEARCH_SVC;