# **SNOWFLAKE CORTEX COMPLETE FINANCIAL SERVICES DEMO**

## Authors: John Heisler, Garrett Frere

In this demo, using Snowflake Cortex (https://www.snowflake.com/en/data-cloud/cortex/), we will build an AI-infused Data Pipeline with Cortex Complete.

### AI Pipeline Overview

We'll learn how to extract raw text from a PDF, perform prompt engineering, and pass custom prompts and data to a large language model of our choosing all without leaving Snowflake.

Specifically, we will be taking on the role of an AI Engineer who is working closely with a portfolio team at an asset manager. The portfolio team would like to speed up their ingestion and comprehension of statements by the Federal Open Market Committee (FOMC) who determines the direction of monetary policy by directing open market operations. Ultimately they would like to get a signal as to whether interest rates will increase, remain the same, or increase (hawkish, or, dovish respectively).

I refer to this as an AI pipeline because we can imbue this type of signal generation with AI much further up the data delivery value chain. In this way, we will maximize the value of our work imbuing into a common dataset. End users will not need invoke any additional logic; good design is invisible!

### Next Steps

 * To industrialize this demo with continuous ingestion and scoring, please check out the `FSI_Cortex_AI_Pipeline_Industrialization.ipynb` notebook in this repository
 * Check out the companion demo in this repository: `FSI_Cortex_Search.ipynb`

---
 # **BEFORE YOU START**

Complete following step to create dependent database objects **before** running this notebook:
1. Run the `1_SQL_SETUP_FOMC.sql` script

In [None]:
from snowflake.snowpark.context import get_active_session
from snowflake.snowpark.types import *
from snowflake.snowpark.functions import col
import streamlit as st

session = get_active_session() 

session.use_database('GEN_AI_FSI')
session.use_schema('ASSET_MANAGEMENT')

## AI Pipeline: Step 1 - Create Primatives

To carry out our batch analysis, we create a stored procedure and a function: 
* **`get_fed_pdfs`(stored proc):** Stored procedure to download the PDFs from the federal reserve website to our stage. We'll put that logic in a Python Stored Procedure which we will automate with a task.
    * Created in the `1_SQL_SETUP_FOMC.sql` script
    
* **`generate_prompt`(function):** As we load data into our system, we want to automatically generate a signal. To do so, we need to call an LLM and pass it our prompt. 
    * **Function Registration:** The function is defined with python and then registered the function for later reuse within our pipeline.

In [None]:
def generate_sentiment_prompt(document_text):
    prompt = f"""# Role
You are an experienced Senior Economist deeply knowledgeable on Federal Reserve guidance including FOMC or Federal Open Market Committee meeting minutes and communications.
You are an expert in interpreting Hawkish and Dovish signals from the Fed or Federal Reserve. Such signals are derived from guidance conveyed in FOMC meeting notes and communications.

As an analyst, you excel at discerning macroeconomic trends for each FOMC meeting notes and communications published by the Federal Reserve.
The  signal or trends are either Hawkish or Dovish based on the growth outlook and inflation outlook of the Fed. The Federal Reserve has a long 
term objective of keeping inflation around 2%, and low unemployment. Hawkish sentiment could imply 
the Federal Reserve intends to raise interest rates to increase the cost of borrowing and slow economic activity. 
The Fed typically increases interest rates when inflation is high or rising, or when the unemployment 
rate is low or falling. Conversely, dovish sentiment could imply the Federal Reserve intends to lower interest 
rates to allow easier access borrowing and lowering the cost of money to stimulate economic activity.  The Fed 
typically decreases interest rates when inflation is low or falling, or when the unemployment rate is high or rising.

## Signal categories known as Economic Policy Stances:
### Hawkish stance or attitude for economic policy
* characterized by a focus on combating inflation and often involves advocating for higher interest rates and tolerant to higher levels of unemployment.
* concerned about rising inflation. Hawkish stance believes higher interest rates can help keep inflation in check, even if it slows down economic growth or increases unemployment.

### Dovish stance or attitude for economic policy
* characterized by a focus on prioritizing stimulating economic growth, reducing unemployment, and tolerant to higher levels of inflation.
* concerned with boosting economic activity, reducing unemployment and, for this reason, lower interest rates are preferred to create economic growth and employment.

###Neutral stance or attitude for economic policy
* characterized by a focus on balance between combating inflation and supporting economic growth, with no strong inclination toward either side.
* concerned with maintaining a steady economic environment without significant deviations. They seek to neither overly stimulate the economy nor excessively tighten it.

# Data
You are provided the text of a Federal Reserve Guidance or FOMC meeting notes as context. These generally are released before the Federal Reserve takes action on economic policy. 

# Input Context: FOMC_meeting_notes 
{document_text}

# Task
Follow these instructions:
1) Review the provided FOMC communication or meeting notes text. Then,
2) Consider the FOMC members or Committee Members tone and sentiment around economic conditions. Then,
3) Consider specific guidance and stated conditions that validate the tone and signal FOMC members make concerning current macro economic conditions. Then,
4) Based on this sentiment classify if the FOMC communication text indicates Hawkish, Dovish, or Neutral outlook for the economy. Be critical and do not categorize sentiment as "Neutral" unless necessary. This will be output as [Signal].
5) Summarize a concise and accurate rationale for classifying the sentiment Hawkish, Neutral, or Dovish sentiment. This will be output as [Signal_Summary].

# Output
Produce valid JSON. Absolutely do not include any additional text before or following the JSON. Output should use following JSON_format

# JSON output Example
{{
    "Signal": "Hawksish",
    "Signal_Summary": "The FOMC meeting notes indicate a Hawkish sentiment. The Committee acknowledged elevated inflation, largely driven by transitory factors, but also noted that inflation pressures could last longer than previously expected. They agreed to begin reducing the pace of asset purchases, signaling a move towards tightening monetary policy. The Committee also expressed concerns about supply chain disruptions and labor shortages, which could further exacerbate inflation. The overall tone suggests a focus on combating inflation and a willingness to adjust policy if inflation continues to run high."
}}"""
    return prompt

session.add_packages("snowflake-snowpark-python", "snowflake-ml-python", "snowflake")

session.udf.register(
  func = generate_sentiment_prompt,
  return_type = StringType(),
  input_types = [ StringType()],
  is_permanent = True,
  name = 'generate_sentiment_prompt',
  replace = True,
  stage_location = '@gen_ai_fsi.asset_management.fed_logic')

# AI Pipeline: Step 2 - Ingest Text and Determine Signal

Leveraging the function and stored procedure created above, we'll download all the pdfs and score the sentiment in a simple insert statement.

In [None]:
query = 'CALL get_fed_pdfs()'
output = session.sql(query).collect()
st.write('# Download Output\n\n ', output[0][0])

In [None]:
ALTER STAGE gen_ai_fsi.asset_management.FED_PDF REFRESH;

INSERT INTO gen_ai_fsi.asset_management.pdf_full_text (id, relative_path, size, last_modified, md5, etag, file_url, file_text, file_date, sentiment)

SELECT
    gen_ai_fsi.asset_management.fed_pdf_full_text_sequence.nextval AS id,
    st.relative_path AS relative_path,
    st.size AS size,
    st.last_modified AS last_modified,
    st.md5 AS md5,
    st.etag AS etag,
    st.file_url AS file_url,
    REPLACE(SNOWFLAKE.CORTEX.PARSE_DOCUMENT(@gen_ai_fsi.asset_management.fed_pdf, st.relative_path,{'mode': 'LAYOUT'}):content, '''', '') AS FILE_TEXT,
    TRY_TO_DATE(REGEXP_SUBSTR(st.relative_path, '\\d{8}'), 'YYYYMMDD') AS file_date,
    NULL as SENTIMENT --we will fill sentiment in the next sql statment
FROM 
    DIRECTORY(@gen_ai_fsi.asset_management.FED_PDF)st
LEFT JOIN 
    gen_ai_fsi.asset_management.pdf_full_text ft 
    ON st.relative_path = ft.relative_path
WHERE -- only insert if the stage file doesnt exist in the pdf_full_text table
    ft.id IS NULL;

UPDATE gen_ai_fsi.asset_management.pdf_full_text t1
  --SET SENTIMENT = snowflake.cortex.try_complete('mistral-large2', gen_ai_fsi.asset_management.generate_sentiment_prompt(t2.file_text))
    SET sentiment =     snowflake.cortex.try_complete('mistral-large2',
        [
            {'role': 'user', 'content': gen_ai_fsi.asset_management.generate_sentiment_prompt(t2.file_text)}
        ],
            {
                'temperature': 0.0, -- Adjust for less randomness (more deterministic output)
                'response_format': {
                    'type': 'json',
                    'schema': {
                        'type': 'object',
                        'properties': {
                            'sentiment': {
                                'type': 'string',
                                'enum': ['hawkish', 'dovish', 'neutral'],
                                'description': 'The overall sentiment (hawkish, dovish, or neutral) based on the text.'
                            },
                            'signal_reasoning': {
                                'type': 'string',
                                'description': 'A concise reason explaining why the sentiment was classified as such.'
                            }
                        },
                        'required': ['sentiment', 'signal_reasoning']
                    }
                }
            }
        )::VARIANT 
  FROM 
    gen_ai_fsi.asset_management.pdf_full_text t2
  WHERE 
    t1.id = t2.id and 
    t1.SENTIMENT IS NULL

### AI Pipeline: Step 3.1 - Check out the result

select from our PDF table to view our signal and a summary or reasoning.

In [None]:
select * from gen_ai_fsi.asset_management.PDF_FULL_TEXT order by file_date desc;

-- select 
--     relative_path as file, 
--     file_date,
--     file_text,
--     --sentiment sentiment_object, 
--     sentiment:structured_output[0].raw_message.sentiment::STRING as sentiment,
--     sentiment:structured_output[0].raw_message.signal_reasoning::STRING as reasoning
-- from 
--     GEN_AI_FSI.asset_management.PDF_FULL_TEXT order by file_date desc;

In [None]:
page_title="FRB Minutes Sentiment Analysis",
initial_sidebar_state="auto"


# --- Application Title and Introduction ---
st.title("Federal Reserve Meeting Minutes Sentiment Analysis")
st.markdown("""
Welcome to the Federal Reserve Meeting Minutes Sentiment Analyzer. 
This tool provides a quick overview of the sentiment expressed in past FOMC meeting minutes, 
along with the detailed reasoning and the full text of the document.
""")
st.markdown("---")

# Define the SQL query to fetch sentiment data
query = """
select
    relative_path as file,
    file_date,
    file_text,
    sentiment sentiment_object,
    sentiment:structured_output[0].raw_message.sentiment::STRING as sentiment,
    sentiment:structured_output[0].raw_message.signal_reasoning::STRING as reasoning
from
    GEN_AI_FSI.asset_management.PDF_FULL_TEXT
order by file_date desc
"""

# Execute the query 
sentiment_data = session.sql(query).order_by("FILE_DATE", ascending=False)

# get available file dates for the dropdown
file_column = sentiment_data.select(col("FILE_DATE")).collect()
available_files = [row.FILE_DATE for row in file_column]

# select file
st.subheader("Select Meeting Minutes Date")
selected_file = st.selectbox(
    "Choose a file date to view its sentiment analysis:",
    available_files,
    index=0, # Default to the most recent file
    help="Use this dropdown to select a specific Federal Reserve meeting minutes file by its date."
)

if st.button("Get Analysis", help="Click to view the sentiment analysis for the selected meeting minutes."):
    # filter based on selection
    sentiment_data_filtered = sentiment_data.filter(col("FILE_DATE") == selected_file)
    
    # Extract the relevant data points for the selected file
    file_text = sentiment_data_filtered.select(col("FILE_TEXT")).collect()[0][0]
    sentiment = sentiment_data_filtered.select(col("SENTIMENT")).collect()[0][0]
    reasoning = sentiment_data_filtered.select(col("REASONING")).collect()[0][0]
    sentiment_object = sentiment_data_filtered.select(col("sentiment_object")).collect()[0][0]
    
    st.markdown("---")
    
    # --- Display Selected File Information ---
    # Use a prominent header for the selected file
    st.header(f"Analysis for: **{selected_file}**")
    
    # --- Sentiment and Reasoning Display in Columns ---
    # Use columns to align sentiment and reasoning side-by-side for better readability.
    # Adjust column ratios to give more space to reasoning.
    col1, col2 = st.columns([1, 3]) # Column 1 for Sentiment (1 unit), Column 2 for Reasoning (3 units)
    
    with col1:
        st.markdown("### Overall Sentiment")
        # Determine the color and emoji based on the sentiment value
        sentiment_lower = sentiment.lower()
        if sentiment_lower == "dovish":
            st.markdown(f"## :green[{sentiment.title()}]")
        elif sentiment_lower == "hawkish":
            st.markdown(f"## :red[{sentiment.title()}]")
        else: # Neutral or any other unexpected value
            st.markdown(f"## :black[{sentiment.title()}]")
    
    with col2:
        st.markdown("### Reasoning Behind Sentiment")
        # Use st.info to make the reasoning stand out visually
        st.write(reasoning)
    
    st.markdown("---")
    
    # --- Detailed Information and Full File Text in Expander ---
    # Use an expander to keep the interface clean, allowing users to delve into details when needed.
    with st.expander("See Detailed JSON Response and Full File Text"):
        st.title("Detailed Snowflake Sentiment Response (JSON)")
        # Display the raw JSON response object
        st.json(sentiment_object)
    
        st.markdown("---") # Visual separator
    
        st.title("Full Meeting Minutes Text")
        # Display the complete text of the selected meeting minutes
        st.markdown(file_text)

In [None]:
--REMOVE @gen_ai_fsi.asset_management.FED_PDF pattern='.*';
REMOVE @gen_ai_fsi.asset_management.FED_PDF/fomcminutes20250319.pdf;
REMOVE @gen_ai_fsi.asset_management.FED_PDF/fomcminutes20250507.pdf;

--TRUNCATE TABLE GEN_AI_FSI.asset_management.PDF_FULL_TEXT;
DELETE FROM GEN_AI_FSI.asset_management.PDF_FULL_TEXT WHERE RELATIVE_PATH IN('fomcminutes20250507.pdf', 'fomcminutes20250319.pdf');

ALTER STAGE gen_ai_fsi.asset_management.FED_PDF REFRESH;

In [None]:
-- INSERT INTO gen_ai_fsi.asset_management.pdf_full_text (id, relative_path, size, last_modified, md5, etag, file_url, file_text, file_date, sentiment)

-- WITH cte AS (
--     SELECT TOP 1
--         gen_ai_fsi.asset_management.fed_pdf_full_text_sequence.nextval AS id,
--         st.relative_path AS relative_path,
--         st.size AS size,
--         st.last_modified AS last_modified,
--         st.md5 AS md5,
--         st.etag AS etag,
--         st.file_url AS file_url,
--         REPLACE(REPLACE(SNOWFLAKE.CORTEX.PARSE_DOCUMENT(@GEN_AI_FSI.asset_management.fed_pdf, st.relative_path,{'mode': 'LAYOUT'}):content, '''', ''),'"','') AS FILE_TEXT,
--         TRY_TO_DATE(REGEXP_SUBSTR(st.relative_path, '\\d{8}'), 'YYYYMMDD') AS file_date
--     FROM DIRECTORY(@gen_ai_fsi.asset_management.FED_PDF)st
--     LEFT JOIN gen_ai_fsi.asset_management.pdf_full_text ft 
--     ON st.relative_path = ft.relative_path
--     WHERE ft.id IS NULL -- only if the stage file doesnt exist
--  )

-- SELECT TOP 1 id,
--        relative_path,
--        size,
--        last_modified,
--        md5,
--        etag,
--        file_url,
--        file_text,
--        file_date,
--        gen_ai_fsi.asset_management.generate_sentiment_prompt(file_text), --FOR TESTING
--        --snowflake.cortex.try_complete('mistral-large2', gen_ai_fsi.asset_management.generate_sentiment_prompt(file_text)) AS signal_mis
-- FROM cte;

-----------------------------------------------------------