# Demo of Snowflake Cortex AISQL
## Cash Flow Analysis Using Comments Made by Company Executives

# The Challenge
- ### Analyzing large volumes of unstructured text, like executive transcripts, is a significant challenge. 
- ### Traditional SQL queries can't easily identify and summarize nuanced themes like "cash flow discussions." 
- ### To understand how a company's narrative around cash flow has changed over time, a financial analyst would typically have to:

    - #### Locate the transcripts for each company and year.
    - #### Manually read through each transcript to find relevant comments on cash flow.
    - #### Synthesize the information to identify similarities and differences.
    - #### This process is incredibly time-consuming, prone to human error, and difficult to scale across multiple companies.

# The Solution and Why It's Awesome
This query solves the challenge by leveraging Snowflake's AI SQL functions, specifically ```AI_AGG``` and ```AI_FILTER```, to perform a natural language processing (NLP) task directly within the database.

```AI_FILTER``` for Precision: The ```AI_FILTER``` function, combined with the PROMPT function, acts as a smart filter. It uses a large language model (LLM) to scan each piece of text and identify only the sentences that specifically discuss cash flow. This is a massive improvement over using simple keyword searches (like LIKE '%cash flow%'), which could miss important context or include irrelevant results.

```AI_AGG``` for Synthesis: The ```AI_AGG``` function is the most powerful part of the solution. After ```AI_FILTER``` has isolated the relevant comments for each company, ```AI_AGG``` aggregates this text and sends it to the LLM with a specific prompt. This prompt instructs the model to analyze all the cash flow-related comments for a single company across multiple years and synthesize them into a concise, structured summary, highlighting both similarities and differences. The prompt is also designed to format the output with markdown headings and bullet points, making the final result easy to read and use.

- ### This solution is **awesome** because it brings the power of generative AI directly into the data warehouse. 
- ### It transforms a complex, manual task into a simple, scalable SQL query. 
- ### Instead of spending hours reading transcripts, an analyst can now get a comprehensive, well-structured summary for multiple companies with a single query, **providing actionable insights in seconds**. 
- ### It shows how AI can be a powerful tool for analyzing unstructured data at scale.

## Prerequisites
- ### Create a Snowflake Database Called ```DEMODB``` 
- ### Create a Snowflake Schema Called ```EQUITY_RESEARCH```
- ### Create an Snowflake Internal Stage Called ```COMPANY_EVENT_TRANSACRIPT_INT_STG```
- ### Download the Transcript Zip file from Github.  
- ### Unzip the Transcript Zip file.  
- ### Upload the ```.csv.gz``` zip files (72 Files) to the internal stage
- ### Create a Table - ```COMPANY_EVENT_TRANSCRIPT_TTBL``` - to Load the Comments from Various Company Executives.  

## Create a Snowflake Internal Stage

In [None]:
-- Create an Internal Stage in Snowflake.  
CREATE STAGE COMPANY_EVENT_TRANSCRIPT_INT_STG 
	DIRECTORY = ( ENABLE = true ) 
	ENCRYPTION = ( TYPE = 'SNOWFLAKE_SSE' ) 
	COMMENT = 'Store the SEC Transcripts Released by the Company ';

## Create a table to store event transcripts.  

In [None]:
create or replace TRANSIENT TABLE DEMODB.EQUITY_RESEARCH.COMPANY_EVENT_TRANSCRIPT_TTBL (
	COMPANY_ID VARCHAR(16777216),
	CIK VARCHAR(16777216),
	COMPANY_NAME VARCHAR(16777216),
	PRIMARY_TICKER VARCHAR(16777216),
	EVENT_TIMESTAMP TIMESTAMP_NTZ(9),
	FISCAL_PERIOD VARCHAR(16777216),
	FISCAL_YEAR VARCHAR(16777216),
	EVENT_TYPE VARCHAR(16777216),
	TRANSCRIPT VARIANT
);

## Copy the transcript data into the table. 
### Please ensure you have uploaded the ```csv.gz``` files into the Snowflake internal stage before executing the ```COPY INTO``` command.

In [None]:
COPY INTO COMPANY_EVENT_TRANSCRIPT_TTBL
FROM @COMPANY_EVENT_TRANSCRIPT_INT_STG/;

## Preview the data in the transcript table.
### The Transcript is in a JSON object within an array.  
```JSON
[
  {
    "speaker": 0,
    "text": "Good day, and welcome to the Lee Enterprises Third Quarter 2021 Webcast and Conference Call. The call is being recorded and will be available for replay beginning later this morning at investors. Lee.net. At the close of the planned remarks, there will be an opportunity for questions. Participants accessing this call by webcast may submit written questions through the website and they will be answered during the call as time permits. Otherwise, you will receive a response later. A link to the live webcast can be found at investors. Lee.net. Now, I will turn the call over to your host, Josh Reinholtz, Vice President of Finance.",
    "time_end": 39.96,
    "time_start": 1.68
  }
]

In [None]:
SELECT * FROM COMPANY_EVENT_TRANSCRIPT_TTBL LIMIT 10;

## Preview the Raw Results Returned By ```AI_FILTER``` function.
## Use the ```LATERAL``` Keyword with the ```FLATTEN``` function to expand the JSON array into individual rows. 
## The ```AI_FILTER``` function will look for comments which discuss cash flow and related topics.   
## The Prompt: 
- ```AI_FILTER(PROMPT('Does the transcript, \'{0}\', discuss cash flows?', SPEAKER_TEXT_WITH_FISCAL_YEAR))```

In [None]:
-- Analysis of cash flow discussion by eecutives across fiscal years
-- Highlight the similarities & difference in cash flow discussions by executives across fiscal years.  
-- Define the CTE first
WITH TranscriptData AS (
    SELECT
        PRIMARY_TICKER,
        FISCAL_YEAR,
        CONCAT('TICKER: ', PRIMARY_TICKER, ', ', 'FISCAL_YEAR: ', FISCAL_YEAR, '-',T2.VALUE:"text"::VARCHAR) AS SPEAKER_TEXT_WITH_FISCAL_YEAR
    FROM
        DEMODB.EQUITY_RESEARCH.COMPANY_EVENT_TRANSCRIPT_TTBL AS TA,
        LATERAL FLATTEN(INPUT => TA.TRANSCRIPT) AS T2
)
-- Now select from the CTE where the alias is recognized
SELECT
    PRIMARY_TICKER,
    FISCAL_YEAR,
    SPEAKER_TEXT_WITH_FISCAL_YEAR,
FROM
    TranscriptData
WHERE
    PRIMARY_TICKER IN ('CPB', 'GIS', 'HRL', 'HSY', 'KO', 'PEP', 'TSN', 'KMB')
    AND 
    AI_FILTER(PROMPT('Does the transcript, \'{0}\', discuss cash flows?', SPEAKER_TEXT_WITH_FISCAL_YEAR))
ORDER BY PRIMARY_TICKER, FISCAL_YEAR ASC
LIMIT 20;

## Analyze the Similarities & Differences in the Cash Flow Discussion by Executives Across Fiscal Years.  
## Applies the ```AI_AGG``` function on the results from ```AI_FILTER```
## The Prompt:
- ```AI_AGG(SPEAKER_TEXT_WITH_FISCAL_YEAR, 'DESCRIBE THE SIMILARITIES AND DIFFERENCES IN THE COMPANY\'S DISCUSSION ON CASHFLOWS ACROSS FISCAL YEARS. PROVIDE THE SIMILARITIES AND DIFFERENCES IN SEPARATE TITLED SECTIONS WITH PROPER MARKDOWN HEADINGS AND BULLET POINTS.INCLUDE THE TICKER, FOLLOWED BY THE RANGE OF FISCAL YEARS USED TO GENERATE THIS ANALYSIS IN THE TITLE. FOR EXAMPLE: TICKER: CPB, FISCAL YEARS: 2015-2020')```

In [None]:
-- Analysis of cash flow discussion by eecutives across fiscal years
-- Highlight the similarities & difference in cash flow discussions by executives across fiscal years.  
-- Define the CTE first
WITH TranscriptData AS (
    SELECT
        PRIMARY_TICKER,
        FISCAL_YEAR,
        CONCAT('TICKER: ', PRIMARY_TICKER, ', ', 'FISCAL_YEAR: ', FISCAL_YEAR, ',  Transcript: ',T2.VALUE:"text"::VARCHAR) AS SPEAKER_TEXT_WITH_FISCAL_YEAR
    FROM
        DEMODB.EQUITY_RESEARCH.COMPANY_EVENT_TRANSCRIPT_TTBL AS TA,
        LATERAL FLATTEN(INPUT => TA.TRANSCRIPT) AS T2
)
-- Now select from the CTE where the alias is recognized
SELECT
    PRIMARY_TICKER,
    AI_AGG(SPEAKER_TEXT_WITH_FISCAL_YEAR, 'DESCRIBE THE SIMILARITIES AND DIFFERENCES IN THE COMPANY\'S DISCUSSION ON CASHFLOWS ACROSS FISCAL YEARS. PROVIDE THE SIMILARITIES AND DIFFERENCES IN SEPARATE TITLED SECTIONS WITH PROPER MARKDOWN HEADINGS AND BULLET POINTS.INCLUDE THE TICKER, FOLLOWED BY THE RANGE OF FISCAL YEARS USED TO GENERATE THIS ANALYSIS IN THE TITLE. FOR EXAMPLE: TICKER: CPB, FISCAL YEARS: 2015-2020') CASH_FLOW_DISCUSSION,
FROM
    TranscriptData
WHERE
    PRIMARY_TICKER IN ('CPB', 'GIS', 'HRL', 'HSY', 'KO', 'PEP', 'TSN', 'KMB')
    AND 
    AI_FILTER(PROMPT('Does the transcript, \'{0}\', discuss cash flows?', SPEAKER_TEXT_WITH_FISCAL_YEAR))
GROUP BY PRIMARY_TICKER
ORDER BY PRIMARY_TICKER ASC;

## Cash Flow Discussion Similarities & Difference for Hershey for years 2018-2024

# HSY, FISCAL YEARS: 2018-2024
## Similarities
* The company consistently emphasizes the importance of strong cash flow and a healthy balance sheet in managing through crises and investing in the business across different fiscal years.
* The company prioritizes investing in the business, returning cash to shareholders, and maintaining a strong balance sheet across different fiscal years.
* The company's discussion on cash flows often highlights the impact of pricing actions, supply chain costs, and inflation on its financial performance.
* In both fiscal years 2021 and 2022, the company experienced strong cash generation, with operating cash flow exceeding \$2,000,000,000 in 2021 and \$656,000,000 in the first quarter of 2022.
* The company invested in its brands, capabilities, and people to drive differentiated growth in both years.
* Price realization was a primary driver of growth in both years, with mid-to-high single-digit price increases announced in 2021 and net price realization contributing 6.9 points of growth in the first quarter of 2022.
* The company faced supply chain disruptions and inflation in both years, which impacted gross margins.
* The company maintained a focus on operating with excellence and investing in its business for long-term growth in both years.

## Differences
* The company's cash flow and capital allocation priorities have evolved over time, with a greater emphasis on investing in core confection capacity, snacking scale and optimization, and supply chain resiliency in recent years.
* The company's discussion on cash flows has become more nuanced, with a greater focus on the impact of external factors such as COVID-19, inflation, and supply chain disruptions on its financial performance.
* The company's outlook for cash flows and capital allocation has become more cautious in recent years, with a greater emphasis on managing through uncertainty and volatility.
* Net sales growth was 10.1% in 2021, while it was 16.1% in the first quarter of 2022, with an expected full-year growth of 10% to 12%.
* Gross margin declined 40 basis points in the fourth quarter of 2021, while it was flat in the first quarter of 2022, with an expected full-year contraction of 120 to 140 basis points.
* Advertising and related consumer marketing expenses decreased by 6.9% in the fourth quarter of 2021, while they decreased by about 1% in the first quarter of 2022.
* The company expected adjusted EPS growth of 9% to 11% in 2022, while it achieved adjusted EPS growth of nearly 32% in the first quarter of 2022, with an expected full-year growth of 10% to 12%.
* Capital expenditures were approximately \$500,000,000 in 2021, while they were expected to be around \$600,000,000 in 2022, with a focus on core confection capacity, snacking scale, and optimization, and supply chain resiliency.

## Yearly Cash Flow Analysis for Each Ticker

In [None]:
-- Use the Transient table to Run your queries.  
-- 
-- Yearly Look at Cash Flow Discussion
-- Define the CTE first
-- 
WITH TranscriptData AS (
    SELECT
        PRIMARY_TICKER,
        FISCAL_YEAR,
        CONCAT('FISCAL YEAR: ', FISCAL_YEAR, ' TICKER: ', PRIMARY_TICKER, ' TRANSCRIPT: ', T2.VALUE:"text"::VARCHAR) AS SPEAKER_TEXT
    FROM
        DEMODB.EQUITY_RESEARCH.COMPANY_EVENT_TRANSCRIPT_TTBL AS TA,
        LATERAL FLATTEN(INPUT => TA.TRANSCRIPT) AS T2
)
-- Now select from the CTE where the alias is recognized
SELECT
    PRIMARY_TICKER,
    FISCAL_YEAR,
    AI_AGG(SPEAKER_TEXT, 'SUMMARIZE THE COMPANY\'S DISCUSSION ON CASHFLOWS. Provide the output in a clearly formatted markdown with bullet points. Start the summary with a title - \'Cash Flow Discussion Summary: \' - marked down as heading level 3. Include the Ticker and the Fiscal Year as Part of the title.') CASH_FLOW_DISCUSSION,
FROM
    TranscriptData
WHERE
    -- EVENT_TIMESTAMP >= TO_DATE('01-01-2025', 'MM-DD-YYYY')
    -- AND 
    PRIMARY_TICKER IN ('CPB', 'GIS', 'HRL', 'HSY', 'KO', 'PEP', 'TSN', 'KMB')
    AND AI_FILTER(PROMPT('Does the transcript, \'{0}\', discuss cash flows?', SPEAKER_TEXT))
-- GROUP BY PRIMARY_TICKER;
GROUP BY ALL
ORDER BY PRIMARY_TICKER, FISCAL_YEAR ASC;

## Cash Flow Analysis for Pepsi Co. (PEP) for Fiscal Year 2023

### Cash Flow Discussion Summary: PEP - Fiscal Year 2023
* The company has been discussing its cash flow profile and capital investments for several years, with a focus on returning to higher levels of free cash flow conversion.
* The new CFO sees opportunities to speed up the process of cash flow generation, but acknowledges that it may take time.
* The company has been intentional about its capital investments, including catching up on capacity and investing in IT and digitalization.
* The level of capital expenditures (CapEx) as a percent of sales is expected to trend down over time, which will help improve cash flow conversion.
* Cash flow remains a priority for the company, with a focus on improving conversion rates.