# Cash Flow Analysis Using Company Event Transcripts
## Demo of Snowflake Cortex AISQL - AI_FILTER & AI_AGG

# The Challenge
- ### Analyzing large volumes of unstructured text, like executive transcripts, is a significant challenge. 
- ### Traditional SQL queries can't easily identify and summarize nuanced themes like "cash flow discussions." 
- ### To understand how a company's narrative around cash flow has changed over time, a financial analyst would typically have to:

    - #### Locate the transcripts for each company and year.
    - #### Manually read through each transcript to find relevant comments on cash flow.
    - #### Synthesize the information to identify similarities and differences.
    - #### This process is incredibly time-consuming, prone to human error, and difficult to scale across multiple companies.

# The Solution and Why It's Awesome
This query solves the challenge by leveraging Snowflake's AI SQL functions, specifically ```AI_AGG``` and ```AI_FILTER```, to perform a natural language processing (NLP) task directly within the database.

```AI_FILTER``` for Precision: The ```AI_FILTER``` function, combined with the PROMPT function, acts as a smart filter. It uses a large language model (LLM) to scan each piece of text and identify only the sentences that specifically discuss cash flow. This is a massive improvement over using simple keyword searches (like LIKE '%cash flow%'), which could miss important context or include irrelevant results.

```AI_AGG``` for Synthesis: The ```AI_AGG``` function is the most powerful part of the solution. After ```AI_FILTER``` has isolated the relevant comments for each company, ```AI_AGG``` aggregates this text and sends it to the LLM with a specific prompt. This prompt instructs the model to analyze all the cash flow-related comments for a single company across multiple years and synthesize them into a concise, structured summary, highlighting both similarities and differences. The prompt is also designed to format the output with markdown headings and bullet points, making the final result easy to read and use.

- ### This solution is **awesome** because it brings the power of generative AI directly into the data warehouse. 
- ### It transforms a complex, manual task into a simple, scalable SQL query. 
- ### Instead of spending hours reading transcripts, an analyst can now get a comprehensive, well-structured summary for multiple companies with a single query, **providing actionable insights in seconds**. 
- ### It shows how AI can be a powerful tool for analyzing unstructured data at scale.

## Prerequisites
- ### Create a Snowflake Database Called ```DEMODB``` 
- ### Create a Snowflake Schema Called ```EQUITY_RESEARCH```
- ### Create an Snowflake Internal Stage Called ```COMPANY_EVENT_TRANSACRIPT_INT_STG```
- ### Download the Transcript Zip file from Github.  
- ### Unzip the Transcript Zip file.  
- ### Upload the ```.csv.gz``` zip files (72 Files) to the internal stage
- ### Create a Table - ```COMPANY_EVENT_TRANSCRIPT_TTBL``` - to Load the Comments from Various Company Executives.  

## Create a Snowflake Internal Stage

In [None]:
-- Create an Internal Stage in Snowflake.  
CREATE STAGE COMPANY_EVENT_TRANSCRIPT_INT_STG 
	DIRECTORY = ( ENABLE = true ) 
	ENCRYPTION = ( TYPE = 'SNOWFLAKE_SSE' ) 
	COMMENT = 'Store the SEC Transcripts Released by the Company ';

## Create a table to store event transcripts.  

In [None]:
create or replace TRANSIENT TABLE DEMODB.EQUITY_RESEARCH.COMPANY_EVENT_TRANSCRIPT_TTBL (
	COMPANY_ID VARCHAR(16777216),
	CIK VARCHAR(16777216),
	COMPANY_NAME VARCHAR(16777216),
	PRIMARY_TICKER VARCHAR(16777216),
	EVENT_TIMESTAMP TIMESTAMP_NTZ(9),
	FISCAL_PERIOD VARCHAR(16777216),
	FISCAL_YEAR VARCHAR(16777216),
	EVENT_TYPE VARCHAR(16777216),
	TRANSCRIPT VARIANT
);

## Copy the transcript data into the table. 
### Please ensure you have uploaded the ```csv.gz``` files into the Snowflake internal stage before executing the ```COPY INTO``` command.

In [None]:
COPY INTO COMPANY_EVENT_TRANSCRIPT_TTBL
FROM @COMPANY_EVENT_TRANSCRIPT_INT_STG/;

## Preview the data in the transcript table.
### The Transcript is in a JSON object within an array.  
```JSON
[
  {
    "speaker": 1,
    "text": "Thank you. Good morning, everyone. Thank you for joining us today for The Hershey Company's Q3 2020 earnings Q and A session. I hope everyone has had the chance to read our press release and listen to our pre recorded management presentation, both of which are available on our website. In addition, we have posted a transcript of the pre recorded remarks. At the conclusion of today's live Q and A session, we will also post a transcript and audio replay of this call. Please note that during today's Q and A session, we may make forward looking statements that are subject to various risks and uncertainties. These statements include expectations and assumptions regarding the company's future operations and financial performance, including expectations and assumptions related to the impact of the COVID-nineteen pandemic. Actual results could differ materially from those projected as a result of the COVID-nineteen pandemic as well as other factors. The company undertakes no obligation to update these statements based on subsequent events. A detailed listing of such risks and uncertainties can be found in today's press release and the company's SEC filings. Finally, please note that we may refer to certain non GAAP financial measures that we believe will provide useful information for investors. The presentation of this information is not intended to be considered in isolation or as a substitute for the financial information presented in accordance with GAAP. Reconciliations to the GAAP results are included in this morning's press release. Joining me today are Hershey's Chairman and CEO, Michelle Buck and Hershey's Senior Vice President and CFO, Steve Voskal. With that, I will turn it over to the operator for the first question.",
    "time_end": 98.88931,
    "time_start": 20.284
  }
]

In [None]:
SELECT * FROM COMPANY_EVENT_TRANSCRIPT_TTBL LIMIT 10;

## Preview the Raw Results Returned By ```AI_FILTER``` function.
## Use the ```LATERAL``` Keyword with the ```FLATTEN``` function to expand the JSON array into individual rows. 
## The ```AI_FILTER``` function will look for comments which discuss cash flow and related topics.   
## The Prompt: 
- ```AI_FILTER(PROMPT('Does the transcript, \'{0}\', discuss cash flows?', SPEAKER_TEXT_WITH_FISCAL_YEAR))```

In [None]:
-- Analysis of cash flow discussion by eecutives across fiscal years
-- Highlight the similarities & difference in cash flow discussions by executives across fiscal years.  
-- Define the CTE first
WITH TranscriptData AS (
    SELECT
        PRIMARY_TICKER,
        FISCAL_YEAR,
        CONCAT('TICKER: ', PRIMARY_TICKER, ', ', 'FISCAL_YEAR: ', FISCAL_YEAR, '-',T2.VALUE:"text"::VARCHAR) AS SPEAKER_TEXT_WITH_FISCAL_YEAR
    FROM
        DEMODB.EQUITY_RESEARCH.COMPANY_EVENT_TRANSCRIPT_TTBL AS TA,
        LATERAL FLATTEN(INPUT => TA.TRANSCRIPT) AS T2
)
-- Now select from the CTE where the alias is recognized
SELECT
    PRIMARY_TICKER,
    FISCAL_YEAR,
    SPEAKER_TEXT_WITH_FISCAL_YEAR,
FROM
    TranscriptData
WHERE
    PRIMARY_TICKER IN ('CPB', 'GIS', 'HRL', 'HSY', 'KO', 'PEP', 'TSN', 'KMB')
    AND 
    AI_FILTER(PROMPT('Does the transcript, \'{0}\', discuss cash flows?', SPEAKER_TEXT_WITH_FISCAL_YEAR))
ORDER BY PRIMARY_TICKER, FISCAL_YEAR ASC
LIMIT 20;

## Analyze the Similarities & Differences in the Cash Flow Discussion by Executives Across Fiscal Years.  
## Applies the ```AI_AGG``` function on the results from ```AI_FILTER```
## The Prompt:
- ```AI_AGG(SPEAKER_TEXT_WITH_FISCAL_YEAR, 'DESCRIBE THE SIMILARITIES AND DIFFERENCES IN THE COMPANY\'S DISCUSSION ON CASHFLOWS ACROSS FISCAL YEARS. PROVIDE THE SIMILARITIES AND DIFFERENCES IN SEPARATE TITLED SECTIONS WITH PROPER MARKDOWN HEADINGS AND BULLET POINTS.INCLUDE THE TICKER, FOLLOWED BY THE RANGE OF FISCAL YEARS USED TO GENERATE THIS ANALYSIS IN THE TITLE. FOR EXAMPLE: TICKER: CPB, FISCAL YEARS: 2015-2020')```
- The query can take 3-6 minutes to complete.  

In [None]:
-- Analysis of cash flow discussion by eecutives across fiscal years
-- Highlight the similarities & difference in cash flow discussions by executives across fiscal years.  
-- Define the CTE first
WITH TranscriptData AS (
    SELECT
        PRIMARY_TICKER,
        FISCAL_YEAR,
        CONCAT('TICKER: ', PRIMARY_TICKER, ', ', 'FISCAL_YEAR: ', FISCAL_YEAR, ',  Transcript: ',T2.VALUE:"text"::VARCHAR) AS SPEAKER_TEXT_WITH_FISCAL_YEAR
    FROM
        DEMODB.EQUITY_RESEARCH.COMPANY_EVENT_TRANSCRIPT_TTBL AS TA,
        LATERAL FLATTEN(INPUT => TA.TRANSCRIPT) AS T2
)
-- Now select from the CTE where the alias is recognized
SELECT
    PRIMARY_TICKER,
    AI_AGG(SPEAKER_TEXT_WITH_FISCAL_YEAR, 'DESCRIBE THE SIMILARITIES AND DIFFERENCES IN THE COMPANY\'S DISCUSSION ON CASHFLOWS ACROSS FISCAL YEARS. PROVIDE THE SIMILARITIES AND DIFFERENCES IN SEPARATE TITLED SECTIONS WITH PROPER MARKDOWN HEADINGS AND BULLET POINTS.INCLUDE THE TICKER, FOLLOWED BY THE RANGE OF FISCAL YEARS USED TO GENERATE THIS ANALYSIS IN THE TITLE. FOR EXAMPLE: TICKER: CPB, FISCAL YEARS: 2015-2020') CASH_FLOW_DISCUSSION,
FROM
    TranscriptData
WHERE
    PRIMARY_TICKER IN ('CPB', 'GIS', 'HRL', 'HSY', 'KO', 'PEP', 'TSN', 'KMB')
    AND 
    AI_FILTER(PROMPT('Does the transcript, \'{0}\', discuss cash flows?', SPEAKER_TEXT_WITH_FISCAL_YEAR))
GROUP BY PRIMARY_TICKER
ORDER BY PRIMARY_TICKER ASC;

## Cash Flow Discussion Similarities & Difference for Hershey for years 2018-2024

# HSY: HERSHEY CO, FISCAL YEARS: 2018-2025
## Similarities
* The company consistently prioritizes investing in its brands, capabilities, and people to drive growth and maintain its competitive advantage across fiscal years.
* Hershey's has a strong track record of delivering balanced top and bottom line growth, with a focus on increasing net sales, gross profit, and adjusted earnings per share.
* The company has a history of returning cash to shareholders through dividends and share repurchases, with a commitment to maintaining a healthy dividend payout ratio.
* Strong cash flow generation is a common theme across fiscal years, with the company emphasizing its ability to generate significant cash flow to reinvest in the business and return to shareholders.
* The importance of price realization and productivity savings in maintaining profitability is highlighted across fiscal years.

## Differences
* The impact of COVID-19 on the company's operations and financial performance is a significant difference between fiscal years, with the company experiencing challenges in 2020 due to the pandemic, but not in later years.
* The company's financial performance has been impacted by various factors, including supply chain disruptions, changes in consumer behavior, and fluctuations in gross margin.
* The company has made acquisitions, such as the purchase of Dots and Pretzels, to expand its portfolio and drive growth in the snacking category in some years.
* The company's capital spending outlook has changed, with a revised outlook of $400,000,000 to $450,000,000 in 2020, compared to $800,000,000 to $900,000,000 in 2023.
* The company's approach to investing in the business has evolved, with a greater emphasis on digital infrastructure and capabilities in later years, compared to a focus on capacity expansion and ERP implementations in earlier years.

## Yearly Cash Flow Analysis for Each Ticker

In [None]:
-- Use the Transient table to Run your queries.  
-- 
-- Yearly Look at Cash Flow Discussion
-- Define the CTE first
-- 
WITH TranscriptData AS (
    SELECT
        PRIMARY_TICKER,
        FISCAL_YEAR,
        CONCAT('FISCAL YEAR: ', FISCAL_YEAR, ' TICKER: ', PRIMARY_TICKER, ' TRANSCRIPT: ', T2.VALUE:"text"::VARCHAR) AS SPEAKER_TEXT
    FROM
        DEMODB.EQUITY_RESEARCH.COMPANY_EVENT_TRANSCRIPT_TTBL AS TA,
        LATERAL FLATTEN(INPUT => TA.TRANSCRIPT) AS T2
)
-- Now select from the CTE where the alias is recognized
SELECT
    PRIMARY_TICKER,
    FISCAL_YEAR,
    AI_AGG(SPEAKER_TEXT, 'SUMMARIZE THE COMPANY\'S DISCUSSION ON CASHFLOWS. Provide the output in a clearly formatted markdown with bullet points. Start the summary with a title - \'Cash Flow Discussion Summary: \' - marked down as heading level 3. Include the Ticker and the Fiscal Year as Part of the title.') CASH_FLOW_DISCUSSION,
FROM
    TranscriptData
WHERE
    -- EVENT_TIMESTAMP >= TO_DATE('01-01-2025', 'MM-DD-YYYY')
    -- AND 
    PRIMARY_TICKER IN ('CPB', 'GIS', 'HRL', 'HSY', 'KO', 'PEP', 'TSN', 'KMB')
    AND AI_FILTER(PROMPT('Does the transcript, \'{0}\', discuss cash flows?', SPEAKER_TEXT))
-- GROUP BY PRIMARY_TICKER;
GROUP BY ALL
ORDER BY PRIMARY_TICKER, FISCAL_YEAR ASC;

## Cash Flow Analysis of Comments Made by Pepsi's Executives on Fiscal Year 2023

### Cash Flow Discussion Summary: PEP - Fiscal Year 2023
* The company has been discussing its cash flow profile and capital investments for several years, with a focus on returning to higher levels of free cash flow conversion.
* The new CFO sees opportunities to speed up the process of cash flow generation, but acknowledges that it may take time.
* The company has been intentional about its capital investments, including catching up on capacity and investing in IT and digitalization.
* The level of capital expenditures (CapEx) as a percent of sales is expected to trend down over time, which will help improve cash flow conversion.
* Cash flow remains a priority for the company, with a focus on improving conversion rates.