# Implementing the Sahm Rule Recession Indicator in Snowflake 

# What's the Sahm rule? 
1. [Claudia Sahm](https://en.wikipedia.org/wiki/Claudia_Sahm) is an American Economist who worked for the United States [Federal Reserve Board of Governors](https://www.federalreserve.gov/) when she developed this rule named after her.
2. Sahm rule [recession indicator](https://en.wikipedia.org/wiki/Sahm_rule) is a heuristic measure for determining when an economy has entered a recession.  
3. You can also read about the more recent application of Sahm rule in [this post](https://live.barcap.com/publiccp/RSR/nyfipubs/barcap-email-mkting/html/eagle-eye-newsletter/2024-10-03-eagle-eye-website.html#section1-chart1) by [Barclays Investment Bank](https://www.ib.barclays/)

# Components of the Sahm Rule
### 1. The Sahm rule indicator is triggered when the unemployment rate goes *half a percentage point above its 12-month low*.
### 2. Another version of this rule looks at the three-month moving average of the unemployment rate.
### 3. The Sahm rule does work with a lag. 
### 4. It has correctly identified all recessions since 1970, performing better than other indicators.

# Create a stage in Snowflake to hold the data.
## The data was downloaded from the United States Bureau of Labor Statistics
### You can use the U.S. BLS API to download the data.  

In [None]:
CREATE STAGE US_UNEMPLOYMENT_RAW_DATA_STAGE 
	DIRECTORY = ( ENABLE = true );

# Instructions to upload the unemployment rate data files into the stage.
- Copy the unemployment rate JSON data files into the Snowflake internal stage - US_UNEMPLOYMENT_RAW_DATA_STAGE.
- Follow the [following steps](https://docs.snowflake.com/en/user-guide/data-load-local-file-system-stage-ui#upload-files-onto-a-named-internal-stage) in Snowsight.  
### 1. Sign in to Snowsight.
### 2. Select Data » Add Data.
### 3. On the Add Data page, select Load files into a Stage.
### 4. In the Upload Your Files dialog that appears, select the files that you want to upload. You can upload multiple files at the same time.
### 5. Select the database schema in which you created the stage, then select the stage.
### 6. Optionally, select or create a path where you want to save your files within the stage.
### 7. Select Upload.

### You can download all the U.S. Unemployment Data JSON files from [Github here](https://github.com/rrprasan/Finance/tree/main/Snowflake/Notebooks/Economics)

In [None]:
LIST @US_UNEMPLOYMENT_RAW_DATA_STAGE;

# Create a table with one column of data type VARIANT.
### 1. The JSON document in each file will be stored in the VARIANT column. 
### 2. We will have six rows in the table.  

In [None]:
CREATE OR REPLACE TABLE US_UNEMPLOYMENT_RATE_DATA_TBL 
(
    UNEMPLOYMENT_RATE_JSON VARIANT
);

# Load data into the US_UNEMPLOYMENT_RATE_DATA_TBL table. 
### 1. We will use Snowflake's [COPY](https://docs.snowflake.com/en/sql-reference/sql/copy-into-table) command. 

In [None]:
COPY INTO US_UNEMPLOYMENT_RATE_DATA_TBL 
FROM 
    @US_UNEMPLOYMENT_RAW_DATA_STAGE 
FILES = (
    'US_Unemployment_Rate_1970_1979.json',
    'US_Unemployment_Rate_1980_1989.json', 
    'US_Unemployment_Rate_1990_1999.json', 
    'US_Unemployment_Rate_2000_2009.json', 
    'US_Unemployment_Rate_2010_2019.json', 
    'US_Unemployment_Rate_2020_2024.json')
FILE_FORMAT = (TYPE = JSON);

## Example: Query the JSON Data in the US_UNEMPLOYMENT_RATE_DATA_TBL Table
### 1. We use the [LATERAL](https://community.snowflake.com/s/article/Dynamically-extract-multi-level-JSON-object-using-lateral-flatten) [FLATTEN](https://docs.snowflake.com/en/sql-reference/functions/flatten) SQL functions. 
### 2. [FLATTEN](https://docs.snowflake.com/en/sql-reference/functions/flatten) can be used to convert semi-structured data to a relational representation.

In [None]:
SELECT
    ft.VALUE:"periodName"::STRING UNEMPLOYMENT_DATA_MONTH,
    ft.VALUE:"year"::NUMBER UNEMPLOYMENT_DATA_YEAR,
    ft.VALUE:"value"::NUMBER(5,2) UNEMPLOYMENT_RATE
FROM 
    US_UNEMPLOYMENT_RATE_DATA_TBL UNER,
    LATERAL FLATTEN (input => parse_json(UNER.UNEMPLOYMENT_RATE_JSON:"data")) ft
ORDER BY UNEMPLOYMENT_DATA_YEAR ASC;

# US RECESSIONS 1970 - 2024
## In the US, the [Business Cycle Dating Committee](https://www.nber.org/research/business-cycle-dating) of the National Bureau of Economic Research ([NBER](https://www.nber.org/)) is reponsible for dating recessions.
## These are the official recession dates. 
## We will check the performance of our query against these dates. 
### 1. December 1969 - November 1970 (11 Months)
### 2. November 1973 – March 1975    (1 year 4 months)
### 3. January  1980 – July 1980     (6 months)
### 4. July     1981 – November 1982 (1 year 4 months)
### 5. July     1990 – March 1991    (8 months)
### 6. March    2001 – November 2001 (8 months)
### 7. December 2007 – June 2009     (1 year 6 months)
### 8. February 2020 – April 2020    (2 months)

> Source: [Wikipedia](https://en.wikipedia.org/wiki/List_of_recessions_in_the_United_States)


# Implement Sahm Rule in Snowflake Using a SQL Query.
## 


# Explaining the Query Using [Snowflake Copilot](https://docs.snowflake.com/en/user-guide/snowflake-copilot)
### Although I wrote this query, explaining how it works can be time-consuming. So, naturally, I used AI to help me with this task.
### I used Snowflake Copilot, which is built into the Snowflake Snowsight, to explain this query. Snowflake Copilot's explanation was detailed and thorough. 
### Here's what [Snowflake Copilot](https://docs.snowflake.com/en/user-guide/snowflake-copilot) said about the query:
This query is designed to analyze unemployment rate data and determine if a recession is imminent. Here's a breakdown of how it works:

- The query first creates a subquery that extracts unemployment data from a JSON object stored in the US_UNEMPLOYMENT_RATE_DATA_TBL table. The subquery uses the FLATTEN function to parse the JSON object and extract the relevant data, including the month and year of the unemployment data, the unemployment rate, and a formatted date that combines the month and year.
- The outer query then selects the extracted data from the subquery and calculates the minimum and maximum unemployment rates for the current row and the 11 preceding rows (i.e., the past year). It also calculates the difference between the maximum and minimum unemployment rates and checks if this difference is greater than or equal to 0.5. If it is, and the current unemployment rate is greater than the unemployment rate from 11 rows ago (i.e., a year ago), the query returns the string 'RECESSION IMMINENT'. Otherwise, it returns 'NO RECESSION'.
In summary, this query uses window functions and conditional logic to analyze unemployment rate data and determine if a recession is imminent based on certain criteria.
### This is a great example of where AI can make you more productive.
### AI does not absolve us from knowing and understanding the technical details. Nor does it absolve you from mastering the business domain of your choice. It's still extremely important for humans to put pen to paper and practice writing code or learning a concept by doing.

## In this implementation of the query the Sahm rule is triggered when the unemployment rate goes *half a percentage point above its 12-month low*.

### Although the query results agrees with the recessions, the starting and ending period differs by a few months. 
### Yet, this is a good tool in any economist's toolkit. 
### You can take this data and the query and improve on it.  

# Definitions of Various SQL Functions Used. 
## Snowflake Copilot Generated these Definitions. 

Here are the definitions of the SQL functions used in the query:

- MIN: Returns the minimum value of a specified column.
- MAX: Returns the maximum value of a specified column.
- LAG: Returns the value of a specified column from a previous row.
- IFF: Returns one of two values depending on whether a specified condition is true or false.
- TO_DATE: Converts a string to a date data type.
- FLATTEN: Extracts data from a semi-structured or nested data structure, such as a JSON object, and returns it as a set of rows.
- For more information on window functions in Snowflake, you can refer to the [Window function syntax and usage](https://docs.snowflake.com/en/sql-reference/functions-window-syntax) documentation.

In [None]:
SELECT
    UNEMPLOYMENT_DATA_MONTH,
    UNEMPLOYMENT_DATA_YEAR,
    IFF((MAX(UNEMPLOYMENT_RATE) OVER (ORDER BY UNEMPLOYMENT_DATA_YEAR ASC ROWS BETWEEN 11 PRECEDING AND CURRENT ROW)) - (MIN(UNEMPLOYMENT_RATE) OVER (ORDER BY UNEMPLOYMENT_DATA_YEAR ASC ROWS BETWEEN 11 PRECEDING AND CURRENT ROW)) >= 0.5 AND (UNEMPLOYMENT_RATE - LAG(UNEMPLOYMENT_RATE, 11, NULL) OVER (ORDER BY MONTH_YEAR ASC)) > 0, 'RECESSION IMMINENT', 'NO RECESSION' ) IS_RECESSION,
    MONTH_YEAR,
    UNEMPLOYMENT_RATE,
    MIN(UNEMPLOYMENT_RATE) OVER (ORDER BY UNEMPLOYMENT_DATA_YEAR ASC ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) MIN_UNEMPLOYMENT_RATE,
    MAX(UNEMPLOYMENT_RATE) OVER (ORDER BY UNEMPLOYMENT_DATA_YEAR ASC ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) MAX_UNEMPLOYMENT_RATE,
FROM
(SELECT
    ft.VALUE:"periodName"::STRING UNEMPLOYMENT_DATA_MONTH,
    ft.VALUE:"year"::NUMBER UNEMPLOYMENT_DATA_YEAR,
    ft.VALUE:"value"::NUMBER(5,2) UNEMPLOYMENT_RATE,
    TO_DATE(ft.VALUE:"periodName"::STRING || '-' || ft.VALUE:"year"::STRING, 'MON-YYYY') MONTH_YEAR
FROM 
    US_UNEMPLOYMENT_RATE_DATA_TBL UNER,
    LATERAL FLATTEN (input => parse_json(UNER.UNEMPLOYMENT_RATE_JSON:"data")) ft)

# Find the maximum unemployment rate during each recession.

In [None]:
SELECT
    UNEMPLOYMENT_DATA_YEAR,
    MAX(UNEMPLOYMENT_RATE) MAX_UNEMPLOYMENT_RATE
FROM
(SELECT
    UNEMPLOYMENT_DATA_MONTH,
    UNEMPLOYMENT_DATA_YEAR,
    MONTH_YEAR,
    UNEMPLOYMENT_RATE,
    MIN(UNEMPLOYMENT_RATE) OVER (ORDER BY UNEMPLOYMENT_DATA_YEAR ASC ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) MIN_UNEMPLOYMENT_RATE,
    MAX(UNEMPLOYMENT_RATE) OVER (ORDER BY UNEMPLOYMENT_DATA_YEAR ASC ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) MAX_UNEMPLOYMENT_RATE,
    IFF((MAX(UNEMPLOYMENT_RATE) OVER (ORDER BY UNEMPLOYMENT_DATA_YEAR ASC ROWS BETWEEN 11 PRECEDING AND CURRENT ROW)) - (MIN(UNEMPLOYMENT_RATE) OVER (ORDER BY UNEMPLOYMENT_DATA_YEAR ASC ROWS BETWEEN 11 PRECEDING AND CURRENT ROW)) >= 0.5 AND (UNEMPLOYMENT_RATE - LAG(UNEMPLOYMENT_RATE, 11, NULL) OVER (ORDER BY MONTH_YEAR ASC)) > 0, 'RECESSION IMMINENT', 'NO RECESSION' ) IS_RECESSION,
FROM
(SELECT
    ft.VALUE:"periodName"::STRING UNEMPLOYMENT_DATA_MONTH,
    ft.VALUE:"year"::NUMBER UNEMPLOYMENT_DATA_YEAR,
    ft.VALUE:"value"::NUMBER(5,2) UNEMPLOYMENT_RATE,
    TO_DATE(ft.VALUE:"periodName"::STRING || '-' || ft.VALUE:"year"::STRING, 'MON-YYYY') MONTH_YEAR
FROM 
    US_UNEMPLOYMENT_RATE_DATA_TBL UNER,
    LATERAL FLATTEN (input => parse_json(UNER.UNEMPLOYMENT_RATE_JSON:"data")) ft
ORDER BY UNEMPLOYMENT_DATA_YEAR ASC))
WHERE
    IS_RECESSION = 'RECESSION IMMINENT'
GROUP BY UNEMPLOYMENT_DATA_YEAR
ORDER BY UNEMPLOYMENT_DATA_YEAR ASC;

### IF YOU PREFER, RESET YOUR DATABASE. 
## - DROP THE STAGE
## - DROP THE TABLE

In [None]:
DROP STAGE US_UNEMPLOYMENT_RAW_DATA_STAGE;

In [None]:
DROP TABLE US_UNEMPLOYMENT_RATE_DATA_TBL;