# ECON 326 Group Final Report
#### Group 14 - Bhavya Dubey, Kashie Ugoji, Ruiquin Wang, Zhuoying Sun

## <u>Introduction<u>

When looking at the current nature of the financial markets, it is clear how much influence they have in our society. With around 63% of young adults (ages 18-34) **(remember to cite)** believing in building their wealth through the stock exchange, it shows how much of a focal point this aspect of our economy is for many people. More than ever, the ability to understand how the markets work and components that might affect it, is becoming an invaluable skill. With so much information readily available, it can sometimes feel like there is an oversaturated level of things that people must learn in order to form even a basic understanding of expected stock market performance. However, one of the basic things that people quickly realize, is that the collective sentiment of consumers and investors ultimately has a major influence on how markets perform.  

One of the phrases financial experts will say to people is to not "panic" during economic uncertainty as this may end up being the very origin of economic downturn. But how true is this claim? What if panicking and overall sentiments of fear are simply reactions to already existing economic instability caused by other factors, rather than potential indicators of future downturn in itself. For this research study, we explore this very topic. We are looking to explore the predictive nature that overall sentiments towards the economy can have on the stock market. More specifically, we want to see if **people's feelings of fear or uncertainty towards the economy can accurately predict stock market performance**. 

## <u>Data Description<u>

To conduct this analysis, we will be looking at data that is able to accurately quantify both the sentiment people have towards the economy, and also how stocks will be performing. 

### Fear Index Wrangling

We will quantify the **sentiment of "fear"** that people have towards the economy through **search-term data from Google Trends**. Google Trends uses relative popularity scores ranging from 0-100 in order to quantify how much a term was searched on google related platforms. A **popularity score of 100** would represent the peak amount a term was searched relative to its own history, and a **popularity score of 0** would represent the lowest amount a term was searched relatively. We will use the search terms **"recession"**, **"layoffs"** and **"stock crash"** spanning from **January 2004** (farthest timeline possible for Google Trends) to **April 2025** showing their monthly popularity scores as markers of fears. We chose these search terms as they are common terms people might search up when they are skeptical or worried about the economy. Using the `tidyverse` library, we will then make our own **"fear index"** where we will find the **mean** value of these collective popularity scores and use that as our explanatory variable for sentiment of fear towards the economy. 

We first load the `tidyverse` library

In [1]:
library(tidyverse)

“package ‘lubridate’ was built under R version 4.4.2”
── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


Now we can import our data from **Google Trends** with the files `layoffs.csv`, `recession.csv` and `stock crash.csv`

In [2]:
#Raw data for search-term "layoffs" (Jan 2004- April 2025)
layoffs_untidy <- read_csv("/home/jupyter/ECON-326-Final-Project/layoffs.csv", 
                    skip = 2)
#Raw data for search-term "recession" (Jan 2004- April 2025)
recession_untidy <- read_csv("/home/jupyter/ECON-326-Final-Project/recession.csv", 
                      skip = 2) 
#Raw data for search-term "stock crash" (Jan 2004- April 2025)
stock_crash_untidy <- read_csv("/home/jupyter/ECON-326-Final-Project/stock_crash.csv", 
                        skip = 2)

#Cleaning search-term data so that it is in a suitable format for future modelling 
stock_crash_tidy <- stock_crash_untidy |>
  mutate(term_date1 = ym(Month)) |>
  select(term_date1, `stock crash: (Canada)`) 

recession_tidy <- recession_untidy |>
  mutate(term_date2 = ym(Month)) |>
  select(term_date2, `recession: (Canada)`)

layoffs_tidy <- layoffs_untidy |>
  mutate(term_date3 = ym(Month)) |>
  select(term_date3, `layoffs: (Canada)`) 


[1mRows: [22m[34m256[39m [1mColumns: [22m[34m2[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (1): Month
[32mdbl[39m (1): layoffs: (Canada)

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m256[39m [1mColumns: [22m[34m2[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (1): Month
[32mdbl[39m (1): recession: (Canada)

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m256[39m [1mColumns: [22m[34m2[39m
[36m──[39m [1mColumn specification[22m [36m───────────────────────────────

With the cleaned data, we can make our **fear index**, showing it in through the variable `fear_value`

In [3]:
#Creating fear index
fear_index <- cbind(stock_crash_tidy, recession_tidy, layoffs_tidy) |>
  mutate(term_date = term_date1) |>
  select(term_date, `stock crash: (Canada)`, 
         `recession: (Canada)`,
         `layoffs: (Canada)`) |>
  mutate(fear_value = (`stock crash: (Canada)` + `recession: (Canada)` +
                         `layoffs: (Canada)`) / 3) 
head(fear_index)

Unnamed: 0_level_0,term_date,stock crash: (Canada),recession: (Canada),layoffs: (Canada),fear_value
Unnamed: 0_level_1,<date>,<dbl>,<dbl>,<dbl>,<dbl>
1,2004-01-01,24,3,6,11.0
2,2004-02-01,14,0,8,7.333333
3,2004-03-01,23,0,8,10.333333
4,2004-04-01,36,4,8,16.0
5,2004-05-01,28,3,5,12.0
6,2004-06-01,18,0,0,6.0


We will now make a new understanding of how to interpret these new values for `fear_value` using `quantile` to look at the percentiles of data

In [4]:
fear_metrics <- quantile(fear_index$fear_value, 
                         probs = c(0.25, 0.5, 0.75, 0.90, 0.95, 0.99)) 
fear_metrics

Based on the data, our interpretation for the **fear index** will go as follows: 

- **0-20**  is a low fear value
- **20-40** is a moderate fear value
- **40-50** is a high fear value
- **50+**   is an extremely high fear value


### Stock Performance Wrangling

We will quantify **stock performance** by looking at the **growth rate** of the overall stock market. In this study we will use the **S&P/TSX Composite Index** as our benchmark for the overall stock market, tracking **monthly** growth data in order to see how the market is performing. To view this, we will use data from **yahoo finance** through the library `quantmod`, which gives time-series data on any stock found on the market. Then we will use the `tidyverse` library to further wrangle and clean up the data.

First let's load the `quantmod` library

In [5]:
library(quantmod)

Loading required package: xts

Loading required package: zoo


Attaching package: ‘zoo’


The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric



#                                                                             #
# The dplyr lag() function breaks how base R's lag() function is supposed to  #
# work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
# source() into this session won't work correctly.                            #
#                                                                             #
# Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
# conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
# dplyr from breaking base R's lag() function.                                #
#                                                                             #
# Code in packages is not affected. It's protected by R's namespace mechanism #
#                      

Now we import our stock performance data, keeping it in **monthly** increments and using **closing** values

In [6]:
#Raw data
getSymbols("^GSPTSE", src = "yahoo", from = "2004-01-01", to = "2025-04-02")
stock_monthly_untidy <- to.monthly(GSPTSE, indexAt = "firstof", OHLC = TRUE) 

#Cleaning data 
stock_monthly_tidy <- fortify.zoo(stock_monthly_untidy) |>
  mutate(Date = Index) |>
  select(Date, GSPTSE.Close)
head(stock_monthly_tidy)

“^GSPTSE contains missing values. Some functions will not work if objects contain missing values in the middle of the series. Consider using na.omit(), na.approx(), na.fill(), etc to remove or replace them.”


“missing values removed from data”


Unnamed: 0_level_0,Date,GSPTSE.Close
Unnamed: 0_level_1,<date>,<dbl>
1,2004-01-01,8521.4
2,2004-02-01,8788.5
3,2004-03-01,8585.9
4,2004-04-01,8244.0
5,2004-05-01,8417.3
6,2004-06-01,8545.6


We convert these price values into **growth rates** so we can see how much the stock price is changing each month

In [7]:
#Growth rates in percentages
stock_monthly_growth <- stock_monthly_tidy |>
  mutate(stock_growth = 
           ((GSPTSE.Close - lag(GSPTSE.Close)) / lag(GSPTSE.Close)) * 100)
head(stock_monthly_growth) 

Unnamed: 0_level_0,Date,GSPTSE.Close,stock_growth
Unnamed: 0_level_1,<date>,<dbl>,<dbl>
1,2004-01-01,8521.4,
2,2004-02-01,8788.5,3.134457
3,2004-03-01,8585.9,-2.305281
4,2004-04-01,8244.0,-3.982115
5,2004-05-01,8417.3,2.102133
6,2004-06-01,8545.6,1.524239


### Combining The Two Variables 

Now that we have properly obtained our data for both variables, we will combine the two variables within the same dataset, **lagging** our `fear_value` by **1 month**. This is in order to avoid instances of **reverse causality** so that we can see if `fear_value` is actually **leading** to changes in the stock market performance rather than the other way around. 

In [8]:
#Combine fear index with stock performance data, lag the values by 1 month
stock_fear_data_untidy <- cbind(stock_monthly_growth, fear_index) |>
  select(Date, stock_growth, fear_value) |>
  filter(!is.na(stock_growth)) |>
  mutate(lagged_fear_value = lag(fear_value, n = 1))  

#Clean the Data By Removing NAs 
stock_fear_data_tidy <- stock_fear_data_untidy |>
  filter(!is.na(lagged_fear_value)) |>
  select(Date, stock_growth, lagged_fear_value) 
head(stock_fear_data_tidy) 

Unnamed: 0_level_0,Date,stock_growth,lagged_fear_value
Unnamed: 0_level_1,<date>,<dbl>,<dbl>
1,2004-03-01,-2.3052809,7.333333
2,2004-04-01,-3.9821146,10.333333
3,2004-05-01,2.1021325,16.0
4,2004-06-01,1.5242395,12.0
5,2004-07-01,-1.0239188,6.0
6,2004-08-01,-0.9588396,0.0


## <u>Summary Statistics<u> 

## <u>Model Specification<u>

## <u>Table of Results<u>

## <u>Discussion<u>

## <u>Specification Check<u>

## <u>Robustness Analysis<u>

## <u>Conclusion<u>

## <u>References<u>