# **Predicting Stock Price Movements using VIX, Sentiment, and ARIMA-GARCH Predictions**

## **1. Objective**
The goal of this experiment is to predict **extreme stock price movements (drops/surges)** using a combination of:
- **VIX (Volatility Index)**
- **Stock-Specific Sentiment Analysis**
- **Historical Stock Prices**

We aim to develop a **7-day ahead prediction model** that can be used to assess short-term risk in individual S&P 500 stocks.

---

## **2. Data Features & Structure**

### **2.1 Training Dataset (Using Observed VIX Data)**
For each date **t**, we include:
- **VIX Information:**
  - `VIX (t)`: Current VIX value
  - `VIX (t - 7)`: VIX value 7 days ago
  - `VIX (t + 7) (Observed)`: Actual VIX value 7 days in the future (only used in training)

- **Sentiment Data:**
  - `Sentiment Label (t)`: Positive, Neutral, Negative
  - `Sentiment Score (t)`: Scaled score (-1 to +1)

- **Stock Price Data:**
  - `Lagged Price (t - 7)`: Closing price 7 days ago
  - `Percent Change (t - 7 to t)`: Price movement in the last 7 days

- **Target Variable:**
  - `Target (t + 7)`: 1 if the price drops/surges by a defined threshold in 7 days, 0 otherwise

| Date (t) | Ticker | VIX (t) | VIX (t - 7) | **VIX (t + 7) (Observed)** | Positive score |  Negative Score | Neutral Score | Highest Sentiment Label | Lagged Price (t - 7) | % Change (t - 7 to t) | Target (t + 7) | 
|----------|--------|---------|------------|---------------------------|------------|-----------|--------|--|-----------------|---------------------|--------------|
| 01/01/22 | AAPL   | 20      | 22         | 25                        | 0.2        | 0.5       | 0.3    | Negative           | $150            | -2%                 | 1 (drop)     |
| 01/01/22 | TSLA   | 20      | 22         | 25                        | 0.5        | 0.3       | 0.2    | Positive       |  $1,000          | +3%                 | 0 (no drop)  |

---

### **2.2 Testing Dataset (Using Predicted VIX from ARIMA-GARCH)**
For future predictions, we **replace observed VIX(t+7) with a predicted value** from the ARIMA-GARCH model.

| Date (t) | Ticker | VIX (t) | VIX (t - 7) | **VIX (t + 7) (Predicted)** | Positive score |  Negative Score | Neutral Score | Highest Sentiment Label | Lagged Price (t - 7) | % Change (t - 7 to t) | Target (t + 7) | 
|----------|--------|---------|------------|---------------------------|------------|-----------|--------|--|-----------------|---------------------|--------------|
| 01/02/25 | AAPL   | 18      | 20         | **23 (Predicted)**        | 0.2        | 0.2       | 0.6    | Neutral           | $150            | -2%                 | 0 (no drop)     |
| 01/02/25 | TSLA   | 22      | 24         | **28 (Predicted)**        | 0.1        | 0.8       | 0.1    | Positive       |  $950          | +3%                 | 1 (drop)  |

---

## **3. Data Splitting Strategy**
To ensure proper model evaluation, we **split the dataset** into:
1. **Training Set (2015 - 2022)**
   - Uses **observed VIX(t+7)** for model learning.
   
2. **Validation Set (2023)**
   - Used for hyperparameter tuning.
   
3. **Testing Set (2024 - 2025)**
   - **Uses precomputed VIX predictions for t+7**, not observed values.
   - This simulates **real-world market conditions**.

---

## **4. Model Workflow**
1. **Train Initial ARIMA-GARCH Model (R)**
   - Fit ARIMA-GARCH using **VIX data from 01/01/2022 - 08/01/2024**.
   - Predict **VIX for 08/08/2024** and store it in **DuckDB**.

2. **Run Rolling Retraining for Future Predictions**
   - Every day, **retrain the model using up-to-date data**.
   - Predict **VIX for t+7**.
   - Store **VIX predictions** in **DuckDB**.

3. **Prepare Data in DuckDB**
   - Store **training and testing datasets** in DuckDB for easy retrieval.
   
4. **Train a Machine Learning Model**
   - Input: VIX, Sentiment, and Price data.
   - Output: Binary classification (drop/no drop).
   
5. **Evaluate Performance**
   - Compare accuracy using **observed vs. predicted VIX**.

---

## **5. Key Questions for Validation**
- Does **VIX (t-7)** provide additional predictive power?
- How does **sentiment influence stock-specific movements**?
- Does the model perform well when using **predicted VIX instead of observed VIX**?

---

## **6. Next Steps**
- Store **training dataset in DuckDB**.
- Train & test a **classification model (Logistic Regression, XGBoost, or LSTM)**.
- Validate **performance against unseen data**.

---

### **End of Documentation**