# A Sector-Conditioned Relative Potential Strategy  
## Canadian Basic Materials — A Literate Jupyter Notebook

---

## Overview

**Objective:**  
Identify Canadian Basic Materials stocks with unrealized upside **conditional on the sector being expected to rise**.

**Strategy logic:**

1. Forecast next-day prices using an **LSTM**
2. Infer **sector direction** from the average forecast
3. Select **lagging stocks** whose forecast is below the sector average

This notebook interleaves explanation with runnable code.

---

## 1. Imports and Environment Setup


In [1]:
import numpy as np
import pandas as pd
import yfinance as yf
from yfinance import EquityQuery

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader


## 2. Universe Definition: Canadian Basic Materials

We define the investment universe using Yahoo Finance’s screener:
- Region: Canada
- Sector: Basic Materials
- Exchanges: TSX, TSX-V, NEO, CSE


In [2]:
def get_ca_basic_materials_symbols(
    exchanges=("TOR","VAN","CNQ","NEO"),
    page_size=250,
    max_pages=50
):
    q = EquityQuery("and", [
        EquityQuery("eq", ["region", "ca"]),
        EquityQuery("eq", ["sector", "Basic Materials"]),
        EquityQuery("is-in", ["exchange", *exchanges]),
    ])

    symbols, offset = [], 0
    for _ in range(max_pages):
        resp = yf.screen(q, size=page_size, offset=offset)
        quotes = resp.get("quotes", []) or resp.get("finance", {}).get("result", [{}])[0].get("quotes", [])
        if not quotes:
            break
        symbols.extend([row["symbol"] for row in quotes if row.get("symbol")])
        offset += page_size

    return list(dict.fromkeys(symbols))

symbols = get_ca_basic_materials_symbols()
print(f"Universe size: {len(symbols)}")
symbols[:10]


Universe size: 1385


['ZON.V',
 'ZNX.V',
 'ZNG.V',
 'ZLTO.V',
 'ZIGY.CN',
 'ZFR.V',
 'ZEUS.CN',
 'ZCC-H.V',
 'ZBNI.V',
 'ZAU.V']

## 3. Download One Year of Price Data

We download adjusted close prices for the past year and normalize them into
a long-format DataFrame.


In [3]:
start = (pd.Timestamp.today() - pd.DateOffset(months=12)).strftime("%Y-%m-%d")
end   = pd.Timestamp.today().strftime("%Y-%m-%d")

raw = yf.download(
    tickers=symbols,
    start=start,
    end=end,
    auto_adjust=True,
    group_by="column",
    progress=False
)


def to_prices_long(raw_df):
    raw_df.index.name = "Date"

    if isinstance(raw_df.columns, pd.MultiIndex):
        close = raw_df["Close"]
    else:
        close = raw_df[["Close"]]

    return (
        close.reset_index()
             .melt(id_vars="Date", var_name="ticker", value_name="Close")
             .dropna()
    )

prices_long = to_prices_long(raw)
prices_long.head()


Unnamed: 0,Date,ticker,Close
242,2026-01-20,AAG.V,1.22
243,2026-01-21,AAG.V,1.2
244,2026-01-22,AAG.V,1.29
245,2026-01-23,AAG.V,1.31
246,2026-01-26,AAG.V,1.24


## 4. Modeling Philosophy

We forecast **prices**, not returns, using an LSTM.  
Cross-sectional logic is applied **after** forecasting.

The model learns temporal structure; relative value logic handles selection.

## 5. LSTM Architecture

An LSTM processes sequential price data and maintains an internal memory,
allowing it to capture momentum, regime, and mean-reversion dynamics.


In [4]:
class LSTM1D(nn.Module):
    def __init__(self, hidden=64, layers=2):
        super().__init__()
        self.lstm = nn.LSTM(
            input_size=1,
            hidden_size=hidden,
            num_layers=layers,
            batch_first=True,
            dropout=0.2
        )
        self.fc = nn.Linear(hidden, 1)

    def forward(self, x):
        out, _ = self.lstm(x)
        return self.fc(out[:, -1]).squeeze(-1)


## 6. Sequence Construction

Each observation uses the **previous 30 trading days** to predict the next day.


In [7]:
SEQ_LEN = 30
HORIZON = 1

def make_sequences(df):
    X, y, dates, tickers = [], [], [], []

    for tkr, g in df.groupby("ticker"):
        g = g.sort_values("Date")
        s = g["Close_s"].values
        d = g["Date"].values

        if len(s) < SEQ_LEN + HORIZON:
            continue

        for i in range(len(s) - SEQ_LEN):
            X.append(s[i:i+SEQ_LEN])
            y.append(s[i+SEQ_LEN])
            dates.append(d[i+SEQ_LEN])
            tickers.append(tkr)

    return (
        np.array(X)[..., None].astype(np.float32),
        np.array(y).astype(np.float32),
        np.array(dates),
        np.array(tickers),
    )


## 7. Time-Based Train / Test Split

We train on ~11 months and evaluate on the most recent month.


In [8]:
prices_long["Date"] = pd.to_datetime(prices_long["Date"])
cutoff = prices_long["Date"].max() - pd.DateOffset(months=1)

train_df = prices_long[prices_long["Date"] <= cutoff]
test_df  = prices_long[prices_long["Date"] >  cutoff]


## 8. Scaling (Per Ticker, Train Only)

All scaling is done using **training data only** to prevent leakage.


In [9]:
scalers = (
    train_df.groupby("ticker")["Close"]
            .agg(["mean", "std"])
            .replace(0, 1)
)

def scale(df):
    df = df.join(scalers, on="ticker")
    df["Close_s"] = (df["Close"] - df["mean"]) / df["std"]
    return df.dropna()

train_s = scale(train_df)
test_s  = scale(test_df)


## 9. Model Training


In [10]:
X_train, y_train, *_ = make_sequences(train_s)

model = LSTM1D()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.MSELoss()

loader = DataLoader(
    list(zip(torch.tensor(X_train), torch.tensor(y_train))),
    batch_size=64,
    shuffle=True
)

for epoch in range(10):
    for xb, yb in loader:
        optimizer.zero_grad()
        loss = loss_fn(model(xb), yb)
        loss.backward()
        optimizer.step()


## 10. Generate Predictions


In [11]:
X_test, y_test, d_test, t_test = make_sequences(
    pd.concat([
        train_s.groupby("ticker").tail(SEQ_LEN),
        test_s
    ])
)

with torch.no_grad():
    yhat = model(torch.tensor(X_test)).numpy()

mu = scalers.loc[t_test, "mean"].values
sd = scalers.loc[t_test, "std"].values

y_pred = yhat * sd + mu
y_true = y_test * sd + mu


## 11. Sector Direction Signal

We compute predicted returns and take the **cross-sectional average**.


In [12]:
preds = pd.DataFrame({
    "Date": pd.to_datetime(d_test),
    "ticker": t_test,
    "y_pred": y_pred,
    "y_true": y_true
}).sort_values(["ticker", "Date"])

preds["prev_close"] = preds.groupby("ticker")["y_true"].shift(1)
preds["pred_ret"] = (preds["y_pred"] - preds["prev_close"]) / preds["prev_close"]
preds = preds.dropna()

preds["sector_avg_pred_ret"] = preds.groupby("Date")["pred_ret"].transform("mean")


## 12. Relative Potential Signal

A stock’s **relative gap** measures how far below the sector average it is.


In [13]:
preds["rel_gap"] = preds["pred_ret"] - preds["sector_avg_pred_ret"]


## 13. Surface Laggards When Sector Is Up


In [17]:
laggards = (
    preds[preds["sector_avg_pred_ret"] > 0]
        .sort_values("rel_gap")
        .groupby("Date")
        .head(10)
)

laggards[["Date", "ticker", "pred_ret", "rel_gap"]].head(10)


Unnamed: 0,Date,ticker,pred_ret,rel_gap
21399,2026-01-02,RCT.V,-0.465654,-0.467644
21247,2026-01-02,RAK.V,-0.247814,-0.249804
1124,2026-01-02,ALTN.V,-0.182411,-0.184401
27950,2026-01-02,WGF.V,-0.146763,-0.148753
16285,2026-01-02,MMN.V,-0.141804,-0.143794
6163,2026-01-02,CRVC-X.CN,-0.137938,-0.139928
16703,2026-01-02,MSC.V,-0.136518,-0.138509
24046,2026-01-02,SOI.V,-0.135344,-0.137335
18684,2026-01-02,NWI.CN,-0.131133,-0.133123
22511,2026-01-02,RXM.CN,-0.127418,-0.129408


In [19]:
if "laggards" not in globals() or laggards.empty:
    raise RuntimeError("No laggards found. Ensure the previous cell ran successfully.")

# --- Per-day comparison ---
laggards_vs_sector_daily = (
    laggards
    .groupby("Date")
    .agg(
        laggards_avg_pred_ret=("pred_ret", "mean"),
        sector_avg_pred_ret=("sector_avg_pred_ret", "mean"),
        n_laggards=("ticker", "count"),
    )
    .reset_index()
)

laggards_vs_sector_daily["gap_laggards_vs_sector"] = (
    laggards_vs_sector_daily["laggards_avg_pred_ret"]
    - laggards_vs_sector_daily["sector_avg_pred_ret"]
)

print("Per-day comparison (laggards vs sector average):")
display(laggards_vs_sector_daily.head(10))


# --- Aggregate summary across the test period ---
summary = laggards_vs_sector_daily.agg({
    "laggards_avg_pred_ret": ["mean", "median"],
    "sector_avg_pred_ret": ["mean", "median"],
    "gap_laggards_vs_sector": ["mean", "median"],
    "n_laggards": "mean",
})

summary.index = ["_".join(col).strip() for col in summary.index.to_flat_index()]

print("\nAggregate summary across test period:")
display(summary)


# --- Sanity check: how often are laggards actually below the sector average? ---
sanity = (
    laggards
    .assign(below_sector = laggards["pred_ret"] < laggards["sector_avg_pred_ret"])
    .groupby("Date")["below_sector"]
    .mean()
    .rename("pct_laggards_below_sector")
    .reset_index()
)

print("\nSanity check: % of selected laggards actually below sector average:")
display(sanity.head(10))


# --- Final scalar you can quote ---
avg_gap = laggards_vs_sector_daily["gap_laggards_vs_sector"].mean()

print(
    f"\nOn average, selected laggards are "
    f"{avg_gap:.4%} BELOW the sector's predicted return "
    f"on days when the sector trend is positive."
)


Per-day comparison (laggards vs sector average):


Unnamed: 0,Date,laggards_avg_pred_ret,sector_avg_pred_ret,n_laggards,gap_laggards_vs_sector
0,2026-01-02,-0.18528,0.00199,10,-0.18727



Aggregate summary across test period:


Unnamed: 0,laggards_avg_pred_ret,sector_avg_pred_ret,gap_laggards_vs_sector,n_laggards
m_e_a_n,-0.18528,0.00199,-0.18727,10.0
m_e_d_i_a_n,-0.18528,0.00199,-0.18727,



Sanity check: % of selected laggards actually below sector average:


Unnamed: 0,Date,pct_laggards_below_sector
0,2026-01-02,1.0



On average, selected laggards are -18.7270% BELOW the sector's predicted return on days when the sector trend is positive.


## 14. Interpretation

- Sector average > 0 → **risk-on regime**
- Negative relative gap → **has not caught up**
- Strategy exploits **cross-sectional convergence**, not perfect prediction

The model only needs to be directionally consistent across the group.

## 15. Summary

This notebook demonstrates a complete pipeline:
- LSTM-based price forecasting
- Sector-level regime detection
- Relative-value stock selection

> The key insight: *When the sector is expected to rise, laggards are potential opportunities.*

Extensions include:
- volatility normalization
- transaction cost modeling
- long–short portfolios
- rolling backtests
