# MSc Artificial Intelligence — Machine Learning  
## Coursework Part 1 (2025/26): Stock Modelling & Prediction

**Release:** Thu 25 September 2025 (Week 2) 17:00 (Europe/London)  
**Due:** Mon 3 Nov 2025 (Week 8) 09:00  
**Feedback by:** Mon 24 Nov 2025  
**Weighting:** **50%** of the module mark

---

## Task
Develop a machine-learning pipeline for stock prediction and package it as a Python class `Student`. The goal is to forecast the next-_H_-day cumulative log return (default: horizon _H_=1); so Target is next-_H_-day cumulative log return $y_t = \log(C_{t+H}/C_t)$. The staff tester will evaluate your model using a leakage-safe, walk-forward procedure. **The tester will use a set of tickers to test your `Student` class by calling your fit and predict methods a few times.**

More specifically, you will:

1. Design a reproducible ML pipeline via data engineering, feature engineering, model/parameter selection. **You may compute features such as past log-returns, technical indicators inside your class**. Please do not redefine the target.

   *Repro notes:* use fixed random seeds; include a `requirements.txt` cell.

2. Evaluate the pipeline across a development ticker universe (XLK, XLP, XLV, XLF, XLE, XLI) via time-aware validation (walk-forward / expanding window). **Justify your decisions with experimental evidence or arguments**. Report Directional Accuracy, MAE, RMSE.  
   *No leakage:* use only information available up to time *t* to predict *t+_h_* (the tester enforces walk-forward fitting).

3. Iterate steps (1)–(2) to select your best pipeline.

4. Implement your best pipeline as `Student` exposing `fit(X_train, y_train, meta)` and `predict(X, meta)` (API below).

5. Create one Jupyter Notebook that documents your exploration and evidence, and **one Python file that contains your final `Student` class**. Staff will test your class with a tool: `mltester` (ML metrics).

## Provided
- `mltester.py` — an evaluator that:
  - loads per-ticker OHLCV from a long file, `prices.csv`,
  - constructs the target \(y_t\) for a chosen horizon `H`,
  - runs expanding-window walk-forward (re-fit every `step` test days),
  - reports Directional Accuracy (DirAcc), MAE, RMSE, and saves per-ticker CSVs + a summary.
- Example `student.py` — minimal baseline showing the required API and simple, causal features.
- Data — you will prototype with a development universe of tickers and download data from Yahoo Finance; final grading uses a held-out universe.

## Quick sanity run (**please see comments below**)

In [None]:

python mltester.py \
  # Names of your python file and the class
  --model ./student.py:Student \
  # Example tickers used in my tests; the actual tickers may be any set of tickers similar to the development tickers.
  --tickers SPY TLT GLD \  
  # Use cached CSV to avoid yfinance rate limits if downloads fail. 
  # this is used to get around Yahoo Finance 'rate limit' warning: if yfinance doesn't work, I will read data from this file.
  --data-file data/prices.csv \ 
  # Example time window (actual window may be different): I will use data in this time window to train models with your 'fit' method and then test the fitted model with your 'predict' method.
  --start 2015-01-01 --end 2019-12-31 \ 
  # Walk forward the window: fit/predict horizon=5, then advance by step=10 between fits; repeat this for a few times
  # I will call 'fit/prediction' a few times to make predictions with a horizon of e.g. 5 by walking through this window in steps of size e.g. 10
  --horizon 5 --step 10 \ 
  # Save test results in an output file
  --out-dir outputs


## Data file (prices.csv) used by the tool  
A single long CSV file with columns:

In [None]:
date, ticker, close[, open, high, low, volume, adj_close]

## `Student` API (must match)

In [None]:
import pandas as pd

class Student:
    def __init__(self, config: dict | None = None, random_state: int = 42):
        """Store hyperparameters, initialise pipeline objects, set seeds."""

    def fit(self,
            X_train: pd.DataFrame,   # Per-ticker OHLCV indexed by date; includes at least 'Close'
            y_train: pd.Series,      # Provided by tester: next-h-day cumulative log return
            meta: dict | None = None # e.g., {"ticker": "...", "horizon": h}
           ):
        """Train using only information available up to each training date. Return self."""
        return self

    def predict(self,
                X: pd.DataFrame,      # Per-ticker OHLCV up to and including prediction dates
                meta: dict | None = None
               ) -> pd.Series:
        """Return a numeric Series named 'y_pred' on the dates where features exist."""
        return y_pred


## How staff will test
`mltester.py`: walk-forward evaluation with multiple runs of fit/predict methods. Reports Directional Accuracy, MAE, RMSE. The tool runs on a held-out universe and dates for marking.

## Assessment rubric (100 marks total)
### A. Test results by `mltester`(50)
*Assessed on the held-out universe using staff tools.*  
**Excellent** (41–50): strong/consistent across tickers; stable.  
**Good** (31–40): generally solid; minor instability.  
**Adequate** (21–30): modest improvement; fragile/inconsistent.  
**Poor** (0–20): little/no signal; erratic or broken outputs.
### B. Evaluation depth & ML use (40)
**Overall experimental design** (10) — time-aware splits; windows documented; re-fit cadence justified.  
**Breadth of exploration** (10) — multiple algorithms/parameters/feature sets; rationale; comparisons to simple baselines.  
**Leakage control & validation discipline** (10) — correct fit/transform separation; no peeking; proper alignment.  
**Metrics & analysis** (10) — appropriate ML metrics; clear tables/plots; basic robustness (sensitivity to windows/params/seeds).  
*Per-criterion bands (out of 10): Excellent (9–10) thorough & well-justified; Good (7–8) solid, minor gaps; Adequate (5–6) basic; Poor (0–4) superficial/incorrect.*  
### C. Notebook quality (10)
**Reproducibility & clarity** (10) — fixed seeds; requirements.txt cell; how-to-run notes; clear structure/comments; runs cleanly.  
*Excellent (9–10) polished/easy to run; Good (7–8) minor rough edges; Adequate (5–6) runs but messy; Poor (0–4) hard to follow or fails to run.*
## Penalties & guidance
~~Modifying tester rules: runability fail + heavy penalties.~~  
Obvious leakage or irreproducible results: up to −30 marks.  
Notebook that does not run end-to-end: up to -50 for Section A and up to −4 for Section C.  

## Submission
~~Submit one file: ECS8051_CW1_\<StudentID\>.ipynb.  
No external files required (we will import Student directly from your notebook).  
Late penalties as per School policy.~~

Submit one Jupyter Notebook file: ECS8051_CW1_\<StudentID\>.ipynb to document your development journey, and one Python file `student.py` to present your `Student` class.
**Your documentation may be presented as comments in multiple markdown cells alongside code cells, or as a report in a single markdown cell at the end of your notebook. It should include the following:**
- **Justification of your decisions**
- **Explanation of what you did**
- **Presentation of your findings and insights**

**You may also include other things that you would like to highlight.**


## Note on Dataset Choice
In this project we use sector ETFs (e.g. Technology, Healthcare, Energy). Each ETF represents a whole industry rather than a single company. This makes the time series:

- Cleaner and more stable than individual stocks (less dominated by one-off firm events).
- Comparable across sectors, so you can evaluate how your ML pipeline performs under different market behaviours (e.g. cyclical vs defensive sectors).

This decision is made for pedagogical purposes: it ensures that you focus on building sound machine learning pipelines and interpreting results across different sectors, rather than being distracted by the excessive noise of individual stocks or the complexity of mixing unrelated asset classes. During marking, your pipelines will be evaluated on a different but parallel set of ETFs to test generalisation.
