# Notebook 01: VFV Data Pull + Target Construction

## Introduction

Financial markets are often believed to contain exploitable patterns that can be identified using technical indicators such as moving averages, momentum, and volatility measures. These indicators are widely used by retail traders and frequently presented as tools that can improve short-term return prediction.

However, financial time series are noisy, highly stochastic, and influenced by a wide range of unpredictable factors. As a result, it is unclear whether common technical indicators provide meaningful predictive information beyond simple baseline assumptions.

This project investigates whether widely used technical indicators add predictive value for next-day returns of VFV (Vanguard S&P 500 Index ETF), compared to naive return-based baselines. Rather than attempting to construct a trading strategy, the goal is to evaluate signal versus noise using a disciplined, out-of-sample modeling approach.

By focusing on proper target construction, time-aware train/test splits, and honest performance evaluation, this analysis aims to assess whether these indicators offer real informational value or whether their apparent usefulness is largely illusory.

## Data Source

The dataset used in this analysis consists of daily adjusted closing prices for VFV (Vanguard S&P 500 Index ETF), obtained from Yahoo Finance using the `yfinance` Python library. VFV tracks the performance of the S&P 500 index and provides a liquid, diversified proxy for the overall U.S. equity market.

Daily data from January 2013 to the present is used in order to capture multiple market regimes, including periods of low volatility, market stress, and recovery.

Adjusted closing prices are used to ensure that returns properly reflect dividends and corporate actions.

## Target Variable Construction

The prediction target is defined as the next-day return of VFV. Daily returns are computed as the percentage change in adjusted closing price from day *tâˆ’1* to day *t*. The target variable is then created by shifting the return series backward by one day so that the value at time *t* represents the return observed at time *t+1*.

This construction ensures that all features used for prediction are based solely on information available at time *t*, preventing the use of future information and avoiding data leakage.

In [18]:
!pip -q install yfinance

import pandas as pd
import yfinance as yf
import altair as alt

In [19]:
ticker = "VFV.TO"
vfv = yf.download(
    ticker,
    start="2013-01-01",
    auto_adjust=False,
    progress=False
)
vfv = vfv.reset_index()
vfv = vfv[["Date", "Adj Close"]].copy()
vfv["return"] = vfv["Adj Close"].pct_change()
vfv["target"] = vfv["return"].shift(-1)

vfv = vfv.dropna().reset_index(drop=True)
vfv.head()


Price,Date,Adj Close,return,target
Ticker,Unnamed: 1_level_1,VFV.TO,Unnamed: 3_level_1,Unnamed: 4_level_1
0,2013-01-03,21.237484,0.005516,0.002351
1,2013-01-04,21.287411,0.002351,-0.001564
2,2013-01-07,21.254116,-0.001564,-0.003523
3,2013-01-08,21.17923,-0.003523,0.004322
4,2013-01-09,21.270761,0.004322,0.004695


In [20]:
import os

os.makedirs("../data", exist_ok=True)
vfv.to_csv("../data/vfv_clean.csv", index=False)

In [21]:
vfv[["return", "target"]].describe()

Price,return,target
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1
count,3287.0,3287.0
mean,0.000681,0.000678
std,0.009938,0.009938
min,-0.106917,-0.106917
25%,-0.00382,-0.00382
50%,0.000905,0.000895
75%,0.005633,0.005633
max,0.090656,0.090656


In [22]:
plot_df = pd.DataFrame({"return": vfv["return"].to_numpy()})

alt.Chart(plot_df).mark_bar().encode(
    alt.X("return:Q", bin=alt.Bin(maxbins=100), title="Daily Return"),
    alt.Y("count()", title="Count")
).properties(
    title="Distribution of Daily VFV Returns"
)

The distribution of daily VFV returns is centered near zero with heavy tails, highlighting the high level of noise and volatility present in financial time series.