# Financial Trading with Python, 2nd Edition
Cordell L. Tanny, CFA, FRM, FDP

## Chapter 3: Data Retrieval
### Notebook 3.3: Use Case - Downloading Our Strategy Data

Version: 1

Date of last revision: January 18, 2026

This notebook downloads and prepares the data for our volatility tail hedge strategy. We will produce a clean, aligned parquet file that will be used in all subsequent chapters.

---

## Setup

In [None]:
# Install packages if needed (uncomment if running in Colab)
# !pip install yfinance --quiet

In [1]:
import yfinance as yf
import pandas as pd
import numpy as np
import warnings

# Display settings
pd.set_option('display.max_rows', 20)
pd.set_option('display.float_format', '{:.2f}'.format)
warnings.filterwarnings('ignore')

print(f"yfinance version: {yf.__version__}")
print(f"pandas version: {pd.__version__}")

yfinance version: 0.2.66
pandas version: 2.2.2


---

## Section 1: The Strategy Overview

Before we download data, let's understand what we are building and why we need these specific instruments.

### 1.1 What We Are Building

Throughout this book, we will develop a **volatility tail hedge strategy**. The core idea is simple:

- Maintain **80% exposure to equities** (SPY) at all times
- Allocate the remaining **20% defensively** between cash (BIL) and volatility protection (VIXY)
- When market volatility spikes, shift the defensive allocation toward VIXY to profit from the turbulence

This strategy attempts to capture equity market returns while providing downside protection during market stress. The key insight is that volatility instruments like VIXY tend to spike precisely when equity markets are falling.

### 1.2 The Four Tickers

Our strategy requires four data series:

| Ticker | Name | Role in Strategy |
|--------|------|------------------|
| SPY | S&P 500 ETF | Core equity exposure (80%) |
| BIL | 1-3 Month T-Bill ETF | Cash parking spot (defensive) |
| VIXY | VIX Short-Term Futures ETF | Tail hedge instrument (defensive) |
| ^VIX | CBOE Volatility Index | Signal generation only (not traded) |

Note that ^VIX is an index, not a tradeable instrument. We use it to generate signals, but we cannot actually buy or sell it. When we want volatility exposure, we trade VIXY instead.

### 1.3 Why These Specific Instruments

**SPY** is the most liquid equity ETF in the world. It tracks the S&P 500 index and represents broad US market exposure. For any equity strategy, SPY is the natural starting point.

**BIL** holds short-term Treasury bills and has near-zero volatility. When our strategy is not signaling danger, we park the defensive allocation here. It earns a small yield while preserving capital.

**VIXY** tracks short-term VIX futures. It tends to spike during market selloffs, which is exactly when we need protection. However, VIXY has a structural drag due to the cost of rolling futures contracts (contango). This means it loses money over time in calm markets. We only want to hold it when we expect volatility.

**^VIX** is the "fear gauge" of the market. It measures expected volatility derived from S&P 500 options prices. We use it to determine when to shift from BIL to VIXY.

---

## Section 2: Downloading the Data

### 2.1 Configuration

We define our tickers and date range. We request data from 2000 to see the different inception dates for each ticker. The alignment step will trim the data to the common range where all four tickers have valid prices.

In [9]:
# Define the tickers for our strategy
tickers = ['SPY', 'BIL', 'VIXY', '^VIX']

# Date range
# VIXY inception: January 2011 - this is our binding constraint
start_date = '2000-01-01'
end_date = '2025-12-31'

print(f"Tickers: {tickers}")
print(f"Date range: {start_date} to {end_date}")

Tickers: ['SPY', 'BIL', 'VIXY', '^VIX']
Date range: 2000-01-01 to 2025-12-31


### 2.2 Download All Tickers

We download all four tickers in a single call. Setting `auto_adjust=False` ensures we get both `Close` and `Adj Close` columns, which is important for understanding how dividends and splits affect prices.

In [10]:
# Download all tickers
df_raw = yf.download(
    tickers=tickers,
    start=start_date,
    end=end_date,
    auto_adjust=False,
    progress=True
)

[*********************100%***********************]  4 of 4 completed


### 2.3 Inspecting the Raw Download

Let's examine what yfinance returned. With multiple tickers, we get a MultiIndex column structure.

In [11]:
# Check the shape and structure
print(f"Shape: {df_raw.shape}")
print(f"\nColumn levels: {df_raw.columns.nlevels}")
print(f"\nLevel 0 (Price types): {df_raw.columns.get_level_values(0).unique().tolist()}")
print(f"Level 1 (Tickers): {df_raw.columns.get_level_values(1).unique().tolist()}")

Shape: (6538, 24)

Column levels: 2

Level 0 (Price types): ['Adj Close', 'Close', 'High', 'Low', 'Open', 'Volume']
Level 1 (Tickers): ['BIL', 'SPY', 'VIXY', '^VIX']


The columns have two levels: the price type (Adj Close, Close, High, Low, Open, Volume) and the ticker symbol. This MultiIndex structure is how yfinance organizes data when you request multiple tickers.

In [12]:
# Look at the first few rows
df_raw.head()

Price,Adj Close,Adj Close,Adj Close,Adj Close,Close,Close,Close,Close,High,High,...,Low,Low,Open,Open,Open,Open,Volume,Volume,Volume,Volume
Ticker,BIL,SPY,VIXY,^VIX,BIL,SPY,VIXY,^VIX,BIL,SPY,...,VIXY,^VIX,BIL,SPY,VIXY,^VIX,BIL,SPY,VIXY,^VIX
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2000-01-03,,91.62,,24.21,,145.44,,24.21,,148.25,...,,23.98,,148.25,,24.36,,8164300,,0
2000-01-04,,88.03,,27.01,,139.75,,27.01,,144.06,...,,24.8,,143.53,,24.94,,8089800,,0
2000-01-05,,88.19,,26.41,,140.0,,26.41,,141.53,...,,25.85,,139.94,,27.98,,12177900,,0
2000-01-06,,86.77,,25.73,,137.75,,25.73,,141.5,...,,24.7,,139.62,,26.68,,6227200,,0
2000-01-07,,91.81,,21.72,,145.75,,21.72,,145.75,...,,21.72,,140.31,,25.14,,8066500,,0


In [13]:
# Look at the last few rows
df_raw.tail()

Price,Adj Close,Adj Close,Adj Close,Adj Close,Close,Close,Close,Close,High,High,...,Low,Low,Open,Open,Open,Open,Volume,Volume,Volume,Volume
Ticker,BIL,SPY,VIXY,^VIX,BIL,SPY,VIXY,^VIX,BIL,SPY,...,VIXY,^VIX,BIL,SPY,VIXY,^VIX,BIL,SPY,VIXY,^VIX
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2025-12-23,91.3,687.96,26.22,14.0,91.3,687.96,26.22,14.0,91.31,688.2,...,26.08,13.64,91.31,683.92,26.35,14.09,7111200.0,64840000,1528600.0,0
2025-12-24,91.33,690.38,26.19,13.47,91.33,690.38,26.19,13.47,91.33,690.83,...,26.04,13.38,91.33,687.95,26.16,14.09,5073500.0,39445600,882800.0,0
2025-12-26,91.36,690.31,26.14,13.6,91.36,690.31,26.14,13.6,91.36,691.66,...,26.04,13.52,91.35,690.64,26.05,14.12,5588900.0,41613300,1456400.0,0
2025-12-29,91.36,687.85,25.86,14.2,91.36,687.85,25.86,14.2,91.37,689.2,...,25.63,13.99,91.36,687.54,26.41,14.69,7988900.0,62559500,1921600.0,0
2025-12-30,91.37,687.01,25.47,14.33,91.37,687.01,25.47,14.33,91.38,688.56,...,25.37,14.04,91.37,687.45,25.75,14.43,7852900.0,47160700,1363200.0,0


---

## Section 3: Data Alignment

Our four tickers have different inception dates. We need to find the common date range where all have valid data.

### 3.1 The Date Range Problem

Let's check when each ticker's data actually begins. We will look at the Adj Close column for each ticker and find the first non-NaN value.

In [14]:
# Check first valid date for each ticker
print("First valid date for each ticker:")
print("-" * 40)

for ticker in tickers:
    first_valid = df_raw['Adj Close'][ticker].first_valid_index()
    print(f"{ticker:6} : {first_valid.strftime('%Y-%m-%d')}")

First valid date for each ticker:
----------------------------------------
SPY    : 2000-01-03
BIL    : 2007-05-30
VIXY   : 2011-01-04
^VIX   : 2000-01-03


Now we can see the actual inception dates. SPY and ^VIX have data going back to the 1990s, BIL started in 2007, and VIXY is the newest with its January 2011 launch. VIXY is our binding constraint. Any date before VIXY began trading will have NaN values for that column.


### 3.2 Finding the Common Date Range

We extract just the Adjusted Close prices and drop any rows where any ticker has a missing value. This gives us a clean, aligned dataset.

In [15]:
# Extract Adjusted Close prices for all tickers
df_adj_close = df_raw['Adj Close'].copy()

print(f"Before alignment: {len(df_adj_close)} rows")
print(f"NaN counts per ticker:")
print(df_adj_close.isna().sum())

Before alignment: 6538 rows
NaN counts per ticker:
Ticker
BIL     1860
SPY        0
VIXY    2768
^VIX       0
dtype: int64


In [16]:
# Drop rows where ANY ticker has NaN
df_aligned = df_adj_close.dropna()

print(f"After alignment: {len(df_aligned)} rows")
print(f"\nDate range: {df_aligned.index[0].strftime('%Y-%m-%d')} to {df_aligned.index[-1].strftime('%Y-%m-%d')}")

After alignment: 3770 rows

Date range: 2011-01-04 to 2025-12-30


We now have a dataset where every row has valid prices for all four tickers. The date range starts when VIXY data became available.

### 3.3 Data Quality Check

Before saving the data, we run a series of quality checks. These checks catch common data problems that could corrupt our analysis downstream:

- NaN values: Missing data that slipped through our alignment step. Should be zero.
- Zero values: Prices should never be exactly zero. A zero usually indicates a data error or a placeholder for missing data.
- Negative values: Prices cannot be negative. If present, the data source has a problem.
- Trading day counts: A typical year has approximately 252 trading days. Significantly fewer days in any year would indicate gaps in our data.

These are basic sanity checks. More sophisticated data validation (outlier detection, corporate action verification) belongs in the data preparation phase covered in later chapters.

In [18]:
# Verify no NaN values remain
nan_count = df_aligned.isna().sum().sum()
print(f"Total NaN values: {nan_count}")

# Check for zeros (which would indicate bad data)
zero_count = (df_aligned == 0).sum().sum()
print(f"Zero values: {zero_count}")

# Check for negative values (prices should always be positive)
negative_count = (df_aligned < 0).sum().sum()
print(f"Negative values: {negative_count}")

Total NaN values: 0
Zero values: 0
Negative values: 0


In [19]:
# Sanity check: count trading days per year
print("Trading days per year:")
print(df_aligned.groupby(df_aligned.index.year).size())

Trading days per year:
Date
2011    251
2012    250
2013    252
2014    252
2015    252
2016    252
2017    251
2018    251
2019    252
2020    253
2021    252
2022    251
2023    250
2024    252
2025    249
dtype: int64


A typical trading year has approximately 252 trading days. The counts above should be in that range (with partial years at the start and end having fewer days). If any year showed significantly fewer days, it would indicate missing data.

---

## Section 4: Building and Saving the Price DataFrame

### 4.1 The Final DataFrame Structure

Our aligned DataFrame already has the structure we need: a DateTimeIndex with columns for each ticker's Adjusted Close price. Let's confirm the structure matches what our strategy code expects.

In [21]:
# Rename for clarity (remove any MultiIndex remnants)
df_prices = df_aligned.copy()

# Ensure column names are clean strings
df_prices.columns = [str(col) for col in df_prices.columns]

print(f"Columns: {df_prices.columns.tolist()}")
print(f"Index type: {type(df_prices.index).__name__}")

Columns: ['BIL', 'SPY', 'VIXY', '^VIX']
Index type: DatetimeIndex


The DataFrame has exactly the structure our strategy requires: columns named SPY, BIL, VIXY, and ^VIX, indexed by date.

### 4.2 Final Inspection

Let's do a final review of our prepared data.

In [22]:
# DataFrame info
df_prices.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3770 entries, 2011-01-04 to 2025-12-30
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   BIL     3770 non-null   float64
 1   SPY     3770 non-null   float64
 2   VIXY    3770 non-null   float64
 3   ^VIX    3770 non-null   float64
dtypes: float64(4)
memory usage: 147.3 KB


In [23]:
# Summary statistics
df_prices.describe()

Unnamed: 0,BIL,SPY,VIXY,^VIX
count,3770.0,3770.0,3770.0,3770.0
mean,77.95,280.93,67160.99,18.14
std,4.33,153.48,159385.91,6.88
min,74.67,85.32,25.47,9.14
25%,74.88,162.16,347.4,13.59
50%,75.84,241.03,2410.8,16.35
75%,78.47,393.63,32184.0,20.58
max,91.37,690.38,975600.0,82.69


A few observations from the summary statistics:

- **SPY** ranges from its 2011 levels to current prices, showing the long-term upward trend of the equity market.
- **BIL** has a very tight range with low standard deviation, confirming its role as a stable cash proxy.
- **VIXY** has a high mean relative to its minimum, reflecting its tendency to decay over time with occasional spikes.
- **^VIX** shows the typical volatility index behavior with a wide range between calm and stressed markets.

In [24]:
# First few rows
print("First 5 rows:")
df_prices.head()

First 5 rows:


Unnamed: 0_level_0,BIL,SPY,VIXY,^VIX
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2011-01-04,74.97,97.14,633840.0,17.38
2011-01-05,74.97,97.64,620400.0,17.02
2011-01-06,74.97,97.45,623040.0,17.4
2011-01-07,74.96,97.26,624320.0,17.14
2011-01-10,74.97,97.14,623040.0,17.54


In [25]:
# Last few rows
print("Last 5 rows:")
df_prices.tail()

Last 5 rows:


Unnamed: 0_level_0,BIL,SPY,VIXY,^VIX
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2025-12-23,91.3,687.96,26.22,14.0
2025-12-24,91.33,690.38,26.19,13.47
2025-12-26,91.36,690.31,26.14,13.6
2025-12-29,91.36,687.85,25.86,14.2
2025-12-30,91.37,687.01,25.47,14.33


### 4.3 Save to Parquet

We save our prepared data to a parquet file. This file will be loaded in subsequent chapters for exploratory analysis, signal generation, and backtesting.

In [26]:
# Save to parquet
output_file = 'case_study_prices.parquet'
df_prices.to_parquet(output_file)

print(f"Saved to {output_file}")

Saved to case_study_prices.parquet


Don't forget to download the parquet file if you want to load it in future notebooks!

In [27]:
# Verify the file loads correctly
df_verify = pd.read_parquet(output_file)

print(f"Loaded shape: {df_verify.shape}")
print(f"Columns: {df_verify.columns.tolist()}")
print(f"Index type: {type(df_verify.index).__name__}")
print(f"\nData matches original: {df_verify.equals(df_prices)}")

Loaded shape: (3770, 4)
Columns: ['BIL', 'SPY', 'VIXY', '^VIX']
Index type: DatetimeIndex

Data matches original: True


The parquet file preserves our DataFrame exactly as we saved it, including the DateTimeIndex and all column data types.

---

## Section 5: Summary and Next Steps

### 5.1 What We Produced

In this notebook, we:

1. Downloaded price data for our four strategy components: SPY, BIL, VIXY, and ^VIX
2. Identified the binding constraint on our date range (VIXY inception in January 2011)
3. Aligned all series to a common date range with no missing values
4. Verified data quality (no NaNs, zeros, or negative values)
5. Saved the clean data to `case_study_prices.parquet`

The output file contains daily Adjusted Close prices for all four tickers, ready for analysis.

In [28]:
# Final summary
print("=" * 50)
print("CASE STUDY DATA SUMMARY")
print("=" * 50)
print(f"Output file: {output_file}")
print(f"Date range: {df_prices.index[0].strftime('%Y-%m-%d')} to {df_prices.index[-1].strftime('%Y-%m-%d')}")
print(f"Total trading days: {len(df_prices)}")
print(f"Tickers: {df_prices.columns.tolist()}")
print("=" * 50)

CASE STUDY DATA SUMMARY
Output file: case_study_prices.parquet
Date range: 2011-01-04 to 2025-12-30
Total trading days: 3770
Tickers: ['BIL', 'SPY', 'VIXY', '^VIX']


### 5.2 What Comes Next

In the next chapter (Exploratory Data Analysis), we will:

- Visualize the price histories of all four instruments
- Examine return distributions and their characteristics
- Analyze correlations between the assets
- Explore the behavior patterns that our strategy will exploit

The `case_study_prices.parquet` file we created here will be the starting point for that analysis.