In [2]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import yfinance as yf

start_date = '2000-01-01'

# 1. Fetch daily price data for biotech and related indices
tickers = ['XBI', 'IBB', 'IWM', 'XLV']
prices = yf.download(tickers, start='2000-01-01', end='2024-09-01')['Adj Close']

# 2. Calculate daily returns (percentage change)
returns = prices.pct_change()

# 3. Fetch daily treasury yields (10-year and 2-year)
yields = yf.download(['^TNX', '^IRX'], start='2000-01-01', end='2024-09-01')['Adj Close']
yields.columns = ['10y_yield', '2y_yield']

# 4. Compute the 10y-2y yield spread and classify yield curve regimes
yields['10y_2y_spread'] = yields['10y_yield'] - yields['2y_yield']
yields['inverted_10y_2y'] = yields['10y_2y_spread'] < 0  # Inversion when 10y < 2y

# 5. Merge returns and yields data
df = pd.merge(returns, yields, left_index=True, right_index=True)

# 6. Compute rolling correlations between biotech index (XBI) and other indices
rolling_corr_window = 252  # 1-year window for rolling correlation
df['xbi_iwm_corr'] = df['XBI'].rolling(window=rolling_corr_window).corr(df['IWM'])
df['xbi_xlv_corr'] = df['XBI'].rolling(window=rolling_corr_window).corr(df['XLV'])

# 7. Analyze XBI returns in normal vs inverted yield curve regimes
xbi_returns_normal = df[~df['inverted_10y_2y']]['XBI']
xbi_returns_inverted = df[df['inverted_10y_2y']]['XBI']

# Output formatting for better readability
print("===== XBI Returns Summary =====\n")
print("Normal Yield Curve (No Inversion):")
print(xbi_returns_normal.describe(), "\n")
print("Inverted Yield Curve:")
print(xbi_returns_inverted.describe(), "\n")

# 8. Analyze correlations under different yield curve regimes
xbi_iwm_corr_normal = df[~df['inverted_10y_2y']]['xbi_iwm_corr']
xbi_iwm_corr_inverted = df[df['inverted_10y_2y']]['xbi_iwm_corr']

print("===== XBI-IWM Correlation Summary =====\n")
print("Normal Yield Curve:")
print(xbi_iwm_corr_normal.describe(), "\n")
print("Inverted Yield Curve:")
print(xbi_iwm_corr_inverted.describe(), "\n")

# 9. Prepare data for predictive modeling (X: features, y: target XBI returns)
X = df[['IWM', 'XLV', '10y_yield', '2y_yield', '10y_2y_spread', 'inverted_10y_2y', 'xbi_iwm_corr', 'xbi_xlv_corr']]

# Shift features to create lagged predictors
lag_periods = [1, 5, 21, 63]  # 1-day, 1-week, 1-month, 1-quarter lags
for lag in lag_periods:
    X_lag = X.shift(lag)
    X_lag.columns = [col + f'_lag{lag}' for col in X_lag.columns]
    X = pd.concat([X, X_lag], axis=1)

# Shift the target variable (XBI returns) to predict 1 month ahead
y = df['XBI'].shift(-21)  # Shift target variable by -21 days (1 month ahead)

# Drop rows with NaNs created by shifting
X = X.dropna()
y = y.reindex(X.index)

# 10. Split the data into training and testing sets (80% train, 20% test)
train_size = int(len(X) * 0.8)  # Calculate the index for the training size
X_train, X_test = X.iloc[:train_size], X.iloc[train_size:]
y_train, y_test = y.iloc[:train_size], y.iloc[train_size:]

# 11. Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# 12. Predict XBI returns on the test set
y_pred = model.predict(X_test)

# 13. Evaluate model performance using Mean Squared Error (MSE)
mse = np.mean((y_test - y_pred) ** 2)
print(f"===== Model Performance =====\nMean Squared Error: {mse:.4f}\n")

# 14. Feature importance (coefficients from linear regression)
importances = pd.Series(model.coef_, index=X.columns)
top_features = importances.sort_values(ascending=False).head(10)

# Output top features and their corresponding coefficients
print("===== Top 10 Feature Importances =====\n")
print(top_features.to_string())



[*********************100%***********************]  4 of 4 completed
[*********************100%***********************]  2 of 2 completed

===== XBI Returns Summary =====

Normal Yield Curve (No Inversion):
count    737.000000
mean       0.000502
std        0.016362
min       -0.047046
25%       -0.010039
50%        0.000527
75%        0.010382
max        0.076684
Name: XBI, dtype: float64 

Inverted Yield Curve:
count    3933.000000
mean        0.000601
std         0.019969
min        -0.123472
25%        -0.010506
50%         0.000838
75%         0.011936
max         0.132640
Name: XBI, dtype: float64 

===== XBI-IWM Correlation Summary =====

Normal Yield Curve:
count    612.000000
mean       0.753253
std        0.048137
min        0.674665
25%        0.712616
50%        0.740736
75%        0.789306
max        0.855241
Name: xbi_iwm_corr, dtype: float64 

Inverted Yield Curve:
count    3807.000000
mean        0.743696
std         0.073803
min         0.499279
25%         0.704211
50%         0.738078
75%         0.806061
max         0.890761
Name: xbi_iwm_corr, dtype: float64 

===== Model Performance =====
Mean Square




In [6]:
# DIRECTIONAL ANALYSIS

# Long/short trading XBI just using yield

## XBI Returns

XBI is a biotech stock index. We looked at how it performs in two different situations:

1. When the economy looks normal (normal yield curve):
   - XBI usually goes up a tiny bit each day, about 0.05%.
   - The ups and downs aren't too crazy, with the worst day being down 4.7% and the best day up 7.67%.

2. When there might be economic trouble ahead (inverted yield curve):
   - XBI actually does a bit better, going up about 0.06% per day on average.
   - But it's more of a rollercoaster ride, with bigger ups and downs. The worst day was down 12.35% and the best day up 13.26%.

## XBI and IWM Relationship

IWM is a small cap index. We looked at how XBI and IWM move together:

1. In normal times:
   - They move together pretty closely, with a strong positive relationship.
   - This relationship stays pretty steady.

2. When there might be economic trouble:
   - They still move together, but not quite as closely.
   - Their relationship is a bit more unpredictable.

## Predicting XBI Returns

We used a linear regression model to try to predict XBI returns:

- The model does a pretty good job, with only small errors in its predictions.
- The most important things for predicting XBI returns are:
  1. How the 10-year Treasury yield changed 63 days ago
  2. The difference between 10-year and 2-year Treasury yields 63 days ago
  3. How the 2-year Treasury yield looked 63 days ago

## Big Picture

- XBI behaves differently when the economy looks normal versus when it might be heading for trouble.
- Past Treasury yields, especially from about two months ago, are really important for guessing how XBI will do.
- We saw more data for times when the economy might be in trouble, which could be interesting to look into more.

### XBI Returns Summary
1. **Normal Yield Curve (No Inversion)**:
   - **Count**: 737 observations.
   - **Mean**: The average daily return is approximately **0.0502%**, suggesting a modest positive return during this regime.
   - **Standard Deviation**: **0.0164**, indicating relatively low volatility in returns.
   - **Range**: Returns ranged from a minimum of **-4.70%** to a maximum of **7.67%**, showing fluctuations but less extreme compared to the inverted scenario.

2. **Inverted Yield Curve**:
   - **Count**: 3933 observations, indicating that this regime is more prevalent in the dataset.
   - **Mean**: Average return is approximately **0.0601%**, slightly higher than in normal conditions.
   - **Standard Deviation**: **0.01997**, suggesting greater volatility during inversions.
   - **Range**: Returns are more extreme, with a minimum of **-12.35%** and a maximum of **13.26%**.

### XBI-IWM Correlation Summary
1. **Normal Yield Curve**:
   - **Count**: 612 observations.
   - **Mean Correlation**: The average correlation between XBI and IWM is **0.7533**, indicating a strong positive relationship.
   - **Standard Deviation**: **0.0481**, showing relatively stable correlations.
   - **Range**: Correlations range from **0.6747** to **0.8552**, highlighting consistency in their relationship.

2. **Inverted Yield Curve**:
   - **Count**: 3807 observations.
   - **Mean Correlation**: Average correlation decreases slightly to **0.7437**.
   - **Standard Deviation**: **0.0738**, suggesting more variability during inversions.
   - **Range**: Correlations range from **0.4993** to **0.8908**, indicating some periods of weaker correlation.

### Model Performance
- **Mean Squared Error (MSE)**: The MSE is **0.0005**, indicating that the predictions made by the linear regression model are quite accurate, as lower MSE values signify better predictive performance.

### Top 10 Feature Importances
The coefficients of the linear regression model highlight the importance of various features:
- **10y_yield_lag1_lag63** is the most significant predictor, followed by **10y_2y_spread_lag63** and **2y_yield_lag63**.
- Several lagged features of treasury yields suggest that past values of yields are critical for predicting future XBI returns.
- The prominence of longer lags (e.g., lag63) implies that there may be delayed effects in how treasury yields influence biotech returns.

### Overall Insights
- The analysis shows that biotech index returns and their correlations with other indices vary significantly between normal and inverted yield curve regimes.
- The model predicts returns with low error, emphasizing the relevance of specific treasury yield features, especially those lagged over longer periods.
- The greater frequency of data points during inverted yield periods suggests an economic context that may require further investigation, particularly regarding market reactions to changes in yield curves.