<div style="background-color:#000;"><img src="pqn.png"></img></div>

## Imports and setup

We use matplotlib for plotting charts, numpy for numeric operations, statsmodels for statistical analysis and regression, yfinance for downloading historic price data, and scikit-learn for finding patterns in the data with principal component analysis (PCA).

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import statsmodels.api as sm
import yfinance as yf
from sklearn.decomposition import PCA
from statsmodels import regression

This block selects a group of sector and asset tickers, pulls one year of daily historical closing prices using yfinance, and then calculates daily returns for this group.

In [None]:
tickers = ["SPY", "XLE", "XLY", "XLP", "XLI", "XLU", "XLK", "XBI", "XLB", "XLF", "GLD"]
price_data = yf.download(tickers, period="1y").Close
returns = price_data.pct_change().dropna()
returns = returns.fillna(0)

We set up a list of ticker symbols, fetch one year of their closing price history, and turn that price data into a table of daily returns for each symbol. We clean the results by dropping missing values and filling any leftovers with zeros, making analysis easier.

## Analyze and visualize components

This block fits principal component analysis (PCA) on the returns to summarize most of the moves in these assets, then stores how many components are needed to capture 90% of the action. It also calculates how much each component tells us about the overall movement.

In [None]:
pca = PCA(n_components=0.9, svd_solver="full")
principal_components = pca.fit(returns)

In [None]:
n_components_90 = pca.n_components_
components_90 = pca.components_
explained_variance = pca.explained_variance_ratio_

Here we use PCA to break down the returns into key pieces that explain most of the changes. By setting it to cover 90% of the explained variance, we're making sure we only keep the most useful directions in the data. We save the number of important components, the directions they point, and how much of the total movement they each explain.

This block creates a bar chart to help visualize how much of the volatility or movement in these returns each component explains.

In [None]:
plt.figure(figsize=(10, 6))
plt.bar(range(1, n_components_90 + 1), explained_variance, alpha=0.7)
plt.xlabel("Principal Component")
plt.ylabel("Explained Variance Ratio")
plt.title(f"PCA Number of Components by MLE: {n_components_90}")
plt.tight_layout()
plt.show()

We build a visual that clearly shows—for each component found by the PCA—how much of the story it tells us about overall returns. The bar chart helps us spot which components matter and how many we need to keep to describe most of our data's behavior.

## Visualize relationships between two assets

This block standardizes the returns, picks the first two assets in our list, and performs PCA to look for the main lines of movement between these two. It then plots them together with the directions of the two most important components.

In [None]:
r = returns / returns.std()
r1_s, r2_s = r.iloc[:, 0], r.iloc[:, 1]

In [None]:
pca.fit(np.vstack((r1_s, r2_s)).T)
components_s = pca.components_
evr_s = pca.explained_variance_ratio_

In [None]:
plt.figure(figsize=(8, 6))
plt.scatter(r1_s, r2_s, alpha=0.5, s=10)

In [None]:
xs = np.linspace(r1_s.min(), r1_s.max(), 100)
plt.plot(xs * components_90[0, 0] * evr_s[0], xs * components_90[0, 1] * evr_s[0], "r")
plt.plot(xs * components_90[1, 0] * evr_s[1], xs * components_90[1, 1] * evr_s[1], "g")

We arrange the returns to all use the same scale, pick out just the first two, and recompute PCA for those. The scatter plot shows how these two returns move together. The overlaid lines make it easy to see which direction in the paired data is the most dominant according to PCA, letting us spot which relationship matters most.

## Model a single asset using components

This block creates a set of series representing our principal components, then uses multiple linear regression to see how well these explain the movement of the first asset in our list. The resulting numbers show how much each component contributes.

In [None]:
factor_returns = np.array(
    [(components_90[i] * returns).T.sum() for i in range(n_components_90)]
)

In [None]:
mlr = regression.linear_model.OLS(
    returns.T.iloc[0], sm.add_constant(factor_returns.T)
).fit()
print(f"Regression coefficients for {tickers[0]}:\n{mlr.params.round(4)}")

We calculate a value for each principal component, measuring how it relates to the daily returns across assets. Then, we use these values to explain the return movements of a single asset using a statistical model. The printed results show us exactly how much each component helps explain the action in that asset.

This line simply displays how many components we kept after PCA, confirming how many were needed to capture 90% of the total movement.

In [None]:
n_components_90

By printing this, we see the precise number of influential pieces we used to summarize almost all the risk and return patterns in our data. This makes it clear how complex—or simple—our group of assets really is.

<a href="https://pyquantnews.com/">PyQuant News</a> is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to <a href="https://gettingstartedwithpythonforquantfinance.com/">get started with Python for quant finance</a>. For educational purposes. Not investment advice. Use at your own risk.