<div style="background-color:#000;"><img src="pqn.png"></img></div>

### Import libraries and set up

We start by importing necessary libraries for data analysis, visualization, and machine learning.

In [None]:
%load_ext cuml.accel
import yfinance as yf
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

These libraries provide tools for financial data retrieval, data manipulation, numerical computations, principal component analysis, and data visualization. We'll use them to analyze S&P 500 stock data and perform dimensionality reduction.

### Download S&P 500 data

We fetch the list of S&P 500 companies and download their historical stock data.

In [None]:
snp_symbols = pd.read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")[0].Symbol.tolist()
symbols = [sym.replace(".", "-") for sym in snp_symbols]

In [None]:
data = yf.download(symbols, start="2020-01-01", end="2024-12-31")
portfolio_returns = data['Close'].pct_change().dropna()

We retrieve the list of S&P 500 company symbols from Wikipedia and adjust them for compatibility with Yahoo Finance. Then, we download historical stock data for these companies from 2020 to 2024. We calculate the daily returns of each stock based on closing prices.

### Perform PCA analysis

We apply Principal Component Analysis to reduce the dimensionality of our portfolio returns data.

In [None]:
pca = PCA(n_components=5)
pca.fit(portfolio_returns)

In [None]:
pct = pca.explained_variance_ratio_
pca_components = pca.components_

We use PCA to identify the main factors driving our portfolio returns. We set the number of components to 5, which will give us the five most significant factors. The explained variance ratio tells us how much of the total variance each component accounts for.

### Visualize PCA results

We create bar and line plots to visualize the contribution of each principal component.

In [None]:
cum_pct = np.cumsum(pct)
x = np.arange(1,len(pct)+1,1)

In [None]:
plt.subplot(1, 2, 1)
plt.bar(x, pct * 100, align="center")
plt.title('Contribution (%)')
plt.xlabel('Component')
plt.xticks(x)
plt.xlim([0, 6])
plt.ylim([0, 100])

In [None]:
plt.subplot(1, 2, 2)
plt.plot(x, cum_pct * 100, 'ro-')
plt.title('Cumulative contribution (%)')
plt.xlabel('Component')
plt.xticks(x)
plt.xlim([0, 6])
plt.ylim([0, 100])

We create two subplots: a bar chart showing the individual contribution of each component, and a line plot showing the cumulative contribution. This helps us visualize how much variance each component explains and how many components we need to explain most of the variance in our data.

### Calculate factor returns

We use the PCA components to calculate factor returns for our portfolio.

In [None]:
X = np.asarray(portfolio_returns)

In [None]:
factor_returns = X.dot(pca_components.T)

In [None]:
factor_returns = pd.DataFrame(
    columns=["f1", "f2", "f3", "f4", "f5"], 
    index=portfolio_returns.index,
    data=factor_returns
)

In [None]:
factor_returns.head()

We transform our portfolio returns using the PCA components to get factor returns. These factor returns represent the behavior of our portfolio in terms of the five most significant factors we identified. We store these in a DataFrame for easy analysis.

### Visualize factor returns

We create a 3D scatter plot to visualize the relationships between the first three factors.

In [None]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

In [None]:
sc = ax.scatter(data[:, 0], data[:, 1], data[:, 2], c=data[:, 2], cmap='viridis')

In [None]:
ax.set_xlabel('f1')
ax.set_ylabel('f2')
ax.set_zlabel('f3')

In [None]:
plt.colorbar(sc, ax=ax, label='f3')
plt.show()

We create a 3D scatter plot to visualize how the first three factors relate to each other. Each point represents a day in our dataset, plotted according to its values for the first three factors. The color of each point is based on the value of the third factor, giving us an additional dimension of information.

<a href="https://pyquantnews.com/">PyQuant News</a> is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to <a href="https://gettingstartedwithpythonforquantfinance.com/">get started with Python for quant finance</a>. For educational purposes. Not investment advise. Use at your own risk.