<div style="background-color:#000;"><img src="pqn.png"></img></div>

## Imports and setup

We use matplotlib and seaborn for charting, numpy for math, pandas for working with data tables, yfinance for downloading stock prices, and statsmodels for testing relationships between price series.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import yfinance as yf
from statsmodels.tsa.stattools import coint

## Download and clean stock data

This block downloads daily closing prices for a group of clean energy stocks and ETFs from yfinance.

In [None]:
tickers = [
    "NEE",
    "FSLR",
    "ENPH",
    "PLUG",
    "BEP",
    "AQN",
    "PBW",
    "FAN",
    "ICLN",
]
start_date = "2014-01-01"
end_date = "2015-01-01"
df = yf.download(tickers, start=start_date, end=end_date, auto_adjust=False).Close
df = df.dropna()

We’re collecting a full year of prices for nine tickers focused on renewable energy topics, then making sure there’s no missing data for any of them. This gives us a clean set of price histories lined up by date—one column per ticker—ready for analysis.

## Test for cointegration between stock pairs

This block tests for cointegration—whether two price series have a statistically meaningful long-term relationship—between each possible pair of tickers in our set.

In [None]:
n = len(tickers)
score_matrix = np.zeros((n, n))
pvalue_matrix = np.ones((n, n))
pairs = []
for i in range(n):
    for j in range(i + 1, n):
        score, pval, _ = coint(df.iloc[:, i], df.iloc[:, j])
        score_matrix[i, j] = score
        pvalue_matrix[i, j] = pval
        if pval < 0.05:
            print(f"{tickers[i]} x {tickers[j]}, p={pvalue_matrix[i, j]}")
            pairs.append((tickers[i], tickers[j]))

We use the cointegration test from statsmodels for every unique pair of tickers. For each pair, we save both the test’s score and its p-value, recording the result in two matrices. Whenever the relationship looks strong enough to be statistically significant (p-value below 0.05), we print the ticker names and p-value, and add the pair to a list. This helps us spot which stock pairs act in sync over time.

This block finds the pair with the strongest relationship (the lowest p-value), then calculates the difference between their price series and standardizes it using a rolling z-score for the past 21 days.

In [None]:
mask = np.triu(np.ones_like(pvalue_matrix, dtype=bool), k=1)
upper_vals = pvalue_matrix[mask]
min_idx_flat = np.argmin(upper_vals)
min_p = upper_vals[min_idx_flat]
idx_pairs = np.column_stack(np.where(mask))
i, j = idx_pairs[min_idx_flat]

In [None]:
S1, S2 = df.iloc[:, i], df.iloc[:, j]

In [None]:
score, pvalue, _ = coint(S1, S2)
print(f"tickers with lowest p-value: {tickers[i]} x {tickers[j]}, p={pvalue}")
spread = S1 - S2
zscore = (spread - spread.rolling(21, min_periods=21).mean()) / spread.rolling(21, min_periods=21).std()

We examine the matrix of p-values and pull out the pair with the lowest value, signaling the tightest statistical connection over our sample. We then create a daily time series showing the price gap between those two stocks, and calculate how “extreme” it is with a rolling z-score. Standardizing the gap helps us identify when the difference is unusually high or low, which can be a setup for certain trading strategies.

## Visualize cointegration results

This block creates a heatmap so we can clearly see which pairs of tickers have close statistical relationships, and which do not.

In [None]:
mask = np.tril(np.ones_like(pvalue_matrix, dtype=bool)) | (pvalue_matrix >= 0.95)
plt.figure(figsize=(8, 6))
sns.heatmap(
    pvalue_matrix,
    mask=mask,
    xticklabels=tickers,
    yticklabels=tickers,
    cmap="RdYlGn_r",
    annot=True,
    fmt=".2f",
    cbar=True,
    vmin=0,
    vmax=1,
)
plt.title("Cointegration Test p-value Matrix")
plt.show()

We use seaborn to chart the p-values for all pairs, hiding entries that aren’t meaningful (like the lower half and those above 0.95). Each square shows the exact value, so it’s easy to spot which tickers move alike. The color scale helps us focus on the most promising relationships for strategies or further study.

This block displays the spread and z-score for the pair of tickers with the tightest relationship, highlighting typical and outlying values.

In [None]:
fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)
axes[0].plot(spread.index, spread, label="Spread")
axes[0].axhline(spread.mean(), color="black", linestyle="--", lw=1)
axes[0].axhline(spread.mean() + spread.std(), color="red", linestyle="--", lw=1)
axes[0].axhline(spread.mean() - spread.std(), color="green", linestyle="--", lw=1)
axes[0].set_ylabel("Spread")
axes[0].legend()
axes[1].plot(zscore.index, zscore, label="Z-score")
axes[1].axhline(0, color="black", linestyle="--", lw=1)
axes[1].axhline(1, color="red", linestyle="--", lw=1)
axes[1].axhline(-1, color="green", linestyle="--", lw=1)
axes[1].set_ylabel("Z-score")
axes[1].legend()
plt.xlabel("Date")
plt.suptitle("Spread and Rolling Z-score between ABGB and FSLR")
plt.tight_layout()
plt.show()

We create two aligned charts: the top shows the raw price gap between the best-matched stocks, with lines marking average and typical extreme values. The bottom chart tracks how unusual the gap is each day, using the rolling z-score. This makes it straightforward to spot when the difference stretches far from normal, which could be a signal for trade ideas or risk alerts. Both charts make these hidden relationships easy to see.

<a href="https://pyquantnews.com/">PyQuant News</a> is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to <a href="https://gettingstartedwithpythonforquantfinance.com/">get started with Python for quant finance</a>. For educational purposes. Not investment advice. Use at your own risk.