In [1]:
import pandas as pd

DATA = '/kaggle/input/jpmorgan-chase-stock-data-2025/JPM_1940-01-01_2025-03-04.csv'
df = pd.read_csv(filepath_or_buffer=DATA,)
df['Date'] = df['date'].apply(func=lambda x: x.split()[0])
df['Date'] = pd.to_datetime(df['Date'])
df['year'] = df['Date'].dt.year.astype(float)
df.head()

Unnamed: 0,date,open,high,low,close,adj_close,volume,Date,year
0,1980-03-17 00:00:00-05:00,0.0,5.12963,5.018519,5.037037,1.059499,62775,1980-03-17,1980.0
1,1980-03-18 00:00:00-05:00,0.0,5.111111,5.037037,5.074074,1.067288,64125,1980-03-18,1980.0
2,1980-03-19 00:00:00-05:00,0.0,5.166667,5.111111,5.148148,1.08287,40500,1980-03-19,1980.0
3,1980-03-20 00:00:00-05:00,0.0,5.148148,5.092593,5.111111,1.075079,18900,1980-03-20,1980.0
4,1980-03-21 00:00:00-05:00,0.0,5.222222,5.111111,5.222222,1.09845,97200,1980-03-21,1980.0


First let's have a look at the price/volume correlations.

In [2]:
df[['open', 'high', 'low', 'close', 'adj_close', 'volume']].corr()

Unnamed: 0,open,high,low,close,adj_close,volume
open,1.0,0.998834,0.998758,0.998704,0.989419,0.152231
high,0.998834,1.0,0.999845,0.999902,0.992053,0.147328
low,0.998758,0.999845,1.0,0.999903,0.992343,0.140009
close,0.998704,0.999902,0.999903,1.0,0.992303,0.143744
adj_close,0.989419,0.992053,0.992343,0.992303,1.0,0.129419
volume,0.152231,0.147328,0.140009,0.143744,0.129419,1.0


What do we see? We see that none of the prices are perfectly correlated, so we know we have no redundant columns. We also see that price and volume is slightly positively correlated.

In [3]:
from plotly import express
from plotly import io

io.renderers.default = 'iframe'
express.scatter(data_frame=df, x='Date', y='adj_close', color='year', log_y=False)

Because the stock has appreciated so much some years look like they have no price variability at all, which doesn't make much sense. Let's use a log plot in the price direction.

In [4]:
express.scatter(data_frame=df, x='Date', y='adj_close', color='year', log_y=True)

What do we see? We see that the stock price has appreciated pretty steadily, reflecting exponential price growth, but with occasional (substantial) reversals.

Next let's look at volume over time. Volume is not generally serially correlated the way prices are, so we need to use a log plot to keep outliers from dominating our plot.

In [5]:
express.scatter(data_frame=df, x='Date', y='volume', color='year', log_y=True)

What do we see? We see steady growth in the log of the volume through some time around 2008, followed by a decline in volume. Let's try plotting price and volume together.

In [6]:
express.scatter(data_frame=df, x='adj_close', y='volume', color='year', log_x=True, log_y=True)

What do we see? We see that the log of the price and the log of the volume are only slightly correlated, but by using color to represent the passage of time we do see that some years we can see the log of the price and the log of the volume growing together year over year.