In [1]:
import pandas as pd

USA = '/kaggle/input/united-states-stocks/USA.csv'
df = pd.read_csv(filepath_or_buffer=USA, thousands=',')
# we want to know where the last price is in the day range
df['diffLow'] = df['Last'] - df['Low']
df['diffHigh'] = df['High'] - df['Last']
df['absChange'] = df['Chg.'].apply(func=abs)
df['pctChange'] = df['Chg.']/df['Last']
df.head()


Unnamed: 0,Name,Last,High,Low,Chg.,Chg...,Vol,diffLow,diffHigh,absChange,pctChange
0,Boeing,209.16,211.41,207.91,-0.06,-0.03,4310000.0,1.25,2.25,0.06,-0.000287
1,General Motors,38.56,38.97,38.45,-0.09,-0.23,11720000.0,0.11,0.41,0.09,-0.002334
2,Chevron,151.02,155.31,150.99,-3.04,-1.97,8040000.0,0.03,4.29,3.04,-0.02013
3,Citigroup,53.98,54.44,53.53,-0.31,-0.57,11750000.0,0.45,0.46,0.31,-0.005743
4,Bank of America,33.05,33.25,32.83,-0.07,-0.2,30560000.0,0.22,0.2,0.07,-0.002118


We sort of expect that either volume and price are correlated, or volume and price change are correlated, so let's look at the overall correlations.

In [2]:
from plotly.express import imshow
imshow(img=df.corr(numeric_only=True))

What we see is that the prices are highly correlated, which is not surprising, but volume is not correlated with anything else. We suspect that don't know that volume is driven primarily by programmatic trading, so maybe it isn't surprising that prices and price changes are not correlated with volume.

In [3]:
from plotly.express import scatter
scatter(data_frame=df, x='Last', error_x='diffHigh', error_x_minus='diffLow', color='Chg.', y='Vol', hover_name='Name', log_x=True, log_y=True, height=900, )

We can squint and see a negative correlation between the log of the price and the log of the volatility; and in this plot we can also see how the last price relates to the high and low prices for each issue. In this dataset we have a handful of outliers and they makes all the other day changes essentially disappear. Let's look again without our change outliers.

In [4]:
from plotly.express import histogram
histogram(data_frame=df, x='Chg.')

In [5]:
scatter(data_frame=df[df['absChange'] < 10], x='Last', error_x='diffHigh', error_x_minus='diffLow', color='pctChange', y='Vol', hover_name='Name', log_x=True, log_y=True, height=900, )

In [6]:
scatter(data_frame=df[df['absChange'] < 10], x='Last', y='pctChange', hover_name='Name', color='Chg.', )

In [7]:
scatter(data_frame=df, x='Last', y='pctChange', log_x=True)