In [1]:
import pandas as pd

TWTR = '/kaggle/input/twitter-stock-market/TWTR.csv'
df = pd.read_csv(filepath_or_buffer=TWTR, parse_dates=['Date'])
df['year'] = df['Date'].dt.year
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,year
0,2013-11-07,45.099998,50.09,44.0,44.900002,44.900002,117701670.0,2013
1,2013-11-08,45.93,46.939999,40.685001,41.650002,41.650002,27925307.0,2013
2,2013-11-11,40.5,43.0,39.400002,42.900002,42.900002,16113941.0,2013
3,2013-11-12,43.66,43.779999,41.830002,41.900002,41.900002,6316755.0,2013
4,2013-11-13,41.029999,42.869999,40.759998,42.599998,42.599998,8688325.0,2013


Let's start by looking at the price/volume correlations.

In [2]:
df[['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']].corr()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
Open,1.0,0.99886,0.998588,0.997414,0.997414,-0.015718
High,0.99886,1.0,0.998265,0.998731,0.998731,-0.001709
Low,0.998588,0.998265,1.0,0.998887,0.998887,-0.031352
Close,0.997414,0.998731,0.998887,1.0,1.0,-0.019179
Adj Close,0.997414,0.998731,0.998887,1.0,1.0,-0.019179
Volume,-0.015718,-0.001709,-0.031352,-0.019179,-0.019179,1.0


What do we see?
* Our close and adjusted close prices are perfectly correlated, so we know we have no dividends, and we can ignore the adjusted close prices.
* Prices and volumes are essentially uncorrelated.

How much data do we have?

In [3]:
df['year'].value_counts().to_frame().sort_index().T

year,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
count,37,252,252,252,251,251,252,253,252,212


We have eight complete years and two partial years.

Let's make a time series of closing prices.

In [4]:
from plotly import express
from plotly import io

io.renderers.default = 'iframe'
express.scatter(data_frame=df, x='Date', y='Close', color='year')

What do we see? We see a stock that went up and down, but mostly traded in a range. If you bought and held this stock, you would have probably have been better off buying an index fund.

Next let's look at the volume over time.

In [5]:
express.scatter(data_frame=df, x='Date', y='Volume', color='year', log_y=True)

Unline prices, volume tends not to be serially correlated, so the plot tends to look uniform except where there are outliers. Plotting the log of the volume gives us a better sense of how volumes have behaved over time. Do we see a pattern? No, volume moves up and down and has shocks on news, but it mostly looks pretty random.

What do we see if we plot prices and volume together?

In [6]:
express.scatter(data_frame=df, x='Close', y='Volume', color='year', log_y=True)

We can sort of see the passage of time because we're using the year to color our plot. But we mostly see an uncorrelated mess. Not least because prices within a year tend to move in a range, and volumes also tend to move in a range most of the time.