# Volumes, Regimes and Liquidity

This and many of our modules are going to contain data analysis sections. These sections are going to make intense use various statistical and data analysis packages in Python. While this course will present a lot of code in these sections, the aim of these modules will not be to teach one to code or to use these particular tools as this will largely form part of one's assumed knowledge.

This section, in particular, will make use of base packages, such are OS and requests, incorporating data analysis and plotting using Pandas, NumPy, HoloViews and Hvplot. While many of you should be familiar with Pandas and NumPy from previous exposures to Python, there exist many resources online to help develop these skills. One package we will be making use of, which may be less familiar to many of you who have made plots in Python before in the past, is HoloViews and Hvplot. For more resources on these packages, we would recommend using [Pyviz Tutorials](http://pyviz.org/tutorial/index.html).  For those who are a bit rusty, I would recommend keeping a [cheat sheet](https://www.datacamp.com/community/data-science-cheatsheets) handy.  
  

The main reason for displaying the code integrated into these sections is so that students can reproduce this code and explore it with other parameters or use it on other datasets. It also helps guide students on some of the math and help individuals when implementing their own code in assignments and future research.

For the most part, this course will be making use of publicly available datasets. In some cases, we will be downloading them programmatically ourselves in the code or integrating numerous data sources in order to produce our analysis. As much of this course looks at incredibly historic data, this data can often be hard and difficult to come by and can require a large amount of cleaning. These are crucial skills for analysts in being able to source, clean, integrate and develop data into hypothesis and findings.

Some of these notes may include small black cells with comments like this:

In [9]:
### Fill in some code here to print to console "Financial Engineering"



This will provide students with the opportunity to follow the code better and extract their own findings. This will require students downloading these files and running them in the correct environment on their own machines, but should provide valuable additional insight into both the code being run and the data being analysed, as shown below:

In [10]:
### Fill in some code here to print to console "Financial Engineering"
print("Financial Engineering")

Financial Engineering


We will start by importing a number of packages. If your environment is setup correctly from the setup notes, it should execute correctly and without any issues.  

In [11]:
# Import Libraries
import os
import requests

import pandas as pd
import numpy as np

import holoviews as hv
import hvplot.pandas

In [12]:
# Import Plotting Backend
hv.extension('bokeh')

The data used for these notes is included in the Data Folder if students would like to run this code themselves and analyse the output.  Code has been included to scrape the data directly from the NYSE website themselves; however, this should not be necessary.  On long-term historical data, it is often challending to find consistent price data; however, volumes are readily recorded.  In order to gain insight into the effects of the crash and history of these markets, we will observe this datapoint overtime in order to gain some peak into evolving market regimes.  

In [13]:
date_ranges = [[1970, 1979, 'dat'],
               [1960, 1969, 'dat'],
               [1950, 1959, 'dat'],
               [1940, 1949, 'dat'],
               [1930, 1939, 'dat'],
               [1920, 1929, 'prn'],
               [1900, 1919, 'dat'],
               [1888, 1899, 'dat']][::-1]

In [14]:
# # Download Data

# def get_decade(start = 1920, end = 1929, extension='prn'):
#     "Specify the sparting year of the decade eg. 1900, 2010, 2009"
#     try:
#         link = requests.get(f'https://www.nyse.com/publicdocs/nyse/data/Daily_Share_Volume_{start}-{end}.{extension}')
#         file = os.path.join("..","Data",f"Daily_Share_Volume_{start}-{end}.{extension}")
        
#         if link.status_code == 404:
#             raise
#         else:
#             with open(file, 'w') as temp_file:
#                 temp_file.write(str(link.content.decode("utf-8")))

#             print(f"Successfully downloaded {start}-{end}")

#     except:
#         print("There was an issue with the download. \n\
# You may need a different date range or file extension. \n\
# Check out https://www.nyse.com/data/transactions-statistics-data-library")

# download_history = [get_decade(decade[0], decade[1], decade[2]) for decade in date_ranges]

In order to start exploring this data, we are going to import it into a Pandas Dataframe.  Using this Dataframe we can then import it into HoloWiews in order to track specific data points over time and interact with them as needed.  

In [19]:
# Read and format the data
def load_data(start = 1920, end = 1929, extension='prn'):
    path = os.path.join("Data",f"Daily_Share_Volume_{start}-{end}.{extension}")
    
    if extension=='prn':
        data = pd.read_csv(path , sep='   ', parse_dates=['Date'], engine='python').iloc[2:,0:2]
        data.loc[:,"  Stock U.S Gov't"] = pd.to_numeric(data.loc[:,"  Stock U.S Gov't"], errors='coerce')
        data.Date = pd.to_datetime(data.Date, format='%Y%m%d', errors='coerce')
        data.columns = ['Date','Volume']
        return data
    else:
        data = pd.read_csv(path)
        data.iloc[:,0] = data.iloc[:,0].apply(lambda x: str(x).strip(' '))
        data = data.iloc[:,0].str.split(' ', 1, expand=True)
        data.columns = ['Date','Volume']
        data.loc[:,"Volume"] = pd.to_numeric(data.loc[:,"Volume"], errors='coerce')
        data.Date = pd.to_datetime(data.Date, format='%Y%m%d', errors='coerce')
        return data

In [20]:
data = pd.concat([load_data(decade[0], decade[1], decade[2]) for decade in date_ranges], axis=0)

Markets are complex and dynamic systems made up of many agents who not only respond to external information, but to the market itself. These agents learn over time and develop complex behaviour through there interactions. As these markets evolve, characteristics can change requiring new strategies in order to keep up with market trends. Markets are dynamic and can me made up of a number of states. Markets can often respond and behave dramatically different during times of crisis, that they do in either Bull or Bear Markets. While price is a significant concern for investor performance, so too is liquidity. In venture capital, a key question asked is around an investment's exit strategy, and for market investors, the ability to rapidly liquidate investments can be the difference between bankruptcy and success. As market information changes, we can often observe the market forces of supply and demand push and pull, as investors rapidly move to buy and sell-off holdings based on their own investment strategies and fast-changing market information. While liquidity, as a concept, is something difficult to directly quantify, for many investors volume can provide an interesting insight over time into changes to market information, demand and supply and liquidity. When volumes are lower than normal that can often signal little changes in market information when volumes are high, information can be changing dramatically, forcing investors to alter their portfolios and investment strategies.

From the diagram below, we plot Volume for the NYSE from 1888 to 1979 over time. It is clear that volumes have increased dramatically over time, with increasing volatility and kurtosis. While we may speculate around the effect of increased market size, computerized trading and even high-frequency trading, it is interesting to note the dramatic changes markets experience during crisis situations.

We see over a period of time, both before and after Black Tuesday, volumes become increasingly volatile as traders seek to price in the drama of new information. The feature of leverage, new to this market crash, forced many traders to alter their positions in the market in hope of settling margin accounts and hold onto trades.

In [21]:
# Create plotting object
plot_data = hv.Dataset(data, kdims=['Date'], vdims=['Volume'])

# Create scatter plot

black_tuesday = pd.to_datetime('1929-10-29')

vline = hv.VLine(black_tuesday).options(color='#FF7E47')

m = hv.Scatter(plot_data).options(width=700, height=400).redim('NYSE Share Trading Volume').hist() * vline * \
    hv.Text(black_tuesday + pd.DateOffset(months=10), 4e7, "Black Tuesday", halign='left').options(color='#FF7E47')
m

In [22]:
# Create plotting object
plot_data_zoom = hv.Dataset(data.loc[((data.Date >= pd.to_datetime("1920-01-01"))&(data.Date <= pd.to_datetime("1940-01-01"))),:], kdims=['Date'], vdims=['Volume'])

# Create scatter plot

black_tuesday = pd.to_datetime('1929-10-29')

vline = hv.VLine(black_tuesday).options(color='#FF7E47')

m = hv.Scatter(plot_data_zoom).options(width=700, height=400).redim('NYSE Share Trading Volume').hist() * vline * \
    hv.Text(black_tuesday + pd.DateOffset(months=10), 4e7, "Black Tuesday", halign='left').options(color='#FF7E47')
m

Using the slider below, you can adjust the Moving Average Smoothing we can apply to this data and the window of Volatility in order to better comprehend changing market properties.  

In [11]:
%%opts Scatter [width=400 height=200]

data['Quarter'] = data.Date.dt.quarter

def second_order(days_window):
    data_imputed = data
    data_imputed.Volume = data_imputed.Volume.interpolate()
    
    return hv.Scatter(pd.concat([data_imputed.Date, data_imputed.Volume.rolling(days_window).mean()], 
                                names=['Date', 'Volumne Trend'], axis=1)
                      .dropna()).redim(Volume='Mean Trend') + \
    hv.Scatter(pd.concat([data_imputed.Date, data_imputed.Volume.rolling(days_window).cov()], 
                         names=['Date', 'Volumne Variance'], axis=1)
               .dropna()).redim(Volume='Volume Variance').options(color='#FF7E47')
    
hv.DynamicMap(second_order,kdims=['days_window']).redim.range(days_window=(7,1000))

In [12]:
%%opts Bars [width=400 height=300]
from statsmodels.tsa.stattools import acf, pacf

def auto_correlations(start_year, window_years):
    start_year  = pd.to_datetime(f'{start_year}-01-01')
    window_years = pd.DateOffset(years=window_years)
    
    data_window = data
    data_window = data_window.loc[((data_window.Date>=start_year)
                                   &(data_window.Date<=(start_year+window_years))),:]
    
    return hv.Bars(acf(data_window.Volume.interpolate().dropna()))\
                .redim(y='Autocorrelation', x='Lags') +\
            hv.Bars(pacf(data_window.Volume.interpolate().dropna()))\
                .redim(y='Patial Autocorrelation', x='Lags').options(color='#FF7E47')

hv.DynamicMap(auto_correlations,kdims=['start_year', 'window_years']
             ).redim.range(start_year=(data.Date.min().year,data.Date.max().year), window_years=(1,25))

We can model this data in a rudimentary fashion, looking at the partial auto-correlation and auto-correlation present in this data. These properties can vary dramatically over time and provide insight into the variance, efficiency and responsiveness of the market. Many markets in developing economies can feature low levels of liquidity, even for large stocks. With large public investment companies and retail investors, changes in investment strategy can subsume liquidity in the market, as large volumes of trades look to be executed. In these markets, these trades force the price to increase over many days and may result in increases in one or two-day auto-correlation depending on the characteristics of market liquidity. These characteristics of momentum can also form part of investor strategy, or describe some element of market microstructure, but interesting to note from these plots above is how in recent years auto-correlation of volumes has seen radical changes to historical norms. Generally, in these plots above, we can observe some inkling of these properties in the Partial Auto-correlation Plot, which displays a regular 2-day correlation indicative of characteristics of momentum and liquidity.

In [13]:
# Try filtering the data and computing 
# the skewness and kurtosis over different time periods
# using the .kurtosis() and .skew() functions




## References
_Rappoport, P. and White, E. N. (2016) ‘Was There a Bubble in the 1929 Stock Market ? Published by : Cambridge University Press on behalf of the Economic History Association Stable URL : http://www.jstor.org/stable/2122405 Was There a Bubble in the 1929 Stock Market ?’, 53(3), pp. 549–574._  
  
