Skip to content

Scala OrderBook Reconstructor for high-frequency order-flow data

License

Notifications You must be signed in to change notification settings

phelps-sg/scobre

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SCala OrderBook REconstructor (SCOBRE)

This software allows the user to reconstruct the state of the limit order-book from low-level tick-data provided by the London Stock-Exchange (LSE). The tick-data can be hosted in either mysql, or Apache HBase, and tools are provided for loading to the data into either of these back-ends from the compressed raw files provided by the LSE. Once the data has been loaded, events corresponding to a particular asset and a particular date-range can be replayed through an order-book simulator in order to reconstruct the state of the book. Variables such as the mid-price can then be recorded as a time-series in CSV format. Alternatively the simulator can be run directly from a Python client using an Apache Thrift API.

The software is written in Scala and Java, along with various Unix shell scripts which automate the import process.

Pre-requisites

  • Oracle Java JVM 1.7.0 or higher. Note that the default JVM installed on MacOS or Linux needs to be replaced by the Oracle version in order for the software to work correctly.

  • If running on Windows you will need to install Cygwin in order to execute the shell scripts.

  • (Optional) In order to build the software from source, you will need the scala build tool (sbt); see the sbt documentation.

  • (Optional) In order to host the data, you will need to install Apache HBase version 1.1.2. The software can optionally connect to an existing server which already hosts the data.

  • (Optional) The best Integrated Development Environment (IDE) to use for working on the project is IntelliJ IDEA with the Scala plugin installed.

Installation

1. Configure the HBase host

Open the file hbase-site.xml in the directory etc/ using a text-editor and check that the hbase.master and hbase.zookeeper.quorum properties point to the machine running Apache HBase. For example, the configuration below can be used to connect to the machine with hostname cseesp1.essex.ac.uk. Alternatively to connect to your own laptop running HBase in stand-alone mode, replace cseesp1.essex.ac.uk with localhost.

<configuration>
	<property>
		<name>hbase.master</name>
		<value>cseesp1.essex.ac.uk</value>
	</property>
	<property>
		<name>hbase.zookeeper.quorum</name>
		<value>cseesp1.essex.ac.uk</value>
	</property>
</configuration>

2. Compile the code

To compile the source-code to separate .class files, execute the following command:

sbt compile

To create jar files and the script files:

sbt pack

3. Install the shell scripts

Execute the following commands in the shell to install the scripts into the directory ~/local/bin:

cd target/pack/bin
make install

If ~/local/bin is not already in your PATH environment variable, add a command similar to the following to the file ~/.profile:

export PATH=$PATH:~/local/bin

Running the reconstructor from the shell

The script replay-orders can then be used retrieve a univariate time-series of prices.

The following example will replay all recorded events for the asset with given ISIN and provide a GUI visualisation of the order-book.

replay-orders -t GB0009252882 --with-gui

The following will replay a subset of events over a given date-range:

replay-orders -t GB0009252882 --with-gui \
		--start-date 5/6/2007 --end-date 6/6/2007

The following command will log the mid-price to a CSV file called hf.csv, but will not provide a GUI:

replay-orders -t GB0009252882 --property midPrice \
		--start-date 5/6/2007 --end-date 6/6/2007 -o hf.csv

The following command will log transaction prices to a CSV file called hf.csv:

replay-orders -t GB0009252882 --property lastTransactionPrice -o hf.csv

To get the full list of options use the built-in help:

replay-orders --help

Accessing the simulator from a Python client

The simulator provides an Apache Thrift API which allows clients written in non-JVM languages to call the reconstructor. To start the server, run the following script:

order-replay-service

By default the server will listen on TCP port 9090. To see the configurations options, run:

start-replay-server.sh --help

To see an example of using the API from Python see the script tickdata.py.

Documentation

  • The data description provided by the LSE
  • The API documentation

Working on the project using an IDE

To import the project as an IntelliJ IDEA project, first install the Scala plugin, and then directly import the build.sbt file as a new project.

Importing the raw data into Apache HBase

  1. Install Apache HBase 1.1.2 in standalone mode.

  2. Modify the file base-config.xml in the etc/ directory of the folder where you unpacked the lse-data distribution as follows:

     <configuration>
     	<property>
     		<name>hbase.master</name>
     		<value>localhost</value>
     	</property>
     	<property>
     		<name>hbase.zookeeper.quorum</name>
     		<value>localhost</value>
     	</property>
     </configuration>
    
  3. Create an empty table called events with column family data using the HBase shell:

cd /opt/hbase/bin
./hbase shell
create 'events', 'data'
  1. Run the shell script hbase-import.sh specifying the raw files to import:
cd ./scripts
./import-data-lse.sh ../data/lse/*.CSV.gz

Contact

(C) Steve Phelps 2016