Skip to content
Research project on Financial Industry Regulatory Authority (FINRA) Trade Reporting and Compliance Engine (TRACE) academic version
Jupyter Notebook Other
Branch: master
Clone or download
Latest commit 5ffaf15 Sep 18, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
Notebooks update model amount outstanding Jul 8, 2019
TopicModeling update model amount outstanding Jul 8, 2019
docs add Dc_v4 Mar 12, 2019
.gitattributes Initial commit Nov 3, 2018
.gitignore update trade_vol_bow_norm Jul 2, 2019
LICENSE Initial commit Nov 3, 2018 Update Sep 18, 2019
requirements.txt clean repo and add sas2csv May 2, 2019

FINRA TRACE Data Research

This is a research project on Financial Industry Regulatory Authority (FINRA) Trade Reporting and Compliance Engine (TRACE) academic version under the supervision of Dr. Louiqa Raschid.

The purpose of the research is to study interaction and trading behavior among dealers in over-the-counter (OTC) corporate bond market. We utilize topic modeling techniques, mostly Latent Dirichlet allocation (LDA), to analyze bonds that were traded by dealer on each day. Our preliminary result shows that LDA has the flexibilty to analyze trading interaction in mutiple dimensions.

The visualization can be found here.



  1. FINRA TRACE academic version
  2. Python 3.7, Gensim, Pandas, Numpy, SKLearn, Matplotlib and Plotly


  1. Cluster with SLURM, this can help you speed up computation. The shell scripts were design for that.


The recommended enviroment to run all the bash script without modification needs to follow these requirements:

  1. Install miniconda and install it at the default path ~/miniconda3/ so that you can activate the base conda enviroment using the below script, and that is what all the bash script contains
source ~/miniconda3/bin/activate
  1. Install all the python dependences mentions above in the conda enviroment you install from the previous step. That is
conda install ......
  1. Finally, Submit SLURM jobs using



Directories Explaination
FINRA_TRACE/ base directory
FINRA_TRACE/Notebook/ folder to place experiment notebooks (not used anymore)
FINRA_TRACE/Data/ data directory contains the following directory
FINRA_TRACE/Data/Pickle/ folder to place data in pandas Pickle format you get from the (I know the naming is confusing)
FINRA_TRACE/Data/Dataset/ folder to place Mergent FISD .csv files you download from WRDS
FINRA_TRACE/Data/id2word/ folder to save Gensim id2word output from
FINRA_TRACE/Data/Corpus/ folder to save Gensim corpus output from
FINRA_TRACE/Result/ auto-generated folder to save the pyldaviz .html output and document-topic probability weighting matrix .csv from
FINRA_TRACE/LDAModel/ auto-generated folder to save the Gensim lda model file outputed by
FINRA_TRACE/LDAModel/logs/ auto-generated folder to save the Gensim lda log files outputed by (this is where you look for perplexity)


Files Explaination
./TopicModeling/ just to nicely create/call the path of each directory in relative links
./TopicModeling/ load python pandas pickle files and compute Gensim lda
./TopicModeling/ run in cluster with each node a specified command line arguments (what kind of transformations, small/large caps ,number of topics ...)
./TopicModeling/ analysis on lda result inclusing document_topic_distribution, save_pyldavis2html, topicXtime_matplotlib, plot_sankey
./TopicModeling/ sbatch script to run lda analysis on cluster
./TopicModeling/ plot topicXtime visualization
./TopicModeling/ plot topicXtime visualization on cluster as sbatch job


You can’t perform that action at this time.