##### Purpose:

This notebook offer's a description of the TAQ20100104 dataset which was given by Mao's PhD Student Sida.

The TAQ20100104 contains a day of trade's and quotes from different stock exchanges at the second level. To read more about this please see the following box file: https://uofi.app.box.com/notes/387487335526


In [1]:
import pandas as pd
from IPython.display import display


##### Reading Quote dataset

In the code cell below I read in the quote dataset. This file shows the bid and ask prices throughout the day. 

In [2]:
quote = pd.read_fwf('/Users/jarvis/Downloads/TAQ20100104/taqquote', skiprows=[0,1,2,4])

In [3]:
display(quote.sample(15))

Unnamed: 0,SYMBOL,DATE,TIME,BID,OFR,BIDSIZ,OFRSIZ,MODE,EX,MMID
3061504,CSX,20100104,0:04:04,48.75,48.81,4.0,28.0,12.0,P,
6130898,MCD,20100104,5:11:01,62.92,62.9,29.0,10.0,12.0,I,
8126925,RRI,20100104,9:32:08,5.79,,,,,,
8135498,RRI,20100104,0:03:50,5.91,0.93,16.0,4.0,2.0,,
5075093,HRS,20100104,5:25:52,48.35,48.36,14.0,5.0,12.0,N,
993559,ARM,20100104,9:59:30,8.0,1.0,,,,,
2647098,BBT,20100104,5:38:53,25.77,25.78,2.0,70.0,12.0,N,
152427,AGL,20100104,9:55:21,36.66,36.0,,,1.0,,
2386751,BAC,20100104,5:46:17,15.67,15.68,654.0,1102.0,12.0,P,
7457727,NPK,20100104,5:26:11,112.8,113.0,,,,,


##### Explaining the Quote Dataset

In the cell above I displayed 15 random samples from the quote dataset. 

In this cell I describe the quote dataset and give a description of features:

- SYMBOL: Symbol is an indicator of the entity shares that is being traded.
- DATE: Date is the is the current day of the trade which is "20100104"/
- TIME:  time at the second level
- BID: Bid Price
- OFR: Offer Price
- BIDSIZ: Bid Size
- OFRSIZ: Offer size in number of round lots
- MODE: quote condition
    - ‘A’ = Slow on the Ask Side
    - ‘B’ = Slow on the Bid Side
    - ‘C’ = Closing
    - ‘D’ = News Dissemination
    - ‘E’ = Slow on the Bid due to LRP or GAP Quote
    - ‘F’ = Slow on the Ask due to LRP or GAP Quote
    - ‘G’ = Trading Range Indication
    - ‘H’ = Slow on the Bid and Ask side
    - ‘I’ = Order Imbalance
    - ‘J’ = Due to a Related Security - News Dissemination
    - ‘K’ = Due to a Related Security - News Pending
    - ‘L’ = Closed Market Maker (NASD)
    - ‘M’ - Additional Information
    - ‘N’ = Non-firm quote
    - ‘O’ = Opening Quote
    - ‘P’ = News Pending
    - ‘Q’ = Additional Information - Due to Related Security
    - ‘R’ = Regular, two-sided open quote
    - ‘S’ = Due to Related Security
    - ‘T’ = Resume
    - ‘U’ = Slow on the Bid and Ask due to LRP or GAP Quote
    - ‘V’ = In View of Common
    - ‘W’ – Slow Quote due to a Set Slow list on both the bid and offer sides
    - ‘X’ = Equipment Changeover
    - ‘Y’ = Regular - One Sided Quote (NASDAQ)
    - ‘Z’ = No open/no resume Market Maker 63 4 Number NASDAQ Market Maker 
- EX: Exchange on which quote occured
    - ‘A’ = American Stock Exchange
    - ‘B’ = Boston Stock Exchange
    - ‘C’ = National (Cincinnati) Stock Exchange
    - ‘D’ = National Association of Securities Dealers (ADF)
    - ‘E’ = Market Independent (SIP - Generated)
    - ‘I’ = International Stock Exchange
    - ‘M’ = Chicago Stock Exchange
    - ‘N’ = NYSE
    - ‘P’ = NYSE Arca
    - ‘T/Q’= NASDAQ Stock Exchange
    - ‘S’=Consolidated Tape System
    - ‘X’= Philadelphia
- MMID: Nasdaq market marker for each NASD Quote
    - ??? Don't know what this means yet


##### Reading Trade dataset
In the code cell below I read in the trade dataset. This file contains when trades actually took place.

In [4]:
trade = pd.read_fwf('/Users/jarvis/Downloads/TAQ20100104/9081a209a9b0c747.txt', skiprows=[0,1,2, 4])

In [5]:
display(trade.sample(15))

Unnamed: 0,SYMBOL,DATE,TIME,PRICE,G127,CORR,COND,EX,SIZE
131875,BAC,20100104,9:32:25,15.22,0,0,F,B,300
139352,BAC,20100104,9:38:45,15.265,0,0,@,D,100
964748,USB,20100104,5:15:29,22.87,0,0,F,B,300
140051,BAC,20100104,9:39:17,15.27,0,0,@,B,100
334232,BAC,20100104,4:57:42,15.6501,0,0,@,D,1200
581368,HL,20100104,5:24:05,6.4298,0,0,@,D,206
805519,PCP,20100104,2:05:37,111.92,0,0,F,N,100
183609,BAC,20100104,0:28:44,15.44,0,0,F,Z,100
169764,BAC,20100104,0:08:05,15.39,0,0,F,P,1600
26977,AIG,20100104,9:36:45,30.2799,0,0,@,D,200


##### Explaining the Trade Dataset

In the cell above I displayed 15 random samples from the trade dataset. 

In this cell I  give a description of features:

- SYMBOL: Symbol is an indicator of the entity stock that is being traded.
- DATE: Date is the is the  day that the trade is taking place: "20100104"
- TIME:  Time is the current time second level
- PRICE: Price is  the price that shares were traded at 
- G127: Combined "G" Rule 127 & Stopped Stock Trade Indicator
    - ?? No idea what this means yet
- CORR:  Correction Indicator
    - ‘00’ = Regular trade which was not corrected, changed or signified as cancel or error
    - ‘01’ = Original trade which was later corrected (This record contains the original time - HHMM and the corrected data for the trade)
    - ‘07’ = Original trade which was later signified as error
    - ‘08’ = Original trade which was later signified as error
    - ‘10’ = Cancel record (This record follows '08' records)
    - ‘11’ = Error record (This record follows '07' records)
    - ‘12’ = Correction record (This record follows '01' records and contains the correction time and the orginal "incorrect" data) 
- COND: Sale Condition
    - Blank or ‘@’ - Regular Sale (no condition)
    - ‘A’ = Cash (only) Market
    - ‘B’ = Average Price Trade
    - ‘C’ = Cash Trade (same day clearing)
    - ‘D’ = Next Day (only) Market
    - ‘E’ = Automatic Execution
    - ‘F’ = Burst Basket Execution
    - ‘G’ = Opening/Reopening Trade Detail
    - ‘H’ = Intraday Trade Detail
    - ‘I’ = Basket Index on Close Transaction
    - ‘J’ = Rule 127 trade (NYSE only)
    - ‘K’ = Rule 155 trade (AMEX only)
    - ‘L’ = Sold Last (late reporting)
    - ‘N’ = Next Day Trade (next day clearing)
    - ‘O’ = Opened (late report of opened trade)
    - ‘R’ = Seller
    - ‘S’ = Reserved
    - ‘T’= Pre/Post Market Trade
    - ‘Z’ = Sold (out of sequence)
- EX:  Exchange on which the trade occured
    - ‘A’ = American Stock Exchange
    - ‘B’ = Boston Stock Exchange
    - ‘C’ = National (Cincinnati) Stock Exchange
    - ‘D’ = National Association of Securities Dealers (ADF)
    - ‘E’ = Market Independent (SIP - Generated)
    - ‘I’ = International Stock Exchange
    - ‘M’ = Chicago Stock Exchange
    - ‘N’ = NYSE
    - ‘P’ = NYSE Arca
    - ‘T/Q’= NASDAQ Stock Exchange
    - ‘S’=Consolidated Tape System
    - ‘X’= Philadelphia
- SIZE: Number of Shares Traded