##### Purpose:

This notebook offer's a description of the TAQ20100104 dataset which was given by Mao's PhD Student Sida.

The TAQ20100104 contains a day of trade's and quotes from different stock exchanges at the second level. To read more about this please see the following box file: https://uofi.app.box.com/notes/387487335526


In [6]:
import pandas as pd
from IPython.display import display


##### Reading Quote dataset

In the code cell below I read in the quote dataset. This file shows the bid and ask prices throughout the day. 

In [9]:
quote = pd.read_fwf('/Users/jarvis/Downloads/TAQ20100104/taqquote', skiprows=[0,1,2, 4])

In [10]:
display(quote.sample(15))

Unnamed: 0,SYMBOL,DATE,TIME,BID,OFR,BIDSIZ,OFRSIZ,MODE,EX,MMID
3560619,ELY,20100104,9:37:08,7.7,,,,,,
3673290,EMR,20100104,1:24:37,43.25,43.26,2.0,1.0,12.0,B,
1687480,BAC,20100104,1:06:30,15.48,15.49,42.0,654.0,12.0,P,
7233745,NOV,20100104,4:59:22,45.85,45.86,14.0,1.0,12.0,N,
3672825,EMR,20100104,1:22:56,43.23,43.25,5.0,12.0,12.0,N,
3934532,FMR,20100104,5:05:33,13.72,1.0,,,,,
402890,AIG,20100104,1:26:17,30.38,30.43,2.0,3.0,12.0,N,
364554,AIG,20100104,0:51:47,29.69,29.72,2.0,6.0,12.0,Z,
183738,AGP,20100104,9:34:37,27.35,27.0,,,1.0,,
6212504,MCY,20100104,2:56:51,39.91,39.93,8.0,2.0,12.0,T,


##### Explaining the Quote Dataset

In the cell above I displayed 15 random samples from the quote dataset. 

In this cell I describe the quote dataset and give a description of features:

- SYMBOL: Symbol is an indicator of the entity shares that is being traded.
- DATE: Date is the is the current day of the trade which is "20100104"/
- TIME:  time at the second level
- BID: Bid Price
- OFR: Offer Price
- BIDSIZ: Bid Size
- OFRSIZ: Offer size in number of round lots
- MODE: quote condition
    - ‘A’ = Slow on the Ask Side
    - ‘B’ = Slow on the Bid Side
    - ‘C’ = Closing
    - ‘D’ = News Dissemination
    - ‘E’ = Slow on the Bid due to LRP or GAP Quote
    - ‘F’ = Slow on the Ask due to LRP or GAP Quote
    - ‘G’ = Trading Range Indication
    - ‘H’ = Slow on the Bid and Ask side
    - ‘I’ = Order Imbalance
    - ‘J’ = Due to a Related Security - News Dissemination
    - ‘K’ = Due to a Related Security - News Pending
    - ‘L’ = Closed Market Maker (NASD)
    - ‘M’ - Additional Information
    - ‘N’ = Non-firm quote
    - ‘O’ = Opening Quote
    - ‘P’ = News Pending
    - ‘Q’ = Additional Information - Due to Related Security
    - ‘R’ = Regular, two-sided open quote
    - ‘S’ = Due to Related Security
    - ‘T’ = Resume
    - ‘U’ = Slow on the Bid and Ask due to LRP or GAP Quote
    - ‘V’ = In View of Common
    - ‘W’ – Slow Quote due to a Set Slow list on both the bid and offer sides
    - ‘X’ = Equipment Changeover
    - ‘Y’ = Regular - One Sided Quote (NASDAQ)
    - ‘Z’ = No open/no resume Market Maker 63 4 Number NASDAQ Market Maker 
- EX: Exchange on which quote occured
    - ‘A’ = American Stock Exchange
    - ‘B’ = Boston Stock Exchange
    - ‘C’ = National (Cincinnati) Stock Exchange
    - ‘D’ = National Association of Securities Dealers (ADF)
    - ‘E’ = Market Independent (SIP - Generated)
    - ‘I’ = International Stock Exchange
    - ‘M’ = Chicago Stock Exchange
    - ‘N’ = NYSE
    - ‘P’ = NYSE Arca
    - ‘T/Q’= NASDAQ Stock Exchange
    - ‘S’=Consolidated Tape System
    - ‘X’= Philadelphia
- MMID: Nasdaq market marker for each NASD Quote
    - ??? Don't know what this means yet


##### Reading Trade dataset
In the code cell below I read in the trade dataset. This file contains when trades actually took place.

In [7]:
trade = pd.read_fwf('/Users/jarvis/Downloads/TAQ20100104/9081a209a9b0c747.txt', skiprows=[0,1,2, 4])

In [8]:
display(trade.sample(15))

Unnamed: 0,SYMBOL,DATE,TIME,PRICE,G127,CORR,COND,EX,SIZE
650400,JWN,20100104,2:54:21,38.03,0,0,F,P,200
638939,JBL,20100104,5:26:17,17.67,0,0,F,T,100
935612,USB,20100104,9:35:40,22.57,0,0,@,P,10
35717,AIG,20100104,0:35:11,29.59,0,0,F,Z,100
397055,BBT,20100104,4:54:00,25.75,0,0,@,B,100
356146,BAC,20100104,5:36:46,15.68,0,0,@,D,1000
501337,EMR,20100104,3:11:20,43.2521,0,0,@,D,800
778257,NOV,20100104,3:14:48,45.67,0,0,F,T,100
347809,BAC,20100104,5:24:39,15.65,0,0,@,Z,100
410987,BYI,20100104,5:29:17,43.35,0,0,@,P,100


##### Explaining the Trade Dataset

In the cell above I displayed 15 random samples from the trade dataset. 

In this cell I  give a description of features:

- SYMBOL: Symbol is an indicator of the entity stock that is being traded.
- DATE: Date is the is the  day that the trade is taking place: "20100104"
- TIME:  Time is the current time second level
- PRICE: Price is  the price that shares were traded at 
- G127: Combined "G" Rule 127 & Stopped Stock Trade Indicator
    - ?? No idea what this means yet
- CORR:  Correction Indicator
    - ‘00’ = Regular trade which was not corrected, changed or signified as cancel or error
    - ‘01’ = Original trade which was later corrected (This record contains the original time - HHMM and the corrected data for the trade)
    - ‘07’ = Original trade which was later signified as error
    - ‘08’ = Original trade which was later signified as error
    - ‘10’ = Cancel record (This record follows '08' records)
    - ‘11’ = Error record (This record follows '07' records)
    - ‘12’ = Correction record (This record follows '01' records and contains the correction time and the orginal "incorrect" data) 
- COND: Sale Condition
    - Blank or ‘@’ - Regular Sale (no condition)
    - ‘A’ = Cash (only) Market
    - ‘B’ = Average Price Trade
    - ‘C’ = Cash Trade (same day clearing)
    - ‘D’ = Next Day (only) Market
    - ‘E’ = Automatic Execution
    - ‘F’ = Burst Basket Execution
    - ‘G’ = Opening/Reopening Trade Detail
    - ‘H’ = Intraday Trade Detail
    - ‘I’ = Basket Index on Close Transaction
    - ‘J’ = Rule 127 trade (NYSE only)
    - ‘K’ = Rule 155 trade (AMEX only)
    - ‘L’ = Sold Last (late reporting)
    - ‘N’ = Next Day Trade (next day clearing)
    - ‘O’ = Opened (late report of opened trade)
    - ‘R’ = Seller
    - ‘S’ = Reserved
    - ‘T’= Pre/Post Market Trade
    - ‘Z’ = Sold (out of sequence)
- EX:  Exchange on which the trade occured
    - ‘A’ = American Stock Exchange
    - ‘B’ = Boston Stock Exchange
    - ‘C’ = National (Cincinnati) Stock Exchange
    - ‘D’ = National Association of Securities Dealers (ADF)
    - ‘E’ = Market Independent (SIP - Generated)
    - ‘I’ = International Stock Exchange
    - ‘M’ = Chicago Stock Exchange
    - ‘N’ = NYSE
    - ‘P’ = NYSE Arca
    - ‘T/Q’= NASDAQ Stock Exchange
    - ‘S’=Consolidated Tape System
    - ‘X’= Philadelphia
- SIZE: Number of Shares Traded