# Cross Market Efficiency Consistency

Below is the code for Table 12.

To run this notebok you will first need to run both of the Trader Analysis notebooks.

In [1]:
import os
import sys
import re
from collections import defaultdict
from functools import lru_cache

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pandas import Series, DataFrame, Panel

idx = pd.IndexSlice

from research_tools import storage

pd.options.display.float_format = lambda x: '{:,.4f}'.format(x) if abs(x) < 1 else '{:,.2f}'.format(x)

# Load Data

First load the data we saved at the end of the Trader Analysis notebooks.

In [2]:
def load_pickle(filename):
    with open(os.path.join('data', filename), 'rb') as f:
        return pickle.load(f)

In [3]:
os.chdir('..')

dem_trader_classifications = load_pickle('dem.trader_classifications.p')
dem_trader_stats_summary = load_pickle('dem.trader_stats_summary.p')

rep_trader_classifications = load_pickle('gop.trader_classifications.p')
rep_trader_stats_summary = load_pickle('gop.trader_stats_summary.p')

In [4]:
dem_trader_classifications.head()

Unnamed: 0_level_0,category,efficiency,size,activity
user_guid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0022AC92-4A31-3308-BCB9-D94C6F507A31,Efficient Small Inactive,Efficient,Small,Inactive
00318BA5-01FC-34A4-A4A1-3523BF5485C6,Inefficient Small Inactive,Inefficient,Small,Inactive
0034C80D-C854-3C60-8F01-64B48B565AA5,Efficient Small Inactive,Efficient,Small,Inactive
005E1296-C898-3911-A4C1-0B33FAB05A29,Inefficient Large Active,Inefficient,Large,Active
005E56D2-76B6-39DA-9199-366D761FE63D,Inefficient Large Inactive,Inefficient,Large,Inactive


In [5]:
dem_trader_stats_summary.head()

Unnamed: 0_level_0,orders_sent,quantity,notional,spread_profit,bias_profit,position_profit,gross_pnl,fee,pnl_net_fee,take_pct,longshot_pct,antilongshot_pct
user_guid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
C4D5B846-1FD3-31DB-BA4C-9223A8633708,6,6038,4718.51,10.47,-22.06,1331.08,1319.49,131.95,1187.54,0.2136,0.0,0.0
9D326C8B-8DE3-3A23-8D20-E2B1A332FAB9,10,6034,4716.1,36.94,-25.52,1306.48,1317.9,131.79,1186.11,0.0603,0.0,0.0
A5D275E7-E1CA-3003-ABB8-00947B491947,47,7392,6014.55,58.17,3.79,1252.99,1314.95,131.49,1183.45,0.2024,0.0,0.447
738E8484-0BDA-3B96-84DC-A110B70DF314,9,5948,4673.58,62.61,-24.13,1235.94,1274.42,127.44,1146.98,0.0121,0.0,0.0
34A305EF-1ED6-3B2D-9F7D-0A90C7B7A787,75,6708,4880.55,-5.9,38.32,1154.81,1187.23,120.95,1066.28,0.7508,0.0,0.0198


There are 2042 traders who traded in both markets.

In [6]:
common_traders = set(rep_trader_stats_summary.index & dem_trader_stats_summary.index)

common_trader_count = len(common_traders)

common_trader_count

2042

This means that about half of the traders in the DEM and REP markets traded in both markets.

In [7]:
len(common_traders) / len(rep_trader_stats_summary.index)

0.458670260557053

In [8]:
len(common_traders) / len(dem_trader_stats_summary.index)

0.5445333333333333

# Compare Classifications

For the traders who were in both markets, how many had the same classifications?

We can merge the two dataframes together and have a look.

In [9]:
joint_classifications = dem_trader_classifications.merge(rep_trader_classifications,
                                                         how='inner',
                                                         left_index=True,
                                                         right_index=True,
                                                         suffixes = ('_dem', '_rep'))

For the 2042 traders who were in both markets, about two thirds of them were classified as efficient in both.

In [10]:
val = (joint_classifications['efficiency_dem'] == joint_classifications['efficiency_rep']).sum() / common_trader_count

val

0.65621939275220376

That was lower than what we see for size and activity, suggesting that efficiency classification is not as consistent as the other metrics.

Here is the crosstab table (Table 12):

In [11]:
out = pd.crosstab(joint_classifications.efficiency_dem, joint_classifications.efficiency_rep)

out

efficiency_rep,Efficient,Inefficient
efficiency_dem,Unnamed: 1_level_1,Unnamed: 2_level_1
Efficient,242,437
Inefficient,265,1098


In [12]:
print(out.to_latex())

\begin{tabular}{lrr}
\toprule
efficiency\_rep &  Efficient &  Inefficient \\
efficiency\_dem &            &              \\
\midrule
Efficient      &        242 &          437 \\
Inefficient    &        265 &         1098 \\
\bottomrule
\end{tabular}



Large / Small traders had the same classification 80% of the time.

In [13]:
(joint_classifications['size_dem'] == joint_classifications['size_rep']).sum() / common_trader_count

0.80019588638589623

Active / Inactive traders has the same classification about 76% of the time.

In [14]:
(joint_classifications['activity_dem'] == joint_classifications['activity_rep']).sum() / common_trader_count

0.75710088148873655

So, are the traders trading efficiently for random reasons or are we measuring something real? Is that 65.6% value statistically significant?

In other words, if we had randomly assigned traders the designation "efficient" would that 65.6% value be significant?

We know that in the DEM market 1/3 of the traders are efficient and in the REP market 1/4 of the traders are efficient. These proportions are also the case for the subset of traders that are in both markets.

In [15]:
joint_classifications.efficiency_dem.value_counts() / common_trader_count

Inefficient   0.6675
Efficient     0.3325
Name: efficiency_dem, dtype: float64

In [16]:
joint_classifications.efficiency_rep.value_counts() / common_trader_count

Inefficient   0.7517
Efficient     0.2483
Name: efficiency_rep, dtype: float64

To test the statistical significance of this we can do a monte-carlo simulation to see what the distribution would be if it were random.

Randomly sample two classifications (1 or 2) in two sets. In the first the probability of getting a 1 is 1/3, in the second it is 1/4.

We see that the mean is 58.3% with a standard deviation of 0.01.

In [17]:
n = 10000
a = np.random.choice([1, 2, 2], size=(common_trader_count, n))
b = np.random.choice([1, 2, 2, 2], size=(common_trader_count, n))

values = (a == b).sum(axis=0) / common_trader_count

m, s = values.mean(), values.std()

m, s

(0.58325763956904997, 0.010926909245103899)

This gives us a z-score of 6.7.

In [18]:
(val - m) / s

6.6772544318372828

Clearly this is statistically significant.