# Cross Market Efficiency Consistency

Below is the code for Table 12.

To run this notebok you will first need to run both of the Trader Analysis notebooks.

In [None]:
import os
import sys
import re
from collections import defaultdict
from functools import lru_cache

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pandas import Series, DataFrame, Panel

idx = pd.IndexSlice

from research_tools import storage

pd.options.display.float_format = lambda x: '{:,.4f}'.format(x) if abs(x) < 1 else '{:,.2f}'.format(x)

# Load Data

First load the data we saved at the end of the Trader Analysis notebooks.

In [None]:
def load_pickle(filename):
    with open(os.path.join('data', filename), 'rb') as f:
        return pickle.load(f)

In [None]:
os.chdir('..')

dem_trader_classifications = load_pickle('dem.trader_classifications.p')
dem_trader_stats_summary = load_pickle('dem.trader_stats_summary.p')

rep_trader_classifications = load_pickle('gop.trader_classifications.p')
rep_trader_stats_summary = load_pickle('gop.trader_stats_summary.p')

In [None]:
dem_trader_classifications.head()

In [None]:
dem_trader_stats_summary.head()

There are 2042 traders who traded in both markets.

In [None]:
common_traders = set(rep_trader_stats_summary.index & dem_trader_stats_summary.index)

common_trader_count = len(common_traders)

common_trader_count

This means that about half of the traders in the DEM and REP markets traded in both markets.

In [None]:
len(common_traders) / len(rep_trader_stats_summary.index)

In [None]:
len(common_traders) / len(dem_trader_stats_summary.index)

# Compare Classifications

For the traders who were in both markets, how many had the same classifications?

We can merge the two dataframes together and have a look.

In [None]:
joint_classifications = dem_trader_classifications.merge(rep_trader_classifications,
                                                         how='inner',
                                                         left_index=True,
                                                         right_index=True,
                                                         suffixes = ('_dem', '_rep'))

For the 2042 traders who were in both markets, about two thirds of them were classified as efficient in both.

In [None]:
val = (joint_classifications['efficiency_dem'] == joint_classifications['efficiency_rep']).sum() / common_trader_count

val

That was lower than what we see for size and activity, suggesting that efficiency classification is not as consistent as the other metrics.

Here is the crosstab table (Table 12):

In [None]:
out = pd.crosstab(joint_classifications.efficiency_dem, joint_classifications.efficiency_rep)

out

In [None]:
print(out.to_latex())

Large / Small traders had the same classification 80% of the time.

In [None]:
(joint_classifications['size_dem'] == joint_classifications['size_rep']).sum() / common_trader_count

Active / Inactive traders has the same classification about 76% of the time.

In [None]:
(joint_classifications['activity_dem'] == joint_classifications['activity_rep']).sum() / common_trader_count

So, are the traders trading efficiently for random reasons or are we measuring something real? Is that 65.6% value statistically significant?

In other words, if we had randomly assigned traders the designation "efficient" would that 65.6% value be significant?

We know that in the DEM market 1/3 of the traders are efficient and in the REP market 1/4 of the traders are efficient. These proportions are also the case for the subset of traders that are in both markets.

In [None]:
joint_classifications.efficiency_dem.value_counts() / common_trader_count

In [None]:
joint_classifications.efficiency_rep.value_counts() / common_trader_count

To test the statistical significance of this we can do a monte-carlo simulation to see what the distribution would be if it were random.

Randomly sample two classifications (1 or 2) in two sets. In the first the probability of getting a 1 is 1/3, in the second it is 1/4.

We see that the mean is 58.3% with a standard deviation of 0.01.

In [None]:
n = 10000
a = np.random.choice([1, 2, 2], size=(common_trader_count, n))
b = np.random.choice([1, 2, 2, 2], size=(common_trader_count, n))

values = (a == b).sum(axis=0) / common_trader_count

m, s = values.mean(), values.std()

m, s

This gives us a z-score of 6.7.

In [None]:
(val - m) / s

Clearly this is statistically significant.