## Analysis

This notebook contains the code for the analysis of the final dataset. While rigorous econometric analysis is limited (due to the nature of the dataset), some basic tests for market efficiency can reveal a lot of insight as to how the machine learning traders affected the market.

The tests we will conduct will look at a variety of factors, including mean-reversion tendencies, run tests and whether our different types of traders outperformed some simple strategies such as filters and buy-and-hold strategies.

In [21]:
import numpy as np
import pandas as pd
import math
import matplotlib.pyplot as plt
import csv
import glob

In [46]:
def unpack_data(filepath): #Returns a dictionary where each item is a df containing one run, one dict per generation
    all_files = glob.glob(filepath + "/*.csv")
    name_list = []
    datadict = {}
    for f in all_files:
        name_list.append(f[89:-4])
    for i,n in enumerate(name_list):
        datadict[n] = pd.read_csv(all_files[i], header=0)
#    for k, v in datadict.items():
#        v.drop(columns=["Unnamed: 0", "volume", "spread", "10_MA", "50_MA"], inplace=True)
    return datadict

In [47]:
gen0 = unpack_data("/Users/karangarg/Documents/Year 3 Modules/EC331/Code/rae_repo/simulations/gen0_sims/data") #Load gen0 data

In [48]:
def compute_mean_squared_diff(datadict): #Returns average mean squared difference, i.e. the square of the average distance between the true price and trading price for any given period in any given simlation
    msd_list = []
    msd_dict = {}
    for k, v in datadict.items():
        diff_list = []
        for i in range(len(v)):
            squared_diff = (v.iloc[i]["trading_price"] - v.iloc[i]["true_price"])**2
            diff_list.append(squared_diff)
        msd = sum(diff_list)/len(diff_list)
        msd_list.append(msd)
        msd_dict[k] = msd
    avg_msd = sum(msd_list)/len(msd_list)
    return avg_msd, msd_dict

In [18]:
#gen0_msd, gen0_msd_dict = compute_mean_squared_diff(gen0)

In [52]:
def reversion_rate(datadict, dev=1): #For any given run of the market, how many times does price deviate and how long does it take to return to the true price
    final_avg_return_list = []
    final_avg_no_return_list = []
    return_run_length_list = []
    no_return_run_length_list = []
    for k, v in datadict.items():
        run_count = 0
        dev_return_list = []
        dev_no_return_list = []
        for i in range(len(v)):
            if abs(v.iloc[i]["trading_price"] - v.iloc[i]["true_price"]) > dev and abs(v.iloc[i-1]["trading_price"] - v.iloc[i-1]["true_price"]) <= dev:
                for t in range(i, len(v)):
                    if abs(v.iloc[t]["trading_price"] - v.iloc[t]["true_price"]) > dev and t == len(v)-1:
                        dev_no_return_list.append(t-i)
                        break
                    elif abs(v.iloc[t]["trading_price"] - v.iloc[t]["true_price"]) <= dev:
                        dev_return_list.append(t-i)
                        break
        print(f"Returned runs for {k}: {dev_return_list}")
        print(f"Unreturned runs for {k}: {dev_no_return_list}")
#        avg_return_time = sum(dev_return_list)/len(dev_return_list)
#        avg_no_return_time = sum(dev_no_return_list)/len(dev_no_return_list)
#        final_avg_return_list.append(avg_return_time)
#        return_run_length_list.append(len(dev_return_list))
#        final_avg_no_return_list.append(dev_no_return_list)
#        no_return_run_length_list.append(len(dev_no_return_list))
#    final_avg_return = sum(final_avg_return_list)/len(final_avg_return_list) #Length of avg deviation run (that returned)
#    final_avg_no_return = sum(final_avg_no_return_list)/len(final_avg_no_return_list) #Length of avg deviation run (that didn't return)
#    avg_no_returns = sum(return_run_length_list)/len(return_run_length_list)
#    avg_no_no_returns = sum(no_return_run_length_list)/len(no_return_run_length_list) #Avg number of deviations per run that returned
    return final_avg_return, final_avg_no_return, avg_no_returns, avg_no_no_returns #Avg number of deviations per run that didn't return

In [51]:
reversion_rate(gen0)

Returned runs for gen0sim35: []
Unreturned runs for gen0sim35: [355]
Returned runs for gen0sim21: [1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2]
Unreturned runs for gen0sim21: [0]
Returned runs for gen0sim20: [3, 3, 1, 3, 2, 4, 2, 1, 1, 7, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1]
Unreturned runs for gen0sim20: []
Returned runs for gen0sim34: [2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 2, 5, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 4, 1, 1, 2, 2, 1]
Unreturned runs for gen0sim34: []
Returned runs for gen0sim22: [1, 2, 2, 1, 7, 2, 1, 1, 14, 5, 1, 1, 1, 4, 1, 3, 5, 1, 1, 1, 1, 1, 1, 3, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Unreturned runs for gen0sim22: []
Returned runs for gen0sim36: [8, 26, 2, 54, 5, 1, 4]
Unreturned runs for gen0sim36: []
Returned runs for gen0sim37: [1, 2, 1, 1, 2, 1, 1, 2, 2, 22, 7, 4, 1, 1, 2, 2, 1, 2, 2, 2, 3, 1, 1, 1, 137, 1, 164, 1, 9, 1, 1, 1, 1]
Unreturned runs for gen0sim37: []
Returned runs for gen0sim23: [1, 5, 1

Returned runs for gen0sim77: [3, 1, 2, 1, 2, 20, 1, 1, 1, 3, 2, 1, 7, 6, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1]
Unreturned runs for gen0sim77: [0]
Returned runs for gen0sim63: [74, 2]
Unreturned runs for gen0sim63: [227]
Returned runs for gen0sim88: [2, 103, 7, 3, 3, 1, 3, 1, 1, 3, 1, 1, 1, 1, 3, 3, 1]
Unreturned runs for gen0sim88: []
Returned runs for gen0sim89: [6]
Unreturned runs for gen0sim89: [161]
Returned runs for gen0sim62: []
Unreturned runs for gen0sim62: [374]
Returned runs for gen0sim76: [6, 19, 72, 3, 1, 3, 3, 1, 2, 4, 1, 1, 2, 1, 6, 1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 3, 1, 1, 1, 1, 3, 1, 1, 1, 1]
Unreturned runs for gen0sim76: []
Returned runs for gen0sim48: [2, 2, 2, 5, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 2]
Unreturned runs for gen0sim48: []
Returned runs for gen0sim60: [1, 2, 2, 5, 88, 6, 1, 1, 1, 1, 1, 1, 1]
Unreturned runs for gen0sim60: []
Returned runs for gen0sim74: []
Unreturned runs for gen0sim74

NameError: name 'final_avg_return' is not defined

In [30]:
def inefficient_proportion(datadict): #Computes the proportion of time that the true and trading prices deviate (1%)
    prop_list = []
    for k, v in datadict.items():
        prop_temp = 0
        for i in range(len(v)):
            if abs(v.iloc[i]["trading_price"] - v.iloc[i]["true_price"]) > 1:
                prop_temp +=1
        prop_list.append((prop_temp/len(v)))
    prop = sum(prop_list)/len(prop_list)
    return prop

In [31]:
inefficient_proportion(gen0)

0.5242133546690094

In [1]:
print(__name__)

__main__


In [35]:
li = [1, 2, 3, 4]
for i in range(2, len(li)):
    print(i)

2
3
