First, let's read in our data.

In [2]:
import pandas as pd

df = pd.read_csv("data.csv")

df.head()

Unnamed: 0,Date,a/b Open,a/b Close,b/c open,b/c close,a/c open,a/c close
0,1/1/10,11.035,11.035,13.66,13.635,12.9,12.885
1,1/4/10,11.115,11.05,13.66,13.685,12.95,13.0
2,1/5/10,11.04,10.59,13.69,13.32,13.0,12.65
3,1/6/10,10.595,10.5425,13.38,12.9025,12.715,12.495
4,1/7/10,10.545,10.6375,12.95,13.1,12.58,12.27


Here's the result of `df.describe()` on our data.

In [3]:
df.describe()

Unnamed: 0,a/b Open,a/b Close,b/c open,b/c close,a/c open,a/c close
count,1781.0,1781.0,1781.0,1781.0,1781.0,1781.0
mean,9.916023,9.897449,9.935126,9.926391,11.852176,11.852465
std,2.686881,2.694516,2.271425,2.276009,2.953024,2.952725
min,4.1575,4.165,4.45,4.45,4.3475,4.315
25%,7.875,7.82,8.35,8.3125,10.005,9.99
50%,9.755,9.745,9.96,9.945,11.5925,11.575
75%,11.93,11.915,11.44,11.425,13.8125,13.8
max,17.83,17.735,17.43,17.43,24.155,23.8875


Next, let's define a method to predict a normal distribution based on the joint distribution of two different normal distributions using Monte Carlo simulation. This is what we will use to predict a/c close.

In [4]:
import numpy as np

def random_return(mu_1, sigma_1, mu_2, sigma_2):
    """Random return of two distributions."""
    rand_1 = np.random.normal(mu_1, sigma_1)
    rand_2 = np.random.normal(mu_2, sigma_2)
    return np.exp(rand_1) * np.exp(rand_2) - 1


def monte_carlo_distribution(mu_1, sigma_1, mu_2, sigma_2, iterations=1000):
    """Returns a distribution generated by a Monte Carlo simulation."""
    dist = [random_return(mu_1, sigma_1, mu_2, sigma_2) for x in range(iterations)]
    return pd.DataFrame(dist)

Let's make sure this result is reasonable. We will run this on the first row of data.

In [5]:
row = df.iloc[0]
ab_open = row['a/b Open']
bc_open = row['b/c open']
ac_open = row['a/c open']

print(ab_open, bc_open, ac_open)

dist = monte_carlo_distribution(0, ab_open / 100, 0, bc_open / 100, 1000000)
dist.head()

11.035 13.66 12.9


Unnamed: 0,0
0,0.214899
1,-0.042035
2,-0.240342
3,-0.024661
4,0.076881


In [6]:
dist.describe()

Unnamed: 0,0
count,1000000.0
mean,0.015608
std,0.179908
min,-0.580306
25%,-0.111949
50%,-6.7e-05
75%,0.126008
max,1.359966


Mean should be close to zero.

Now, let's predict our a/c close given a/c open, a/b open, and b/c open.

In [7]:
def predict_ac_open(ab_open, bc_open):
    dist = monte_carlo_distribution(0, ab_open / 100, 0, bc_open / 100, 1000)
    return float(dist.std() * 100)

def predict_row(row):
    ab_open = row['a/b Open']
    bc_open = row['b/c open']
    ac_open = row['a/c open']
    ac_close = row['a/c close']
    expected_open = predict_ac_open(ab_open, bc_open)
    diff = expected_open - ac_open
    should_buy = expected_open > ac_open
    close_open_diff = ac_close - ac_open
    
    success = (close_open_diff >= 0 and should_buy) \
        or (close_open_diff < 0 and not should_buy)
        
    return {
        'expected_open': expected_open,
        'actual_open': ac_open,
        'actual_close': ac_close,
        'buy': 1 if should_buy else 0,
        'success': 1 if success else 0,
        'diff': diff,
        'close_open_diff': close_open_diff,
    }
    

results = pd.DataFrame([predict_row(row) for index, row in df.iterrows()])
results.head()

Unnamed: 0,actual_close,actual_open,buy,close_open_diff,diff,expected_open,success
0,12.885,12.9,1,-0.015,4.575988,17.475988,0
1,13.0,12.95,1,0.05,5.366725,18.316725,1
2,12.65,13.0,1,-0.35,4.288322,17.288322,0
3,12.495,12.715,1,-0.22,3.785016,16.500016,0
4,12.27,12.58,1,-0.31,4.843862,17.423862,0


In [11]:
results.describe()

Unnamed: 0,actual_close,actual_open,buy,close_open_diff,diff,expected_open,success
count,1781.0,1781.0,1781.0,1781.0,1781.0,1781.0,1781.0
mean,11.852465,11.852176,0.970803,0.000289,2.576439,14.428615,0.474453
std,2.952725,2.953024,0.168406,0.55856,1.482577,2.984227,0.499487
min,4.315,4.3475,0.0,-3.3925,-1.739264,6.141226,0.0
25%,9.99,10.005,1.0,-0.275,1.426519,12.592947,0.0
50%,11.575,11.5925,1.0,-0.0225,2.532438,14.687039,0.0
75%,13.8,13.8125,1.0,0.225,3.690406,16.336962,1.0
max,23.8875,24.155,1.0,3.9475,7.461948,24.801953,1.0


This is inconclusive -- actually worse. Success rate is ~50%.