**Training a neural network to read price action like floor trader**

First of all, a thank you for uploading the trading data and therefore making it available for the following analysis and discussion.

**1. Introduction**

The data is presented in OHLC format in 1min resolution with additional volumetric and valuation data points like volume traded (VolB) and value traded (VolC). Using these additional data points, a volume weighted average price (WgtPx) is derived. (For completeness, WgtPx is included in the supplied data).

Traditional time series analysis relies on a typical relationship like Y[t] = f( Y[t-1], Y[t-2]…) where Y = price at time with a laundry list of over-arching assumptions. Using the data in the such format, technical analysis/indicators like SMA(15) are constructed. This following discussion will deviate from such norms and focuses on price action of the WgtPx i.e. Given the relative positions of OHLC in each snapshot, what is the relative movement of WgtPx in the next snapshot?

It should be noted the definition of WgtPx implies that ’market clearing’ benchmark over the said time horizon and therefore the relative movements of this said benchmark presents a view of the underlying valuation (V) of the financial instrument/asset. This is the discussion’s hypothesis.

The above approach also assumes that the change in price is a ‘not-so-random’ walk (with drift?) and therefore the underlying price action is a leading indicator for V. Price action is opined to be readily observed in markets where liquidity and price volatility are highest, but anything that is bought or sold freely in a market will per se demonstrate price action (Source: Wikipedia). In this case, the Bitcoin market fits the former description.

**2. Translating the above into Python**

O: Price at the start of snapshot which is fixed

C: Price at the close of snapshot which may equal O

H: Highest price recorded during the snapshot which may equal C and or O

L: Lowest price recorded during the snapshot which may equal C and or O

WgtPx, W: A derived price based on the ratio of value traded to volume traded

Hence:

Change in WgtPX or V[t] – V[t-1]= f(HO[t], LO[t], CO[t], WO[t]), 

where HO, LO, CO and WO are the relative distance of H, L, C and W from a fixed datum O[t]

```
INPUT_FILE_1 = os.path.join(os.getcwd(),sub_dir_in, filename_1) 
#filename_1 is a cleaned up pickle file based on 
#'coinbaseUSD_1-min_data_2014-12-01_to_2017-05-31.csv'

with open(INPUT_FILE_1, 'rb') as file:
    df= pickle.load(file)

df = df[['O', 'H', 'L', 'C', 'VolB', 'VolC','WgtPx']]
df.index = pd.date_range('2015-02-28 23:59:00', periods=len(df), freq='1T')
df.index.name ='time_UTC'

#Resample data

df['V'] = df.WgtPx - df.WgtPx.shift(1)
df['V+1'] = df['V'].shift(-1)
df['WC'] = df.WgtPx - df.C

df['HO'] = df.H - df.O
df['LO'] = df.L - df.O
df['CO'] = df.C - df.O
df['WO'] = df.WgtPx - df.O
df_Xt = df.iloc[:,-4:]
train_Xt_array = df_Xt.values
```

*Making this into a supervised learning problem*
```
UP = np.logical_and(df['V+1'].round(decimals=2)>0, 
                    np.logical_and(df['CO']>0, df['WO']>0)).astype(int)
DN = np.logical_and(df['V+1'].round(decimals=2)<0, 
                    np.logical_and(df['CO']<0, df['WO']<0)).astype(int)
FLAT = np.logical_and(UP==0, DN==0).astype(int)
df_Yt = pd.concat([UP, DN, FLAT], join = 'outer', axis =1)
df_Yt.columns = ['UP', 'DN', 'FLAT']
train_y_array = df_Yt.values
```

One can check-sum the above by ensuring df_Yt.mean().values.sum() == 1

**3. Neutral Network setup**

The proposed neutral network is a straight vanilla ANN as set up below. This also assumes an efficient market hypothesis where prices fully reflect all available information. The model is structured as a classfication model simply because it is easier to relate to the model's effectiveness (accuracy, recall, f1-score) as opposed to the model being a regression variant (RMS).

```
model = Sequential()
model.add(Dense(32, activation = 'tanh', input_dim = features))
model.add(Dropout(0.2))
model.add(Dense(32, activation = 'tanh'))
model.add(Dropout(0.1))
model.add(Dense(32, activation = 'tanh'))
model.add(Dropout(0.1))
model.add(Dense(3, activation = 'softmax')) 
# out shaped on df_Yt.shape[1]
model.compile(loss='categorical_crossentropy', optimizer='adam', 
                  metrics=['accuracy'])
```

A plausible alternative is a LSTM (which is not discussed here) therefore assumes there are some memory in the price actions. 

Lastly, the structure (# of hidden layers and neutrons) of the NN is somewhat arbitrary and solely for the purpose of this discussion.

*Fitting the Neutral Network*

The models are trained using a cloud GPU and with the following hyperparameters:
```
batch_size = 60*24 # Total 'blocks/snapshot' in a day
epochs = 1000
```
Model output/prediction is put thru sci-kit learn's classification report
```
Series_pred = np.argmax(model.predict(train_Xt_array, 
                                      batch_size=batch_size, 
                                      verbose = 0),axis = 1)
Series_actual = np.argmax(train_y_array, axis = 1)
classreport= classification_report(Series_actual, Series_pred, 
                                   target_names = df_Yt.columns,
                                   digits = 4)
print(classreport)                                  
```

**4. Reviewing the trained models**

The model is trained using trading data for calendar year 2016. 

Note: Consider lowering epoch to 100 if one wishes to train the model on a local machine (with basic specs). The following results are based on 100 epochs.

```
             precision    recall  f1-score   support

         UP     0.6139    0.8120    0.6992    106311
         DN     0.6245    0.4227    0.5042     70077
       FLAT     0.8217    0.7943    0.8078    350652

avg / total     0.7536    0.7485    0.7455    527040

```
It is worth nothing the Keras' accuracy metric differs from the above as the Keras uses a straight average method whilst ski-learn computes on a weighted average basis

*Walking forward the model*

Validating the model with trading data from 1 Jan 2017 to 30 May 2017

```
             precision    recall  f1-score   support

         UP     0.6756    0.8454    0.7510     50578
         DN     0.6520    0.5271    0.5829     33741
       FLAT     0.8104    0.7720    0.7907    131681

avg / total     0.7541    0.7509    0.7490    216000
```

**5.Developing the model further**

A potential nextstep is to upsample the data. Since 60 is divisible by 2, 3, 4, 5 and 10, a function can be written to resample the OHLC data in those resolution. Furthermore, up-sampling can smooth out the 'noise' at the lower resolution and possibly allowing the underlying pattern to come thru.

It also worth noting that price action can be reasonably assummed to be is the principal drive of valuation within a short time frame. Such assumption may be tenuous in higher resolution snapshots. 