Reinforcement Learning for Intraday Trading
===========================================

In this project, our aim is to implement a Reinforcement Learning (RL)
strategy for trading stocks. Adopting a learning-based approach, in
particular using RL, entails several potential benefits over current
approaches. Firstly, several ML methods allow learning-based
pre-processing steps, such as convolutional layers which enable
automatic feature extraction and detection, and may be used to focus the
computation on the most relevant features. Secondly, constructing an
end-to-end learning-based pipeline makes the prediction step implicit,
and potentially reduces the problem complexity to predicting only
certain aspects or features of the time series which are necessary for
the control strategy, as opposed to attempting to predict the exact time
series values. Thirdly, an end-to-end learning-based approach alleviates
potential bounds of the step-wise modularization that a human-designed
pipeline would entail, and allows the learning algorithm to
automatically deduce the optimal strategy for utilizing any feature
signal, in order to execute the most efficient control strategy.

The main idea behind RL algorithms is to learn by trial-and-error how to
act optimally. An agent gathers experience by iteratively interacting
with an environment. Starting in state S*t, the agent takes an action
A*t and receives a reward R*t+1 as it moves to state S*t+1, as seen
below
([source](https://upload.wikimedia.org/wikipedia/commons/d/da/Markov_diagram_v2.svg)).
Using this experience, RL algorithms can learn either a value function
or a policy directly. We learn the former, which can then be used to
compute optimal actions, by chosing the action that maximizes the action
value, Q. Specifically, we use the DQN -- Deep Q-Network -- algorithm to
train an agent which trades Brent Crude Oil (BCOUSD) stocks, in order to
maximize profit.

&lt;img
src=https://upload.wikimedia.org/wikipedia/commons/d/da/Markov*diagram*v2.svg
width=600&gt;

Group members:
--------------

-   Fabian Sinzinger
-   Karl Bäckström
-   Rita Laezza

In [None]:
// Scala imports
import org.lamastex.spark.trendcalculus._
import spark.implicits._
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import java.sql.Timestamp
import org.apache.spark.sql.expressions._

  

>     import org.lamastex.spark.trendcalculus._
>     import spark.implicits._
>     import org.apache.spark.sql._
>     import org.apache.spark.sql.functions._
>     import java.sql.Timestamp
>     import org.apache.spark.sql.expressions._

  

Brent Crude Oil Dataset
-----------------------

The dataset consists of historical data starting from the *14th of
October 2010* to the *21st of June 2019*. Since the data in the first
and last day is incomplete, we remove it from the dataset. The BCUSD
data is sampled approximatly every minute with a specific timestamp and
registered in US dollars.

To read the BCUSD dataset, we use the same parsers provided by the
[TrendCalculus](https://github.com/lamastex/spark-trend-calculus)
library. This allows us to load the FX data into a Spark Dataset. The
**fx1m** function returns the dataset as **TickerPoint** objects with
values **x** and **y**, which are **time** and a **close** values
respectively. The first consists of the name of the stock, the second is
the timestamp of the data point and the latter consists of the value of
the stock at the end of each 1 minute bin.

Finally we add the **index** column to facilitate retrieving values from
the table, since there are gaps in the data meaning that not all minutes
have an entry. Further a \*\*diff\_close\*\* column was added, which
consists of the relative difference between the **close** value at the
current and the previous **time**. Note hat since **ticker** is always
the same, we remove that column.

In [None]:
// Load dataset
val oilDS = spark.read.fx1m("dbfs:/FileStore/shared_uploads/fabiansi@kth.se/*csv.gz").toDF.withColumn("ticker", lit("BCOUSD")).select($"ticker", $"time" as "x", $"close" as "y").as[TickerPoint].orderBy("time")

// Add column with difference from previous close value (expected 'x', 'y' column names)
val windowSpec = Window.orderBy("x")
val oilDS1 = oilDS 
.withColumn("diff_close", $"y" - when((lag("y", 1).over(windowSpec)).isNull, 0).otherwise(lag("y", 1).over(windowSpec)))

// Rename variables
val oilDS2 = oilDS1.withColumnRenamed("x","time").withColumnRenamed("y","close")

// Remove incomplete data from first day (2010-11-14) and last day (2019-06-21)
val oilDS3 = oilDS2.filter(to_date(oilDS2("time")) >= lit("2010-11-15") && to_date(oilDS2("time")) <= lit("2019-06-20"))

// Add index column
val windowSpec1 = Window.orderBy("time")
val oilDS4 = oilDS3
.withColumn("index", row_number().over(windowSpec1))

// Drop ticker column
val oilDS5 = oilDS4.drop("ticker")

// Store loaded data as temp view, to be accessible in Python
oilDS5.createOrReplaceTempView("temp")

  

>     oilDS: org.apache.spark.sql.Dataset[org.lamastex.spark.trendcalculus.TickerPoint] = [ticker: string, x: timestamp ... 1 more field]
>     windowSpec: org.apache.spark.sql.expressions.WindowSpec = org.apache.spark.sql.expressions.WindowSpec@347f9beb
>     oilDS1: org.apache.spark.sql.DataFrame = [ticker: string, x: timestamp ... 2 more fields]
>     oilDS2: org.apache.spark.sql.DataFrame = [ticker: string, time: timestamp ... 2 more fields]
>     oilDS3: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [ticker: string, time: timestamp ... 2 more fields]
>     windowSpec1: org.apache.spark.sql.expressions.WindowSpec = org.apache.spark.sql.expressions.WindowSpec@3c3c818a
>     oilDS4: org.apache.spark.sql.DataFrame = [ticker: string, time: timestamp ... 3 more fields]
>     oilDS5: org.apache.spark.sql.DataFrame = [time: timestamp, close: double ... 2 more fields]

  

### Preparing the data in Python

Because the
[TrendCalculus](https://github.com/lamastex/spark-trend-calculus)
library we use is implemented in Scala and we want to do our
implementation in Python, we have to make sure that the data loaded in
Scala is correctly read in Python, before moving on. To that end, we
select the first 10 data points and show them in a table.

We can see that there are roughly **2.5 million data points** in the
BCUSD dataset.

In [None]:
#Python imports
import datetime
import gym
import math
import random
import json
import collections
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers.core import Dense
from keras.layers import Conv1D, MaxPool1D, Flatten, BatchNormalization
from keras import optimizers

In [None]:
# Create Dataframe from temp data
oilDF_py = spark.table("temp")

# Select the 10 first Rows of data and print them
ten_oilDF_py = oilDF_py.limit(10)
ten_oilDF_py.show()

# Check number of data points
last_index = oilDF_py.count()
print("Number of data points: {}".format(last_index))

# Select the date of the last data point
print("Last data point: {}".format(np.array(oilDF_py.where(oilDF_py.index == last_index).select('time').collect()).item()))

  

>     +-------------------+-----+--------------------+-----+
>     |               time|close|          diff_close|index|
>     +-------------------+-----+--------------------+-----+
>     |2010-11-15 00:00:00| 86.6|-0.01000000000000...|    1|
>     |2010-11-15 00:01:00| 86.6|                 0.0|    2|
>     |2010-11-15 00:02:00|86.63|0.030000000000001137|    3|
>     |2010-11-15 00:03:00|86.61|-0.01999999999999602|    4|
>     |2010-11-15 00:05:00|86.61|                 0.0|    5|
>     |2010-11-15 00:07:00| 86.6|-0.01000000000000...|    6|
>     |2010-11-15 00:08:00|86.58|-0.01999999999999602|    7|
>     |2010-11-15 00:09:00|86.58|                 0.0|    8|
>     |2010-11-15 00:10:00|86.58|                 0.0|    9|
>     |2010-11-15 00:12:00|86.57|-0.01000000000000...|   10|
>     +-------------------+-----+--------------------+-----+
>
>     Number of data points: 2523078
>     Last data point: 2019-06-20 23:59:00

  

RL Environment
--------------

In order to train RL agents, we first need to create the environment
with which the agent will interact to gather experience. In our case,
that consist of a stock market simulation which plays out historical
data from the BCUSD dataset. This is valid, under the assumption that
the trading on the part of our agent has no affect on the stock market.
An RL problem can be formally defined by a Markov Decision Process
(MDP).

For our application, we have the following MDP: - State, s: a window of
**diff*close\*\* values for a given **scope**, i.e. the current value
and history leading up to it. - Action, a: either **LONG** for buying
stock, or **SHORT** for selling stock. Note that **PASS** is not
required, since if stock is already owned, buying means holding and if
stock is not owned then shorting means pass. - Reward, r: if
a*t=**LONG\*\* r*t=s*t+1=\*\*diff\_close\*\*; if a*t=**SHORT**
r*t=-s\_t+1=-\*\*diff\_close\*\*. Essentially, the reward is negative if
we sell and the stock goes up or if we buy and the stock goes down in
the next timestep. Conversely, the reward is positive if we buy and the
stock goes up or if we sell and the stock goes down in the next
timestep.

This environment is very simplified, with only binary actions. An
alternative could be to use continuos actions to determine how much
stock to buy or sell. However, since we aim to compare to TrendCalculus
results which only predict reversals, these actions are more adequate.
For the implementation, we used OpenAI Gym's formalism, which includes a
**done** variable to indicate the end of an episode. In **MarketEnv**,
by setting the \*\*start\_date\*\* and \*\*end\_date\*\* atttributes, we
can select the part of the dataset we wish to use. Finally, the and
\*\*episode\_size\*\* parameter determines the episode size. An
episode's starting point can be sampled at random or not, which is
defined when calling **reset**.

In [None]:
# Adapted from: https://github.com/kh-kim/stock_market_reinforcement_learning/blob/master/market_env.py


class MarketEnv(gym.Env):
    def __init__(self, full_data, start_date, end_date, episode_size=30*24*60, scope=60):
        self.episode_size = episode_size
        self.actions = ["LONG", "SHORT"] 
        self.action_space = gym.spaces.Discrete(len(self.actions))
        self.state_space = gym.spaces.Box(np.ones(scope) * -1, np.ones(scope))

        self.diff_close = np.array(full_data.filter(full_data["time"] > start_date).filter(full_data["time"] <= end_date).select('diff_close').collect())
        max_diff_close = np.max(self.diff_close)
        self.diff_close = self.diff_close*max_diff_close
        self.close = np.array(full_data.filter(full_data["time"] > start_date).filter(full_data["time"] <= end_date).select('close').collect())
        self.num_ticks_train = np.shape(self.diff_close)[0]

        self.scope = scope # N values to be included in a state vector
        self.time_index = self.scope  # start N steps in, to ensure that we have enough past values for history 
        self.episode_init_time = self.time_index  # initial time index of the episode


    def step(self, action):
        info = {'index': int(self.time_index), 'close': float(self.close[self.time_index])}
        self.time_index += 1
        self.state = self.diff_close[self.time_index - self.scope:self.time_index]
        self.reward = float( - (2 * action - 1) * self.state[-1] )
        
        # Check if done
        if self.time_index - self.episode_init_time > self.episode_size:
            self.done = True
        if self.time_index > self.diff_close.shape[0] - self.scope -1:
            self.done = True

        return self.state, self.reward, self.done, info

    def reset(self, random_starttime=True):
        self.done = False
        self.reward = 0.
        self.time_index = self.scope 
        self.state = self.diff_close[self.time_index - self.scope:self.time_index]
        
        if random_starttime:
            self.time_index += random.randint(0, self.num_ticks_train - self.scope)
        
        self.episode_init_time = self.time_index
        
        return self.state

    def seed(self):
        pass

In [None]:
states = []
actions = []
rewards = []
reward_sum = 0.

# Verify environment for 1 hour
start = datetime.datetime(2010, 11, 15, 0, 0)
end = datetime.datetime(2010, 11, 15, 1, 0)

env = MarketEnv(oilDF_py, start, end, episode_size=np.inf, scope=1)
state = env.reset(random_starttime=False)
done = False
while not done:
    states.append(state[-1])
    # Take random actions
    action = env.action_space.sample()
    actions.append(action)
    state, reward, done, info = env.step(action)
    rewards.append(reward)
    reward_sum += reward
print("Return = {}".format(reward_sum))

  

>     Return = 0.005

In [None]:
# Plot samples
timesteps = np.linspace(1,len(states),len(states))
longs = np.argwhere(np.asarray(actions) ==  0)
shorts = np.argwhere(np.asarray(actions) ==  1)
states = np.asarray(states)
fig, ax = plt.subplots(2, 1, figsize=(16, 8))
ax[0].grid(True)
ax[0].plot(timesteps, states, label='diff_close')
ax[0].plot(timesteps[longs], states[longs].flatten(), '*g', markersize=12, label='long')
ax[0].plot(timesteps[shorts], states[shorts].flatten(), '*r', markersize=12, label='short')
ax[0].set_ylabel("(s,a)")
ax[0].set_xlabel("Timestep")
ax[0].set_xlim(1,len(states))
ax[0].set_xticks(np.arange(1, len(states), 1.0))
ax[0].legend()
ax[1].grid(True)
ax[1].plot(timesteps, rewards, 'o-r')
ax[1].set_ylabel("r")
ax[1].set_xlabel("Timestep")
ax[1].set_xlim(1,len(states))
ax[1].set_xticks(np.arange(1, len(states), 1.0))
plt.tight_layout()
display(fig)

  

DQN Algorithm
-------------

Since we have discrete actions, we can use Q-learning to train our
agent. Specifically we use the DQN algorithm with Experience Replay,
which was first described in DeepMind's: [Playing Atari with Deep
Reinforcement Learning](https://arxiv.org/pdf/1312.5602.pdf). The
algorithm is described below, where equation \[3\], refers to the
gradient: &lt;img src="https://imgur.com/eGhNC9m.png" width=650&gt;

&lt;img src="https://imgur.com/mvopoh8.png" width=800&gt;

In [None]:
# Adapted from: https://dbc-635ca498-e5f1.cloud.databricks.com/?o=445287446643905#notebook/4201196137758409/command/4201196137758410

class ExperienceReplay:
    def __init__(self, max_memory=100, discount=.9):
        self.max_memory = max_memory
        self.memory = list()
        self.discount = discount

    def remember(self, states, done):
        self.memory.append([states, done])
        if len(self.memory) > self.max_memory:
            del self.memory[0]

    def get_batch(self, model, batch_size=10):
        len_memory = len(self.memory)
        num_actions = model.output_shape[-1]

        env_dim = self.memory[0][0][0].shape[1]
        inputs = np.zeros((min(len_memory, batch_size), env_dim, 1))
        targets = np.zeros((inputs.shape[0], num_actions))
        for i, idx in enumerate(np.random.randint(0, len_memory, size=inputs.shape[0])):
            state_t, action_t, reward_t, state_tp1 = self.memory[idx][0]
            done = self.memory[idx][1]

            inputs[i:i + 1] = state_t
            # There should be no target values for actions not taken.
            targets[i] = model.predict(state_t)[0]
            Q_sa = np.max(model.predict(state_tp1)[0])
            if done: # if done is True
                targets[i, action_t] = reward_t
            else:
                # reward_t + gamma * max_a' Q(s', a')
                targets[i, action_t] = reward_t + self.discount * Q_sa
        return inputs, targets

  

  

Training RL agent
-----------------

In order to train the RL agent, we use the data from 2014 to 2018,
leaving the data from 2019 for testing. RL implementations are quite
difficult to train, due to the large amount of parameters which need to
be tuned. We have spent little time seraching for better hyperparameters
as this was beyond the scope of the course. We have picked parameters
based on a similar implementation of RL for trading, however we have
designed an new Q-network, since the state is different in our
implementation. Sine we are dealing wih sequential data, we could have
opted for an RNN, however 1-dimensional CNNs are also a common choice
which is less computationally heavy.

In [None]:
# Adapted from: https://dbc-635ca498-e5f1.cloud.databricks.com/?o=445287446643905#notebook/4201196137758409/command/4201196137758410

# RL parameters
epsilon = .5  # exploration
min_epsilon = 0.1
max_memory = 5000
batch_size = 512
discount = 0.8

# Environment parameters
num_actions = 2  # [long, short]
episodes = 500 # 100000
episode_size = 1 * 1 * 60  # roughly an hour worth of data in each training episode

# Define state sequence scope (approx. 1 hour)
sequence_scope = 60
input_shape = (batch_size, sequence_scope, 1)

# Create Q Network
hidden_size = 128
model = Sequential()
model.add(Conv1D(32, (5), strides=2, input_shape=input_shape[1:], activation='relu'))
model.add(MaxPool1D(pool_size=2, strides=1))
model.add(BatchNormalization())
model.add(Conv1D(32, (5), strides=1, activation='relu'))
model.add(MaxPool1D(pool_size=2, strides=1))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(hidden_size, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(num_actions))
opt = optimizers.Adam(lr=0.01)
model.compile(loss='mse', optimizer=opt)

# Define training interval
start = datetime.datetime(2010, 11, 15, 0, 0)
end = datetime.datetime(2018, 12, 31, 23, 59)

# Initialize Environment
env = MarketEnv(oilDF_py, start, end, episode_size=episode_size, scope=sequence_scope)

# Initialize experience replay object
exp_replay = ExperienceReplay(max_memory=max_memory, discount=discount)

# Train
returns = []
for e in range(1, episodes):
    loss = 0.
    counter = 0
    reward_sum = 0.
    done = False
    
    state = env.reset()
    input_t = state.reshape(1, sequence_scope, 1) 
    
    while not done:     
        counter += 1
        input_tm1 = input_t
        # get next action
        if np.random.rand() <= epsilon:
            action = np.random.randint(0, num_actions, size=1)
        else:
            q = model.predict(input_tm1)
            action = np.argmax(q[0])

        # apply action, get rewards and new state
        state, reward, done, info = env.step(action)
        reward_sum += reward
        input_t = state.reshape(1, sequence_scope, 1)         

        # store experience
        exp_replay.remember([input_tm1, action, reward, input_t], done)

        # adapt model
        inputs, targets = exp_replay.get_batch(model, batch_size=batch_size)
        loss += model.train_on_batch(inputs, targets)
    
    
    print("Episode {:03d}/{:d} | Average Loss {:.4f} | Cumulative Reward {:.4f}".format(e, episodes, loss / counter, reward_sum))
    epsilon = max(min_epsilon, epsilon * 0.99)
    returns.append(reward_sum)

  

>     Episode 001/500 | Average Loss 0.9243 | Cumulative Reward -0.0492
>     Episode 002/500 | Average Loss 0.0431 | Cumulative Reward -0.2952
>     Episode 003/500 | Average Loss 0.0102 | Cumulative Reward 0.0246
>     Episode 004/500 | Average Loss 0.0332 | Cumulative Reward -0.3444
>     Episode 005/500 | Average Loss 0.0366 | Cumulative Reward 0.4920
>     Episode 006/500 | Average Loss 44.9978 | Cumulative Reward -0.0984
>     Episode 007/500 | Average Loss 13892.3869 | Cumulative Reward -0.7626
>     Episode 008/500 | Average Loss 695661.5894 | Cumulative Reward 1.9434
>     Episode 009/500 | Average Loss 12746050.9732 | Cumulative Reward -0.2952
>     Episode 010/500 | Average Loss 31002932.3811 | Cumulative Reward 0.0492
>     Episode 011/500 | Average Loss 272621435.5164 | Cumulative Reward 0.4182
>     Episode 012/500 | Average Loss 171599336.8525 | Cumulative Reward 0.4674
>     Episode 013/500 | Average Loss 27726520.4590 | Cumulative Reward 0.1968
>     Episode 014/500 | Average Loss 20788127.3443 | Cumulative Reward 0.0984
>     Episode 015/500 | Average Loss 10797102.9508 | Cumulative Reward -0.0738
>     Episode 016/500 | Average Loss 4081633.1680 | Cumulative Reward -0.0984
>     Episode 017/500 | Average Loss 3617864.5246 | Cumulative Reward 0.5166
>     Episode 018/500 | Average Loss 2102176.5451 | Cumulative Reward -0.9348
>     Episode 019/500 | Average Loss 1819763.1424 | Cumulative Reward -0.9348
>     Episode 020/500 | Average Loss 525313.2244 | Cumulative Reward 0.3198
>     Episode 021/500 | Average Loss 590327.7751 | Cumulative Reward -0.5412
>     Episode 022/500 | Average Loss 643538.9570 | Cumulative Reward -1.2792
>     Episode 023/500 | Average Loss 506215.3632 | Cumulative Reward 0.2214
>     Episode 024/500 | Average Loss 229747.6570 | Cumulative Reward 0.3690
>     Episode 025/500 | Average Loss 118323.1301 | Cumulative Reward -0.1968
>     Episode 026/500 | Average Loss 90964.7006 | Cumulative Reward 0.1476
>     Episode 027/500 | Average Loss 47487.1318 | Cumulative Reward -0.4428
>     Episode 028/500 | Average Loss 90195246.1188 | Cumulative Reward -3.2718
>     Episode 029/500 | Average Loss 32692468.4262 | Cumulative Reward 0.1230
>     Episode 030/500 | Average Loss 2332925.2500 | Cumulative Reward -0.3690
>     Episode 031/500 | Average Loss 834450.7592 | Cumulative Reward -0.1968
>     Episode 032/500 | Average Loss 283082.5282 | Cumulative Reward -0.3198
>     Episode 033/500 | Average Loss 283707.8343 | Cumulative Reward 0.0000
>     Episode 034/500 | Average Loss 231921.2454 | Cumulative Reward -0.1230
>     Episode 035/500 | Average Loss 147320.1317 | Cumulative Reward 0.9594
>     Episode 036/500 | Average Loss 128691.0975 | Cumulative Reward -0.5904
>     Episode 037/500 | Average Loss 103319.1212 | Cumulative Reward 0.2952
>     Episode 038/500 | Average Loss 89891.4415 | Cumulative Reward 2.8782
>     Episode 039/500 | Average Loss 328345656.4238 | Cumulative Reward -0.6150
>     Episode 040/500 | Average Loss 10943518558.4262 | Cumulative Reward -0.6888
>     Episode 041/500 | Average Loss 162841831.4754 | Cumulative Reward 0.1476
>     Episode 042/500 | Average Loss 37689431.5000 | Cumulative Reward 0.4182
>     Episode 043/500 | Average Loss 72136614.4344 | Cumulative Reward 0.0738
>     Episode 044/500 | Average Loss 56635724.9180 | Cumulative Reward -0.2706
>     Episode 045/500 | Average Loss 10563238.0512 | Cumulative Reward 0.0492
>     Episode 046/500 | Average Loss 132264149.6066 | Cumulative Reward 1.0824
>     Episode 047/500 | Average Loss 41677930.5574 | Cumulative Reward 0.0246
>     Episode 048/500 | Average Loss 9066351.1148 | Cumulative Reward -0.1230
>     Episode 049/500 | Average Loss 2358803.1926 | Cumulative Reward 0.0246
>     Episode 050/500 | Average Loss 1814690.9365 | Cumulative Reward -4.8708
>     Episode 051/500 | Average Loss 1223300511.6383 | Cumulative Reward -1.4514
>     Episode 052/500 | Average Loss 1202551068.6885 | Cumulative Reward -0.6888
>     Episode 053/500 | Average Loss 48371521.6721 | Cumulative Reward -0.0738
>     Episode 054/500 | Average Loss 3760147.6393 | Cumulative Reward -1.7712
>     Episode 055/500 | Average Loss 1791821.9693 | Cumulative Reward -0.0246
>     Episode 056/500 | Average Loss 2677952.7643 | Cumulative Reward -0.1230
>     Episode 057/500 | Average Loss 4241063.5451 | Cumulative Reward 0.4674
>     Episode 058/500 | Average Loss 3664478.7951 | Cumulative Reward -0.3198
>     Episode 059/500 | Average Loss 2111481.0102 | Cumulative Reward 0.0492
>     Episode 060/500 | Average Loss 1312407.7039 | Cumulative Reward -0.3690
>     Episode 061/500 | Average Loss 1215967.5225 | Cumulative Reward -1.6974
>     Episode 062/500 | Average Loss 1206593.6096 | Cumulative Reward -0.8118
>     Episode 063/500 | Average Loss 1242360.7818 | Cumulative Reward 0.9348
>     Episode 064/500 | Average Loss 911372.9959 | Cumulative Reward 0.8856
>     Episode 065/500 | Average Loss 983090.6424 | Cumulative Reward 0.2460
>     Episode 066/500 | Average Loss 56758439.2228 | Cumulative Reward -0.5166
>     Episode 067/500 | Average Loss 1265408.1035 | Cumulative Reward -0.0246
>     Episode 068/500 | Average Loss 5961294.6898 | Cumulative Reward -0.1722
>     Episode 069/500 | Average Loss 142104936.0943 | Cumulative Reward -1.8450
>     Episode 070/500 | Average Loss 604659424.3279 | Cumulative Reward -1.7466
>     Episode 071/500 | Average Loss 2669601345.0492 | Cumulative Reward -0.7872
>     Episode 072/500 | Average Loss 6518834110.9508 | Cumulative Reward -0.3198
>     Episode 073/500 | Average Loss 660690072.6557 | Cumulative Reward 0.3936
>     Episode 074/500 | Average Loss 49482324.3770 | Cumulative Reward 0.2706
>     Episode 075/500 | Average Loss 12702479.1721 | Cumulative Reward 2.2140
>     Episode 076/500 | Average Loss 31520307.4426 | Cumulative Reward -0.6642
>     Episode 077/500 | Average Loss 30264551.5410 | Cumulative Reward 0.1230
>     Episode 078/500 | Average Loss 23348717.7705 | Cumulative Reward -0.8610
>     Episode 079/500 | Average Loss 10719680.3934 | Cumulative Reward -1.5006
>     Episode 080/500 | Average Loss 5350704.6926 | Cumulative Reward 0.2952
>     Episode 081/500 | Average Loss 4198182.6107 | Cumulative Reward 0.5412
>     Episode 082/500 | Average Loss 3379803.6189 | Cumulative Reward 3.6162
>     Episode 083/500 | Average Loss 129381317.8607 | Cumulative Reward -0.0246
>     Episode 084/500 | Average Loss 220882119.9344 | Cumulative Reward -0.0000
>     Episode 085/500 | Average Loss 66604575.5410 | Cumulative Reward -0.0738
>     Episode 086/500 | Average Loss 274207500.3934 | Cumulative Reward -1.1808
>     Episode 087/500 | Average Loss 256016848.9549 | Cumulative Reward -0.5904
>     Episode 088/500 | Average Loss 150811378.3607 | Cumulative Reward -0.0984
>     Episode 089/500 | Average Loss 64320583.2131 | Cumulative Reward 0.1476
>     Episode 090/500 | Average Loss 30557816.7213 | Cumulative Reward 0.3690
>     Episode 091/500 | Average Loss 14250694.0328 | Cumulative Reward -0.2214
>     Episode 092/500 | Average Loss 7108390.2541 | Cumulative Reward 0.4428
>     Episode 093/500 | Average Loss 3342842.4262 | Cumulative Reward 0.8610
>     Episode 094/500 | Average Loss 1480352.2623 | Cumulative Reward -0.1230
>     Episode 095/500 | Average Loss 665809.5815 | Cumulative Reward -0.1968
>     Episode 096/500 | Average Loss 357767.4959 | Cumulative Reward 1.1562
>     Episode 097/500 | Average Loss 239387.9193 | Cumulative Reward 0.1968
>     Episode 098/500 | Average Loss 204709.5968 | Cumulative Reward -0.7380
>     Episode 099/500 | Average Loss 166877.2141 | Cumulative Reward 0.5658
>     Episode 100/500 | Average Loss 155712.9769 | Cumulative Reward 0.9102
>     Episode 101/500 | Average Loss 144984.7732 | Cumulative Reward 1.2300
>     Episode 102/500 | Average Loss 138251.9153 | Cumulative Reward 0.2214
>     Episode 103/500 | Average Loss 127553.8490 | Cumulative Reward 0.3444
>     Episode 104/500 | Average Loss 124246.1187 | Cumulative Reward 0.9348
>     Episode 105/500 | Average Loss 118589.2683 | Cumulative Reward 1.5498
>     Episode 106/500 | Average Loss 125549.7988 | Cumulative Reward 0.6888
>     Episode 107/500 | Average Loss 134375.3892 | Cumulative Reward 0.1722
>     Episode 108/500 | Average Loss 148787.7741 | Cumulative Reward -0.1230
>     Episode 109/500 | Average Loss 164604.2537 | Cumulative Reward 1.2054
>     Episode 110/500 | Average Loss 199937.5569 | Cumulative Reward 0.0000
>     Episode 111/500 | Average Loss 191672.2460 | Cumulative Reward -0.3444
>     Episode 112/500 | Average Loss 258762.1554 | Cumulative Reward 0.1230
>     Episode 113/500 | Average Loss 289838.2514 | Cumulative Reward -0.5904
>     Episode 114/500 | Average Loss 465530.3043 | Cumulative Reward -1.7220
>     Episode 115/500 | Average Loss 415042.3491 | Cumulative Reward 0.0246
>     Episode 116/500 | Average Loss 309605.8963 | Cumulative Reward -0.3444
>     Episode 117/500 | Average Loss 262915.5269 | Cumulative Reward -0.0246
>     Episode 118/500 | Average Loss 276877.2044 | Cumulative Reward 0.1722
>     Episode 119/500 | Average Loss 297945.2974 | Cumulative Reward -0.8610
>     Episode 120/500 | Average Loss 252190.8712 | Cumulative Reward 0.3198
>     Episode 121/500 | Average Loss 209873.6153 | Cumulative Reward -0.0984
>     Episode 122/500 | Average Loss 200975.3696 | Cumulative Reward -0.4182
>     Episode 123/500 | Average Loss 224392.9705 | Cumulative Reward 0.6396
>     Episode 124/500 | Average Loss 175272.5402 | Cumulative Reward -0.1230
>     Episode 125/500 | Average Loss 134914.6336 | Cumulative Reward 0.0738
>     Episode 126/500 | Average Loss 118408.8128 | Cumulative Reward 0.9102
>     Episode 127/500 | Average Loss 119856.1210 | Cumulative Reward -0.1476
>     Episode 128/500 | Average Loss 128668.1934 | Cumulative Reward 0.4182
>     Episode 129/500 | Average Loss 121379.1356 | Cumulative Reward 0.3444
>     Episode 130/500 | Average Loss 122105.9212 | Cumulative Reward 0.8364
>     Episode 131/500 | Average Loss 126979.4586 | Cumulative Reward 0.4428
>     Episode 132/500 | Average Loss 159263.2796 | Cumulative Reward 1.4268
>     Episode 133/500 | Average Loss 249200.5343 | Cumulative Reward -0.3690
>     Episode 134/500 | Average Loss 351498.9006 | Cumulative Reward 0.2460
>     Episode 135/500 | Average Loss 344710.3376 | Cumulative Reward 0.4920
>     Episode 136/500 | Average Loss 290292.0177 | Cumulative Reward 0.3444
>     Episode 137/500 | Average Loss 203908.2672 | Cumulative Reward -1.1808
>     Episode 138/500 | Average Loss 152643.9161 | Cumulative Reward -0.3690
>     Episode 139/500 | Average Loss 103234.8929 | Cumulative Reward 0.3690
>     Episode 140/500 | Average Loss 81400.9171 | Cumulative Reward -0.8364
>     Episode 141/500 | Average Loss 70417.8615 | Cumulative Reward -0.0246
>     Episode 142/500 | Average Loss 62871.1321 | Cumulative Reward -0.1968
>     Episode 143/500 | Average Loss 66203.8933 | Cumulative Reward 1.3284
>     Episode 144/500 | Average Loss 80965.3102 | Cumulative Reward -0.7380
>     Episode 145/500 | Average Loss 85374.2427 | Cumulative Reward 0.0738
>     Episode 146/500 | Average Loss 74547.1055 | Cumulative Reward -0.2952
>     Episode 147/500 | Average Loss 61201.4695 | Cumulative Reward -0.0000
>     Episode 148/500 | Average Loss 58691.6379 | Cumulative Reward -0.7872
>     Episode 149/500 | Average Loss 53623.2068 | Cumulative Reward -0.1968
>     Episode 150/500 | Average Loss 52295.3436 | Cumulative Reward 0.5658
>     Episode 151/500 | Average Loss 54563.6969 | Cumulative Reward -0.7528
>     Episode 152/500 | Average Loss 56892.9392 | Cumulative Reward -1.5990
>     Episode 153/500 | Average Loss 56162.3126 | Cumulative Reward 0.1968
>     Episode 154/500 | Average Loss 57015.3386 | Cumulative Reward -0.1722
>     Episode 155/500 | Average Loss 61344.5501 | Cumulative Reward 0.2214
>     Episode 156/500 | Average Loss 63970.4605 | Cumulative Reward 0.1968
>     Episode 157/500 | Average Loss 64938.6926 | Cumulative Reward 1.3776
>     Episode 158/500 | Average Loss 71837.3588 | Cumulative Reward -0.4428
>     Episode 159/500 | Average Loss 69606.6803 | Cumulative Reward 1.1316
>     Episode 160/500 | Average Loss 51512.9089 | Cumulative Reward -0.4428
>     Episode 161/500 | Average Loss 42763.9575 | Cumulative Reward -0.6642
>     Episode 162/500 | Average Loss 34133.1615 | Cumulative Reward 0.0492
>     Episode 163/500 | Average Loss 26055.7404 | Cumulative Reward -0.7380
>     Episode 164/500 | Average Loss 17645.5418 | Cumulative Reward 0.3198
>     Episode 165/500 | Average Loss 23771.6887 | Cumulative Reward -0.6888
>     Episode 166/500 | Average Loss 33822.2709 | Cumulative Reward -1.4760
>     Episode 167/500 | Average Loss 36789.7859 | Cumulative Reward -0.3444
>     Episode 168/500 | Average Loss 30012.2132 | Cumulative Reward 0.3198
>     Episode 169/500 | Average Loss 25915.5318 | Cumulative Reward 0.0246
>     Episode 170/500 | Average Loss 21197.9396 | Cumulative Reward 0.4182
>     Episode 171/500 | Average Loss 18828.6276 | Cumulative Reward 0.6642
>     Episode 172/500 | Average Loss 12931.5156 | Cumulative Reward 0.1230
>     Episode 173/500 | Average Loss 9384.6666 | Cumulative Reward -0.5412
>     Episode 174/500 | Average Loss 9953.6688 | Cumulative Reward 0.0246
>     Episode 175/500 | Average Loss 14217.0748 | Cumulative Reward 0.0738
>     Episode 176/500 | Average Loss 18767.0269 | Cumulative Reward -0.2706
>     Episode 177/500 | Average Loss 17302.4695 | Cumulative Reward 0.4674
>     Episode 178/500 | Average Loss 12386.1551 | Cumulative Reward 0.0984
>     Episode 179/500 | Average Loss 8794.3302 | Cumulative Reward 0.9348
>     Episode 180/500 | Average Loss 8130.0289 | Cumulative Reward -0.2214
>     Episode 181/500 | Average Loss 8009.3913 | Cumulative Reward -0.3198
>     Episode 182/500 | Average Loss 6421.1121 | Cumulative Reward -0.0492
>     Episode 183/500 | Average Loss 5636.7044 | Cumulative Reward 0.1968
>     Episode 184/500 | Average Loss 8854.0772 | Cumulative Reward 0.4920
>     Episode 185/500 | Average Loss 10231.7754 | Cumulative Reward 0.9840
>     Episode 186/500 | Average Loss 8491.7384 | Cumulative Reward -0.1722
>     Episode 187/500 | Average Loss 5335.5632 | Cumulative Reward -0.8856
>     Episode 188/500 | Average Loss 4152.4838 | Cumulative Reward 1.1316
>     Episode 189/500 | Average Loss 3644.6625 | Cumulative Reward 1.3284
>     Episode 190/500 | Average Loss 4318.2997 | Cumulative Reward -0.2460
>     Episode 191/500 | Average Loss 4694.9497 | Cumulative Reward -0.0492
>     Episode 192/500 | Average Loss 3490.6077 | Cumulative Reward 0.4674
>     Episode 193/500 | Average Loss 3043.5791 | Cumulative Reward 0.2903
>     Episode 194/500 | Average Loss 2338.0377 | Cumulative Reward 0.4182
>     Episode 195/500 | Average Loss 2115.3294 | Cumulative Reward 1.5990
>     Episode 196/500 | Average Loss 3127.7692 | Cumulative Reward -1.0578
>     Episode 197/500 | Average Loss 4421.3908 | Cumulative Reward -0.2214
>     Episode 198/500 | Average Loss 2560.3734 | Cumulative Reward 0.2706
>     Episode 199/500 | Average Loss 2147.2506 | Cumulative Reward -1.5006
>     Episode 200/500 | Average Loss 2207.9893 | Cumulative Reward -0.5658
>     Episode 201/500 | Average Loss 3174.7853 | Cumulative Reward -1.4760
>     Episode 202/500 | Average Loss 11331.6762 | Cumulative Reward 0.3198
>     Episode 203/500 | Average Loss 37899.0565 | Cumulative Reward -0.6150
>     Episode 204/500 | Average Loss 36401.8491 | Cumulative Reward -0.1230
>     Episode 205/500 | Average Loss 12925.8751 | Cumulative Reward -0.2706
>     Episode 206/500 | Average Loss 12854.6668 | Cumulative Reward -0.3936
>     Episode 207/500 | Average Loss 11591.4458 | Cumulative Reward -0.1968
>     Episode 208/500 | Average Loss 21179.2468 | Cumulative Reward 0.4920
>     Episode 209/500 | Average Loss 20376.6122 | Cumulative Reward 0.1968
>     Episode 210/500 | Average Loss 18849.6060 | Cumulative Reward 0.0000
>     Episode 211/500 | Average Loss 13982.4748 | Cumulative Reward -0.2214
>     Episode 212/500 | Average Loss 87311.4121 | Cumulative Reward -0.8856
>     Episode 213/500 | Average Loss 654473.7298 | Cumulative Reward 0.0492
>     Episode 214/500 | Average Loss 201913.9874 | Cumulative Reward -0.2214
>     Episode 215/500 | Average Loss 44836.9004 | Cumulative Reward 0.1968
>     Episode 216/500 | Average Loss 35657.7584 | Cumulative Reward 0.1968
>     Episode 217/500 | Average Loss 32079.6191 | Cumulative Reward -0.0246
>     Episode 218/500 | Average Loss 30304.5018 | Cumulative Reward -0.0492
>     Episode 219/500 | Average Loss 30373.2290 | Cumulative Reward -0.1968
>     Episode 220/500 | Average Loss 29203.3995 | Cumulative Reward 0.0492
>     Episode 221/500 | Average Loss 31887.5387 | Cumulative Reward 0.4428
>     Episode 222/500 | Average Loss 27344.9023 | Cumulative Reward -0.2706
>     Episode 223/500 | Average Loss 26080.0415 | Cumulative Reward 1.2546
>     Episode 224/500 | Average Loss 27963.8084 | Cumulative Reward -0.2952
>     Episode 225/500 | Average Loss 22244.3293 | Cumulative Reward 0.4674
>     Episode 226/500 | Average Loss 19426.3250 | Cumulative Reward 0.1476
>     Episode 227/500 | Average Loss 19027.4235 | Cumulative Reward 3.0504
>     Episode 228/500 | Average Loss 22579.4337 | Cumulative Reward 1.3284
>     Episode 229/500 | Average Loss 35449.1802 | Cumulative Reward -0.0000
>     Episode 230/500 | Average Loss 36867.8054 | Cumulative Reward -0.3936
>     Episode 231/500 | Average Loss 49915.4290 | Cumulative Reward -0.3936
>     Episode 232/500 | Average Loss 32619.6804 | Cumulative Reward -0.6396
>     Episode 233/500 | Average Loss 12634.1564 | Cumulative Reward 1.6728
>     Episode 234/500 | Average Loss 11710.0142 | Cumulative Reward -0.2952
>     Episode 235/500 | Average Loss 11534.9517 | Cumulative Reward 0.0246
>     Episode 236/500 | Average Loss 21333.6566 | Cumulative Reward -0.1476
>     Episode 237/500 | Average Loss 14613.8575 | Cumulative Reward -1.2792
>     Episode 238/500 | Average Loss 18945.5421 | Cumulative Reward 0.1968
>     Episode 239/500 | Average Loss 11825.2040 | Cumulative Reward 1.4268
>     Episode 240/500 | Average Loss 7433.5705 | Cumulative Reward -0.9594
>     Episode 241/500 | Average Loss 4902.9148 | Cumulative Reward -0.4674
>     Episode 242/500 | Average Loss 3179.8172 | Cumulative Reward 0.6888
>     Episode 243/500 | Average Loss 3503.6598 | Cumulative Reward -0.6150
>     Episode 244/500 | Average Loss 3431.9749 | Cumulative Reward -0.7134
>     Episode 245/500 | Average Loss 2877.5648 | Cumulative Reward -0.0246
>     Episode 246/500 | Average Loss 2608.7916 | Cumulative Reward -0.3690
>     Episode 247/500 | Average Loss 2049.1479 | Cumulative Reward -0.1476
>     Episode 248/500 | Average Loss 1290.3437 | Cumulative Reward -0.0246
>     Episode 249/500 | Average Loss 925.7207 | Cumulative Reward 0.5412
>     Episode 250/500 | Average Loss 823.1600 | Cumulative Reward 0.3444
>     Episode 251/500 | Average Loss 1006.1303 | Cumulative Reward 0.0246
>     Episode 252/500 | Average Loss 1000.9536 | Cumulative Reward 0.1968
>     Episode 253/500 | Average Loss 910.4431 | Cumulative Reward 0.0492
>     Episode 254/500 | Average Loss 793.4383 | Cumulative Reward -0.0738
>     Episode 255/500 | Average Loss 960.7262 | Cumulative Reward -0.0492
>     Episode 256/500 | Average Loss 1210.5615 | Cumulative Reward 0.7626
>     Episode 257/500 | Average Loss 1456.9758 | Cumulative Reward -0.0984
>     Episode 258/500 | Average Loss 1492.6155 | Cumulative Reward -0.2706
>     Episode 259/500 | Average Loss 800.3773 | Cumulative Reward -0.1968
>     Episode 260/500 | Average Loss 663.3038 | Cumulative Reward -0.5166
>     Episode 261/500 | Average Loss 570.7030 | Cumulative Reward 0.5412
>     Episode 262/500 | Average Loss 515.7673 | Cumulative Reward -0.0246
>     Episode 263/500 | Average Loss 914.1635 | Cumulative Reward 0.3936
>     Episode 264/500 | Average Loss 635.1886 | Cumulative Reward 0.5166
>     Episode 265/500 | Average Loss 503.3200 | Cumulative Reward 0.3936
>     Episode 266/500 | Average Loss 442.9404 | Cumulative Reward 0.1722
>     Episode 267/500 | Average Loss 372.8969 | Cumulative Reward -0.1722
>     Episode 268/500 | Average Loss 342.8540 | Cumulative Reward 0.8610
>     Episode 269/500 | Average Loss 336.0440 | Cumulative Reward 1.1562
>     Episode 270/500 | Average Loss 325.4765 | Cumulative Reward 1.4022
>     Episode 271/500 | Average Loss 639.1140 | Cumulative Reward 0.0738
>     Episode 272/500 | Average Loss 467.9056 | Cumulative Reward -1.3530
>     Episode 273/500 | Average Loss 691.2283 | Cumulative Reward -0.8856
>     Episode 274/500 | Average Loss 442.4127 | Cumulative Reward -0.0246
>     Episode 275/500 | Average Loss 672.4676 | Cumulative Reward -0.0000
>     Episode 276/500 | Average Loss 373.9056 | Cumulative Reward 0.7626
>     Episode 277/500 | Average Loss 510.2552 | Cumulative Reward 0.0492
>     Episode 278/500 | Average Loss 476.9959 | Cumulative Reward -0.0246
>     Episode 279/500 | Average Loss 374.8663 | Cumulative Reward 0.4182
>     Episode 280/500 | Average Loss 285.5737 | Cumulative Reward 0.2214
>     Episode 281/500 | Average Loss 359.3125 | Cumulative Reward 0.0738
>     Episode 282/500 | Average Loss 509.2474 | Cumulative Reward 0.0738
>     Episode 283/500 | Average Loss 509.7669 | Cumulative Reward -0.9840
>     Episode 284/500 | Average Loss 848.4750 | Cumulative Reward -0.1968
>     Episode 285/500 | Average Loss 746.0073 | Cumulative Reward 0.6396
>     Episode 286/500 | Average Loss 354.4148 | Cumulative Reward 0.4920
>     Episode 287/500 | Average Loss 744.5271 | Cumulative Reward -1.2546
>     Episode 288/500 | Average Loss 1177.0523 | Cumulative Reward -0.0246
>     Episode 289/500 | Average Loss 912.7724 | Cumulative Reward -0.0000
>     Episode 290/500 | Average Loss 493.3918 | Cumulative Reward 0.7872
>     Episode 291/500 | Average Loss 666.1050 | Cumulative Reward -0.4428
>     Episode 292/500 | Average Loss 855.3613 | Cumulative Reward -2.3124
>     Episode 293/500 | Average Loss 558.7653 | Cumulative Reward -0.6396
>     Episode 294/500 | Average Loss 418.6678 | Cumulative Reward -0.6396
>     Episode 295/500 | Average Loss 534.7206 | Cumulative Reward 0.1722
>     Episode 296/500 | Average Loss 264.4277 | Cumulative Reward 0.1230
>     Episode 297/500 | Average Loss 389.4987 | Cumulative Reward -0.2214
>     Episode 298/500 | Average Loss 728.0504 | Cumulative Reward 0.7134
>     Episode 299/500 | Average Loss 632.7627 | Cumulative Reward -0.3198
>     Episode 300/500 | Average Loss 637.5207 | Cumulative Reward -0.9348
>     Episode 301/500 | Average Loss 1028.3571 | Cumulative Reward -0.2952
>     Episode 302/500 | Average Loss 1685.8779 | Cumulative Reward 0.0246
>     Episode 303/500 | Average Loss 1392.4933 | Cumulative Reward 0.6150
>     Episode 304/500 | Average Loss 788.6780 | Cumulative Reward -0.2952
>     Episode 305/500 | Average Loss 1226.8964 | Cumulative Reward 0.7134
>     Episode 306/500 | Average Loss 771.0202 | Cumulative Reward -0.1968
>     Episode 307/500 | Average Loss 1038.6742 | Cumulative Reward 1.4514
>     Episode 308/500 | Average Loss 2320.5486 | Cumulative Reward -0.5658
>     Episode 309/500 | Average Loss 1688.7067 | Cumulative Reward -0.8364
>     Episode 310/500 | Average Loss 1235.5575 | Cumulative Reward 1.0824
>     Episode 311/500 | Average Loss 314.3540 | Cumulative Reward 0.4920
>     Episode 312/500 | Average Loss 221.4066 | Cumulative Reward -0.4674
>     Episode 313/500 | Average Loss 149.2911 | Cumulative Reward -1.4022
>     Episode 314/500 | Average Loss 251.1208 | Cumulative Reward 0.2952
>     Episode 315/500 | Average Loss 295.4088 | Cumulative Reward 0.0984
>     Episode 316/500 | Average Loss 353.4387 | Cumulative Reward 0.2214
>     Episode 317/500 | Average Loss 471.9230 | Cumulative Reward -0.1968
>     Episode 318/500 | Average Loss 481.8251 | Cumulative Reward -0.0246
>     Episode 319/500 | Average Loss 277.8028 | Cumulative Reward -0.2706
>     Episode 320/500 | Average Loss 458.1981 | Cumulative Reward 0.7134
>     Episode 321/500 | Average Loss 554.8020 | Cumulative Reward 0.0984
>     Episode 322/500 | Average Loss 712.4146 | Cumulative Reward -0.1722
>     Episode 323/500 | Average Loss 533.7365 | Cumulative Reward 1.5252
>     Episode 324/500 | Average Loss 361.0181 | Cumulative Reward 0.8118
>     Episode 325/500 | Average Loss 447.6775 | Cumulative Reward 1.0332
>     Episode 326/500 | Average Loss 1355.3373 | Cumulative Reward 0.0000
>     Episode 327/500 | Average Loss 908.2948 | Cumulative Reward -0.0984
>     Episode 328/500 | Average Loss 959.7786 | Cumulative Reward -1.2054
>     Episode 329/500 | Average Loss 450.1697 | Cumulative Reward -0.7134
>     Episode 330/500 | Average Loss 1212.9733 | Cumulative Reward -0.5166
>     Episode 331/500 | Average Loss 5285.4580 | Cumulative Reward 0.0738
>     Episode 332/500 | Average Loss 3094.1890 | Cumulative Reward 0.3690
>     Episode 333/500 | Average Loss 4562.0196 | Cumulative Reward -0.4674
>     Episode 334/500 | Average Loss 8924.8034 | Cumulative Reward 1.0824
>     Episode 335/500 | Average Loss 30994.9012 | Cumulative Reward -1.1808
>     Episode 336/500 | Average Loss 534143731.8066 | Cumulative Reward -0.1476
>     Episode 337/500 | Average Loss 5921050444109.0820 | Cumulative Reward -0.4428
>     Episode 338/500 | Average Loss 4505985552.3279 | Cumulative Reward 0.1230
>     Episode 339/500 | Average Loss 1964492739.4754 | Cumulative Reward 0.1968
>     Episode 340/500 | Average Loss 1593002800.1967 | Cumulative Reward 0.5486
>     Episode 341/500 | Average Loss 1377590111.9344 | Cumulative Reward 0.0984
>     Episode 342/500 | Average Loss 1024536314.2295 | Cumulative Reward 0.3936
>     Episode 343/500 | Average Loss 714535699.4098 | Cumulative Reward -1.0824
>     Episode 344/500 | Average Loss 581171856.7541 | Cumulative Reward -0.7626
>     Episode 345/500 | Average Loss 1080491153.4754 | Cumulative Reward -0.3444
>     Episode 346/500 | Average Loss 1290483386.2951 | Cumulative Reward -0.1968
>     Episode 347/500 | Average Loss 638285799.5984 | Cumulative Reward 0.0492
>     Episode 348/500 | Average Loss 533999203.1926 | Cumulative Reward 0.1968
>     Episode 349/500 | Average Loss 1186640501.3566 | Cumulative Reward -0.4674
>     Episode 350/500 | Average Loss 836347220.6230 | Cumulative Reward 2.1402
>     Episode 351/500 | Average Loss 19564334.5430 | Cumulative Reward 0.2460
>     Episode 352/500 | Average Loss 15821917.5266 | Cumulative Reward -0.0738
>     Episode 353/500 | Average Loss 8794998.8955 | Cumulative Reward 0.7380
>     Episode 354/500 | Average Loss 16293582.7008 | Cumulative Reward -0.1722
>     Episode 355/500 | Average Loss 17792936.4262 | Cumulative Reward -0.7872
>     Episode 356/500 | Average Loss 19392693.2541 | Cumulative Reward 0.0246
>     Episode 357/500 | Average Loss 12951887.3934 | Cumulative Reward -0.3690
>     Episode 358/500 | Average Loss 21376279.0902 | Cumulative Reward -0.1968
>     Episode 359/500 | Average Loss 18399117.6311 | Cumulative Reward 0.3198
>     Episode 360/500 | Average Loss 16349954.5328 | Cumulative Reward -0.0492
>     Episode 361/500 | Average Loss 10212129.2971 | Cumulative Reward 0.0246
>     Episode 362/500 | Average Loss 9528178.3489 | Cumulative Reward 0.5166
>     Episode 363/500 | Average Loss 10891389.6680 | Cumulative Reward -0.2952
>     Episode 364/500 | Average Loss 8848364.4677 | Cumulative Reward -0.1722
>     Episode 365/500 | Average Loss 7967370.4139 | Cumulative Reward -0.7872
>     Episode 366/500 | Average Loss 10170590.0517 | Cumulative Reward 1.1808
>     Episode 367/500 | Average Loss 7691382.5763 | Cumulative Reward 0.6150
>     Episode 368/500 | Average Loss 19905810.8770 | Cumulative Reward -0.4920
>     Episode 369/500 | Average Loss 29691840.6557 | Cumulative Reward -0.1476
>     Episode 370/500 | Average Loss 26487591.1967 | Cumulative Reward 0.1230
>     Episode 371/500 | Average Loss 23137364.8689 | Cumulative Reward -0.3936
>     Episode 372/500 | Average Loss 30061004.7541 | Cumulative Reward -0.3444
>     Episode 373/500 | Average Loss 40822205.1148 | Cumulative Reward 0.2460
>     Episode 374/500 | Average Loss 31239984.0328 | Cumulative Reward 0.5166
>     Episode 375/500 | Average Loss 32901997.1639 | Cumulative Reward 0.0246
>     Episode 376/500 | Average Loss 22117612.4918 | Cumulative Reward -0.1968
>     Episode 377/500 | Average Loss 21721578.4795 | Cumulative Reward 0.3690
>     Episode 378/500 | Average Loss 19514039.5656 | Cumulative Reward -0.4428
>     Episode 379/500 | Average Loss 18671316.6721 | Cumulative Reward -0.2214
>     Episode 380/500 | Average Loss 19410190.3115 | Cumulative Reward 0.3936
>     Episode 381/500 | Average Loss 37436785.1475 | Cumulative Reward 0.6642
>     Episode 382/500 | Average Loss 12312954.6025 | Cumulative Reward 0.1476
>     Episode 383/500 | Average Loss 12721993.5220 | Cumulative Reward -1.1808
>     Episode 384/500 | Average Loss 13490703.0113 | Cumulative Reward -0.1476
>     Episode 385/500 | Average Loss 12027525.5820 | Cumulative Reward -0.0984
>     Episode 386/500 | Average Loss 8776895.7362 | Cumulative Reward 0.1722
>     Episode 387/500 | Average Loss 27030448.9508 | Cumulative Reward -1.2546
>     Episode 388/500 | Average Loss 17247655.4836 | Cumulative Reward 0.7134
>     Episode 389/500 | Average Loss 13761868.2480 | Cumulative Reward -0.8856
>     Episode 390/500 | Average Loss 15799021.5820 | Cumulative Reward -1.0332
>     Episode 391/500 | Average Loss 24067677.2961 | Cumulative Reward -2.0664
>     Episode 392/500 | Average Loss 15877324.8064 | Cumulative Reward -0.0000
>     Episode 393/500 | Average Loss 10898226.3576 | Cumulative Reward 0.2460
>     Episode 394/500 | Average Loss 11247735.4857 | Cumulative Reward 0.5904
>     Episode 395/500 | Average Loss 14609065.8217 | Cumulative Reward 0.1968
>     Episode 396/500 | Average Loss 2343010.0123 | Cumulative Reward 0.1968
>     Episode 397/500 | Average Loss 3205300.5410 | Cumulative Reward -0.4182
>     Episode 398/500 | Average Loss 8199556.5533 | Cumulative Reward 0.0492
>     Episode 399/500 | Average Loss 21762857.0984 | Cumulative Reward -0.3444
>     Episode 400/500 | Average Loss 15102527.0164 | Cumulative Reward -0.0246
>     Episode 401/500 | Average Loss 12123659.1414 | Cumulative Reward -0.1230
>     Episode 402/500 | Average Loss 2576558.2551 | Cumulative Reward 0.2706
>     Episode 403/500 | Average Loss 2949066.3484 | Cumulative Reward -0.0984
>     Episode 404/500 | Average Loss 4379467.7418 | Cumulative Reward 0.1722
>     Episode 405/500 | Average Loss 2939416.2049 | Cumulative Reward 1.7712
>     Episode 406/500 | Average Loss 10072456.5164 | Cumulative Reward -0.5412
>     Episode 407/500 | Average Loss 11368148.5574 | Cumulative Reward -1.6728
>     Episode 408/500 | Average Loss 4251800.9057 | Cumulative Reward 0.0246
>     Episode 409/500 | Average Loss 2425709.3094 | Cumulative Reward 1.5990
>     Episode 410/500 | Average Loss 1586738.5707 | Cumulative Reward -0.4920
>     Episode 411/500 | Average Loss 1070607.6132 | Cumulative Reward -1.2054
>     Episode 412/500 | Average Loss 5415716.3217 | Cumulative Reward 0.0000
>     Episode 413/500 | Average Loss 1737057.5220 | Cumulative Reward 0.5904
>     Episode 414/500 | Average Loss 1278951.9534 | Cumulative Reward 0.4182
>     Episode 415/500 | Average Loss 728374.3837 | Cumulative Reward 0.5658
>     Episode 416/500 | Average Loss 9779674.6527 | Cumulative Reward -0.8856
>     Episode 417/500 | Average Loss 39090349.0164 | Cumulative Reward -0.1230
>     Episode 418/500 | Average Loss 17068589.6311 | Cumulative Reward 3.0258
>     Episode 419/500 | Average Loss 19433508.8033 | Cumulative Reward 0.6396
>     Episode 420/500 | Average Loss 8357417.2172 | Cumulative Reward -0.3936
>     Episode 421/500 | Average Loss 11197790.3607 | Cumulative Reward 0.2706
>     Episode 422/500 | Average Loss 6756482.8443 | Cumulative Reward 1.2054
>     Episode 423/500 | Average Loss 3976738.0656 | Cumulative Reward -0.4182
>     Episode 424/500 | Average Loss 3616583.8033 | Cumulative Reward -0.2706
>     Episode 425/500 | Average Loss 2603585.6906 | Cumulative Reward 0.0246
>     Episode 426/500 | Average Loss 2082487.2746 | Cumulative Reward 0.6396
>     Episode 427/500 | Average Loss 2534304.0727 | Cumulative Reward -0.5658
>     Episode 428/500 | Average Loss 12130751.7418 | Cumulative Reward -0.2214
>     Episode 429/500 | Average Loss 69780277.8033 | Cumulative Reward 0.4182
>     Episode 430/500 | Average Loss 78495889.1803 | Cumulative Reward -0.8610
>     Episode 431/500 | Average Loss 45958012.8525 | Cumulative Reward -0.0984
>     Episode 432/500 | Average Loss 136452393.9016 | Cumulative Reward 0.1476
>     Episode 433/500 | Average Loss 38613835.7541 | Cumulative Reward -0.5166
>     Episode 434/500 | Average Loss 33883618.4098 | Cumulative Reward -1.0332
>     Episode 435/500 | Average Loss 41067984.8197 | Cumulative Reward -0.1230
>     Episode 436/500 | Average Loss 91378111.4098 | Cumulative Reward -1.4760
>     Episode 437/500 | Average Loss 94752215.4098 | Cumulative Reward -0.4674
>     Episode 438/500 | Average Loss 40647725.0492 | Cumulative Reward 0.3444
>     Episode 439/500 | Average Loss 35812338.5574 | Cumulative Reward -0.1230
>     Episode 440/500 | Average Loss 29541471.6230 | Cumulative Reward 0.3198
>     Episode 441/500 | Average Loss 60021648.1885 | Cumulative Reward -1.1070
>     Episode 442/500 | Average Loss 158118092.1967 | Cumulative Reward -0.0984
>     Episode 443/500 | Average Loss 59849613.7049 | Cumulative Reward 0.2706
>     Episode 444/500 | Average Loss 46461701.9016 | Cumulative Reward -0.2952
>     Episode 445/500 | Average Loss 48389924.3934 | Cumulative Reward 1.0086
>     Episode 446/500 | Average Loss 195580981.4754 | Cumulative Reward -0.2460
>     Episode 447/500 | Average Loss 1722251685.6393 | Cumulative Reward 0.2214
>     Episode 448/500 | Average Loss 3515392862.4262 | Cumulative Reward -0.0492
>     Episode 449/500 | Average Loss 306165022.4836 | Cumulative Reward 1.4760
>     Episode 450/500 | Average Loss 99212269.6230 | Cumulative Reward 0.0738
>     Episode 451/500 | Average Loss 86367019.5410 | Cumulative Reward 2.5338
>     Episode 452/500 | Average Loss 31133120.3689 | Cumulative Reward -0.6642
>     Episode 453/500 | Average Loss 21015204.2172 | Cumulative Reward 1.8942
>     Episode 454/500 | Average Loss 11878485.2541 | Cumulative Reward -0.9102
>     Episode 455/500 | Average Loss 14406326.1680 | Cumulative Reward 0.1968
>     Episode 456/500 | Average Loss 26474831.6639 | Cumulative Reward 0.0000
>     Episode 457/500 | Average Loss 20445096.5492 | Cumulative Reward -0.1476
>     Episode 458/500 | Average Loss 77507055.2131 | Cumulative Reward -0.9348
>     Episode 459/500 | Average Loss 34715110.0984 | Cumulative Reward -0.1722
>     Episode 460/500 | Average Loss 11601538.8443 | Cumulative Reward -0.1230
>     Episode 461/500 | Average Loss 2247047.2961 | Cumulative Reward 0.1230
>     Episode 462/500 | Average Loss 939256.9170 | Cumulative Reward -0.0246
>     Episode 463/500 | Average Loss 808798.3960 | Cumulative Reward 1.6728
>     Episode 464/500 | Average Loss 683601.6122 | Cumulative Reward -0.4182
>     Episode 465/500 | Average Loss 1048500.5922 | Cumulative Reward 1.2792
>     Episode 466/500 | Average Loss 996181.3012 | Cumulative Reward 0.9594
>     Episode 467/500 | Average Loss 958677.9529 | Cumulative Reward -0.1230
>     Episode 468/500 | Average Loss 640709.4234 | Cumulative Reward 0.2952
>     Episode 469/500 | Average Loss 561099.1286 | Cumulative Reward -0.4182
>     Episode 470/500 | Average Loss 858037.4851 | Cumulative Reward -0.0984
>     Episode 471/500 | Average Loss 781562.1660 | Cumulative Reward -0.4428
>     Episode 472/500 | Average Loss 585637.4631 | Cumulative Reward 0.2706
>     Episode 473/500 | Average Loss 1691876.3504 | Cumulative Reward -0.0984
>     Episode 474/500 | Average Loss 857916.0794 | Cumulative Reward 0.8364
>     Episode 475/500 | Average Loss 697805.0133 | Cumulative Reward 1.2300
>     Episode 476/500 | Average Loss 428738.8163 | Cumulative Reward -0.9102
>     Episode 477/500 | Average Loss 412427.7167 | Cumulative Reward -0.1722
>     Episode 478/500 | Average Loss 371060.8876 | Cumulative Reward 0.3198
>     Episode 479/500 | Average Loss 290708.3730 | Cumulative Reward 0.0984
>     Episode 480/500 | Average Loss 293991.3815 | Cumulative Reward 0.3690
>     Episode 481/500 | Average Loss 271680.2395 | Cumulative Reward 0.1968
>     Episode 482/500 | Average Loss 247069.2293 | Cumulative Reward 0.2706
>     Episode 483/500 | Average Loss 298824.8330 | Cumulative Reward -0.0246
>     Episode 484/500 | Average Loss 272823.3829 | Cumulative Reward 0.1476
>     Episode 485/500 | Average Loss 221586.0561 | Cumulative Reward 0.2706
>     Episode 486/500 | Average Loss 255791.8814 | Cumulative Reward -0.2952
>     Episode 487/500 | Average Loss 508487.5830 | Cumulative Reward 0.0984
>     Episode 488/500 | Average Loss 559980.2203 | Cumulative Reward -0.7134
>     Episode 489/500 | Average Loss 378515.2789 | Cumulative Reward -0.7626
>     Episode 490/500 | Average Loss 250586.9393 | Cumulative Reward -0.4182
>     Episode 491/500 | Average Loss 251211.5902 | Cumulative Reward -0.5904
>     Episode 492/500 | Average Loss 169529.8831 | Cumulative Reward -0.5904
>     Episode 493/500 | Average Loss 242881.4125 | Cumulative Reward -0.1968
>     Episode 494/500 | Average Loss 573974.6457 | Cumulative Reward 0.4182
>     Episode 495/500 | Average Loss 200285.0907 | Cumulative Reward 0.7626
>     Episode 496/500 | Average Loss 268848.6760 | Cumulative Reward 0.1968
>     Episode 497/500 | Average Loss 235332.8327 | Cumulative Reward 0.0246
>     Episode 498/500 | Average Loss 202253.1831 | Cumulative Reward 0.0492
>     Episode 499/500 | Average Loss 149803.7602 | Cumulative Reward 0.1476

In [None]:
# Plotting training results
fig, ax = plt.subplots(figsize=(16, 8))
ax.plot(returns)
ax.set_ylabel("Return")
ax.set_xlabel("Episode")
display(fig)

  

We have trained our model for 500 episodes and the returns are plotted
above. Note that the loss was still quite high at the end of training,
which indicates that the algorithm hasn't converged. A possible
explanation for this is that RL algorithms typically require
significantly more steps to converge. Further, considering the size of
the tranining dataset, the neural network used is very small. Besides
that, DQN is known to be quite unstable and prone to diverge, which is
why several new versions of this algorithm have been proposed since it
was first introduced. A very common implementation consists of the
Double DQN, which introduced a target Q-network used to compute the
actions, which is updated at a lower rate than the main Q-network. In
our implementation, the max operator uses the same network both to
select and to evaluate an action. This may lead to wrongly selecting
overestimated values. Having a separate target network can help prevent
this, by decoupling the selection from the evaluation.

Testing RL agent
----------------

In order to test our agent, we select the whole data from the 1st of
January 2019, which wasn't included during training.

In [None]:
done = False
states = []
actions = []
rewards = []
reward_sum = 0.

# Define testing interval, January 2019
start = datetime.datetime(2019, 1, 1, 0, 0)
end = datetime.datetime(2019, 1, 1, 23, 59)

# Test learned model
env = MarketEnv(oilDF_py, start, end, episode_size=np.inf, scope=sequence_scope)
state = env.reset(random_starttime=False)
input_t = state.reshape(1, sequence_scope, 1)
while not done:    
    states.append(state[-1])
    q = model.predict(input_t)
    action = np.argmax(q[0])
    actions.append(action)
    state, reward, done, info = env.step(action)
    rewards.append(reward)
    reward_sum += reward
    input_t = state.reshape(1, sequence_scope, 1)      
print("Return = {}".format(reward_sum))

  

>     Return = 0.096

In [None]:
# Plotting testing results
timesteps = np.linspace(1,len(states),len(states))
longs = np.argwhere(np.asarray(actions) ==  0)
shorts = np.argwhere(np.asarray(actions) ==  1)
states = np.asarray(states)
fig, ax = plt.subplots(2, 1, figsize=(16, 8))
ax[0].grid(True)
ax[0].plot(timesteps, states, label='diff_close')
ax[0].plot(timesteps[longs], states[longs].flatten(), '*g', markersize=12, label='long')
ax[0].plot(timesteps[shorts], states[shorts].flatten(), '*r', markersize=12, label='short')
ax[0].set_ylabel("(s,a)")
ax[0].set_xlabel("Timestep")
ax[0].set_xlim(1,len(states))
ax[0].legend()
ax[1].grid(True)
ax[1].plot(timesteps, rewards, 'o-r')
ax[1].set_ylabel("r")
ax[1].set_xlabel("Timestep")
ax[1].set_xlim(1,len(states))
plt.tight_layout()
display(fig)

  

We can see that the policy converged to always shorting, meaning that
the agent never buys any stock. While abstaining from investments in
fossil fuels may be good advice, the result is not very useful for our
intended application. Nevertherless, reaching a successful automatic
intraday trading bot in the short time we spent implementing this
project would be a high bar. After all, this is more or less the holy
grail of computational economy.

Summary and Outlook
-------------------

In this project we have trained and tested an RL agent, using DQN for
intraday trading. We started by processing the data and adding a
\*\*diff\_close\*\* column which contains the differece of the closing
stock value between two timesteps. We then implemented our own Gym
environment **MarketEnv**, to be able to read data from the BCUSD
dataset and feed it to an agents, as weel as compute the reward given
the agent's action. We used a DQN implementation, to train a
convolutional Q-Network. Since we are using TensorFlow in the
background, the training is automatically scaled to use all cpu cores
available (see [here](https://www.xspdf.com/resolution/52582340.html)).
Finally, we have tested our agent on new data, and concluded that more
work needs to be put into making the algorithm convege.

As future work we believe we can still improve the state and reward
definitions. For the state, used a window of \*\*close\_diff\*\* values
as our state definiion. However, augmenting the state with longer term
trends computed by the TrendCalculus algorithm could yield significant
improvements. The TrendCalculus algorithm provides an analytical
framework effective for identifying trends in historical price data,
including [trend pattern
analysis](https://lamastex.github.io/spark-trend-calculus-examples/notebooks/db/01trend-calculus-showcase.html)
and [prediction of trend
changes](https://lamastex.github.io/spark-trend-calculus-examples/notebooks/db/03streamable-trend-calculus-estimators.html).
Below we present the idea behind TrendCalculus and how it relates to
\*\*close\_diff\*\*, as well as a few ideas on how it could be used for
our application.

### Trend Calculus

Taken from: <https://github.com/lamastex/spark-trend-calculus-examples>

Trend Calculus is an algorithm invented by Andrew Morgan that is used to
find trend changes in a time series (see
[here](https://github.com/bytesumo/TrendCalculus/blob/master/HowToStudyTrends_v1.03.pdf)).
It works by grouping the observations in the time series into windows
and defining a trend upwards as “higher highs and higher lows” compared
to the previous window. A downwards trend is similarly defined as “lower
highs and lower lows”.

&lt;img
src=https://lamastex.github.io/spark-trend-calculus-examples/notebooks/db/images/HigherHighHigherLow.png
width=300&gt;

If there is a higher high and lower low (or lower high and higher low),
no trend is detected. This is solved by introducing intermediate windows
that split the non-trend into two trends, ensuring that every point can
be labeled with either an up or down trend.

&lt;img
src=https://lamastex.github.io/spark-trend-calculus-examples/notebooks/db/images/OuterInnerBars.png
width=600&gt;

When the trends have been calculated for all windows, the points where
the trends change sign are labeled as reversals. If the reversal is from
up to down, the previous high is the reversal point and if the reversal
is from down to up, the previous low is the reversal. This means that
the reversals always are the appropriate extrema (maximum for up to
down, minimum for down to up).

&lt;img
src=https://lamastex.github.io/spark-trend-calculus-examples/notebooks/db/images/trendReversals.png
width=600&gt;

The output of the algorithm is a time series consisting of all the
labelled reversal points. It is therefore possible to use this as the
input for another run of the Trend Calculus algorithm, finding more long
term trends. This can be seen when TrendCalculus is applied to the BCUSD
datset, shown in column **reversal1** of the table below.

In [None]:
val windowSize = 2
val numReversals = 1 // we look at 1 iteration of the algorithm. 

val dfWithReversals = new TrendCalculus2(oilDS, windowSize, spark).nReversalsJoinedWithMaxRev(numReversals)
display(dfWithReversals)

val windowSpec = Window.orderBy("x")
val dfWithReversalsDiff = dfWithReversals 
.withColumn("diff_close", $"y" - when((lag("y", 1).over(windowSpec)).isNull, 0).otherwise(lag("y", 1).over(windowSpec)))

// Store loaded data as temp view, to be accessible in Python
dfWithReversalsDiff.createOrReplaceTempView("temp")

  

[TABLE]

Truncated to 30 rows

  

In conjunction with TrendCalculus, a complete automatic trading pipeline
can be constructed, consisting of (i) trend analysis with TrendCalculus
(ii) time series prediction and (iii) control, i.e. buy or sell.
Implementing and evaluating a pipeline such as the one outlined in the
aforementioned steps is left as a suggestion for future work, and it is
of particular interest to compare the performance of such a method to a
learning-based one.

Below we show that sign(\*\*diff\_close**) is equivalent to sign of the
output of a single iteration of TrendCalculus with window size 2, over
our **scope\*\*. A possible improvement of our algorithm would be to use
TrendCalculus to compute long term trends from historical data and
include it on our state definition. This way, if for example the agent
observes a long term downward trend, then it can be encouraged to buy
stock since it is bound to bounce up again.

In [None]:
# Taken from: https://lamastex.github.io/spark-trend-calculus-examples/notebooks/db/01trend-calculus-showcase.html

# Create Dataframe from temp data
fullDS = spark.table("temp")
fullTS = fullDS.select("x", "y", "reversal1", "diff_close").collect()

startDate = datetime.datetime(2019, 1, 1, 1, 0) # first window used as scope
endDate = datetime.datetime(2019, 1, 1, 23, 59)
TS = [row for row in fullTS if startDate <= row['x'] and row['x'] <= endDate]

allData = {'x': [row['x'] for row in TS], 'close': [row['y'] for row in TS], 'diff_close': [row['diff_close'] for row in TS], 'reversal1': [row['reversal1'] for row in TS]}

# Plot reversals
close = np.asarray(allData['close'])
diff_close = np.asarray(allData['diff_close'])
timesteps = np.linspace(1,len(diff_close),len(diff_close))
revs = np.asarray(allData['reversal1'])
pos_rev_ind = np.argwhere(revs ==  1)
neg_rev_ind = np.argwhere(revs ==  -1)
fig, ax = plt.subplots(2, 1, figsize=(16, 8))
ax[0].grid(True)
ax[0].plot(timesteps, close, label='close')
ax[0].plot(timesteps[pos_rev_ind], close[pos_rev_ind].flatten(), '*g', markersize=12, label='+ reversal')
ax[0].plot(timesteps[neg_rev_ind], close[neg_rev_ind].flatten(), '*r', markersize=12, label='- reversal')
ax[0].set_ylabel("close")
ax[0].set_xlabel("Timestep")
ax[0].set_xlim(1,len(close))
ax[0].legend()
ax[1].grid(True)
ax[1].plot(timesteps, diff_close, label='diff_close')
ax[1].plot(timesteps[pos_rev_ind], diff_close[pos_rev_ind].flatten(), '*g', markersize=12, label='+ reversal')
ax[1].plot(timesteps[neg_rev_ind], diff_close[neg_rev_ind].flatten(), '*r', markersize=12, label='- reversal')
ax[1].set_ylabel("diff_close")
ax[1].set_xlabel("Timestep")
ax[1].set_xlim(1,len(diff_close))
plt.tight_layout()
display(fig)