## Main Trading Bot Logic

The first algorithm we will test out is DQN. This is the de facto standard for single agent RL algorithms at this point. 

<img src="DQN.png" alt="drawing" width="700"/>

Before we actually start working on the core algorithm we are going to use for the trading bot, we should probably make sure we can pull the appropriate data and clean it if necessary. Perhaps the most obvious place to start is [Yahoo! Finance](https://finance.yahoo.com/).

We will set this up so we can run our algorithm with some input parameters like the ticker code for a stock/crypto and automate the cleaning and training process.

## Test on LunarLander

In [None]:
import gym
import numpy as np
import tensorflow as tf

In [None]:
env = gym.make('LunarLander-v2')
env.seed(0)
print('State shape: ', env.observation_space.shape)
print('Number of actions: ', env.action_space.n)

In [None]:
# state_dim defines the number of days to take in a
#TAU = 1e-3              # for soft update of target parameters
lunar_agent = agent.DQNAgent(
    state_dim=8,
    action_dim=4,
    hidden_layer_sizes=[64,64],
    buffer_size=10000,
    batch_size=64,
    discount=0.99,
    learning_rate=5e-4,
    learning_freq=4
)

In [None]:
# Evaluate untrained model

state = env.reset()
for j in range(200):
    state = tf.reshape(state,shape=(1,-1))
    action = lunar_agent.act(state, evaluation=True)
    #env.render()
    state, reward, done, _ = env.step(action)
    print(reward)    
    if done:
        break 
#env.close()

In [None]:
from collections import deque
import numpy as np 

def dqn(n_episodes=100, max_t=100, eps_start=1.0, eps_end=0.01, eps_decay=0.995):
    scores = []                        # list containing scores from each episode
    scores_window = deque(maxlen=100)  # last 100 scores
    eps = eps_start                    # initialize epsilon

    for i_episode in range(1, n_episodes+1):
        print(i_episode)
        state = env.reset()
        state = tf.reshape(state,shape=(1,-1))

        score = 0
        for t in range(max_t):
            action = lunar_agent.act(state, eps)
            next_state, reward, done, _ = env.step(action)
            next_state = tf.reshape(next_state,shape=(1,-1))
            lunar_agent.step(state, action, reward, next_state, done)
            state = next_state
            score += reward
            if done:
                break 

        scores_window.append(score)       # save most recent score
        scores.append(score)              # save most recent score
        eps = max(eps_end, eps_decay*eps) # decrease epsilon
        print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)), end="")
        if i_episode % 100 == 0:
            print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)))
        # if np.mean(scores_window)>=200.0:
        #     print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'.format(i_episode-100, np.mean(scores_window)))
        #     torch.save(agent.qnetwork_local.state_dict(), 'checkpoint.pth')
        #     break
    return scores
dqn()

## Trading Agent

In [None]:
from SmartTradingBot import agent, utils, trainer
from SmartTradingBot.utils import get_data

In [None]:
train, test = get_data(['BTC-USD'], start_date="2019-06-01", end_date="2020-09-01")

In [None]:
import seaborn as sns
#sns.lineplot(train.index, train)
normalised_train = utils.normalised_difference(data=train)
signorm_train = utils.sigmoid(normalised_train)
sns.lineplot(train.index[:-1],signorm_train)

In [None]:
trading_agent = agent.DQNAgent(
    state_dim=10, # 10 days data is one "state1"/feature
    action_dim=3, # [Hold,Buy,Sell] = [0,1,2]
    hidden_layer_sizes=[128, 256, 256, 128],
    buffer_size=1000,
    batch_size=32,
    discount=0.99,
    learning_rate=1e-3,
    learning_freq=4
)

In [None]:
n_episodes = 50
results=[]
for episode in range(1, n_episodes):
    trainer.train_bot(agent=trading_agent, data=signorm_train, episode=episode, n_episodes=n_episodes)
    results.append(x)

In [None]:
results