# Algorithmic Trading Using Deep Reinforcdment Learning

### Table of Contents

- [Introduction](#scrollTo=9SNR5Z82unXd)

- [Import Dependencies](#scrollTo=012Sf4GHumaL)

- [Data & Preprocessing](#scrollTo=fAq9RY7dtwTe)

- [Custom Trading Environment Setting](#scrollTo=z-g6RJHLpuKh)

- [Utility Functions](#scrollTo=jWElTzIctZ3E)

- [PPO Agent](#scrollTo=rz1CA85UkXPI)

  - [Agent setting](#scrollTo=bKxYFvx8nALx)

  - [Training](#scrollTo=VXm3OnAlnF51)

  - [Results and Validation](#scrollTo=3D6ypjL-nh8J)

- [DQN Agent](#scrollTo=F1SfonB9kXPR)

  - [Agent Setting](#scrollTo=hIqzHqzPn2Lv)

  - [Training](#scrollTo=6tBBq8TQn7JZ)

  - [Results and Validation](#scrollTo=SukTkoX8oExx)

- [Visualization](#scrollTo=V48fsZn7jv9Q)



## Introduction

In quantitative finance, stock trading is essentially a dynamic decision problem, that is, deciding where, at what price, and how much to trade in a highly stochastic, dynamic, and complex stock market. With recent advances in deep reinforcement learning (DRL) methods, sequential dynamic decision problems can be modeled and solved with a human-like approach.

<br>

In this poject, we examine the potential and performance of deep reinforcement learning to optimize stock trading strategies and thus maximize investment returns. Google stock is selected as our trading stock and the daily opening and closing price along with trading volume and several technical indicators are used as a training environment and trading market.

<br>

We present two trading agents based on deep reinforcement learning, one using Proximal Policy Otimization algorithm and the other based on Deep Q-Learing, to autonomously make trading decisions and generate returns in dynamic financial markets. The performance of these intelligent agents is compared with the performance of the buy and hold strategy. And at the end, it is shown that the proposed deep reinforcement learning approach performs better than the buy and hold benchmark in terms of risk assessment criteria and portfolio return.

---


**References:**
* Human-level control through deep reinforcement learning (Deep Q-Learning) : [paper](https://www.nature.com/articles/nature14236)
* Proximal Policy Optimization) : [paper](https://arxiv.org/abs/1707.06347), [blog](https://openai.com/blog/openai-baselines-ppo/), [spinning-up](https://spinningup.openai.com/en/latest/algorithms/ppo.html)

## Import Dependencies

In [None]:
%%capture
!pip install talib-binary
!pip install gym_anytrading
!pip install quantstats
!pip install stable_baselines3
!pip install pyfolio
!pip install --upgrade gym==0.25.2
!pip install stable_baselines

In [None]:
import os
import math
import talib
import numpy as np
import pandas as pd
from scipy.stats import t
from pandas_datareader import data as web
import pandas_datareader as pdr
from dateutil.relativedelta import relativedelta
from tqdm import tqdm
from gym.utils import seeding
import gym
from gym import spaces
import logging
import datetime
import pyfolio.timeseries as ts
import scipy.stats as st

from gym_anytrading.envs import TradingEnv, ForexEnv, StocksEnv, Actions, Positions 
# from gym_anytrading.datasets import FOREX_EURUSD_1H_ASK, STOCKS_GOOGL
import matplotlib.pyplot as plt
import seaborn as sns
import quantstats as qs

from stable_baselines3 import A2C, DDPG, DQN, PPO, TD3, SAC
# from stable_baselines import TRPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise

# import torch
from typing import Callable, Dict, List, Optional, Tuple, Type, Union

import torch as th
from torch import nn

from stable_baselines3 import PPO
from stable_baselines3.common.policies import ActorCriticPolicy

from sklearn.preprocessing import scale


DECIMAL_SIGNS = 5
rnd = lambda x: round(x, DECIMAL_SIGNS)

#===========

In [None]:
from stable_baselines3.common.callbacks import EvalCallback

In [None]:
#============
logging.basicConfig()
log = logging.getLogger(__name__)
log.setLevel(logging.INFO)
log.info('%s logger started.', __name__)
#============

INFO:__main__:__main__ logger started.


----

## Data & Preprocessing

creating DataSource class to handle fetching data and calculating technical indicators.

In [None]:
class DataSource(object):
    def __init__(self, data_path, start_date, end_date,
                    time_frame, tickers, window_size,
                    # train_mode=True,
                    # episode_duration=480,
                    # train_split=0.8, normalize=True
                    ):

        self.tickers = tickers

        if end_date == None:
            end_date = datetime.datetime.now()
        if start_date == None:
            start_date = end_date - relativedelta(years=2)

        self.data = pd.DataFrame()
        for ticker in self.tickers:
            csv_name = os.path.join(
                            data_path,
                            ticker+start_date.strftime('_from_%Y%m%d')+end_date.strftime('_to_%Y%m%d')+".csv"
                            )
            ticker_data = self._load_data(
                                csv_name=csv_name, time_frame=time_frame,
                                start_date=start_date, end_date=end_date, ticker=ticker
                                )
            self.data = pd.concat(
                [self.data, ticker_data],
                # join='inner'
                )

        self.date_time = self.data.index
        self.count = self.data.shape[0]
        self.window_size = window_size
        self.states = self.data.values


    @staticmethod
    def seed(seed):
        np.random.seed(seed)

    
    def _load_data(self, csv_name, time_frame, start_date, end_date,  ticker):
        log.info('loading data for {}...'.format(ticker))

        if os.path.exists(csv_name):
            df = pd.read_csv(csv_name)
        else:
            with open("./tiingo_api_key.txt") as file:
                key = file.readline()
            
            # df = web.DataReader(ticker, time_frame,
            #     start=start_date,
            #     end=end_date,
            #     api_key=key
            #     ).dropna()
            df = pdr.get_data_tiingo(ticker,
                start=start_date,
                end=end_date,
                api_key=key
                ).dropna()
            
            df.columns = [col.lower() for col in df.columns]

            # print(df.columns)

            df['ret_5'] = df.adjopen.pct_change(5)
            df['ret_10'] = df.adjopen.pct_change(10)
            df['ret_21'] = df.adjopen.pct_change(21)
            df['rsi'] = talib.STOCHRSI(df.adjopen)[1]
            df['macd'] = talib.MACD(df.adjopen)[1]
            df['atr'] = talib.ATR(df.adjhigh, df.adjlow, df.adjopen)
            df = df.replace((np.inf, -np.inf), np.nan).drop(['high', 'low','close','open','adjhigh', 'adjlow','divcash','splitfactor'], axis=1).dropna()
            df.columns = [col+'_'+ticker for col in df.columns]
            df.to_csv(csv_name,index_label="date_time")
            

        log.info('got data for {}...'.format(ticker))
        return df

    def get_start_end_index(self,a,b):
        
        start_index = np.random.randint(a, b-20)
        end_index = np.random.randint(start_index+10, b)
    
        return start_index, end_index


In [None]:
config = {
            "data": {
                "time_frame": "tiingo",
                "data_path": os.path.join(os.getcwd(), "data"),
                "tickers": ["GOOG","IBM"],
                "episode_duration": 480
                },
            "seed": 42,
            "model": {
                "window_size": 10,
                "initial_cash": 1_000_000,
                "commission_rate":0,
                "start_date": datetime.datetime(2014,1,1),
                "end_date": datetime.datetime(2022,8,1),
                "stat_save_folder": None,
                "agent_save_folder": None
                 },
            "ddpg": {
                "buffer_size": 100000,
                "batch_size": 64,
                "gamma": 0.99,
                "tau": 0.001,
                "learning_rate_actor": 0.0001,
                "learning_rate_critic": 0.001,
                "explore": 1000000.,
                "weight_decay": 0,
                "eps":0.1,
                "eps_decay":0.001
                }
        }

In [None]:
data_config = config['data']
seed = config.get("seed", 42)

stat_save_path = os.path.join(os.getcwd(),"saved_stats")
if not os.path.exists(stat_save_path):
    os.mkdir(stat_save_path)
stat_save_folder  = os.path.join(stat_save_path, "ddpg_" + datetime.datetime.now().strftime('%Y%m%d'))
if not os.path.exists(stat_save_folder):
    os.mkdir(stat_save_folder)

agent_save_path = os.path.join(os.getcwd(),"saved_agents")
if not os.path.exists(agent_save_path):
    os.mkdir(agent_save_path)
agent_save_folder  = os.path.join(agent_save_path, "ddpg_" + datetime.datetime.now().strftime('%Y%m%d'))
if not os.path.exists(agent_save_folder):
    os.mkdir(agent_save_folder)

np.random.seed(seed)
window_size = config["model"]["window_size"]
data_path = config["data"]["data_path"] 
tickers = config["data"]["tickers"]
time_frame = config["data"]["time_frame"]
start_date, end_date = config["model"]["start_date"] , config["model"]["end_date"] 

In [None]:
data_path

'/content/data'

In [None]:
data_source_goog = DataSource(
    data_path=data_path,
    start_date=start_date,
    end_date=end_date,
    time_frame=time_frame,
    tickers=["GOOG"],
    window_size=window_size,
)


INFO:__main__:loading data for GOOG...
INFO:__main__:got data for GOOG...
INFO:__main__:loading data for SPY...
INFO:__main__:got data for SPY...


In [None]:
goog = data_source_goog.data

In [None]:
def preprocess(df, ticker):
    df.set_index("date_time", inplace=True)
    # df_train = df.iloc[:int(0.8*len(df)),[1,2,3,4,5,7,8,9]].copy()
    # df_test = df.iloc[int(0.8*len(df)):,[1,2,3,4,5,7,8,9]].copy()
    df_train = df.iloc[:int(0.8*len(df)),:].copy()
    df_test = df.iloc[int(0.8*len(df)):,:].copy()

    sig_train = df_train.rolling(30).std()
    sig_test = df_test.rolling(30).std()

    mu_train = df_train.rolling(30).mean()
    mu_test = df_test.rolling(30).mean()

    eps = np.finfo(np.float32).eps

    df_train_norm = ((df_train - mu_train.shift())/(sig_train + eps)).dropna()
    df_test_norm = ((df_test - mu_test.shift())/(sig_test + eps)).dropna()

    df_train_norm.columns = [col+"_norm" for col in df_train_norm.columns]
    df_test_norm.columns = [col+"_norm" for col in df_test_norm.columns]

    df_train_norm["adjclose_"+ticker] = df_train["adjclose_"+ticker][30:].values
    df_train_norm["adjopen_"+ticker] = df_train["adjopen_"+ticker][30:].values
    df_test_norm["adjclose_"+ticker] = df_test["adjclose_"+ticker][30:].values
    df_test_norm["adjopen_"+ticker] = df_test["adjopen_"+ticker][30:].values

    return(df_train, df_test, df_train_norm, df_test_norm)


In [None]:
df_train, df_test, df_train_norm, df_test_norm = preprocess(goog.copy(), "GOOG")

In [None]:
df_train.head()

Unnamed: 0_level_0,adjclose_GOOG,adjopen_GOOG,adjvolume_GOOG,ret_5_GOOG,ret_10_GOOG,rsi_GOOG,macd_GOOG,atr_GOOG
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2014-05-14 00:00:00+00:00,26.3325,26.65,23770000,0.033366,0.010235,100.0,-0.608101,0.800339
2014-05-15 00:00:00+00:00,25.999,26.285,33994000,0.033906,-0.002675,90.697465,-0.569027,0.798815
2014-05-16 00:00:00+00:00,26.0315,26.0695,29624000,0.020832,-0.023175,57.364132,-0.534635,0.778399
2014-05-19 00:00:00+00:00,26.443,25.985,25486000,-0.007278,-0.009756,24.030798,-0.505111,0.76636
2014-05-20 00:00:00+00:00,26.4885,26.487,35598000,-0.002166,0.008587,31.528101,-0.471024,0.770656
