<a href="https://colab.research.google.com/github/AI4Finance-Foundation/FinRL/blob/master/FinRL_StockTrading_NeurIPS_2018.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Reinforcement Learning for Stock Trading from Scratch: Multiple Stock Trading

* **Pytorch Version** 



# Content

* [1. Problem Definition](#0)
* [2. Getting Started - Load Python packages](#1)
    * [2.1. Install Packages](#1.1)    
    * [2.2. Check Additional Packages](#1.2)
    * [2.3. Import Packages](#1.3)
    * [2.4. Create Folders](#1.4)
* [3. Download Data](#2)
* [4. Preprocess Data](#3)        
    * [4.1. Technical Indicators](#3.1)
    * [4.2. Perform Feature Engineering](#3.2)
* [5.Build Environment](#4)  
    * [5.1. Training & Trade Data Split](#4.1)
    * [5.2. User-defined Environment](#4.2)   
    * [5.3. Initialize Environment](#4.3)    
* [6.Implement DRL Algorithms](#5)  
* [7.Backtesting Performance](#6)  
    * [7.1. BackTestStats](#6.1)
    * [7.2. BackTestPlot](#6.2)   
    * [7.3. Baseline Stats](#6.3)   
    * [7.3. Compare to Stock Market Index](#6.4)   
* [RLlib Section](#7)            

<a id='0'></a>
# Part 1. Problem Definition

This problem is to design an automated trading solution for single stock trading. We model the stock trading process as a Markov Decision Process (MDP). We then formulate our trading goal as a maximization problem.

The algorithm is trained using Deep Reinforcement Learning (DRL) algorithms and the components of the reinforcement learning environment are:


* Action: The action space describes the allowed actions that the agent interacts with the
environment. Normally, a ∈ A includes three actions: a ∈ {−1, 0, 1}, where −1, 0, 1 represent
selling, holding, and buying one stock. Also, an action can be carried upon multiple shares. We use
an action space {−k, ..., −1, 0, 1, ..., k}, where k denotes the number of shares. For example, "Buy
10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively

* Reward function: r(s, a, s′) is the incentive mechanism for an agent to learn a better action. The change of the portfolio value when action a is taken at state s and arriving at new state s',  i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio
values at state s′ and s, respectively

* State: The state space describes the observations that the agent receives from the environment. Just as a human trader needs to analyze various information before executing a trade, so
our trading agent observes many different features to better learn in an interactive environment.

* Environment: Dow 30 consituents


The data of the single stock that we will be using for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume.


<a id='1'></a>
# Part 2. Getting Started- Load Python Packages

<a id='1.1'></a>
## 2.1. Install all the packages through FinRL library


In [54]:
## install finrl library
#!pip install git+https://github.com/AI4Finance-Foundation/FinRL-Library.git

In [1]:
%reload_ext autoreload
%autoreload 2


<a id='1.2'></a>
## 2.2. Check if the additional packages needed are present, if not install them. 
* Yahoo Finance API
* pandas
* numpy
* matplotlib
* stockstats
* OpenAI gym
* stable-baselines
* tensorflow
* pyfolio

<a id='1.3'></a>
## 2.3. Import Packages

In [2]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
# matplotlib.use('Agg')
import datetime
from finrl import config
from finrl import config_tickers
import os


%matplotlib inline
from finrl.finrl_meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.finrl_meta.preprocessor.preprocessors import FeatureEngineer, data_split 
from finrl.finrl_meta.preprocessor.CryptoDataReader import CryptoDataLoader
from finrl.finrl_meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.agents.stablebaselines3.models import DRLAgent
#from finrl.finrl_meta.data_processor import DataProcessor

from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline, trx_plot


from pprint import pprint

import sys
sys.path.append("../FinRL-Library")

import itertools

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
import os
import numpy as np

from sb3_contrib import RecurrentPPO
from stable_baselines3.common.evaluation import evaluate_policy



<a id='1.4'></a>
## 2.4. Create Folders

In [4]:
root_path = 'MARKETS'
from finrl import config
from finrl import config_tickers
import os
from finrl.main import check_and_make_directories

CHOOSEN_MARKET = 'Crypto_market' 

DATA_SAVE_DIR = os.path.join(root_path, CHOOSEN_MARKET, 'DATASET')
TRAINED_MODEL_DIR = os.path.join(root_path, CHOOSEN_MARKET, 'TRAINED_MODEL_DIR')
TENSORBOARD_LOG_DIR = os.path.join(root_path, CHOOSEN_MARKET, 'TENSORBOARD_LOG_DIR')
RESULTS_DIR = os.path.join(root_path, CHOOSEN_MARKET, 'RESULTS_DIR')

check_and_make_directories([DATA_SAVE_DIR, TRAINED_MODEL_DIR, TENSORBOARD_LOG_DIR, RESULTS_DIR])

# Part 3: Download Data



-----
class YahooDownloader:
    Provides methods for retrieving daily stock data from
    Yahoo Finance API

    Attributes
    ----------
        start_date : str
            start date of the data (modified from config.py)
        end_date : str
            end date of the data (modified from config.py)
        ticker_list : list
            a list of stock tickers (modified from config.py)

    Methods
    -------
    fetch_data()
        Fetches data from yahoo API


## 3.1 use yahoo finance api to get DOWJONES 

In [None]:
df = YahooDownloader(start_date = datetime.datetime(2009, 1, 1),
                     end_date = datetime.datetime.now(),
                     ticker_list = config_tickers.DOW_30_TICKER).fetch_data()

In [6]:
print(config_tickers.DOW_30_TICKER)

['AXP', 'AMGN', 'AAPL', 'BA', 'CAT', 'CSCO', 'CVX', 'GS', 'HD', 'HON', 'IBM', 'INTC', 'JNJ', 'KO', 'JPM', 'MCD', 'MMM', 'MRK', 'MSFT', 'NKE', 'PG', 'TRV', 'UNH', 'CRM', 'VZ', 'V', 'WBA', 'WMT', 'DIS', 'DOW']


In [7]:
df.shape

(100570, 8)

In [8]:
df.sort_values(['date','tic'],ignore_index=True).head()

Unnamed: 0,date,open,high,low,close,volume,tic,day
0,2008-12-31,3.070357,3.133571,3.047857,2.602663,607541200,AAPL,2
1,2008-12-31,57.110001,58.220001,57.060001,43.587841,6287200,AMGN,2
2,2008-12-31,17.969999,18.75,17.91,14.85288,9625600,AXP,2
3,2008-12-31,41.59,43.049999,41.5,32.00589,5443100,BA,2
4,2008-12-31,43.700001,45.099998,43.700001,30.416967,6277400,CAT,2


In [12]:
data = df.copy()

## 3.2 use Finnhub api to get forex market data

#### read all symbols data and save as csv file

In [5]:
import os
forex_data_path = '/mnt/c/Users/Lenovo/financial_projects/Deep Reinforcement Learning Approaches on Stock Prediction/FinRL/forex_data'

In [None]:
import finnhub
import pandas as pd
finnhub_client = finnhub.Client(api_key='cbj22uqad3i2thcmtg80')
forex_symbols = finnhub_client.forex_symbols(exchange='oanda')
#finnhub_client.stock_candles('')

In [None]:
def read_and_save_forex_candles(symbol_name, stock_df):
    shape = stock_df.shape
    data_list = [symbol_name] * shape[0]
    stock_df['symbol'] = data_list
    stock_df.to_csv(os.path.join(forex_data_path,symbol_name + '.csv'))


In [None]:
i = 0
for symbol in forex_symbols :
    i +=1
    print(i)
    if i > 120 :
        holcv_symbol = finnhub_client.stock_candles(symbol = symbol['symbol'], resolution = '60', _from = 1577824200, to = 1659250244)
        holcv_symbol_df = pd.DataFrame(holcv_symbol)
        read_and_save_forex_candles(symbol['symbol'], holcv_symbol_df)


    

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122


#### read symbols data from csv file and merge all of them

In [None]:
forex_market_dataset = pd.DataFrame() 
for symbol in forex_symbols :
    forex_symbol_data = pd.read_csv(os.path.join(forex_data_path, symbol['symbol'] + '.csv'), index_col= [0])
    forex_market_dataset = pd.concat([forex_market_dataset, forex_symbol_data])

In [None]:
forex_market_dataset

Unnamed: 0,c,h,l,o,s,t,v,symbol
0,3419.60000,3423.70000,3412.60000,3420.60000,ok,1656655200,463,OANDA:EU50_EUR
1,3438.00000,3444.90000,3401.90000,3419.60000,ok,1656658800,1691,OANDA:EU50_EUR
2,3456.00000,3462.90000,3428.90000,3440.00000,ok,1656662400,794,OANDA:EU50_EUR
3,3463.90000,3466.00000,3450.90000,3456.90000,ok,1656666000,685,OANDA:EU50_EUR
4,3441.00000,3465.90000,3439.90000,3464.90000,ok,1656669600,468,OANDA:EU50_EUR
...,...,...,...,...,...,...,...,...
490,0.69773,0.69848,0.69680,0.69834,ok,1659110400,3036,OANDA:AUD_USD
491,0.69771,0.69784,0.69713,0.69770,ok,1659114000,1886,OANDA:AUD_USD
492,0.69951,0.69952,0.69762,0.69771,ok,1659117600,1877,OANDA:AUD_USD
493,0.69864,0.69978,0.69837,0.69950,ok,1659121200,2004,OANDA:AUD_USD


#### cleaning forex_symbols dataframe

In [None]:
forex_market_dataset.rename(columns={'c':'close','h':'high','l':'low','o':'open','t':'date','v':'volume', 'symbol':'tic'}, inplace= True)

In [None]:
dates = []
import datetime
for unixdate in forex_market_dataset['date']:
    utcdate = datetime.datetime.fromtimestamp(unixdate).strftime('%Y-%m-%dT%H:%M:%S')
    dates.append(utcdate)

In [None]:
dates

['2022-07-01T10:30:00',
 '2022-07-01T11:30:00',
 '2022-07-01T12:30:00',
 '2022-07-01T13:30:00',
 '2022-07-01T14:30:00',
 '2022-07-01T15:30:00',
 '2022-07-01T16:30:00',
 '2022-07-01T17:30:00',
 '2022-07-01T18:30:00',
 '2022-07-01T19:30:00',
 '2022-07-01T20:30:00',
 '2022-07-01T21:30:00',
 '2022-07-01T22:30:00',
 '2022-07-01T23:30:00',
 '2022-07-04T04:30:00',
 '2022-07-04T05:30:00',
 '2022-07-04T06:30:00',
 '2022-07-04T07:30:00',
 '2022-07-04T08:30:00',
 '2022-07-04T09:30:00',
 '2022-07-04T10:30:00',
 '2022-07-04T11:30:00',
 '2022-07-04T12:30:00',
 '2022-07-04T13:30:00',
 '2022-07-04T14:30:00',
 '2022-07-04T15:30:00',
 '2022-07-04T16:30:00',
 '2022-07-04T17:30:00',
 '2022-07-04T18:30:00',
 '2022-07-04T19:30:00',
 '2022-07-04T20:30:00',
 '2022-07-04T21:30:00',
 '2022-07-04T22:30:00',
 '2022-07-04T23:30:00',
 '2022-07-05T04:30:00',
 '2022-07-05T05:30:00',
 '2022-07-05T06:30:00',
 '2022-07-05T07:30:00',
 '2022-07-05T08:30:00',
 '2022-07-05T09:30:00',
 '2022-07-05T10:30:00',
 '2022-07-05T11:

In [None]:
forex_market_dataset['date'] = dates

In [None]:
forex_market_dataset

Unnamed: 0,close,high,low,open,s,date,volume,tic
0,3419.60000,3423.70000,3412.60000,3420.60000,ok,2022-07-01T10:30:00,463,OANDA:EU50_EUR
1,3438.00000,3444.90000,3401.90000,3419.60000,ok,2022-07-01T11:30:00,1691,OANDA:EU50_EUR
2,3456.00000,3462.90000,3428.90000,3440.00000,ok,2022-07-01T12:30:00,794,OANDA:EU50_EUR
3,3463.90000,3466.00000,3450.90000,3456.90000,ok,2022-07-01T13:30:00,685,OANDA:EU50_EUR
4,3441.00000,3465.90000,3439.90000,3464.90000,ok,2022-07-01T14:30:00,468,OANDA:EU50_EUR
...,...,...,...,...,...,...,...,...
490,0.69773,0.69848,0.69680,0.69834,ok,2022-07-29T20:30:00,3036,OANDA:AUD_USD
491,0.69771,0.69784,0.69713,0.69770,ok,2022-07-29T21:30:00,1886,OANDA:AUD_USD
492,0.69951,0.69952,0.69762,0.69771,ok,2022-07-29T22:30:00,1877,OANDA:AUD_USD
493,0.69864,0.69978,0.69837,0.69950,ok,2022-07-29T23:30:00,2004,OANDA:AUD_USD


In [None]:
forex_market_dataset.sort_values(['date', 'tic'], ignore_index = True)
forex_market_dataset.drop(columns='s', inplace = True)
forex_market_dataset

Unnamed: 0,close,high,low,open,date,volume,tic
0,3419.60000,3423.70000,3412.60000,3420.60000,2022-07-01T10:30:00,463,OANDA:EU50_EUR
1,3438.00000,3444.90000,3401.90000,3419.60000,2022-07-01T11:30:00,1691,OANDA:EU50_EUR
2,3456.00000,3462.90000,3428.90000,3440.00000,2022-07-01T12:30:00,794,OANDA:EU50_EUR
3,3463.90000,3466.00000,3450.90000,3456.90000,2022-07-01T13:30:00,685,OANDA:EU50_EUR
4,3441.00000,3465.90000,3439.90000,3464.90000,2022-07-01T14:30:00,468,OANDA:EU50_EUR
...,...,...,...,...,...,...,...
490,0.69773,0.69848,0.69680,0.69834,2022-07-29T20:30:00,3036,OANDA:AUD_USD
491,0.69771,0.69784,0.69713,0.69770,2022-07-29T21:30:00,1886,OANDA:AUD_USD
492,0.69951,0.69952,0.69762,0.69771,2022-07-29T22:30:00,1877,OANDA:AUD_USD
493,0.69864,0.69978,0.69837,0.69950,2022-07-29T23:30:00,2004,OANDA:AUD_USD


## 3.3 : use finnhub to get crypto market data  

In [5]:
api_key='cbj22uqad3i2thcmtg80'
choosen_symbols = ['BTC', 'ETH', 'USDT', 'USDC', 'BNB', 'XRP', 'ADA', 'BUSD', 'SOL', 'DOT']

In [None]:
data_loader = CryptoDataLoader(symbols_list= choosen_symbols, api_key=api_key)

In [None]:
crypto_df = data_loader.load_crypto_candles()

In [None]:
crypto_df

Unnamed: 0,c,h,l,o,s,t,v,symbol
0,0.06921,0.10000,0.01444,0.01444,ok,1586476800,3695792.1,BINANCE:SOLBNB
1,0.05773,0.07609,0.05568,0.06921,ok,1586563200,1792845.2,BINANCE:SOLBNB
2,0.06188,0.06589,0.05464,0.05756,ok,1586649600,1036908.3,BINANCE:SOLBNB
3,0.05171,0.06188,0.05160,0.06188,ok,1586736000,476803.6,BINANCE:SOLBNB
4,0.04272,0.05264,0.04040,0.05171,ok,1586822400,617685.5,BINANCE:SOLBNB
...,...,...,...,...,...,...,...,...
995,1.23700,1.33100,1.23700,1.31300,ok,1639612800,1450240.9,BINANCE:ADAUSDC
996,1.22200,1.25800,1.18300,1.23900,ok,1639699200,3451131.2,BINANCE:ADAUSDC
997,1.24300,1.26800,1.19900,1.21900,ok,1639785600,658339.6,BINANCE:ADAUSDC
998,1.24400,1.31100,1.24100,1.24200,ok,1639872000,1262579.4,BINANCE:ADAUSDC


In [None]:
crypto_symbols_df = data_loader.cleansing_holcv_dataframes(crypto_df)

In [None]:
dates = []
import datetime
for unixdate in crypto_symbols_df['date']:
    utcdate = datetime.datetime.fromtimestamp(unixdate).strftime('%Y-%m-%d')
    dates.append(utcdate)
crypto_symbols_df['date'] = dates

In [None]:
data = crypto_symbols_df.copy()
data  = data_loader.crypto_clean_data(data)

Unnamed: 0,close,high,low,open,date,volume,tic
0,0.06921,0.10000,0.01444,0.01444,2020-04-10,3695792.1,BINANCE:SOLBNB
1,0.05773,0.07609,0.05568,0.06921,2020-04-11,1792845.2,BINANCE:SOLBNB
2,0.06188,0.06589,0.05464,0.05756,2020-04-12,1036908.3,BINANCE:SOLBNB
3,0.05171,0.06188,0.05160,0.06188,2020-04-13,476803.6,BINANCE:SOLBNB
4,0.04272,0.05264,0.04040,0.05171,2020-04-14,617685.5,BINANCE:SOLBNB
...,...,...,...,...,...,...,...
995,1.23700,1.33100,1.23700,1.31300,2021-12-16,1450240.9,BINANCE:ADAUSDC
996,1.22200,1.25800,1.18300,1.23900,2021-12-17,3451131.2,BINANCE:ADAUSDC
997,1.24300,1.26800,1.19900,1.21900,2021-12-18,658339.6,BINANCE:ADAUSDC
998,1.24400,1.31100,1.24100,1.24200,2021-12-19,1262579.4,BINANCE:ADAUSDC


In [None]:
data.to_csv(os.path.join(DATA_SAVE_DIR,'Dataset.csv'))

NameError: name 'data' is not defined

In [6]:
data_name = 'Dataset.csv'
data  = pd.read_csv(os.path.join(DATA_SAVE_DIR,data_name))
data.drop(columns={'Unnamed: 0'}, inplace= True)

## 3.4 :use yahoofinance api to get s&p 500

In [None]:
import requests
import bs4 as bs
import pickle

def save_sp500_tickers():
    resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
    soup = bs.BeautifulSoup(resp.text, 'lxml')
    table = soup.find('table', {'class': 'wikitable sortable'})
    tickers = []
    for row in table.findAll('tr')[1:]:
        ticker = row.findAll('td')[0].text
        tickers.append(ticker)
    with open("sp500tickers.pickle", "wb") as f:
        pickle.dump(tickers, f)
    return tickers

In [None]:
sp_tickers = save_sp500_tickers()
data = YahooDownloader(start_date = datetime.datetime(2016, 1, 1),
                     end_date = datetime.datetime.now(),
                     ticker_list = sp_tickers).fetch_data()

In [86]:
data = pd.DataFrame(data)
data.to_csv(os.path.join(DATA_SAVE_DIR,'sp500.csv'))

In [5]:
data = pd.read_csv(os.path.join(DATA_SAVE_DIR,'sp500.csv'))
data.drop(columns={'Unnamed: 0'}, inplace= True)

# Part 4: Preprocess Data
Data preprocessing is a crucial step for training a high quality machine learning model. We need to check for missing data and do feature engineering in order to convert the data into a model-ready state.
* Add technical indicators. In practical trading, various information needs to be taken into account, for example the historical stock prices, current holding shares, technical indicators, etc. In this article, we demonstrate two trend-following technical indicators: MACD and RSI.
* Add turbulence index. Risk-aversion reflects whether an investor will choose to preserve the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the financial turbulence index that measures extreme asset price fluctuation.

In [7]:
fe = FeatureEngineer(
                    use_technical_indicator = True,
                    tech_indicator_list =  config.INDICATORS,
                    use_vix= True ,
                    use_turbulence=True,
                    user_defined_feature = False)

processed = fe.preprocess_data(data)


[*********************100%***********************]  1 of 1 completed
Shape of DataFrame:  (487, 8)
Successfully added vix


  data_df = data_df.append(temp_df)


Successfully added turbulence index


In [8]:
processed

Unnamed: 0,close,high,low,open,date,volume,tic,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
0,0.002141,0.002163,0.002121,0.002134,2019-10-21,1942025.0,BINANCE:ADABNB,0.000000e+00,0.002164,0.002093,0.000000,66.666667,100.000000,0.002141,0.002141,14.00,0.00000
1,0.000005,0.000005,0.000005,0.000005,2019-10-21,77735781.0,BINANCE:ADABTC,0.000000e+00,0.002164,0.002093,0.000000,66.666667,100.000000,0.000005,0.000005,14.00,0.00000
2,0.000225,0.000226,0.000221,0.000225,2019-10-21,5301873.0,BINANCE:ADAETH,0.000000e+00,0.002164,0.002093,0.000000,66.666667,100.000000,0.000225,0.000225,14.00,0.00000
3,0.039250,0.039540,0.038540,0.039440,2019-10-21,1540760.8,BINANCE:ADAUSDC,0.000000e+00,0.002164,0.002093,0.000000,66.666667,100.000000,0.039250,0.039250,14.00,0.00000
4,0.039040,0.039490,0.038390,0.039460,2019-10-21,86919604.0,BINANCE:ADAUSDT,0.000000e+00,0.002164,0.002093,0.000000,66.666667,100.000000,0.039040,0.039040,14.00,0.00000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12170,0.000022,0.000022,0.000022,0.000022,2021-09-24,44863051.0,BINANCE:XRPBTC,-4.539272e-07,0.000026,0.000021,48.163652,-117.877633,8.744270,0.000024,0.000023,17.75,6.24007
12171,0.944700,1.003200,0.886500,1.002000,2021-09-24,57552288.0,BINANCE:XRPBUSD,-4.283220e-02,1.310501,0.847159,47.181216,-144.829265,26.002737,1.115060,1.038177,17.75,6.24007
12172,0.000322,0.000326,0.000316,0.000318,2021-09-24,5815389.0,BINANCE:XRPETH,-4.566231e-06,0.000343,0.000300,49.132923,-38.514564,4.883908,0.000328,0.000325,17.75,6.24007
12173,0.944900,1.002200,0.885100,1.000000,2021-09-24,6952389.0,BINANCE:XRPUSDC,-4.268827e-02,1.308576,0.848214,47.181044,-145.588074,25.820468,1.114743,1.038108,17.75,6.24007


In [9]:
processed.dropna(axis=1)

Unnamed: 0,close,high,low,open,date,volume,tic,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
0,0.002141,0.002163,0.002121,0.002134,2019-10-21,1942025.0,BINANCE:ADABNB,0.000000e+00,0.002164,0.002093,0.000000,66.666667,100.000000,0.002141,0.002141,14.00,0.00000
1,0.000005,0.000005,0.000005,0.000005,2019-10-21,77735781.0,BINANCE:ADABTC,0.000000e+00,0.002164,0.002093,0.000000,66.666667,100.000000,0.000005,0.000005,14.00,0.00000
2,0.000225,0.000226,0.000221,0.000225,2019-10-21,5301873.0,BINANCE:ADAETH,0.000000e+00,0.002164,0.002093,0.000000,66.666667,100.000000,0.000225,0.000225,14.00,0.00000
3,0.039250,0.039540,0.038540,0.039440,2019-10-21,1540760.8,BINANCE:ADAUSDC,0.000000e+00,0.002164,0.002093,0.000000,66.666667,100.000000,0.039250,0.039250,14.00,0.00000
4,0.039040,0.039490,0.038390,0.039460,2019-10-21,86919604.0,BINANCE:ADAUSDT,0.000000e+00,0.002164,0.002093,0.000000,66.666667,100.000000,0.039040,0.039040,14.00,0.00000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12170,0.000022,0.000022,0.000022,0.000022,2021-09-24,44863051.0,BINANCE:XRPBTC,-4.539272e-07,0.000026,0.000021,48.163652,-117.877633,8.744270,0.000024,0.000023,17.75,6.24007
12171,0.944700,1.003200,0.886500,1.002000,2021-09-24,57552288.0,BINANCE:XRPBUSD,-4.283220e-02,1.310501,0.847159,47.181216,-144.829265,26.002737,1.115060,1.038177,17.75,6.24007
12172,0.000322,0.000326,0.000316,0.000318,2021-09-24,5815389.0,BINANCE:XRPETH,-4.566231e-06,0.000343,0.000300,49.132923,-38.514564,4.883908,0.000328,0.000325,17.75,6.24007
12173,0.944900,1.002200,0.885100,1.000000,2021-09-24,6952389.0,BINANCE:XRPUSDC,-4.268827e-02,1.308576,0.848214,47.181044,-145.588074,25.820468,1.114743,1.038108,17.75,6.24007


In [10]:
list_ticker = processed["tic"].unique().tolist()
list_date = list(pd.date_range(processed['date'].min(),processed['date'].max()).astype(str))
combination = list(itertools.product(list_date,list_ticker))

processed_full = pd.DataFrame(combination,columns=["date","tic"]).merge(processed,on=["date","tic"],how="left")
processed_full = processed_full[processed_full['date'].isin(processed['date'])]
processed_full = processed_full.sort_values(['date','tic'])

processed_full = processed_full.fillna(0)

In [11]:
processed_full.sort_values(['date','tic'],ignore_index=True).tail(29)

Unnamed: 0,date,tic,close,high,low,open,volume,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
12146,2021-09-23,BINANCE:XRPBUSD,1.002,1.0172,0.9684,1.0036,38622300.0,-0.0388847,1.329823,0.858917,49.018979,-111.684122,15.8526,1.12271,1.032827,18.629999,4.943331
12147,2021-09-23,BINANCE:XRPETH,0.000318,0.000329,0.000313,0.000326,4994810.0,-5.277261e-06,0.000343,0.0003,48.176868,-52.245069,4.883908,0.000329,0.000325,18.629999,4.943331
12148,2021-09-23,BINANCE:XRPUSDC,1.0,1.0164,0.9685,1.0034,5366165.0,-0.03874903,1.32808,0.85978,48.953416,-112.583238,15.593849,1.12235,1.032757,18.629999,4.943331
12149,2021-09-23,BINANCE:XRPUSDT,1.0016,1.017,0.9676,1.0033,409673900.0,-0.03876705,1.328918,0.859222,49.014088,-112.081677,16.695392,1.122353,1.032418,18.629999,4.943331
12150,2021-09-24,BINANCE:ADABNB,0.006411,0.006648,0.006028,0.006068,4747376.0,0.0001161221,0.006397,0.005444,60.612147,194.790314,28.730114,0.005904,0.00531,17.75,6.24007
12151,2021-09-24,BINANCE:ADABTC,5.3e-05,5.5e-05,5e-05,5.2e-05,123244400.0,-3.749818e-07,5.8e-05,4.7e-05,54.842591,-27.797538,7.973718,5.4e-05,4.8e-05,17.75,6.24007
12152,2021-09-24,BINANCE:ADAETH,0.000778,0.000817,0.000732,0.000738,11561320.0,-6.291115e-07,0.000789,0.000658,55.575027,25.537406,33.261016,0.000754,0.000686,17.75,6.24007
12153,2021-09-24,BINANCE:ADAUSDC,2.278,2.347,2.06,2.329,7531239.0,-0.06842359,2.860438,1.989262,50.847043,-94.783863,29.242234,2.56,2.201948,17.75,6.24007
12154,2021-09-24,BINANCE:ADAUSDT,2.277,2.345,2.063,2.327,382004500.0,-0.06889023,2.860247,1.989453,50.840693,-94.723161,29.788195,2.559833,2.201883,17.75,6.24007
12155,2021-09-24,BINANCE:BNBBTC,0.008294,0.008572,0.00814,0.008549,220028.1,-0.0002314059,0.009533,0.008156,43.145741,-108.303501,22.616339,0.009206,0.008907,17.75,6.24007


<a id='4'></a>
# Part 5. Design Environment
Considering the stochastic and interactive nature of the automated stock trading tasks, a financial task is modeled as a **Markov Decision Process (MDP)** problem. The training process involves observing stock price change, taking an action and reward's calculation to have the agent adjusting its strategy accordingly. By interacting with the environment, the trading agent will derive a trading strategy with the maximized rewards as time proceeds.

Our trading environments, based on OpenAI Gym framework, simulate live stock markets with real market data according to the principle of time-driven simulation.

The action space describes the allowed actions that the agent interacts with the environment. Normally, action a includes three actions: {-1, 0, 1}, where -1, 0, 1 represent selling, holding, and buying one share. Also, an action can be carried upon multiple shares. We use an action space {-k,…,-1, 0, 1, …, k}, where k denotes the number of shares to buy and -k denotes the number of shares to sell. For example, "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or -10, respectively. The continuous action space needs to be normalized to [-1, 1], since the policy is defined on a Gaussian distribution, which needs to be normalized and symmetric.

In [13]:
processed_full['date'].min()

'2019-10-21'

## Training data and Trading data split

In [14]:
# from config.py TRAIN_START_DATE is a string
#config.TRAIN_START_DATE
train_start_date = datetime.datetime(2019,10,21).strftime('%Y-%m-%d')
# from config.py TRAIN_END_DATE is a string
train_end_date = datetime.datetime(2021,4,24).strftime('%Y-%m-%d')
trade_end_date = datetime.datetime(2021,9,24).strftime('%Y-%m-%d')
train = data_split(processed_full, train_start_date ,train_end_date)
trade = data_split(processed_full, train_end_date ,trade_end_date)
print(len(train))
print(len(trade))

9500
2650


In [15]:
train.tail()

Unnamed: 0,date,tic,close,high,low,open,volume,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
379,2021-04-23,BINANCE:XRPBTC,2.3e-05,2.3e-05,1.8e-05,2.2e-05,393793000.0,3e-06,3.2e-05,1.3e-05,60.738954,38.157296,0.311254,1.8e-05,1.4e-05,17.33,27.440439
379,2021-04-23,BINANCE:XRPBUSD,1.17175,1.19752,0.8859,1.15573,189464700.0,0.145651,1.929796,0.673339,56.544387,7.484767,14.447515,1.055094,0.760606,17.33,27.440439
379,2021-04-23,BINANCE:XRPETH,0.000495,0.000508,0.000415,0.000482,25737020.0,4.5e-05,0.000813,0.000345,52.997886,-6.297251,3.230725,0.000488,0.000384,17.33,27.440439
379,2021-04-23,BINANCE:XRPUSDC,1.17511,1.196,0.8846,1.15559,17865540.0,0.14609,1.930624,0.673937,56.587112,7.362434,14.037158,1.05556,0.760798,17.33,27.440439
379,2021-04-23,BINANCE:XRPUSDT,1.17073,1.19724,0.88601,1.15489,2651342000.0,0.145328,1.928916,0.673397,56.516724,7.441843,14.111294,1.054794,0.760397,17.33,27.440439


In [16]:
trade.head()

Unnamed: 0,date,tic,close,high,low,open,volume,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
0,2021-04-26,BINANCE:ADABNB,0.002315,0.002348,0.002151,0.002162,9251394.0,-0.000391038,0.003048,0.001932,38.958367,-75.731894,41.409773,0.002896,0.003792,17.639999,26.923526
0,2021-04-26,BINANCE:ADABTC,2.3e-05,2.3e-05,2.2e-05,2.2e-05,61939840.0,3.839654e-07,2.4e-05,2e-05,55.875077,97.441412,3.761324,2.2e-05,2.2e-05,17.639999,26.923526
0,2021-04-26,BINANCE:ADAETH,0.000489,0.000498,0.000463,0.00047,5843893.0,-3.609651e-05,0.000652,0.000459,43.633933,-161.045583,25.493666,0.000574,0.000637,17.639999,26.923526
0,2021-04-26,BINANCE:ADAUSDC,1.2373,1.2524,1.0806,1.0932,3501569.0,-0.01380237,1.487651,1.029315,53.334135,-44.771478,21.340004,1.238175,1.206592,17.639999,26.923526
0,2021-04-26,BINANCE:ADAUSDT,1.2375,1.2509,1.0793,1.091,473369900.0,-0.01342379,1.485886,1.032705,53.312587,-46.223462,20.622522,1.238797,1.206566,17.639999,26.923526


In [17]:
config.INDICATORS

['macd',
 'boll_ub',
 'boll_lb',
 'rsi_30',
 'cci_30',
 'dx_30',
 'close_30_sma',
 'close_60_sma']

In [18]:
stock_dimension = len(train.tic.unique())
state_space = 1 + len(config.INDICATORS)*stock_dimension + 2*stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")


Stock Dimension: 25, State Space: 251


In [19]:
buy_cost_list = sell_cost_list = [0.001] * stock_dimension
num_stock_shares = [0] * stock_dimension

env_kwargs = {
    "hmax": 100,
    "initial_amount": 1000000,
    "num_stock_shares": num_stock_shares,
    "buy_cost_pct": buy_cost_list,
    "sell_cost_pct": sell_cost_list,
    "state_space": state_space,
    "stock_dim": stock_dimension,
    "tech_indicator_list": config.INDICATORS,
    "action_space": stock_dimension,
    "reward_scaling": 1e-4
}


e_train_gym = StockTradingEnv(df = train, **env_kwargs)

## Environment for Training



In [51]:
env_train, _ = e_train_gym.get_sb_env()
print(type(env_train))

ValueError: could not broadcast input array from shape (291,) into shape (233,)

<a id='5'></a>
# Part 6: Implement DRL Algorithms
* The implementation of the DRL algorithms are based on **OpenAI Baselines** and **Stable Baselines**. Stable Baselines is a fork of OpenAI Baselines, with a major structural refactoring, and code cleanups.
* FinRL library includes fine-tuned standard DRL algorithms, such as DQN, DDPG,
Multi-Agent DDPG, PPO, SAC, A2C and TD3. We also allow users to
design their own DRL algorithms by adapting these DRL algorithms.

### Model Training: 5 models, A2C DDPG, PPO, TD3, SAC


### Model 1: A2C


In [None]:
agent = DRLAgent(env = env_train)
model_a2c = agent.get_model("a2c")
model_a2c.pre

{'n_steps': 5, 'ent_coef': 0.01, 'learning_rate': 0.0007}
Using cpu device


In [None]:
model_name = 'a2c_'
trained_a2c = agent.train_model(model=model_a2c, 
                             tb_log_name='a2c',
                             total_timesteps=50000)
trained_a2c.save(os.path.join(TRAINED_MODEL_DIR, model_name + ".pth"))

---------------------------------------
| time/                 |             |
|    fps                | 58          |
|    iterations         | 100         |
|    time_elapsed       | 8           |
|    total_timesteps    | 500         |
| train/                |             |
|    entropy_loss       | -96.5       |
|    explained_variance | -52.8       |
|    learning_rate      | 0.0007      |
|    n_updates          | 99          |
|    policy_loss        | -16.7       |
|    reward             | 0.044553936 |
|    std                | 1.02        |
|    value_loss         | 0.0917      |
---------------------------------------
----------------------------------------
| time/                 |              |
|    fps                | 69           |
|    iterations         | 200          |
|    time_elapsed       | 14           |
|    total_timesteps    | 1000         |
| train/                |              |
|    entropy_loss       | -97.4        |
|    explained_variance | -9.89 

### Model 2: DDPG

In [43]:
from stable_baselines3.common.utils import get_schedule_fn
agent = DRLAgent(env = env_train)
DDPG_PARAMS = {"batch_size": 128, "buffer_size": 50, "learning_rate": 0.1}
model_ddpg = agent.get_model("ddpg",model_kwargs= DDPG_PARAMS,  tensorboard_log = TENSORBOARD_LOG_DIR)

{'batch_size': 128, 'buffer_size': 50, 'learning_rate': 0.1}
Using cpu device


In [39]:
model_name  = 'DDPG_'
total_timesteps = 50000
trained_ddpg = agent.train_model(model=model_ddpg, 
                             tb_log_name='ddpg',
                             total_timesteps=total_timesteps)
trained_ddpg.save(os.path.join(TRAINED_MODEL_DIR, model_name + str(total_timesteps) + ".pth"))

Logging to Crypto_market/TENSORBOARD_LOG_DIR/ddpg_3
---------------------------------
| time/              |          |
|    episodes        | 4        |
|    fps             | 39       |
|    time_elapsed    | 38       |
|    total_timesteps | 1508     |
| train/             |          |
|    actor_loss      | -2.39    |
|    critic_loss     | 15.2     |
|    learning_rate   | 0.1      |
|    n_updates       | 1131     |
|    reward          | 1.401983 |
---------------------------------
day: 376, episode: 80
begin_total_asset: 1000000.00
end_total_asset: 999001.00
total_reward: -999.00
total_cost: 999.00
total_trades: 4512
Sharpe: 1.001
---------------------------------
| time/              |          |
|    episodes        | 8        |
|    fps             | 33       |
|    time_elapsed    | 89       |
|    total_timesteps | 3016     |
| train/             |          |
|    actor_loss      | -3.93    |
|    critic_loss     | 15.5     |
|    learning_rate   | 0.1      |
|    n_update

KeyboardInterrupt: 

In [23]:
TENSORBOARD_LOG_DIR

'Crypto_market/TENSORBOARD_LOG_DIR'

### Model 3: PPO

In [19]:
agent = DRLAgent(env = env_train)
PPO_PARAMS = config.PPO_PARAMS
model_ppo = agent.get_model("ppo",model_kwargs = PPO_PARAMS, tensorboard_log= TENSORBOARD_LOG_DIR)

{'n_steps': 2048, 'ent_coef': 0.01, 'learning_rate': 0.00025, 'batch_size': 64}
Using cpu device


In [20]:
model_ppo.device

device(type='cpu')

In [21]:
model_name  = 'ppo_'
model_version = '50000'
trained_ppo = agent.train_model(model=model_ppo, 
                             tb_log_name='ppo',
                             total_timesteps=50000)
trained_ppo.save(os.path.join(TRAINED_MODEL_DIR, model_name + ".pth"))

Logging to MARKETS/Sp500_market/TENSORBOARD_LOG_DIR/ppo_3
-----------------------------------
| time/              |            |
|    fps             | 7          |
|    iterations      | 1          |
|    time_elapsed    | 271        |
|    total_timesteps | 2048       |
| train/             |            |
|    reward          | 0.49223363 |
-----------------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 7          |
|    iterations           | 2          |
|    time_elapsed         | 518        |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.30981565 |
|    clip_fraction        | 0.771      |
|    clip_range           | 0.2        |
|    entropy_loss         | -691       |
|    explained_variance   | -0.0256    |
|    learning_rate        | 0.00025    |
|    loss                 | -5.44      |
|    n_updates            | 10         |
|  

### Model 4: TD3

In [None]:
agent = DRLAgent(env = env_train)
TD3_PARAMS = {"batch_size": 100, 
              "buffer_size": 1000000, 
              "learning_rate": 0.001}

model_td3 = agent.get_model("td3",model_kwargs = TD3_PARAMS)

In [None]:
model_name ='td3_'
trained_td3 = agent.train_model(model=model_td3, 
                             tb_log_name='td3',
                             total_timesteps=30000)
trained_td3.save(os.path.join(TRAINED_MODEL_DIR, model_name + ".pth"))

### Model 5: SAC

In [None]:
agent = DRLAgent(env = env_train)
SAC_PARAMS = {
    "batch_size": 128,
    "buffer_size": 1000000,
    "learning_rate": 0.0001,
    "learning_starts": 100,
    "ent_coef": "auto_0.1",
}

model_sac = agent.get_model("sac",model_kwargs = SAC_PARAMS)

{'batch_size': 128, 'buffer_size': 1000000, 'learning_rate': 0.0001, 'learning_starts': 100, 'ent_coef': 'auto_0.1'}
Using cpu device


In [None]:
model_name = 'sac_'
trained_sac = agent.train_model(model=model_sac, 
                             tb_log_name='sac',
                             total_timesteps=60000)
trained_sac.save(os.path.join(TRAINED_MODEL_DIR, model_name + ".pth"))

----------------------------------
| time/              |           |
|    episodes        | 4         |
|    fps             | 23        |
|    time_elapsed    | 63        |
|    total_timesteps | 1508      |
| train/             |           |
|    actor_loss      | 1.22e+03  |
|    critic_loss     | 7.89e+03  |
|    ent_coef        | 0.115     |
|    ent_coef_loss   | 1.08e+03  |
|    learning_rate   | 0.0001    |
|    n_updates       | 1407      |
|    reward          | 4.9791036 |
----------------------------------
----------------------------------
| time/              |           |
|    episodes        | 8         |
|    fps             | 19        |
|    time_elapsed    | 156       |
|    total_timesteps | 3016      |
| train/             |           |
|    actor_loss      | 1.66e+03  |
|    critic_loss     | 8.25e+03  |
|    ent_coef        | 0.134     |
|    ent_coef_loss   | 1.01e+03  |
|    learning_rate   | 0.0001    |
|    n_updates       | 2915      |
|    reward         

KeyboardInterrupt: 

### Model 6: recurrentppo

In [45]:
model = RecurrentPPO("MlpLstmPolicy", env=env_train, verbose=1)
model.learn(50000)
model_version = '50000_iter_'
model_name = 'recurrent_ppo'

env = model.get_env()
mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=20, warn=False)
print(mean_reward)

model.save(os.path.join(TRAINED_MODEL_DIR, model_version + model_name  + ".pth"))

Using cpu device
----------------------------
| time/              |     |
|    fps             | 121 |
|    iterations      | 1   |
|    time_elapsed    | 1   |
|    total_timesteps | 128 |
----------------------------
---------------------------------------
| time/                   |           |
|    fps                  | 34        |
|    iterations           | 2         |
|    time_elapsed         | 7         |
|    total_timesteps      | 256       |
| train/                  |           |
|    approx_kl            | 6.0098896 |
|    clip_fraction        | 0.429     |
|    clip_range           | 0.2       |
|    entropy_loss         | -23       |
|    explained_variance   | 0.016     |
|    learning_rate        | 0.0003    |
|    loss                 | 52.7      |
|    n_updates            | 10        |
|    policy_gradient_loss | 0.041     |
|    std                  | 1         |
|    value_loss           | 96.5      |
---------------------------------------
--------------------

# Trading
Assume that we have $1,000,000 initial capital at 2020-07-01. We use the DDPG model to trade Dow jones 30 stocks.

### Set turbulence threshold
Set the turbulence threshold to be greater than the maximum of insample turbulence data, if current turbulence index is greater than the threshold, then we assume that the current market is volatile

In [20]:
data_risk_indicator = processed_full[(processed_full.date<trade_end_date) & (processed_full.date> train_end_date)]
insample_risk_indicator = data_risk_indicator.drop_duplicates(subset=['date'])

In [21]:
insample_risk_indicator.vix.describe()

count    106.000000
mean      18.132075
std        2.200087
min       15.070000
25%       16.425000
50%       17.725000
75%       18.840000
max       27.590000
Name: vix, dtype: float64

In [22]:
insample_risk_indicator.vix.quantile(0.996)

26.80039970397949

In [23]:
insample_risk_indicator.turbulence.describe()

count    106.000000
mean      23.324069
std       47.602698
min        3.014002
25%        6.395789
50%       10.602426
75%       19.930865
max      306.349405
Name: turbulence, dtype: float64

In [24]:
insample_risk_indicator.turbulence.quantile(0.996)

305.9095537826083

### Trade

DRL model needs to update periodically in order to take full advantage of the data, ideally we need to retrain our model yearly, quarterly, or monthly. We also need to tune the parameters along the way, in this notebook I only use the in-sample data from 2009-01 to 2020-07 to tune the parameters once, so there is some alpha decay here as the length of trade date extends. 

Numerous hyperparameters – e.g. the learning rate, the total number of samples to train on – influence the learning process and are usually determined by testing some variations.

In [25]:
#trade = data_split(processed_full, '2020-07-01','2021-10-31')
e_trade_gym = StockTradingEnv(df = trade, turbulence_threshold = 70, risk_indicator_col='vix', **env_kwargs)
# env_trade, obs_trade = e_trade_gym.get_sb_env()

In [26]:
trade.head()

Unnamed: 0,date,tic,close,high,low,open,volume,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
0,2021-04-26,BINANCE:ADABNB,0.002315,0.002348,0.002151,0.002162,9251394.0,-0.000391038,0.003048,0.001932,38.958367,-75.731894,41.409773,0.002896,0.003792,17.639999,26.923526
0,2021-04-26,BINANCE:ADABTC,2.3e-05,2.3e-05,2.2e-05,2.2e-05,61939840.0,3.839654e-07,2.4e-05,2e-05,55.875077,97.441412,3.761324,2.2e-05,2.2e-05,17.639999,26.923526
0,2021-04-26,BINANCE:ADAETH,0.000489,0.000498,0.000463,0.00047,5843893.0,-3.609651e-05,0.000652,0.000459,43.633933,-161.045583,25.493666,0.000574,0.000637,17.639999,26.923526
0,2021-04-26,BINANCE:ADAUSDC,1.2373,1.2524,1.0806,1.0932,3501569.0,-0.01380237,1.487651,1.029315,53.334135,-44.771478,21.340004,1.238175,1.206592,17.639999,26.923526
0,2021-04-26,BINANCE:ADAUSDT,1.2375,1.2509,1.0793,1.091,473369900.0,-0.01342379,1.485886,1.032705,53.312587,-46.223462,20.622522,1.238797,1.206566,17.639999,26.923526


In [27]:
model_name = 'ppo_.pth'
train_model_path = os.path.join(TRAINED_MODEL_DIR, model_name)
trained_ppo = DRLAgent.DRL_load_from_file(model_name = 'ppo' ,
    cwd=train_model_path)

Successfully load model MARKETS/Crypto_market/TRAINED_MODEL_DIR/ppo_.pth


In [28]:
df_account_value, df_actions = DRLAgent.DRL_prediction(
    model=trained_ppo, 
    environment=e_trade_gym)

hit end!


In [82]:
#import os
#action_value_df = pd.merge(df_actions, df_account_value,on='date')
#action_value_df.to_csv(os.path.join(RESULTS_DIR,'ppo_100000_total_timesteps_actions_account_value.csv'))

### Trade by recurrentppo

we use recurrent ppo as alternative to finrl's drl agents beacause our env is partially observable and we need memory so we use recurrent ppo (lstm ppo) for using of recurrent prediction 


In [29]:
#trade = data_split(processed_full, '2020-07-01','2021-10-31')
e_trade_gym = StockTradingEnv(df = trade,  turbulence_threshold = 70,risk_indicator_col='vix', **env_kwargs)
env_trade, obs_trade = e_trade_gym.get_sb_env()

In [30]:
model_version = '50000_iter_'
model_name = 'recurrent_ppo'
model = RecurrentPPO.load(os.path.join(TRAINED_MODEL_DIR,  model_version + model_name + ".pth"))

In [31]:
account_memory = []
actions_memory = []
lstm_states = None
num_envs = 1
episode_starts = np.ones((num_envs,), dtype=bool)
#         state_memory=[] #add memory pool to store states
env_trade.reset()
for i in range(len(e_trade_gym.df.index.unique())):
    action, lstm_states = model.predict(obs_trade, state=lstm_states, episode_start=episode_starts, deterministic=True)
            # account_memory = test_env.env_method(method_name="save_asset_memory")
            # actions_memory = test_env.env_method(method_name="save_action_memory")
    obs_trade, rewards, dones, info = env_trade.step(action)
    episode_starts = dones
    env_trade.render()
    if i == (len(e_trade_gym.df.index.unique()) - 2):
        account_memory = env_trade.env_method(method_name="save_asset_memory")
        actions_memory = env_trade.env_method(method_name="save_action_memory")
#                 state_memory=test_env.env_method(method_name="save_state_memory") # add current state to state memory
    if dones[0]:
        print(i)
        print("hit end!")


105
hit end!


In [32]:
df_account_value = pd.DataFrame(account_memory[0])
df_actions_memory = pd.DataFrame(actions_memory[0])

In [33]:
df_account_value

Unnamed: 0,date,account_value
0,2021-04-26,1.000000e+06
1,2021-04-27,1.017711e+06
2,2021-04-28,1.014192e+06
3,2021-04-29,9.902239e+05
4,2021-04-30,1.067545e+06
...,...,...
101,2021-09-17,8.741583e+05
102,2021-09-20,7.944693e+05
103,2021-09-21,7.521426e+05
104,2021-09-22,8.052165e+05


Unnamed: 0_level_0,BINANCE:ADABNB,BINANCE:ADABTC,BINANCE:ADAETH,BINANCE:ADAUSDC,BINANCE:ADAUSDT,BINANCE:BNBBTC,BINANCE:BNBBUSD,BINANCE:BNBETH,BINANCE:BNBUSDC,BINANCE:BNBUSDT,...,BINANCE:ETHBUSD,BINANCE:ETHUSDC,BINANCE:ETHUSDT,BINANCE:USDCUSDT,BINANCE:XRPBNB,BINANCE:XRPBTC,BINANCE:XRPBUSD,BINANCE:XRPETH,BINANCE:XRPUSDC,BINANCE:XRPUSDT
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021-04-26,0,0,0,0,100,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2021-04-27,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2021-04-28,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2021-04-29,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2021-04-30,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-09-16,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2021-09-17,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2021-09-20,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2021-09-21,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


<a id='6'></a>
# Part 7: Backtest Our Strategy
Backtesting plays a key role in evaluating the performance of a trading strategy. Automated backtesting tool is preferred because it reduces the human error. We usually use the Quantopian pyfolio package to backtest our trading strategies. It is easy to use and consists of various individual plots that provide a comprehensive image of the performance of a trading strategy.

<a id='6.1'></a>
## 7.1 BackTestStats
pass in df_account_value, this information is stored in env class


In [30]:
#print("==============Get Backtest Results===========")

#perf_stats_all = backtest_stats(account_value=df_account_value)
#perf_stats_all = pd.DataFrame(perf_stats_all)
#perf_stats_all

Annual return         -0.257744
Cumulative returns    -0.120958
Annual volatility      0.836281
Sharpe ratio           0.063197
Calmar ratio          -0.523235
Stability              0.000089
Max drawdown          -0.492597
Omega ratio            1.010460
Sortino ratio          0.085508
Skew                        NaN
Kurtosis                    NaN
Tail ratio             0.936609
Daily value at risk   -0.105152
dtype: float64


  stats = pd.Series()


Unnamed: 0,0
Annual return,-0.257744
Cumulative returns,-0.120958
Annual volatility,0.836281
Sharpe ratio,0.063197
Calmar ratio,-0.523235
Stability,8.9e-05
Max drawdown,-0.492597
Omega ratio,1.01046
Sortino ratio,0.085508
Skew,


In [None]:
#baseline stats
#print("==============Get Baseline Stats===========")

#stats = backtest_stats(trade[trade['tic'] == 'BINANCE:ADABTC'], value_col_name = 'close')
#stats.to_csv(os.path.join(RESULTS_DIR,'baseline_stats' + ".csv"))


Annual return           5.986155
Cumulative returns      1.318283
Annual volatility       0.919731
Sharpe ratio            2.577352
Calmar ratio           17.354357
Stability               0.456362
Max drawdown           -0.344937
Omega ratio             1.618750
Sortino ratio           5.224333
Skew                         NaN
Kurtosis                     NaN
Tail ratio              1.916473
Daily value at risk    -0.106469
dtype: float64


  stats = pd.Series()


In [None]:
trade

Unnamed: 0,date,tic,close,high,low,open,volume,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
0,2021-04-21,BINANCE:ADABNB,0.002212,0.002260,0.002059,0.002159,18845377.0,-4.532893e-04,0.003651,0.001904,37.452199,-105.344339,50.613529,0.003273,0.003951,17.500000,21.358722
0,2021-04-21,BINANCE:ADABTC,0.000022,0.000023,0.000022,0.000022,70292728.0,4.537259e-07,0.000024,0.000019,54.946082,93.856314,11.920094,0.000021,0.000022,17.500000,21.358722
0,2021-04-21,BINANCE:ADAETH,0.000511,0.000556,0.000510,0.000543,7790865.0,-2.257796e-05,0.000626,0.000530,44.226765,-137.574945,7.374134,0.000608,0.000650,17.500000,21.358722
0,2021-04-21,BINANCE:ADAUSDC,1.201800,1.298600,1.196130,1.267140,3892517.7,2.727909e-02,1.471924,1.072486,52.221574,-5.229173,9.576328,1.236505,1.199330,17.500000,21.358722
0,2021-04-21,BINANCE:ADAUSDT,1.203580,1.290000,1.197040,1.266650,404434984.0,2.732037e-02,1.470524,1.074456,52.278253,-7.904506,7.113809,1.236719,1.199059,17.500000,21.358722
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
108,2021-09-23,BINANCE:XRPBTC,0.000022,0.000023,0.000022,0.000023,35468753.0,-4.353549e-07,0.000026,0.000021,48.868132,-88.816861,2.111125,0.000024,0.000022,18.629999,4.943331
108,2021-09-23,BINANCE:XRPBUSD,1.002000,1.017200,0.968400,1.003600,38622295.0,-3.888470e-02,1.329823,0.858917,49.018979,-111.684122,15.852600,1.122710,1.032827,18.629999,4.943331
108,2021-09-23,BINANCE:XRPETH,0.000318,0.000329,0.000313,0.000326,4994810.0,-5.277261e-06,0.000343,0.000300,48.176868,-52.245069,4.883908,0.000329,0.000325,18.629999,4.943331
108,2021-09-23,BINANCE:XRPUSDC,1.000000,1.016400,0.968500,1.003400,5366165.0,-3.874903e-02,1.328080,0.859780,48.953416,-112.583238,15.593849,1.122350,1.032757,18.629999,4.943331


In [None]:
df_account_value.loc[0,'date']

'2021-04-21'

In [None]:
df_account_value.loc[len(df_account_value)-1,'date']

'2021-09-23'

<a id='6.2'></a>
## 7.2 BackTestPlot

In [None]:
print("==============Compare to DJIA===========")
%matplotlib inline
# S&P 500: ^GSPC
# Dow Jones Index: ^DJI
# NASDAQ 100: ^NDX
#backtest_plot(df_account_value, 
#             baseline_ticker = 'dija', 
#             baseline_start = df_account_value.loc[0,'date'],
#             baseline_end = df_account_value.loc[len(df_account_value)-1,'date'])



AttributeError: 'DataFrame' object has no attribute 'split'

# part7 : manual backtesting

In [43]:
trx_plot(trade, df_actions, list(processed_full['tic'].unique()) )


KeyError: 'transactions'

# tensorboard


In [34]:
model_name = 'recurrent_ppo'

In [62]:
from SaveResultsByTensorboard.tflogtodf import tflog2pandas
path="/mnt/f/financial_projects/Deep Reinforcement Learning Approaches on Stock Prediction/FinRL/Crypto_market/TENSORBOARD_LOG_DIR/ppo_1/events.out.tfevents.1660815830.mohammad-pc.257.0"
df=tflog2pandas(path)



adding training_metric to tensorboard

In [35]:
import torch
import torchvision
from torch.utils.tensorboard import SummaryWriter
import numpy as np
import os

In [36]:
def add_scalar(writer, informations):
    for i in range(len(informations)) :
        info = informations.loc[i]
        writer.add_scalars('train_info', {info['metric'] : info['value'] } , info['step'])

writer = SummaryWriter(os.path.join(TENSORBOARD_LOG_DIR, model_name))
for metric in df['metric'].unique() :
    train_info = df[df['metric']== metric]
    add_scalar(writer, train_info)
writer.close()






NameError: name 'df' is not defined

adding test time info to tensorboard

In [37]:
writer = SummaryWriter(os.path.join(TENSORBOARD_LOG_DIR, model_name))
for i in range(len(df_account_value)) :
    date, account_value = df_account_value.loc[i] 
    writer.add_scalars(main_tag='test', tag_scalar_dict = {'account_value': account_value}, global_step = i) 
writer.close()

adding portfolio visualization to tensorboard

In [38]:
import matplotlib.pyplot as plt

def visualize_portfolio(total, tickers):
    
    fig, ax = plt.subplots(figsize=(16,8))
    ax.set_facecolor('black')
    ax.figure.set_facecolor('#121212')
    ax.tick_params(axis='x', colors='white')
    ax.tick_params(axis='y', colors='white')
    ax.set_title("NEURALNINE PORTFOLIO VISUALIZER", color='#EF6C35', fontsize=20)

    patches, texts, autotexts = ax.pie([abs(total_element) for total_element in total], labels=tickers, autopct='%1.1f%%', pctdistance=0.8)
    [text.set_color('white') for text in texts]
    my_circle = plt.Circle((0, 0), 0.55, color='black')
    plt.gca().add_artist(my_circle)

    ax.text(-2,1, 'PORTFOLIO OVERVIEW:', fontsize=14, color="#ffe536", horizontalalignment='center', verticalalignment='center')
    ax.text(-2,0.85, f'Total USD Amount: {sum(total):.2f} $', fontsize=12, color="white", horizontalalignment='center', verticalalignment='center')
    counter = 0.15
    for ticker in tickers:
        ax.text(-2, 0.85 - counter, f'{ticker}: {total[tickers.index(ticker)]:.2f} $', fontsize=12, color="white",
        horizontalalignment='center', verticalalignment='center')
        counter += 0.15
    return fig

In [39]:
def add_image_to_tensorboard(specific_tag ,specific_figure):
    writer = SummaryWriter(os.path.join(TENSORBOARD_LOG_DIR, model_name, 'image'))
    writer.add_figure(tag= specific_tag, figure = specific_figure)
    print('done')
    writer.close()

In [40]:
df_actions.isna().sum()

BINANCE:ADABNB      0
BINANCE:ADABTC      0
BINANCE:ADAETH      0
BINANCE:ADAUSDC     0
BINANCE:ADAUSDT     0
BINANCE:BNBBTC      0
BINANCE:BNBBUSD     0
BINANCE:BNBETH      0
BINANCE:BNBUSDC     0
BINANCE:BNBUSDT     0
BINANCE:BTCBUSD     0
BINANCE:BTCUSDC     0
BINANCE:BTCUSDT     0
BINANCE:BUSDUSDT    0
BINANCE:ETHBTC      0
BINANCE:ETHBUSD     0
BINANCE:ETHUSDC     0
BINANCE:ETHUSDT     0
BINANCE:USDCUSDT    0
BINANCE:XRPBNB      0
BINANCE:XRPBTC      0
BINANCE:XRPBUSD     0
BINANCE:XRPETH      0
BINANCE:XRPUSDC     0
BINANCE:XRPUSDT     0
dtype: int64

In [41]:
def visualize_process(total_val) :
    if total_val == 0 :
        return 0 , 'white'
    elif total_val > 0:
        return 1 , 'green'
    elif total_val < 0 :
        return -1 , 'red'



In [44]:
import time
tickers = list(trade['tic'].unique())
dates = df_actions.index
step = 0
for date in dates :
    sell_total = []
    sell_position_tickers = []
    buy_total = []
    buy_position_tickers = []
    step += 1
    for tic in tickers :
        amounts = df_actions.loc[date, tic]
        choosen_trade = trade[trade['date'] == date] 
        choosen_trade = choosen_trade[choosen_trade['tic'] == tic]
        prices = choosen_trade['close']
        flag, color = visualize_process(int(amounts * prices))
        if flag == 1:
            buy_total.append(int(amounts * prices))
            buy_position_tickers.append(tic)
        elif flag == -1:
            sell_total.append(int(amounts * prices))
            sell_position_tickers.append(tic)

    if len(buy_total) != 0 :
        buy_fig = visualize_portfolio(total=buy_total, tickers= buy_position_tickers)
        add_image_to_tensorboard(f"buy_portfolio:{step}", buy_fig)

    if len(sell_total) != 0 :
        sell_fig = visualize_portfolio(total=sell_total, tickers= sell_position_tickers)
        add_image_to_tensorboard(f"sell_portfolio:{step}", sell_fig)

done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done


In [125]:
writer = SummaryWriter(os.path.join(TENSORBOARD_LOG_DIR, 'ppo', 'Graph'))
writer.add_graph(trained_ppo, e_trade_gym)

ImportError: cannot import name 'metanet_pb2' from partially initialized module 'caffe2.proto' (most likely due to a circular import) (/home/mohammad/anaconda3/envs/FinRl/lib/python3.8/site-packages/caffe2/proto/__init__.py)

In [122]:
!tensorboard --logdir==runs

TensorFlow installation not found - running with reduced feature set.

NOTE: Using experimental fast data loading logic. To disable, pass
    "--load_fast=false" and report issues on GitHub. More details:
    https://github.com/tensorflow/tensorboard/issues/4784

Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.9.1 at http://localhost:6007/ (Press CTRL+C to quit)
^C


In [133]:
model.action_space

Box([-1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1.
 -1. -1. -1. -1. -1. -1. -1. -1. -1. -1.], [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1.], (28,), float32)

In [135]:
len(processed_full['tic'].unique())

25