<a href="https://colab.research.google.com/github/AI4Finance-Foundation/FinRL/blob/master/tutorials/1-Introduction/FinRL_PortfolioAllocation_NeurIPS_2020.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Reinforcement Learning for Stock Trading from Scratch: Portfolio Allocation

Tutorials to use OpenAI DRL to perform portfolio allocation in one Jupyter Notebook | Presented at NeurIPS 2020: Deep RL Workshop

* This blog is based on our paper: FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance, presented at NeurIPS 2020: Deep RL Workshop.
* Check out medium blog for detailed explanations: 
* Please report any issues to our Github: https://github.com/AI4Finance-Foundation/FinRL/issues
* **Pytorch Version** 



# Content

* [1. Problem Definition](#0)
* [2. Getting Started - Load Python packages](#1)
    * [2.1. Install Packages](#1.1)    
    * [2.2. Check Additional Packages](#1.2)
    * [2.3. Import Packages](#1.3)
    * [2.4. Create Folders](#1.4)
* [3. Download Data](#2)
* [4. Preprocess Data](#3)        
    * [4.1. Technical Indicators](#3.1)
    * [4.2. Perform Feature Engineering](#3.2)
* [5.Build Environment](#4)  
    * [5.1. Training & Trade Data Split](#4.1)
    * [5.2. User-defined Environment](#4.2)   
    * [5.3. Initialize Environment](#4.3)    
* [6.Implement DRL Algorithms](#5)  
* [7.Backtesting Performance](#6)  
    * [7.1. BackTestStats](#6.1)
    * [7.2. BackTestPlot](#6.2)   
    * [7.3. Baseline Stats](#6.3)   
    * [7.3. Compare to Stock Market Index](#6.4)             

<a id='0'></a>
# Part 1. Problem Definition

This problem is to design an automated trading solution for portfolio alloacation. We model the stock trading process as a Markov Decision Process (MDP). We then formulate our trading goal as a maximization problem.

The algorithm is trained using Deep Reinforcement Learning (DRL) algorithms and the components of the reinforcement learning environment are:


* Action: The action space describes the allowed actions that the agent interacts with the
environment. Normally, a ∈ A represents the weight of a stock in the porfolio: a ∈ (-1,1). Assume our stock pool includes N stocks, we can use a list [a<sub>1</sub>, a<sub>2</sub>, ... , a<sub>N</sub>] to determine the weight for each stock in the porfotlio, where a<sub>i</sub> ∈ (-1,1), a<sub>1</sub>+ a<sub>2</sub>+...+a<sub>N</sub>=1. For example, "The weight of AAPL in the portfolio is 10%." is [0.1 , ...].

* Reward function: r(s, a, s′) is the incentive mechanism for an agent to learn a better action. The change of the portfolio value when action a is taken at state s and arriving at new state s',  i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio
values at state s′ and s, respectively

* State: The state space describes the observations that the agent receives from the environment. Just as a human trader needs to analyze various information before executing a trade, so
our trading agent observes many different features to better learn in an interactive environment.

* Environment: Dow 30 consituents


The data of the single stock that we will be using for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume.


<a id='1'></a>
# Part 2. Getting Started- Load Python Packages

<a id='1.1'></a>
## 2.1. Install all the packages through FinRL library


In [1]:
## install finrl library
#%pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git


<a id='1.2'></a>
## 2.2. Check if the additional packages needed are present, if not install them. 
* Yahoo Finance API
* pandas
* numpy
* matplotlib
* stockstats
* OpenAI gym
* stable-baselines
* tensorflow
* pyfolio

<a id='1.3'></a>
## 2.3. Import Packages

In [1]:
import talib as ta
from utils import process_future_data, dates_intersection, add_covariance, StockPortfolioEnv, create_features
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('Agg')
%matplotlib inline
import datetime

from finrl import config
from finrl import config_tickers
from finrl.finrl_meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.finrl_meta.preprocessor.preprocessors import FeatureEngineer, data_split
from finrl.finrl_meta.env_portfolio_allocation.env_portfolio import StockPortfolioEnv
from finrl.agents.stablebaselines3.models import DRLAgent
from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline,convert_daily_return_to_pyfolio_ts
from finrl.finrl_meta.data_processor import DataProcessor
from finrl.finrl_meta.data_processors.processor_yahoofinance import YahooFinanceProcessor
import sys
sys.path.append("../FinRL-Library")

  'Module "zipline.assets" not found; multipliers will not be applied'


<a id='1.4'></a>
## 2.4. Create Folders

In [2]:
import os
if not os.path.exists("./" + config.DATA_SAVE_DIR):
    os.makedirs("./" + config.DATA_SAVE_DIR)
if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
    os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
    os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
if not os.path.exists("./" + config.RESULTS_DIR):
    os.makedirs("./" + config.RESULTS_DIR)

<a id='2'></a>
# Part 3. Download Data
Yahoo Finance is a website that provides stock data, financial news, financial reports, etc. All the data provided by Yahoo Finance is free.
* FinRL uses a class **YahooDownloader** to fetch data from Yahoo Finance API
* Call Limit: Using the Public API (without authentication), you are limited to 2,000 requests per hour per IP (or up to a total of 48,000 requests a day).


In [3]:
arrz = process_future_data('data/ARROZ.csv')
bg   = process_future_data('data/BOI_GORDO.csv')
cf   = process_future_data('data/CAFE.csv')
eth  = process_future_data('data/ETANOL.csv')
mil  = process_future_data('data/MILHO.csv')
mf   = process_future_data('data/MINERIO_FERRO.csv')
gold = process_future_data('data/OURO.csv')
petr = process_future_data('data/PETROLEO.csv')
soj  = process_future_data('data/SOJA.csv')
trg  = process_future_data('data/TRIGO.csv')

In [4]:
cmds_f =pd.concat([arrz,bg,cf,eth,mil,mf,gold,petr,soj,trg],axis=0) 

In [5]:
cmds_f['tic'].unique().tolist()

['ZR', 'LE', 'KC', 'FL', 'ZC', 'TR', 'GC', 'CB', 'ZS', 'ZW']

In [6]:
cmds_f_l = cmds_f["tic"].unique().tolist()
stocks_br  = ["PETR4.SA", "VALE3.SA", "ITUB4.SA", "MGLU3.SA", "BBAS3.SA", "BBDC4.SA","B3SA3.SA", "PETR3.SA", "RENT3.SA", "ELET3.SA" ]
stocks_usa = ["META", "AAPL", "AMZN", "F", "T", "BAC", "GOOGL", "MSFT", "INTC", "CMCSA"]
stocks_eur = ["iSP.MI", "ENEL.MI", "SAN.MC", "INGA.AS", "ENI.MI", "BBVA.MC", "IBE.MC", "CS.PA", "STLA.MI","DTE.DE"]
stocks_chn = ["601899.SS","600010.SS","600795.SS", "603993.SS", "600157.SS", "601288.SS", "600050.SS", "601398.SS", "600537.SS","600777.SS"]

In [7]:
ativos = list(set().union(stocks_br,stocks_usa,stocks_eur,stocks_chn))

In [8]:
len(ativos)

40

In [9]:
#print(config_tickers.DOW_30_TICKER)

In [10]:
dp = YahooFinanceProcessor()
df = dp.download_data(start_date = '2004-01-01',
                     end_date = '2022-11-07',
                     ticker_list = ativos, time_interval='1D')
ativos = ativos = list(set().union(stocks_br,stocks_usa,stocks_eur,stocks_chn,cmds_f_l))

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

In [11]:
df = pd.concat([df,cmds_f],axis=0)
df['date']= pd.to_datetime(df['date'])
df = df.sort_values(by='date')

# Part 4: Preprocess Data
Data preprocessing is a crucial step for training a high quality machine learning model. We need to check for missing data and do feature engineering in order to convert the data into a model-ready state.
* Add technical indicators. In practical trading, various information needs to be taken into account, for example the historical stock prices, current holding shares, technical indicators, etc. In this article, we demonstrate two trend-following technical indicators: MACD and RSI.
* Add turbulence index. Risk-aversion reflects whether an investor will choose to preserve the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the financial turbulence index that measures extreme asset price fluctuation.

## Add covariance matrix as states

In [13]:
df = add_covariance(df)

## Adding Features

In [14]:
df = create_features(df)

In [16]:
dates_f3 = dates_intersection(df)
print(df.shape)
df=df[df['date'].isin(dates_f3)]
print(df.shape)

(223396, 21)
(110050, 21)


In [21]:
#df['tic'].value_counts()

In [20]:
df.shape

(110050, 21)

<a id='4'></a>
# Part 5. Design Environment
Considering the stochastic and interactive nature of the automated stock trading tasks, a financial task is modeled as a **Markov Decision Process (MDP)** problem. The training process involves observing stock price change, taking an action and reward's calculation to have the agent adjusting its strategy accordingly. By interacting with the environment, the trading agent will derive a trading strategy with the maximized rewards as time proceeds.

Our trading environments, based on OpenAI Gym framework, simulate live stock markets with real market data according to the principle of time-driven simulation.


In [25]:
print(df.date.head(1))
print(df.date.tail(1))

103420   2013-01-04
Name: date, dtype: datetime64[ns]
226101   2022-10-27
Name: date, dtype: datetime64[ns]


## Training data split: 2009-01-01 to 2020-07-01

In [35]:
train = data_split(df, '2013-01-04','2018-01-01')
trade = data_split(df,'2018-01-02', '2022-10-27')

In [36]:
train.head(1)

Unnamed: 0,date,open,high,low,close,adjcp,volume,tic,day,cov_list,...,RSI,slowk,slowd,WILLR,MACD,ROC,OBV,lag_20,lag_40,lag_60
0,2013-01-04,1.975,1.985714,1.928571,1.942857,1.899315,214400818.0,600010.SS,4,"[[0.0007838344569130899, 8.932580553651602e-05...",...,59.497655,53.365166,49.616944,-23.863621,0.03241,3.03031,69183100000.0,0.182609,-0.003663,0.04817


In [37]:
train.tail(1)

Unnamed: 0,date,open,high,low,close,adjcp,volume,tic,day,cov_list,...,RSI,slowk,slowd,WILLR,MACD,ROC,OBV,lag_20,lag_40,lag_60
1127,2017-12-29,2.794,2.796,2.754,2.77,1.723731,73906621.0,iSP.MI,4,"[[0.0002780890201951984, 4.7102876631634787e-0...",...,42.279556,34.823964,59.120869,-87.368462,-0.007944,-3.551532,6916249000.0,-0.01773,-0.037526,-0.055896


## Environment for Portfolio Allocation
##### Got from utils.py 

In [40]:
stock_dimension = len(train.tic.unique())
state_space = stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")


Stock Dimension: 50, State Space: 50


In [41]:
config.INDICATORS

['macd',
 'boll_ub',
 'boll_lb',
 'rsi_30',
 'cci_30',
 'dx_30',
 'close_30_sma',
 'close_60_sma']

In [42]:
features =['RSI', 'slowk', 'slowd', 'WILLR', 'MACD','ROC', 'OBV', 'lag_20', 'lag_40', 'lag_60']

In [43]:
env_kwargs = {
    "hmax": 100, 
    "initial_amount": 1000000, 
    "transaction_cost_pct": 0.001, 
    "state_space": state_space, 
    "stock_dim": stock_dimension, 
    "tech_indicator_list": features, 
    "action_space": stock_dimension, 
    "reward_scaling": 1e-4
    
}

e_train_gym = StockPortfolioEnv(df = train, **env_kwargs)

In [44]:
state_space

50

In [45]:
env_train, _ = e_train_gym.get_sb_env()
print(type(env_train))

<class 'stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv'>


# <a id='5'></a>
# Part 6: Implement DRL Algorithms
* The implementation of the DRL algorithms are based on **OpenAI Baselines** and **Stable Baselines**. Stable Baselines is a fork of OpenAI Baselines, with a major structural refactoring, and code cleanups.
* FinRL library includes fine-tuned standard DRL algorithms, such as DQN, DDPG,
Multi-Agent DDPG, PPO, SAC, A2C and TD3. We also allow users to
design their own DRL algorithms by adapting these DRL algorithms.

In [46]:
# initialize
agent = DRLAgent(env = env_train)

### Model 1: **A2C**


In [47]:
agent = DRLAgent(env = env_train)

A2C_PARAMS = {"n_steps": 5, "ent_coef": 0.005, "learning_rate": 0.0002}
model_a2c = agent.get_model(model_name="a2c",model_kwargs = A2C_PARAMS)

{'n_steps': 5, 'ent_coef': 0.005, 'learning_rate': 0.0002}
Using cpu device


In [48]:
trained_a2c = agent.train_model(model=model_a2c, 
                                tb_log_name='a2c',
                                total_timesteps=50000)

-------------------------------------
| time/                 |           |
|    fps                | 229       |
|    iterations         | 100       |
|    time_elapsed       | 2         |
|    total_timesteps    | 500       |
| train/                |           |
|    entropy_loss       | -70.8     |
|    explained_variance | 0         |
|    learning_rate      | 0.0002    |
|    n_updates          | 99        |
|    policy_loss        | 2.79e+08  |
|    reward             | 1303658.8 |
|    std                | 0.998     |
|    value_loss         | 1.8e+13   |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 236       |
|    iterations         | 200       |
|    time_elapsed       | 4         |
|    total_timesteps    | 1000      |
| train/                |           |
|    entropy_loss       | -70.8     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0002    |
|    n_updat

begin_total_asset:1000000
end_total_asset:2153482.8828611607
Sharpe:  1.3015280000300582
-------------------------------------
| time/                 |           |
|    fps                | 207       |
|    iterations         | 1400      |
|    time_elapsed       | 33        |
|    total_timesteps    | 7000      |
| train/                |           |
|    entropy_loss       | -70.7     |
|    explained_variance | 0         |
|    learning_rate      | 0.0002    |
|    n_updates          | 1399      |
|    policy_loss        | 2.28e+08  |
|    reward             | 1037495.6 |
|    std                | 0.995     |
|    value_loss         | 1.16e+13  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 208       |
|    iterations         | 1500      |
|    time_elapsed       | 36        |
|    total_timesteps    | 7500      |
| train/                |           |
|    entropy_loss       | -70.7     |

-------------------------------------
| time/                 |           |
|    fps                | 201       |
|    iterations         | 2600      |
|    time_elapsed       | 64        |
|    total_timesteps    | 13000     |
| train/                |           |
|    entropy_loss       | -70.5     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0002    |
|    n_updates          | 2599      |
|    policy_loss        | 2.66e+08  |
|    reward             | 1263524.9 |
|    std                | 0.992     |
|    value_loss         | 1.83e+13  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 202       |
|    iterations         | 2700      |
|    time_elapsed       | 66        |
|    total_timesteps    | 13500     |
| train/                |           |
|    entropy_loss       | -70.5     |
|    explained_variance | 0         |
|    learning_rate      | 0.0002    |
|    n_updat

begin_total_asset:1000000
end_total_asset:1971454.68350671
Sharpe:  1.150317405846422
-------------------------------------
| time/                 |           |
|    fps                | 196       |
|    iterations         | 3900      |
|    time_elapsed       | 99        |
|    total_timesteps    | 19500     |
| train/                |           |
|    entropy_loss       | -70.4     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0002    |
|    n_updates          | 3899      |
|    policy_loss        | 2.18e+08  |
|    reward             | 1085754.4 |
|    std                | 0.989     |
|    value_loss         | 1.27e+13  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 196       |
|    iterations         | 4000      |
|    time_elapsed       | 101       |
|    total_timesteps    | 20000     |
| train/                |           |
|    entropy_loss       | -70.4     |
| 

-------------------------------------
| time/                 |           |
|    fps                | 205       |
|    iterations         | 5100      |
|    time_elapsed       | 124       |
|    total_timesteps    | 25500     |
| train/                |           |
|    entropy_loss       | -70.2     |
|    explained_variance | 0         |
|    learning_rate      | 0.0002    |
|    n_updates          | 5099      |
|    policy_loss        | 2.4e+08   |
|    reward             | 1058120.2 |
|    std                | 0.986     |
|    value_loss         | 1.33e+13  |
-------------------------------------
begin_total_asset:1000000
end_total_asset:1825488.1958972972
Sharpe:  1.0256685393718505
------------------------------------
| time/                 |          |
|    fps                | 206      |
|    iterations         | 5200     |
|    time_elapsed       | 126      |
|    total_timesteps    | 26000    |
| train/                |          |
|    entropy_loss       | -70.2    |
|    ex

begin_total_asset:1000000
end_total_asset:1781088.034064995
Sharpe:  0.958603988257336
-------------------------------------
| time/                 |           |
|    fps                | 212       |
|    iterations         | 6400      |
|    time_elapsed       | 150       |
|    total_timesteps    | 32000     |
| train/                |           |
|    entropy_loss       | -70       |
|    explained_variance | 1.19e-07  |
|    learning_rate      | 0.0002    |
|    n_updates          | 6399      |
|    policy_loss        | 2.47e+08  |
|    reward             | 1176368.9 |
|    std                | 0.982     |
|    value_loss         | 1.5e+13   |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 213       |
|    iterations         | 6500      |
|    time_elapsed       | 152       |
|    total_timesteps    | 32500     |
| train/                |           |
|    entropy_loss       | -70       |
|

-------------------------------------
| time/                 |           |
|    fps                | 217       |
|    iterations         | 7600      |
|    time_elapsed       | 174       |
|    total_timesteps    | 38000     |
| train/                |           |
|    entropy_loss       | -69.9     |
|    explained_variance | 1.79e-07  |
|    learning_rate      | 0.0002    |
|    n_updates          | 7599      |
|    policy_loss        | 2.88e+08  |
|    reward             | 1323974.6 |
|    std                | 0.979     |
|    value_loss         | 1.87e+13  |
-------------------------------------
begin_total_asset:1000000
end_total_asset:1916169.881919747
Sharpe:  1.0942982800727572
-------------------------------------
| time/                 |           |
|    fps                | 218       |
|    iterations         | 7700      |
|    time_elapsed       | 176       |
|    total_timesteps    | 38500     |
| train/                |           |
|    entropy_loss       | -69.9     |


-------------------------------------
| time/                 |           |
|    fps                | 221       |
|    iterations         | 8900      |
|    time_elapsed       | 200       |
|    total_timesteps    | 44500     |
| train/                |           |
|    entropy_loss       | -69.7     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0002    |
|    n_updates          | 8899      |
|    policy_loss        | 2.83e+08  |
|    reward             | 1294108.9 |
|    std                | 0.976     |
|    value_loss         | 1.69e+13  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 222       |
|    iterations         | 9000      |
|    time_elapsed       | 202       |
|    total_timesteps    | 45000     |
| train/                |           |
|    entropy_loss       | -69.7     |
|    explained_variance | 0         |
|    learning_rate      | 0.0002    |
|    n_updat

In [51]:
trained_a2c.save('trained_models/trained_a2c.zip')

### Model 2: **PPO**


In [52]:
agent = DRLAgent(env = env_train)
PPO_PARAMS = {
    "n_steps": 2048,
    "ent_coef": 0.005,
    "learning_rate": 0.0001,
    "batch_size": 128,
}
model_ppo = agent.get_model("ppo",model_kwargs = PPO_PARAMS)

{'n_steps': 2048, 'ent_coef': 0.005, 'learning_rate': 0.0001, 'batch_size': 128}
Using cpu device


In [53]:
trained_ppo = agent.train_model(model=model_ppo, 
                             tb_log_name='ppo',
                             total_timesteps=80000)

begin_total_asset:1000000
end_total_asset:1717287.5065464734
Sharpe:  0.9340859152387432
----------------------------------
| time/              |           |
|    fps             | 370       |
|    iterations      | 1         |
|    time_elapsed    | 5         |
|    total_timesteps | 2048      |
| train/             |           |
|    reward          | 1579130.4 |
----------------------------------
begin_total_asset:1000000
end_total_asset:1870491.5513849333
Sharpe:  1.075526135645024
begin_total_asset:1000000
end_total_asset:1833199.0750782813
Sharpe:  1.0457958856959144
---------------------------------------
| time/                   |           |
|    fps                  | 324       |
|    iterations           | 2         |
|    time_elapsed         | 12        |
|    total_timesteps      | 4096      |
| train/                  |           |
|    approx_kl            | 0.0       |
|    clip_fraction        | 0         |
|    clip_range           | 0.2       |
|    entropy_loss  

begin_total_asset:1000000
end_total_asset:1781424.407846242
Sharpe:  0.9968086850565152
---------------------------------------
| time/                   |           |
|    fps                  | 301       |
|    iterations           | 9         |
|    time_elapsed         | 61        |
|    total_timesteps      | 18432     |
| train/                  |           |
|    approx_kl            | 0.0       |
|    clip_fraction        | 0         |
|    clip_range           | 0.2       |
|    entropy_loss         | -70.9     |
|    explained_variance   | 0         |
|    learning_rate        | 0.0001    |
|    loss                 | 2.37e+14  |
|    n_updates            | 80        |
|    policy_gradient_loss | -2.08e-06 |
|    reward               | 1131934.9 |
|    std                  | 1         |
|    value_loss           | 4.68e+14  |
---------------------------------------
begin_total_asset:1000000
end_total_asset:1827049.5291141437
Sharpe:  1.0309835211115124
begin_total_asset:10000

begin_total_asset:1000000
end_total_asset:1871317.3149154296
Sharpe:  1.0763056094842542
---------------------------------------
| time/                   |           |
|    fps                  | 305       |
|    iterations           | 17        |
|    time_elapsed         | 114       |
|    total_timesteps      | 34816     |
| train/                  |           |
|    approx_kl            | 0.0       |
|    clip_fraction        | 0         |
|    clip_range           | 0.2       |
|    entropy_loss         | -70.9     |
|    explained_variance   | 1.19e-07  |
|    learning_rate        | 0.0001    |
|    loss                 | 2.26e+14  |
|    n_updates            | 160       |
|    policy_gradient_loss | -3.03e-06 |
|    reward               | 1641280.4 |
|    std                  | 1         |
|    value_loss           | 4.67e+14  |
---------------------------------------
begin_total_asset:1000000
end_total_asset:1845972.7199464387
Sharpe:  1.0510562904828071
begin_total_asset:1000

begin_total_asset:1000000
end_total_asset:1951416.2481146294
Sharpe:  1.1405973015405837
begin_total_asset:1000000
end_total_asset:1716500.099729747
Sharpe:  0.9304470943289785
---------------------------------------
| time/                   |           |
|    fps                  | 306       |
|    iterations           | 25        |
|    time_elapsed         | 167       |
|    total_timesteps      | 51200     |
| train/                  |           |
|    approx_kl            | 0.0       |
|    clip_fraction        | 0         |
|    clip_range           | 0.2       |
|    entropy_loss         | -70.9     |
|    explained_variance   | 0         |
|    learning_rate        | 0.0001    |
|    loss                 | 2.46e+14  |
|    n_updates            | 240       |
|    policy_gradient_loss | -2.14e-06 |
|    reward               | 1157090.4 |
|    std                  | 1         |
|    value_loss           | 4.64e+14  |
---------------------------------------
begin_total_asset:10000

begin_total_asset:1000000
end_total_asset:1852204.3123942674
Sharpe:  1.0535045268392096
---------------------------------------
| time/                   |           |
|    fps                  | 308       |
|    iterations           | 33        |
|    time_elapsed         | 219       |
|    total_timesteps      | 67584     |
| train/                  |           |
|    approx_kl            | 0.0       |
|    clip_fraction        | 0         |
|    clip_range           | 0.2       |
|    entropy_loss         | -70.9     |
|    explained_variance   | -1.19e-07 |
|    learning_rate        | 0.0001    |
|    loss                 | 2.26e+14  |
|    n_updates            | 320       |
|    policy_gradient_loss | -1.69e-06 |
|    reward               | 1644545.8 |
|    std                  | 1         |
|    value_loss           | 4.44e+14  |
---------------------------------------
begin_total_asset:1000000
end_total_asset:1751561.3109320835
Sharpe:  0.9704898275713983
begin_total_asset:1000

In [54]:
trained_ppo.save('trained_models/trained_ppo.zip')

### Model 3: **DDPG**


In [55]:
agent = DRLAgent(env = env_train)
DDPG_PARAMS = {"batch_size": 128, "buffer_size": 50000, "learning_rate": 0.001}


model_ddpg = agent.get_model("ddpg",model_kwargs = DDPG_PARAMS)

{'batch_size': 128, 'buffer_size': 50000, 'learning_rate': 0.001}
Using cpu device


In [56]:
trained_ddpg = agent.train_model(model=model_ddpg, 
                             tb_log_name='ddpg',
                             total_timesteps=50000)

begin_total_asset:1000000
end_total_asset:1707727.7449840084
Sharpe:  0.956321070837711
begin_total_asset:1000000
end_total_asset:1715006.8235203333
Sharpe:  0.9669053723288912
begin_total_asset:1000000
end_total_asset:1715006.8235203333
Sharpe:  0.9669053723288912
begin_total_asset:1000000
end_total_asset:1715006.8235203333
Sharpe:  0.9669053723288912
----------------------------------
| time/              |           |
|    episodes        | 4         |
|    fps             | 17        |
|    time_elapsed    | 261       |
|    total_timesteps | 4512      |
| train/             |           |
|    actor_loss      | 8.32e+07  |
|    critic_loss     | 1.18e+13  |
|    learning_rate   | 0.001     |
|    n_updates       | 3384      |
|    reward          | 1715006.9 |
----------------------------------
begin_total_asset:1000000
end_total_asset:1715006.8235203333
Sharpe:  0.9669053723288912
begin_total_asset:1000000
end_total_asset:1715006.8235203333
Sharpe:  0.9669053723288912
begin_total_

begin_total_asset:1000000
end_total_asset:1715006.8235203333
Sharpe:  0.9669053723288912
begin_total_asset:1000000
end_total_asset:1715006.8235203333
Sharpe:  0.9669053723288912
begin_total_asset:1000000
end_total_asset:1715006.8235203333
Sharpe:  0.9669053723288912
begin_total_asset:1000000
end_total_asset:1715006.8235203333
Sharpe:  0.9669053723288912
----------------------------------
| time/              |           |
|    episodes        | 36        |
|    fps             | 13        |
|    time_elapsed    | 3082      |
|    total_timesteps | 40608     |
| train/             |           |
|    actor_loss      | -8.8e+07  |
|    critic_loss     | 5.15e+12  |
|    learning_rate   | 0.001     |
|    n_updates       | 39480     |
|    reward          | 1715006.9 |
----------------------------------
begin_total_asset:1000000
end_total_asset:1715006.8235203333
Sharpe:  0.9669053723288912
begin_total_asset:1000000
end_total_asset:1715006.8235203333
Sharpe:  0.9669053723288912
begin_total

In [57]:
trained_ddpg.save('trained_models/trained_ddpg.zip')

### Model 4: **SAC**


In [58]:
agent = DRLAgent(env = env_train)
SAC_PARAMS = {
    "batch_size": 128,
    "buffer_size": 100000,
    "learning_rate": 0.0003,
    "learning_starts": 100,
    "ent_coef": "auto_0.1",
}

model_sac = agent.get_model("sac",model_kwargs = SAC_PARAMS)

{'batch_size': 128, 'buffer_size': 100000, 'learning_rate': 0.0003, 'learning_starts': 100, 'ent_coef': 'auto_0.1'}
Using cpu device


In [59]:
trained_sac = agent.train_model(model=model_sac, 
                             tb_log_name='sac',
                             total_timesteps=50000)

begin_total_asset:1000000
end_total_asset:1687810.1175539356
Sharpe:  0.8990522625891147
begin_total_asset:1000000
end_total_asset:1714105.8015329551
Sharpe:  0.9198465332896087
begin_total_asset:1000000
end_total_asset:1714105.8015329551
Sharpe:  0.9198465332896087
begin_total_asset:1000000
end_total_asset:1714105.8015329551
Sharpe:  0.9198465332896087
----------------------------------
| time/              |           |
|    episodes        | 4         |
|    fps             | 12        |
|    time_elapsed    | 347       |
|    total_timesteps | 4512      |
| train/             |           |
|    actor_loss      | 3.34e+08  |
|    critic_loss     | 1.84e+13  |
|    ent_coef        | 0.376     |
|    ent_coef_loss   | 808       |
|    learning_rate   | 0.0003    |
|    n_updates       | 4411      |
|    reward          | 1714105.8 |
----------------------------------
begin_total_asset:1000000
end_total_asset:1714105.8015329551
Sharpe:  0.9198465332896087
begin_total_asset:1000000
end_

begin_total_asset:1000000
end_total_asset:1714105.8015329551
Sharpe:  0.9198465332896087
begin_total_asset:1000000
end_total_asset:1714105.8015329551
Sharpe:  0.9198465332896087
begin_total_asset:1000000
end_total_asset:1714105.8015329551
Sharpe:  0.9198465332896087
----------------------------------
| time/              |           |
|    episodes        | 32        |
|    fps             | 12        |
|    time_elapsed    | 2939      |
|    total_timesteps | 36096     |
| train/             |           |
|    actor_loss      | 4.37e+07  |
|    critic_loss     | 2.74e+12  |
|    ent_coef        | 4.9e+03   |
|    ent_coef_loss   | -7.02e+03 |
|    learning_rate   | 0.0003    |
|    n_updates       | 35995     |
|    reward          | 1714105.8 |
----------------------------------
begin_total_asset:1000000
end_total_asset:1714105.8015329551
Sharpe:  0.9198465332896087
begin_total_asset:1000000
end_total_asset:1714105.8015329551
Sharpe:  0.9198465332896087
begin_total_asset:1000000
end_

In [60]:
trained_sac.save('trained_models/trained_sac.zip')

### Model 5: **TD3**


In [61]:
agent = DRLAgent(env = env_train)
TD3_PARAMS = {"batch_size": 100, 
              "buffer_size": 1000000, 
              "learning_rate": 0.001}

model_td3 = agent.get_model("td3",model_kwargs = TD3_PARAMS)

{'batch_size': 100, 'buffer_size': 1000000, 'learning_rate': 0.001}
Using cpu device


  "This system does not have apparently enough memory to store the complete "


In [62]:
trained_td3 = agent.train_model(model=model_td3, 
                             tb_log_name='td3',
                             total_timesteps=30000)

begin_total_asset:1000000
end_total_asset:1772199.7858085996
Sharpe:  1.0130632929593688
begin_total_asset:1000000
end_total_asset:1763421.9123593213
Sharpe:  1.0031139377505809
begin_total_asset:1000000
end_total_asset:1763421.9123593213
Sharpe:  1.0031139377505809
begin_total_asset:1000000
end_total_asset:1763421.9123593213
Sharpe:  1.0031139377505809
----------------------------------
| time/              |           |
|    episodes        | 4         |
|    fps             | 17        |
|    time_elapsed    | 254       |
|    total_timesteps | 4512      |
| train/             |           |
|    actor_loss      | 1.97e+08  |
|    critic_loss     | 1.12e+14  |
|    learning_rate   | 0.001     |
|    n_updates       | 3384      |
|    reward          | 1763421.9 |
----------------------------------
begin_total_asset:1000000
end_total_asset:1763421.9123593213
Sharpe:  1.0031139377505809
begin_total_asset:1000000
end_total_asset:1763421.9123593213
Sharpe:  1.0031139377505809
begin_total

In [63]:
trained_td3.save('trained_models/trained_td3.zip')

## Trading
Assume that we have $1,000,000 initial capital at 2019-01-01. We use the A2C model to trade Dow jones 30 stocks.

In [64]:
df.tail()

Unnamed: 0,date,open,high,low,close,adjcp,volume,tic,day,cov_list,...,RSI,slowk,slowd,WILLR,MACD,ROC,OBV,lag_20,lag_40,lag_60
226137,2022-10-27,32.810001,33.540001,32.669998,32.959999,28.838276,111008800.0,PETR4.SA,3,"[[0.0006241287251678888, 9.547497088755112e-05...",...,48.877619,11.663715,21.048627,-83.667188,0.602634,-2.887447,545282600.0,0.126068,0.016343,-0.025717
226124,2022-10-27,2.325,2.325,2.325,2.325,2.325,0.0,FL,3,"[[0.0006241287251678888, 9.547497088755112e-05...",...,38.921043,4.35493,13.115796,-88.571429,-0.028812,-3.526971,685807.0,-0.00641,-0.073705,-0.04321
226138,2022-10-27,64.0,67.870003,63.860001,66.529999,66.529999,7893400.0,RENT3.SA,3,"[[0.0006241287251678888, 9.547497088755112e-05...",...,53.614327,24.679698,27.408235,-47.492652,0.844189,0.925365,1792504000.0,0.098604,0.069098,0.158454
226141,2022-10-27,18.35,18.35,17.99,18.030001,18.030001,53975500.0,T,3,"[[0.0006241287251678888, 9.547497088755112e-05...",...,70.517636,88.999256,89.503388,-8.226213,0.362687,19.246037,924117400.0,0.160979,0.027936,-0.018508
226101,2022-10-27,1.64,1.65,1.62,1.62,1.62,250843740.0,600157.SS,3,"[[0.0006241287251678888, 9.547497088755112e-05...",...,42.860273,50.0,43.518507,-70.000036,-0.014977,-2.409636,30937030000.0,-0.02994,-0.084746,-0.068966


In [65]:
e_trade_gym = StockPortfolioEnv(df = trade, **env_kwargs)

In [66]:
trade.shape

(53600, 21)

In [67]:
#trained_sac
#trained_ddpg
#trained_ppo
#trained_a2c
#trained_td3

df_daily_return, df_actions = DRLAgent.DRL_prediction(model=trained_sac,
                        environment = e_trade_gym)

df_daily_return2, df_actions2 = DRLAgent.DRL_prediction(model=trained_ddpg,
                        environment = e_trade_gym)

df_daily_return3, df_actions3 = DRLAgent.DRL_prediction(model=trained_ppo,
                        environment = e_trade_gym)

df_daily_return4, df_actions4 = DRLAgent.DRL_prediction(model=trained_a2c,
                        environment = e_trade_gym)

df_daily_return5, df_actions5 = DRLAgent.DRL_prediction(model=trained_td3,
                        environment = e_trade_gym)

begin_total_asset:1000000
end_total_asset:1389457.353315982
Sharpe:  0.5365195694411835
hit end!
begin_total_asset:1000000
end_total_asset:1527355.9604429563
Sharpe:  0.6741866138070857
hit end!
begin_total_asset:1000000
end_total_asset:1501072.662677848
Sharpe:  0.6426789058667638
hit end!
begin_total_asset:1000000
end_total_asset:1486022.1291474588
Sharpe:  0.6191157130831528
hit end!
begin_total_asset:1000000
end_total_asset:1536771.3007145352
Sharpe:  0.6492809815756594
hit end!


In [68]:
df_daily_return.to_csv('df_daily_return.csv')
df_daily_return.to_csv('df_daily_return2.csv')
df_daily_return.to_csv('df_daily_return3.csv')
df_daily_return.to_csv('df_daily_return4.csv')
df_daily_return.to_csv('df_daily_return5.csv')

In [69]:
df_actions.to_csv('df_actions.csv')
df_actions.to_csv('df_actions2.csv')
df_actions.to_csv('df_actions3.csv')
df_actions.to_csv('df_actions4.csv')
df_actions.to_csv('df_actions5.csv')

<a id='6'></a>
# Part 7: Backtest Our Strategy
Backtesting plays a key role in evaluating the performance of a trading strategy. Automated backtesting tool is preferred because it reduces the human error. We usually use the Quantopian pyfolio package to backtest our trading strategies. It is easy to use and consists of various individual plots that provide a comprehensive image of the performance of a trading strategy.

<a id='6.1'></a>
## 7.1 BackTestStats
pass in df_account_value, this information is stored in env class


In [70]:
from pyfolio import timeseries
DRL_strat = convert_daily_return_to_pyfolio_ts(df_daily_return5)
perf_func = timeseries.perf_stats 
perf_stats_all = perf_func( returns=DRL_strat, 
                              factor_returns=DRL_strat, 
                                positions=None, transactions=None, turnover_denom="AGB")

In [71]:
print("==============DRL Strategy Stats===========")
perf_stats_all



Annual return           0.106285
Cumulative returns      0.536771
Annual volatility       0.181039
Sharpe ratio            0.649281
Calmar ratio            0.320314
Stability               0.818021
Max drawdown           -0.331815
Omega ratio             1.134494
Sortino ratio           0.886354
Skew                   -1.030890
Kurtosis               19.573074
Tail ratio              1.003096
Daily value at risk    -0.022342
Alpha                   0.000000
Beta                    1.000000
dtype: float64

In [None]:
#baseline stats
print("==============Get Baseline Stats===========")
baseline_df = get_baseline(
        ticker="^DJI", 
        start = df_daily_return.loc[0,'date'],
        end = df_daily_return.loc[len(df_daily_return)-1,'date'])

stats = backtest_stats(baseline_df, value_col_name = 'close')

In [None]:
#dp = YahooFinanceProcessor()
#df_comp = dp.download_data(start_date = '2008-01-01',
#                     end_date = '2021-10-31',
#                     ticker_list = ativos, time_interval='1D')
#
#dates1= df_comp.query('tic == "AAPL"').date.tolist()
#dates2= df_comp.query('tic == "PETR3.SA"').date.tolist()
#dates_final=list(set(dates1).intersection(dates2))
#print(len(dates1),len(dates2))
#print(len(dates_final))
#
#print(df_comp.shape)
#df_comp=df_comp[df_comp['date'].isin(dates_final)]
#print(df_comp.shape)
#df_comp = data_split(df_comp,'2020-07-01', '2021-10-31')

In [None]:
#df_comp=df_comp[['date','close','tic']]
df_comp = df[['date','close','tic']].tail(38916)
df_comp.set_index('date',inplace=True)
res = df_comp.pivot(columns='tic', values='close')

# Asset weights
wts = [0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1
      ,0.1,0.1,0.1,0.1,0.1,0.1]


ret_data = res.pct_change()

weighted_returns = (wts * ret_data)
weighted_returns.index= pd.to_datetime(weighted_returns.index)
weighted_returns

In [None]:
ptf_return= weighted_returns.sum(axis=1,skipna=False)
ptf_return.name= 'Portfolio Returns'
ptf_return.index=DRL_strat.index
ptf_return

<a id='6.2'></a>
## 7.2 BackTestPlot

In [None]:
import pyfolio
%matplotlib inline

baseline_df = get_baseline(
        ticker='^DJI', start=df_daily_return.loc[0,'date'], end='2022-11-04'
    )

baseline_returns = get_daily_return(baseline_df, value_col_name="close")

with pyfolio.plotting.plotting_context(font_scale=1.1):
        pyfolio.create_full_tear_sheet(returns = DRL_strat,
                                       benchmark_rets=baseline_returns, set_context=False)

In [None]:
with pyfolio.plotting.plotting_context(font_scale=1.1):
        pyfolio.create_full_tear_sheet(returns = DRL_strat,
                                       benchmark_rets=ptf_return, set_context=False)

## Min-Variance Portfolio Allocation

In [None]:
#%pip install PyPortfolioOpt

In [None]:
from pypfopt.efficient_frontier import EfficientFrontier
from pypfopt import risk_models

In [None]:
unique_tic = trade.tic.unique()
unique_trade_date = trade.date.unique()

In [None]:
df.head()

In [None]:
#calculate_portfolio_minimum_variance
portfolio = pd.DataFrame(index = range(1), columns = unique_trade_date)
initial_capital = 1000000
portfolio.loc[0,unique_trade_date[0]] = initial_capital

for i in range(len( unique_trade_date)-1):
    df_temp = df[df.date==unique_trade_date[i]].reset_index(drop=True)
    df_temp_next = df[df.date==unique_trade_date[i+1]].reset_index(drop=True)
    #Sigma = risk_models.sample_cov(df_temp.return_list[0])
    #calculate covariance matrix
    Sigma = df_temp.return_list[0].cov()
    #portfolio allocation
    ef_min_var = EfficientFrontier(None, Sigma,weight_bounds=(0, 0.1))
    #minimum variance
    raw_weights_min_var = ef_min_var.min_volatility()
    #get weights
    cleaned_weights_min_var = ef_min_var.clean_weights()
    
    #current capital
    cap = portfolio.iloc[0, i]
    #current cash invested for each stock
    current_cash = [element * cap for element in list(cleaned_weights_min_var.values())]
    # current held shares
    current_shares = list(np.array(current_cash)
                                      / np.array(df_temp.close))
    # next time period price
    next_price = np.array(df_temp_next.close)
    ##next_price * current share to calculate next total account value 
    portfolio.iloc[0, i+1] = np.dot(current_shares, next_price)
    
portfolio=portfolio.T
portfolio.columns = ['account_value']

In [None]:
#trained_sac
#trained_ddpg
#trained_ppo
#trained_a2c
sac_cumpod =(df_daily_return.daily_return+1).cumprod()-1
ddpg_cumpod =(df_daily_return2.daily_return+1).cumprod()-1
ppo_cumpod =(df_daily_return3.daily_return+1).cumprod()-1
a2c_cumpod =(df_daily_return4.daily_return+1).cumprod()-1


In [None]:
min_var_cumpod =(portfolio.account_value.pct_change().fillna(0)+1).cumprod()-1

In [None]:
dji_cumpod =(baseline_returns.fillna(0)+1).cumprod()-1

In [None]:
ptf_cumpod =(ptf_return.fillna(0)+1).cumprod()-1

## Plotly: DRL, Min-Variance, DJIA

In [None]:
#%pip install plotly

In [None]:
from datetime import datetime as dt

import matplotlib.pyplot as plt
import plotly
import plotly.graph_objs as go

In [None]:
time_ind = pd.Series(df_daily_return.date)

In [None]:
#trained_sac
#trained_ddpg
#trained_ppo
#trained_a2c


trace0_portfolio = go.Scatter(x = time_ind, y = a2c_cumpod, mode = 'lines', name = 'A2C (Portfolio Allocation)')
trace1_portfolio = go.Scatter(x = time_ind, y = dji_cumpod, mode = 'lines', name = 'DJIA')
trace2_portfolio = go.Scatter(x = time_ind, y = min_var_cumpod, mode = 'lines', name = 'Min-Variance')
trace3_portfolio = go.Scatter(x = time_ind, y = ptf_cumpod, mode = 'lines', name = 'Portfolio Buy & Hold')
trace4_portfolio = go.Scatter(x = time_ind, y = ddpg_cumpod, mode = 'lines', name = 'DDPG')
trace5_portfolio = go.Scatter(x = time_ind, y = sac_cumpod, mode = 'lines', name = 'SAC')
trace6_portfolio = go.Scatter(x = time_ind, y = ppo_cumpod, mode = 'lines', name = 'PPO')

#trace4 = go.Scatter(x = time_ind, y = addpg_cumpod, mode = 'lines', name = 'Adaptive-DDPG')

#trace2 = go.Scatter(x = time_ind, y = portfolio_cost_minv, mode = 'lines', name = 'Min-Variance')
#trace3 = go.Scatter(x = time_ind, y = spx_value, mode = 'lines', name = 'SPX')

In [None]:
fig = go.Figure()
fig.add_trace(trace0_portfolio)
fig.add_trace(trace1_portfolio)
fig.add_trace(trace2_portfolio)
fig.add_trace(trace3_portfolio)
fig.add_trace(trace4_portfolio)
fig.add_trace(trace5_portfolio)
fig.add_trace(trace6_portfolio)




fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=15,
            color="black"
        ),
        bgcolor="White",
        bordercolor="white",
        borderwidth=2
        
    ),
)
#fig.update_layout(legend_orientation="h")
fig.update_layout(title={
        #'text': "Cumulative Return using FinRL",
        'y':0.85,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
#with Transaction cost
#fig.update_layout(title =  'Quarterly Trade Date')
fig.update_layout(
#    margin=dict(l=20, r=20, t=20, b=20),

    #paper_bgcolor='rgba(1,1,0,0)',
    #paper_bgcolor='rgb(255,1,0)',
    #plot_bgcolor='rgba(1, 1, 0, 0)',
    #xaxis_title="Date",
    yaxis_title="Cumulative Return",
xaxis={'type': 'date', 
       'tick0': time_ind[0], 
        'tickmode': 'linear', 
       'dtick': 86400000.0 *80}

)
#fig.update_xaxes(showline=True,linecolor='black',showgrid=True, gridwidth=1, gridcolor='LightSteelBlue',mirror=True)
#fig.update_yaxes(showline=True,linecolor='black',showgrid=True, gridwidth=1, gridcolor='LightSteelBlue',mirror=True)
#fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='LightSteelBlue')

fig.show()