# Stock NeurIPS2018 Part 3. Backtest
This series is a reproduction of paper *the process in the paper Practical Deep Reinforcement Learning Approach for Stock Trading*.

This is the third and last part of the NeurIPS2018 series, introducing how to use use the agents we trained to do backtest, and compare with baselines such as Mean Variance Optimization and DJIA index.

Other demos can be found at the repo of [FinRL-Tutorials]((https://github.com/AI4Finance-Foundation/FinRL-Tutorials)).

# Part 1. Install Packages

In [47]:
## install required packages
!pip install swig
!pip install wrds
!pip install pyportfolioopt
## install finrl library
!pip install git+https://github.com/AI4Finance-Foundation/FinRL.git

Collecting git+https://github.com/AI4Finance-Foundation/FinRL.git
  Cloning https://github.com/AI4Finance-Foundation/FinRL.git to /private/var/folders/ks/bjl76g8d4zxgw0m5p8z2pd9r0000gn/T/pip-req-build-euc0_t3m
  Running command git clone --filter=blob:none --quiet https://github.com/AI4Finance-Foundation/FinRL.git /private/var/folders/ks/bjl76g8d4zxgw0m5p8z2pd9r0000gn/T/pip-req-build-euc0_t3m
  Resolved https://github.com/AI4Finance-Foundation/FinRL.git to commit d25d902a6de54931a329adc38a2663e8f576adc4
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting elegantrl@ git+https://github.com/AI4Finance-Foundation/ElegantRL.git (from finrl==0.3.8)
  Cloning https://github.com/AI4Finance-Foundation/ElegantRL.git to /private/var/folders/ks/bjl76g8d4zxgw0m5p8z2pd9r0000gn/T/pip-install-l5p_xs7e/elegantrl_76802787e87a4d07b4600b77f7dd4b11
  Running command git clone --

In [1]:
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

!pip install pandas_market_calendars
from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.agents.stablebaselines3.models import DRLAgent
from stable_baselines3 import A2C, DDPG, PPO, SAC, TD3

%matplotlib inline
from finrl.config import INDICATORS, TRAINED_MODEL_DIR



# Part 2. Backtesting

To backtest the agents, upload trade_data.csv in the same directory of this notebook. For Colab users, just upload trade_data.csv to the default directory.

In [2]:
train = pd.read_csv('train_data.csv')
trade = pd.read_csv('trade_data.csv')

# If you are not using the data generated from part 1 of this tutorial, make sure
# it has the columns and index in the form that could be make into the environment.
# Then you can comment and skip the following lines.
train = train.set_index(train.columns[0])
train.index.names = ['']
trade = trade.set_index(trade.columns[0])
trade.index.names = ['']

Then, upload the trained agent to the same directory, and set the corresponding variable to True.

In [3]:
if_using_a2c = True
if_using_ddpg = True
if_using_ppo = True
if_using_td3 = True
if_using_sac = True

Load the agents

In [5]:
trained_a2c = A2C.load("trained_models/agent_a2c") if if_using_a2c else None
trained_ddpg = DDPG.load("trained_models/agent_ddpg") if if_using_ddpg else None
trained_ppo = PPO.load("trained_models/agent_ppo") if if_using_ppo else None
trained_td3 = TD3.load("trained_models/agent_td3") if if_using_td3 else None
trained_sac = SAC.load("trained_models/agent_sac") if if_using_sac else None

### Trading (Out-of-sample Performance)

We update periodically in order to take full advantage of the data, e.g., retrain quarterly, monthly or weekly. We also tune the parameters along the way, in this notebook we use the in-sample data from 2009-01 to 2020-07 to tune the parameters once, so there is some alpha decay here as the length of trade date extends.

Numerous hyperparameters – e.g. the learning rate, the total number of samples to train on – influence the learning process and are usually determined by testing some variations.

In [6]:
stock_dimension = len(trade.tic.unique())
state_space = 1 + 2*stock_dimension + len(INDICATORS)*stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")

Stock Dimension: 8, State Space: 81


In [7]:
buy_cost_list = sell_cost_list = [0.001] * stock_dimension
num_stock_shares = [0] * stock_dimension

env_kwargs = {
    "hmax": 100,
    "initial_amount": 1000000,
    "num_stock_shares": num_stock_shares,
    "buy_cost_pct": buy_cost_list,
    "sell_cost_pct": sell_cost_list,
    "state_space": state_space,
    "stock_dim": stock_dimension,
    "tech_indicator_list": INDICATORS,
    "action_space": stock_dimension,
    "reward_scaling": 1e-4
}

In [8]:
e_trade_gym = StockTradingEnv(df = trade, turbulence_threshold = 70,risk_indicator_col='vix', **env_kwargs)
env_trade, obs_trade = e_trade_gym.get_sb_env()

In [9]:
df_account_value_a2c, df_actions_a2c = DRLAgent.DRL_prediction(
    model=trained_a2c,
    environment = e_trade_gym) if if_using_a2c else (None, None)

hit end!


In [10]:
df_account_value_ddpg, df_actions_ddpg = DRLAgent.DRL_prediction(
    model=trained_ddpg,
    environment = e_trade_gym) if if_using_ddpg else (None, None)

hit end!


In [11]:
df_account_value_ppo, df_actions_ppo = DRLAgent.DRL_prediction(
    model=trained_ppo,
    environment = e_trade_gym) if if_using_ppo else (None, None)

hit end!


In [12]:
df_account_value_td3, df_actions_td3 = DRLAgent.DRL_prediction(
    model=trained_td3,
    environment = e_trade_gym) if if_using_td3 else (None, None)

hit end!


In [13]:
df_account_value_sac, df_actions_sac = DRLAgent.DRL_prediction(
    model=trained_sac,
    environment = e_trade_gym) if if_using_sac else (None, None)

hit end!


# Part 3: Mean Variance Optimization

Mean Variance optimization is a very classic strategy in portfolio management. Here, we go through the whole process to do the mean variance optimization and add it as a baseline to compare.

First, process dataframe to the form for MVO weight calculation.

In [14]:
def process_df_for_mvo(df):
  df = df.sort_values(['date','tic'],ignore_index=True)[['date','tic','close']]
  fst = df
  fst = fst.iloc[0:stock_dimension, :]
  tic = fst['tic'].tolist()

  mvo = pd.DataFrame()

  for k in range(len(tic)):
    mvo[tic[k]] = 0

  for i in range(df.shape[0]//stock_dimension):
    n = df
    n = n.iloc[i * stock_dimension:(i+1) * stock_dimension, :]
    date = n['date'][i*stock_dimension]
    mvo.loc[date] = n['close'].tolist()

  return mvo

### Helper functions for mean returns and variance-covariance matrix

In [15]:
# Codes in this section partially refer to Dr G A Vijayalakshmi Pai

# https://www.kaggle.com/code/vijipai/lesson-5-mean-variance-optimization-of-portfolios/notebook

def StockReturnsComputing(StockPrice, Rows, Columns):
  import numpy as np
  StockReturn = np.zeros([Rows-1, Columns])
  for j in range(Columns):        # j: Assets
    for i in range(Rows-1):     # i: Daily Prices
      StockReturn[i,j]=((StockPrice[i+1, j]-StockPrice[i,j])/StockPrice[i,j])* 100

  return StockReturn

### Calculate the weights for mean-variance

In [16]:
StockData = process_df_for_mvo(train)
TradeData = process_df_for_mvo(trade)

TradeData.to_numpy()

array([[112.68,  91.46, 143.95, ...,  91.7 , 178.82,  45.53],
       [113.01,  91.46, 145.86, ...,  92.27, 178.02,  44.72],
       [112.92,  91.45, 147.39, ...,  92.38, 178.27,  44.64],
       ...,
       [ 97.69,  91.51, 275.2 , ...,  79.79, 228.54,  39.89],
       [ 97.91,  91.51, 285.38, ...,  84.55, 246.69,  42.49],
       [ 97.14,  91.52, 292.35, ...,  82.62, 239.6 ,  41.94]])

In [17]:
#compute asset returns
arStockPrices = np.asarray(StockData)
[Rows, Cols]=arStockPrices.shape
arReturns = StockReturnsComputing(arStockPrices, Rows, Cols)

#compute mean returns and variance covariance matrix of returns
meanReturns = np.mean(arReturns, axis = 0)
covReturns = np.cov(arReturns, rowvar=False)

#set precision for printing results
np.set_printoptions(precision=3, suppress = True)

#display mean returns and variance-covariance matrix of returns
print('Mean returns of assets in k-portfolio 1\n', meanReturns)
print('Variance-Covariance matrix of returns\n', covReturns)

Mean returns of assets in k-portfolio 1
 [ 0.004 -0.     0.031  0.031  0.035  0.026  0.034  0.017]
Variance-Covariance matrix of returns
 [[ 0.098  0.     0.062 -0.049 -0.077 -0.07  -0.061 -0.051]
 [ 0.     0.002  0.002 -0.007 -0.007 -0.005 -0.007 -0.011]
 [ 0.062  0.002  1.292  0.011  0.028  0.046  0.048  0.29 ]
 [-0.049 -0.007  0.011  1.513  1.661  1.891  1.605  1.941]
 [-0.077 -0.007  0.028  1.661  2.117  2.363  1.912  2.159]
 [-0.07  -0.005  0.046  1.891  2.363  4.175  2.182  2.509]
 [-0.061 -0.007  0.048  1.605  1.912  2.182  1.84   2.117]
 [-0.051 -0.011  0.29   1.941  2.159  2.509  2.117  3.368]]


### Use PyPortfolioOpt

In [18]:
from pypfopt.efficient_frontier import EfficientFrontier

if stock_dimension == 1:
    mvo_weights = np.array([1.0])  # 100% weight on the only stock
else:
    ef_mean = EfficientFrontier(meanReturns, covReturns, weight_bounds=(0, 0.5))
    raw_weights_mean = ef_mean.max_sharpe()
    cleaned_weights_mean = ef_mean.clean_weights()
    mvo_weights = np.array([1000000 * cleaned_weights_mean[i] for i in range(stock_dimension)])

mvo_weights

array([397160., 214190., 198090., 123580.,  66970.,      0.,      0.,
            0.])

In [19]:
LastPrice = np.array([1/p for p in StockData.tail(1).to_numpy()[0]])
Initial_Portfolio = np.multiply(mvo_weights, LastPrice)
Initial_Portfolio

array([3526.236, 2342.667, 1388.838,  384.889,  405.24 ,    0.   ,
          0.   ,    0.   ])

In [20]:
Portfolio_Assets = TradeData @ Initial_Portfolio
MVO_result = pd.DataFrame(Portfolio_Assets, columns=["Mean Var"])
MVO_result

Unnamed: 0,Mean Var
2020-01-02,1.003800e+06
2020-01-03,1.006426e+06
2020-01-06,1.008773e+06
2020-01-07,1.008655e+06
2020-01-08,1.007579e+06
...,...
2025-04-04,1.229365e+06
2025-04-07,1.215511e+06
2025-04-08,1.210660e+06
2025-04-09,1.253058e+06


# Part 4: DJIA index

Add DJIA index as a baseline to compare with.

In [21]:
TRAIN_START_DATE = '2007-05-30'
TRAIN_END_DATE = '2019-12-31'
TRADE_START_DATE = '2020-01-02'
TRADE_END_DATE = '2025-04-11'

In [38]:
# Load the CSV
df_raw = pd.read_csv('data.csv')

# Filter for btcusd only
df_raw = df_raw[df_raw['tic'] == 'spy']


# Show the last 20 rows
df_raw.tail(20)

df_dji = df_raw.copy()

In [40]:
df_dji = df_dji[['date','close']]
# fst_day = df_dji['close'][0] # The original line causing the error
fst_day = df_dji['close'].iloc[0] # Access the first element using iloc
dji = pd.merge(df_dji['date'], df_dji['close'].div(fst_day).mul(1000000),
               how='outer', left_index=True, right_index=True).set_index('date')
dji

Unnamed: 0_level_0,close
date,Unnamed: 1_level_1
2007-05-30,1.000000e+06
2007-05-31,9.989576e+05
2007-06-01,1.003909e+06
2007-06-04,1.004040e+06
2007-06-05,1.000065e+06
...,...
2025-04-07,3.286291e+06
2025-04-08,3.234819e+06
2025-04-09,3.574537e+06
2025-04-10,3.417905e+06


<a id='4'></a>
# Part 5: Backtesting Results
Backtesting plays a key role in evaluating the performance of a trading strategy. Automated backtesting tool is preferred because it reduces the human error. We usually use the Quantopian pyfolio package to backtest our trading strategies. It is easy to use and consists of various individual plots that provide a comprehensive image of the performance of a trading strategy.

In [41]:
df_result_a2c = df_account_value_a2c.set_index(df_account_value_a2c.columns[0]) if if_using_a2c else None
df_result_ddpg = df_account_value_ddpg.set_index(df_account_value_ddpg.columns[0]) if if_using_ddpg else None
df_result_ppo = df_account_value_ppo.set_index(df_account_value_ppo.columns[0]) if if_using_ppo else None
df_result_td3 = df_account_value_td3.set_index(df_account_value_td3.columns[0]) if if_using_td3 else None
df_result_sac = df_account_value_sac.set_index(df_account_value_sac.columns[0]) if if_using_sac else None

result = pd.DataFrame()
if if_using_a2c: result = pd.merge(result, df_result_a2c, how='outer', left_index=True, right_index=True)
if if_using_ddpg: result = pd.merge(result, df_result_ddpg, how='outer', left_index=True, right_index=True, suffixes=('', '_drop'))
if if_using_ppo: result = pd.merge(result, df_result_ppo, how='outer', left_index=True, right_index=True, suffixes=('', '_drop'))
if if_using_td3: result = pd.merge(result, df_result_td3, how='outer', left_index=True, right_index=True, suffixes=('', '_drop'))
if if_using_sac: result = pd.merge(result, df_result_sac, how='outer', left_index=True, right_index=True, suffixes=('', '_drop'))
result = pd.merge(result, MVO_result, how='outer', left_index=True, right_index=True)
result = pd.merge(result, dji, how='outer', left_index=True, right_index=True).fillna(method='bfill')

  result = pd.merge(result, dji, how='outer', left_index=True, right_index=True).fillna(method='bfill')


In [53]:
col_name = []
col_name.append('A2C') if if_using_a2c else None
col_name.append('DDPG') if if_using_ddpg else None
col_name.append('PPO') if if_using_ppo else None
col_name.append('TD3') if if_using_td3 else None
col_name.append('SAC') if if_using_sac else None
col_name.append('Mean Var')
col_name.append('spx')
result.columns = col_name

In [54]:
result

Unnamed: 0_level_0,A2C,DDPG,PPO,TD3,SAC,Mean Var,spx
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2007-05-30,1.000000e+06,1.000000e+06,1.000000e+06,1000000.0,1.000000e+06,1.003800e+06,1.000000e+06
2007-05-31,1.000000e+06,1.000000e+06,1.000000e+06,1000000.0,1.000000e+06,1.003800e+06,9.989576e+05
2007-06-01,1.000000e+06,1.000000e+06,1.000000e+06,1000000.0,1.000000e+06,1.003800e+06,1.003909e+06
2007-06-04,1.000000e+06,1.000000e+06,1.000000e+06,1000000.0,1.000000e+06,1.003800e+06,1.004040e+06
2007-06-05,1.000000e+06,1.000000e+06,1.000000e+06,1000000.0,1.000000e+06,1.003800e+06,1.000065e+06
...,...,...,...,...,...,...,...
2025-04-07,1.360982e+06,1.330565e+06,1.224973e+06,1000000.0,1.396838e+06,1.215511e+06,3.286291e+06
2025-04-08,1.336188e+06,1.314225e+06,1.205664e+06,1000000.0,1.378728e+06,1.210660e+06,3.234819e+06
2025-04-09,1.464537e+06,1.407594e+06,1.297445e+06,1000000.0,1.479930e+06,1.253058e+06,3.574537e+06
2025-04-10,1.406416e+06,1.375322e+06,1.263677e+06,1000000.0,1.446620e+06,1.247135e+06,3.417905e+06


Now, everything is ready, we can plot the backtest result.

In [55]:
plt.rcParams["figure.figsize"] = (15,5)
plt.figure()
result.plot()

<Axes: xlabel='date'>

In [45]:
import pandas as pd

# MACD calculation
macd_result = pd.DataFrame(index=result.index)
for col in result.columns:
    ema12 = result[col].ewm(span=12, adjust=False).mean()
    ema26 = result[col].ewm(span=26, adjust=False).mean()
    macd_result[col] = ema12 - ema26

# Sharpe Ratio calculation
daily_returns = result.pct_change().dropna()
sharpe_ratios = (daily_returns.mean() / daily_returns.std()) * (252 ** 0.5)

print("Sharpe Ratios:")
print(sharpe_ratios)


Sharpe Ratios:
A2C         0.228426
DDPG        0.262256
PPO         0.180976
TD3              NaN
SAC         0.276099
Mean Var    0.319361
            0.447223
dtype: float64


  daily_returns = result.pct_change().dropna()
