# Stock NeurIPS2018 Part 3. Backtest
This series is a reproduction of paper *the process in the paper Practical Deep Reinforcement Learning Approach for Stock Trading*.

This is the third and last part of the NeurIPS2018 series, introducing how to use use the agents we trained to do backtest, and compare with baselines such as Mean Variance Optimization and DJIA index.

Other demos can be found at the repo of [FinRL-Tutorials]((https://github.com/AI4Finance-Foundation/FinRL-Tutorials)).

# Part 1. Install Packages

In [60]:
## install required packages
!pip install swig
!pip install wrds
!pip install pyportfolioopt
## install finrl library
!pip install git+https://github.com/AI4Finance-Foundation/FinRL.git
!pip install pandas_market_calendars


Collecting git+https://github.com/AI4Finance-Foundation/FinRL.git
  Cloning https://github.com/AI4Finance-Foundation/FinRL.git to /private/var/folders/ks/bjl76g8d4zxgw0m5p8z2pd9r0000gn/T/pip-req-build-zcsozypc
  Running command git clone --filter=blob:none --quiet https://github.com/AI4Finance-Foundation/FinRL.git /private/var/folders/ks/bjl76g8d4zxgw0m5p8z2pd9r0000gn/T/pip-req-build-zcsozypc
  Resolved https://github.com/AI4Finance-Foundation/FinRL.git to commit d25d902a6de54931a329adc38a2663e8f576adc4
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting elegantrl@ git+https://github.com/AI4Finance-Foundation/ElegantRL.git (from finrl==0.3.8)
  Cloning https://github.com/AI4Finance-Foundation/ElegantRL.git to /private/var/folders/ks/bjl76g8d4zxgw0m5p8z2pd9r0000gn/T/pip-install-7se0zjwz/elegantrl_404f2f0845fe40c4b19e22fa46e199f8
  Running command git clone --

In [61]:
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split

# Part 2. Backtesting

In [62]:
TRAIN_START_DATE = '2016-02-02'
TRAIN_END_DATE = '2023-04-03'
TRADE_START_DATE = '2023-04-04'
TRADE_END_DATE = '2025-04-08'

In [63]:
df = pd.read_csv('data.csv')

In [64]:
day_values_per_tic = df.groupby('tic')['day'].apply(lambda x: sorted(x.unique())).reset_index()
day_values_per_tic.columns = ['tic', 'unique_days']

# Display
print(day_values_per_tic)

      tic            unique_days
0     agg        [0, 1, 2, 3, 4]
1     bil        [0, 1, 2, 3, 4]
2  btcusd  [0, 1, 2, 3, 4, 5, 6]
3     gld        [0, 1, 2, 3, 4]
4     spy        [0, 1, 2, 3, 4]
5      vb        [0, 1, 2, 3, 4]
6     vnq        [0, 1, 2, 3, 4]
7      vo        [0, 1, 2, 3, 4]
8     vwo        [0, 1, 2, 3, 4]


In [65]:
# Match 5-day and 7-day tickers using apply
tics_5day = day_values_per_tic[day_values_per_tic['unique_days'].apply(lambda x: x == list(range(5)))]['tic']
tics_7day = day_values_per_tic[day_values_per_tic['unique_days'].apply(lambda x: x == list(range(7)))]['tic']

# Filter the original df
df_5day_full = df[df['tic'].isin(tics_5day)]
df_7day_full = df[df['tic'].isin(tics_7day)]

# Results
print("5-day tickers:", tics_5day.tolist())
print("7-day tickers:", tics_7day.tolist())
print("5-day df shape:", df_5day_full.shape)
print("7-day df shape:", df_7day_full.shape)

5-day tickers: ['agg', 'bil', 'gld', 'spy', 'vb', 'vnq', 'vo', 'vwo']
7-day tickers: ['btcusd']
5-day df shape: (20520, 8)
7-day df shape: (3723, 8)


In [66]:
df.head()

Unnamed: 0,date,close,high,low,open,volume,tic,day
0,2015-02-01,228.99,233.79,210.0,218.67,7220.0,btcusd,6
1,2015-02-02,112.2,112.23,112.0,112.06,2792120.0,agg,0
2,2015-02-02,91.46,91.48,91.46,91.48,3557487.0,bil,0
3,2015-02-02,237.83,240.1,220.89,228.39,7421.0,btcusd,0
4,2015-02-02,122.42,123.155,121.82,121.84,8885189.0,gld,0


In [67]:
df = pd.concat([df_5day_full, df_7day_full], ignore_index=False)
df.index = range(len(df))  # Assign new sequential index

In [68]:
# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'])

# Remove rows where the date appears only once
df = df[df.groupby('date')['date'].transform('count') > 1]

# Sort by date
df = df.sort_values('date').reset_index(drop=True)


In [69]:
df.head()

Unnamed: 0,date,close,high,low,open,volume,tic,day
0,2015-02-02,112.2,112.23,112.0,112.06,2792120.0,agg,0
1,2015-02-02,40.65,40.76,40.28,40.4,15944366.0,vwo,0
2,2015-02-02,122.3,122.3899,119.94,121.66,823489.0,vo,0
3,2015-02-02,86.32,86.69,84.6851,86.53,7523640.0,vnq,0
4,2015-02-02,237.83,240.1,220.89,228.39,7421.0,btcusd,0


In [70]:
train = data_split(df, TRAIN_START_DATE, TRAIN_END_DATE).reset_index(drop=True)
trade = data_split(df, TRADE_START_DATE, TRADE_END_DATE).reset_index(drop=True)


In [71]:
trade.head()

Unnamed: 0,date,close,high,low,open,volume,tic,day
0,2023-04-04,100.19,100.3,99.5,99.55,8925168.0,agg,1
1,2023-04-04,91.48,91.49,91.48,91.48,10265810.0,bil,1
2,2023-04-04,28179.64,28450.0,27670.24,27809.55,13261.95,btcusd,1
3,2023-04-04,187.98,188.23,184.66,184.7224,13765360.0,gld,1
4,2023-04-04,408.67,411.92,407.24,411.62,66601530.0,spy,1


To backtest the agents, upload trade_data.csv in the same directory of this notebook. For Colab users, just upload trade_data.csv to the default directory.

In [72]:

stock_dimension = len(trade.tic.unique())


# Part 3: Mean Variance Optimization

Mean Variance optimization is a very classic strategy in portfolio management. Here, we go through the whole process to do the mean variance optimization and add it as a baseline to compare.

First, process dataframe to the form for MVO weight calculation.

In [73]:
def process_df_for_mvo(df):
    df = df.sort_values(['date', 'tic'], ignore_index=True)[['date', 'tic', 'close']]
    all_tickers = sorted(df['tic'].unique())
    ticker_index = {tic: idx for idx, tic in enumerate(all_tickers)}
    stock_dimension = len(all_tickers)

    mvo = pd.DataFrame(columns=all_tickers)

    grouped = df.groupby('date')
    for date, group in grouped:
        row = [np.nan] * stock_dimension
        for _, row_data in group.iterrows():
            row[ticker_index[row_data['tic']]] = row_data['close']
        if not any(pd.isna(row)):  # only include dates with all tickers
            mvo.loc[date] = row

    return mvo


### Helper functions for mean returns and variance-covariance matrix

In [74]:
# Codes in this section partially refer to Dr G A Vijayalakshmi Pai

# https://www.kaggle.com/code/vijipai/lesson-5-mean-variance-optimization-of-portfolios/notebook

def StockReturnsComputing(StockPrice, Rows, Columns):
  import numpy as np
  StockReturn = np.zeros([Rows-1, Columns])
  for j in range(Columns):        # j: Assets
    for i in range(Rows-1):     # i: Daily Prices
      StockReturn[i,j]=((StockPrice[i+1, j]-StockPrice[i,j])/StockPrice[i,j])* 100

  return StockReturn

### Calculate the weights for mean-variance

In [75]:
trade.head()

Unnamed: 0,date,close,high,low,open,volume,tic,day
0,2023-04-04,100.19,100.3,99.5,99.55,8925168.0,agg,1
1,2023-04-04,91.48,91.49,91.48,91.48,10265810.0,bil,1
2,2023-04-04,28179.64,28450.0,27670.24,27809.55,13261.95,btcusd,1
3,2023-04-04,187.98,188.23,184.66,184.7224,13765360.0,gld,1
4,2023-04-04,408.67,411.92,407.24,411.62,66601530.0,spy,1


In [76]:
StockData = process_df_for_mvo(train)

TradeData = process_df_for_mvo(trade)

TradeData.to_numpy()

array([[  100.19,    91.48, 28179.64, ...,    82.27,   208.26,    40.61],
       [  100.53,    91.53, 28175.37, ...,    81.83,   206.91,    40.25],
       [  100.44,    91.54, 28053.46, ...,    82.39,   207.1 ,    40.49],
       ...,
       [   99.37,    91.45, 83174.33, ...,    88.02,   249.18,    44.63],
       [   99.46,    91.48, 83860.16, ...,    84.2 ,   234.28,    42.13],
       [   98.2 ,    91.49, 79140.01, ...,    81.99,   232.36,    40.43]])

In [77]:
#compute asset returns
arStockPrices = np.asarray(StockData)
[Rows, Cols]=arStockPrices.shape
arReturns = StockReturnsComputing(arStockPrices, Rows, Cols)

#compute mean returns and variance covariance matrix of returns
meanReturns = np.mean(arReturns, axis = 0)
covReturns = np.cov(arReturns, rowvar=False)

#set precision for printing results
np.set_printoptions(precision=3, suppress = True)

#display mean returns and variance-covariance matrix of returns
print('Mean returns of assets in k-portfolio 1\n', meanReturns)
print('Variance-Covariance matrix of returns\n', covReturns)

Mean returns of assets in k-portfolio 1
 [-0.005  0.     0.349  0.033  0.05   0.045  0.014  0.045  0.026]
Variance-Covariance matrix of returns
 [[ 0.111  0.001  0.173  0.112  0.042  0.053  0.109  0.055  0.045]
 [ 0.001  0.001  0.    -0.    -0.    -0.     0.    -0.    -0.   ]
 [ 0.173  0.    21.569  0.452  1.287  1.505  1.163  1.378  1.233]
 [ 0.112 -0.     0.452  0.797  0.045  0.045  0.154  0.067  0.17 ]
 [ 0.042 -0.     1.287  0.045  1.429  1.539  1.267  1.478  1.171]
 [ 0.053 -0.     1.505  0.045  1.539  1.972  1.521  1.751  1.344]
 [ 0.109  0.     1.163  0.154  1.267  1.521  1.936  1.437  1.024]
 [ 0.055 -0.     1.378  0.067  1.478  1.751  1.437  1.652  1.265]
 [ 0.045 -0.     1.233  0.17   1.171  1.344  1.024  1.265  1.69 ]]


### Use PyPortfolioOpt

In [78]:
from pypfopt.efficient_frontier import EfficientFrontier



In [79]:
ef_mean = EfficientFrontier(meanReturns, covReturns, weight_bounds=(0.01, 0.25))
raw_weights_mean = ef_mean.max_sharpe()
cleaned_weights_mean = ef_mean.clean_weights()
mvo_weights = np.array([1000000 * cleaned_weights_mean[key] for key in cleaned_weights_mean.keys()])
mvo_weights

array([ 10000., 250000., 200000., 250000., 250000.,  10000.,  10000.,
        10000.,  10000.])

In [80]:
FirstTradePrice = np.array([1/p for p in TradeData.head(1).to_numpy()[0]])
Initial_Portfolio = np.multiply(mvo_weights, FirstTradePrice)


In [81]:
Portfolio_Assets = TradeData @ Initial_Portfolio
MVO_result = pd.DataFrame(Portfolio_Assets, columns=["account_value"])
MVO_result

Unnamed: 0,account_value
2023-04-04,1.000000e+06
2023-04-05,9.989945e+05
2023-04-06,9.974777e+05
2023-04-10,1.007469e+06
2023-04-11,1.013427e+06
...,...
2025-04-01,1.636429e+06
2025-04-02,1.620763e+06
2025-04-03,1.604361e+06
2025-04-04,1.578854e+06


In [82]:
# Assuming MVO_result has datetime index and 'account_value' column
df_daily_return = MVO_result.copy()

# Compute daily returns
df_daily_return["daily_return"] = df_daily_return["account_value"].pct_change()

# Reset index to make 'date' a column
df_daily_return = df_daily_return.reset_index().rename(columns={"index": "date"})

# Replace NaN in first row with 0.0 using loc (best practice)
df_daily_return.loc[0, "daily_return"] = 0.0

# Keep only required columns
df_daily_return = df_daily_return[["date", "daily_return"]]

# Preview
df_daily_return.head()

df_daily_return.to_csv('df_daily_return_mvo.csv')


In [98]:
from pypfopt.efficient_frontier import EfficientFrontier
import pandas as pd
import numpy as np

def compute_rolling_mvo_rebalance_63(trade_df, train_df, window_size=63, train_window=126):
    trade_df = trade_df.reset_index(drop=True)
    stock_dimension = len(trade_df.tic.unique())
    unique_dates = trade_df.date.unique()
    total_windows = len(unique_dates) // window_size

    # Initialize tracking
    portfolio_values = pd.DataFrame(columns=["account_value"])
    portfolio_dates = []
    weights_log = []
    initial_fund = 1_000_000
    buy_cost_pct = 0.000
    sell_cost_pct = 0.000

    # Initial train data
    recent_dates = train_df['date'].drop_duplicates().sort_values().tail(train_window)
    train_df_window = train_df[train_df['date'].isin(recent_dates)].copy()

    for w in range(total_windows):
        print(f"\nRebalancing window {w+1}/{total_windows}...")

        start_idx = w * window_size
        end_idx = (w + 1) * window_size
        window_dates = unique_dates[start_idx:end_idx]
        trade_df_window = trade_df[trade_df['date'].isin(window_dates)].copy()

        # Step 1: Preprocess
        train_mvo = process_df_for_mvo(train_df_window)
        trade_mvo = process_df_for_mvo(trade_df_window)
        if train_mvo.empty or len(train_mvo) < 2:
            print(f"[Warning] Skipping window {w}: insufficient train data.")
            continue

        arStockPrices = np.asarray(train_mvo)
        Rows, Cols = arStockPrices.shape
        arReturns = StockReturnsComputing(arStockPrices, Rows, Cols)

        # Convert to pandas with proper tickers
        meanReturns = pd.Series(np.mean(arReturns, axis=0), index=train_mvo.columns)
        covReturns = pd.DataFrame(np.cov(arReturns, rowvar=False), index=train_mvo.columns, columns=train_mvo.columns)

        # Step 2: Optimize
        ef_mean = EfficientFrontier(meanReturns, covReturns, weight_bounds=(0.01, 0.25))
        raw_weights_mean = ef_mean.max_sharpe()
        cleaned_weights_mean = ef_mean.clean_weights()
        weights_log.append(cleaned_weights_mean)

        mvo_weights = np.array([
            initial_fund * (1 - buy_cost_pct) * cleaned_weights_mean[key]
            for key in cleaned_weights_mean.keys()
        ])

        # Step 3: Compute portfolio value
        FirstTradePrice = np.array([1 / p for p in trade_mvo.head(1).to_numpy()[0]])
        Initial_Portfolio = np.multiply(mvo_weights, FirstTradePrice)
        Portfolio_Assets = trade_mvo @ Initial_Portfolio
        MVO_result = pd.DataFrame(Portfolio_Assets, columns=["account_value"])

        # Step 4: Collect results
        dates_in_window = trade_df_window['date'].drop_duplicates().sort_values().tolist()
        portfolio_dates.extend(dates_in_window)
        portfolio_values = pd.concat([portfolio_values, MVO_result], ignore_index=True)

        # Step 5: Update training set and capital
        train_df_window = pd.concat([
            train_df_window.iloc[stock_dimension * window_size:],  # remove oldest
            trade_df_window
        ], ignore_index=True)

        final_value = MVO_result["account_value"].iloc[-1]
        initial_fund = final_value * (1 - sell_cost_pct)

    # Final formatting
    portfolio_values.index = pd.to_datetime(portfolio_dates)

    return portfolio_values, weights_log


In [99]:

rolling_result, weights_log = compute_rolling_mvo_rebalance_63(trade, train)
rolling_result.head()


Rebalancing window 1/8...

Rebalancing window 2/8...

Rebalancing window 3/8...


  portfolio_values = pd.concat([portfolio_values, MVO_result], ignore_index=True)



Rebalancing window 4/8...

Rebalancing window 5/8...

Rebalancing window 6/8...

Rebalancing window 7/8...

Rebalancing window 8/8...


Unnamed: 0,account_value
2023-04-04,1000000.0
2023-04-05,1000029.0
2023-04-06,998388.4
2023-04-10,1000381.0
2023-04-11,1004188.0


In [100]:
# Assuming MVO_result has datetime index and 'account_value' column
df_daily_return = rolling_result.copy()

# Compute daily returns
df_daily_return["daily_return"] = df_daily_return["account_value"].pct_change()

# Reset index to make 'date' a column
df_daily_return = df_daily_return.reset_index().rename(columns={"index": "date"})

# Replace NaN in first row with 0.0 using loc (best practice)
df_daily_return.loc[0, "daily_return"] = 0.0

# Keep only required columns
df_daily_return = df_daily_return[["date", "daily_return"]]

# Preview
df_daily_return.head()

Unnamed: 0,date,daily_return
0,2023-04-04,0.0
1,2023-04-05,2.9e-05
2,2023-04-06,-0.00164
3,2023-04-10,0.001996
4,2023-04-11,0.003805


In [101]:
df_daily_return.to_csv('df_daily_return_adaptive_mvo.csv')