<a href="https://colab.research.google.com/github/AI4Finance-Foundation/FinRL-Meta/blob/master/tutorials/1-Introduction/FinRL_PortfolioAllocation_NeurIPS_2020.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Todo
- Finish exp3 algorithm
    - need to calculate the reward without making a step, see if sep function in gym works
- Confirm turbulence?
- Backtest the trained agents
- Change the reward function
- Run the exp3 algorithm for the trained agents


# Deep Reinforcement Learning for Stock Trading from Scratch: Portfolio Allocation

Tutorials to use OpenAI DRL to perform portfolio allocation in one Jupyter Notebook | Presented at NeurIPS 2020: Deep RL Workshop

* This blog is based on our paper: FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance, presented at NeurIPS 2020: Deep RL Workshop.
* Check out medium blog for detailed explanations: https://towardsdatascience.com/finrl-for-quantitative-finance-tutorial-for-portfolio-allocation-9b417660c7cd
* Please report any issues to our Github: https://github.com/AI4Finance-Foundation/FinRL/issues
* **Pytorch Version** 



# Content

* [1. Problem Definition](#0)
* [2. Getting Started - Load Python packages](#1)
    * [2.1. Install Packages](#1.1)    
    * [2.2. Check Additional Packages](#1.2)
    * [2.3. Import Packages](#1.3)
    * [2.4. Create Folders](#1.4)
* [3. Download Data](#2)
* [4. Preprocess Data](#3)        
    * [4.1. Technical Indicators](#3.1)
    * [4.2. Perform Feature Engineering](#3.2)
* [5.Build Environment](#4)  
    * [5.1. Training & Trade Data Split](#4.1)
    * [5.2. User-defined Environment](#4.2)   
    * [5.3. Initialize Environment](#4.3)    
* [6.Implement DRL Algorithms](#5)  
* [7.Backtesting Performance](#6)  
    * [7.1. BackTestStats](#6.1)
    * [7.2. BackTestPlot](#6.2)   
    * [7.3. Baseline Stats](#6.3)   
    * [7.3. Compare to Stock Market Index](#6.4)             

<a id='0'></a>
# Part 1. Problem Definition

This problem is to design an automated trading solution for portfolio alloacation. We model the stock trading process as a Markov Decision Process (MDP). We then formulate our trading goal as a maximization problem.

The algorithm is trained using Deep Reinforcement Learning (DRL) algorithms and the components of the reinforcement learning environment are:


* Action: The action space describes the allowed actions that the agent interacts with the
environment. Normally, a ∈ A represents the weight of a stock in the porfolio: a ∈ (-1,1). Assume our stock pool includes N stocks, we can use a list [a<sub>1</sub>, a<sub>2</sub>, ... , a<sub>N</sub>] to determine the weight for each stock in the porfotlio, where a<sub>i</sub> ∈ (-1,1), a<sub>1</sub>+ a<sub>2</sub>+...+a<sub>N</sub>=1. For example, "The weight of AAPL in the portfolio is 10%." is [0.1 , ...].

* Reward function: r(s, a, s′) is the incentive mechanism for an agent to learn a better action. The change of the portfolio value when action a is taken at state s and arriving at new state s',  i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio
values at state s′ and s, respectively

* State: The state space describes the observations that the agent receives from the environment. Just as a human trader needs to analyze various information before executing a trade, so
our trading agent observes many different features to better learn in an interactive environment.

* Environment: Dow 30 consituents


The data of the single stock that we will be using for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume.


<a id='1'></a>
# Part 2. Getting Started- Load Python Packages

<a id='1.1'></a>
## 2.1. Install all the packages through FinRL library


In [1]:
## install finrl library
%pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git

^C
Note: you may need to restart the kernel to use updated packages.



<a id='1.2'></a>
## 2.2. Check if the additional packages needed are present, if not install them. 
* Yahoo Finance API
* pandas
* numpy
* matplotlib
* stockstats
* OpenAI gym
* stable-baselines
* tensorflow
* pyfolio

<a id='1.3'></a>
## 2.3. Import Packages

In [1]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('Agg')
%matplotlib inline
import datetime

from finrl import config
from finrl import config_tickers
from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split
from finrl.meta.env_portfolio_allocation.env_portfolio import StockPortfolioEnv
from finrl.agents.stablebaselines3.models import DRLAgent
from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline,convert_daily_return_to_pyfolio_ts
from finrl.meta.data_processor import DataProcessor
from finrl.meta.data_processors.processor_yahoofinance import YahooFinanceProcessor
import sys
sys.path.append("../FinRL-Library")

  from .autonotebook import tqdm as notebook_tqdm


<a id='1.4'></a>
## 2.4. Create Folders

In [2]:
import os
if not os.path.exists("./" + config.DATA_SAVE_DIR):
    os.makedirs("./" + config.DATA_SAVE_DIR)
if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
    os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
    os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
if not os.path.exists("./" + config.RESULTS_DIR):
    os.makedirs("./" + config.RESULTS_DIR)

<a id='2'></a>
# Part 3. Download Data
Yahoo Finance is a website that provides stock data, financial news, financial reports, etc. All the data provided by Yahoo Finance is free.
* FinRL uses a class **YahooDownloader** to fetch data from Yahoo Finance API
* Call Limit: Using the Public API (without authentication), you are limited to 2,000 requests per hour per IP (or up to a total of 48,000 requests a day).


In [3]:
print(config_tickers.DOW_30_TICKER)

['AXP', 'AMGN', 'AAPL', 'BA', 'CAT', 'CSCO', 'CVX', 'GS', 'HD', 'HON', 'IBM', 'INTC', 'JNJ', 'KO', 'JPM', 'MCD', 'MMM', 'MRK', 'MSFT', 'NKE', 'PG', 'TRV', 'UNH', 'CRM', 'VZ', 'V', 'WBA', 'WMT', 'DIS', 'DOW']


In [4]:
dp = YahooFinanceProcessor()
df = dp.download_data(start_date = '2008-01-01',
                     end_date = '2021-10-31',
                     ticker_list = config_tickers.DOW_30_TICKER, time_interval='1D')

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

In [5]:
df.head()

Unnamed: 0,date,open,high,low,close,adjcp,volume,tic,day
0,2008-01-02,7.116786,7.152143,6.876786,6.958571,5.931609,1079178800,AAPL,2
1,2008-01-02,46.599998,47.040001,46.259998,46.599998,34.931583,7934400,AMGN,2
2,2008-01-02,52.09,52.32,50.790001,51.040001,40.173504,8053700,AXP,2
3,2008-01-02,87.57,87.839996,86.0,86.620003,63.481617,4303000,BA,2
4,2008-01-02,72.559998,72.669998,70.050003,70.629997,46.539101,6337800,CAT,2


In [6]:
df.shape

(101615, 9)

# Part 4: Preprocess Data
Data preprocessing is a crucial step for training a high quality machine learning model. We need to check for missing data and do feature engineering in order to convert the data into a model-ready state.
* Add technical indicators. In practical trading, various information needs to be taken into account, for example the historical stock prices, current holding shares, technical indicators, etc. In this article, we demonstrate two trend-following technical indicators: MACD and RSI.
* Add turbulence index. Risk-aversion reflects whether an investor will choose to preserve the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the financial turbulence index that measures extreme asset price fluctuation.

In [7]:
fe = FeatureEngineer(
                    use_technical_indicator=True,
                    use_turbulence=False,
                    user_defined_feature = False)

df = fe.preprocess_data(df)

Successfully added technical indicators


In [8]:
df.shape

(97524, 17)

In [9]:
df.head()

Unnamed: 0,date,open,high,low,close,adjcp,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma
0,2008-01-02,7.116786,7.152143,6.876786,6.958571,5.931609,1079178800,AAPL,2,0.0,6.964725,6.955632,100.0,-66.666667,100.0,6.958571,6.958571
3483,2008-01-02,46.599998,47.040001,46.259998,46.599998,34.931583,7934400,AMGN,2,0.0,6.964725,6.955632,100.0,-66.666667,100.0,46.599998,46.599998
6966,2008-01-02,52.09,52.32,50.790001,51.040001,40.173504,8053700,AXP,2,0.0,6.964725,6.955632,100.0,-66.666667,100.0,51.040001,51.040001
10449,2008-01-02,87.57,87.839996,86.0,86.620003,63.481617,4303000,BA,2,0.0,6.964725,6.955632,100.0,-66.666667,100.0,86.620003,86.620003
13932,2008-01-02,72.559998,72.669998,70.050003,70.629997,46.539101,6337800,CAT,2,0.0,6.964725,6.955632,100.0,-66.666667,100.0,70.629997,70.629997


## Add covariance matrix as states

In [10]:
# add covariance matrix as states
df=df.sort_values(['date','tic'],ignore_index=True)
df.index = df.date.factorize()[0]

cov_list = []
return_list = []

# look back is one year
lookback=252
for i in range(lookback,len(df.index.unique())):
  data_lookback = df.loc[i-lookback:i,:]
  price_lookback=data_lookback.pivot_table(index = 'date',columns = 'tic', values = 'close')
  return_lookback = price_lookback.pct_change().dropna()
  return_list.append(return_lookback)

  covs = return_lookback.cov().values 
  cov_list.append(covs)

  
df_cov = pd.DataFrame({'date':df.date.unique()[lookback:],'cov_list':cov_list,'return_list':return_list})
df = df.merge(df_cov, on='date')
df = df.sort_values(['date','tic']).reset_index(drop=True)
        

In [11]:
df.shape

(90468, 19)

In [12]:
df.head()

Unnamed: 0,date,open,high,low,close,adjcp,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2008-12-31,3.070357,3.133571,3.047857,3.048214,2.598352,607541200,AAPL,2,-0.097446,3.649552,2.895305,42.254771,-80.847207,16.129793,3.243631,3.375887,"[[0.0013489689861716533, 0.0004284126428082587...",tic AAPL AMGN AXP ...
1,2008-12-31,57.110001,58.220001,57.060001,57.75,43.289665,6287200,AMGN,2,0.216368,58.947401,56.388599,51.060614,51.895357,10.432018,56.671334,56.044333,"[[0.0013489689861716533, 0.0004284126428082587...",tic AAPL AMGN AXP ...
2,2008-12-31,17.969999,18.75,17.91,18.549999,14.796396,9625600,AXP,2,-1.191668,23.723023,16.106977,42.52117,-74.811722,25.776759,20.03,22.412,"[[0.0013489689861716533, 0.0004284126428082587...",tic AAPL AMGN AXP ...
3,2008-12-31,41.59,43.049999,41.5,42.669998,32.005878,5443100,BA,2,-0.391219,42.894634,38.486366,47.290375,157.922391,5.366299,40.432,43.3045,"[[0.0013489689861716533, 0.0004284126428082587...",tic AAPL AMGN AXP ...
4,2008-12-31,43.700001,45.099998,43.700001,44.669998,30.214798,6277400,CAT,2,0.979845,45.785565,38.404435,51.073052,98.904653,26.331746,40.266,39.918333,"[[0.0013489689861716533, 0.0004284126428082587...",tic AAPL AMGN AXP ...


<a id='4'></a>
# Part 5. Design Environment
Considering the stochastic and interactive nature of the automated stock trading tasks, a financial task is modeled as a **Markov Decision Process (MDP)** problem. The training process involves observing stock price change, taking an action and reward's calculation to have the agent adjusting its strategy accordingly. By interacting with the environment, the trading agent will derive a trading strategy with the maximized rewards as time proceeds.

Our trading environments, based on OpenAI Gym framework, simulate live stock markets with real market data according to the principle of time-driven simulation.


## Training data split: 2009-01-01 to 2020-07-01

In [13]:
train = data_split(df, '2009-01-01','2020-07-01')
#trade = data_split(df, '2020-01-01', config.END_DATE)

In [14]:
train.head()

Unnamed: 0,date,open,high,low,close,adjcp,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2009-01-02,3.067143,3.251429,3.041429,3.241071,2.762747,746015200,AAPL,4,-0.082758,3.6336,2.892864,45.440193,-30.508777,2.140064,3.244631,3.376833,"[[0.001366150662406762, 0.00043393819572559104...",tic AAPL AMGN AXP ...
0,2009-01-02,58.59,59.080002,57.75,58.990002,44.219189,6547900,AMGN,4,0.320448,59.14836,56.33964,52.756859,94.54963,0.814217,56.759667,56.166,"[[0.001366150662406762, 0.00043393819572559104...",tic AAPL AMGN AXP ...
0,2009-01-02,18.57,19.52,18.4,19.33,15.418563,10955700,AXP,4,-1.059847,23.489423,16.086577,43.923322,-42.018825,16.335101,20.028333,22.263333,"[[0.001366150662406762, 0.00043393819572559104...",tic AAPL AMGN AXP ...
0,2009-01-02,42.799999,45.560001,42.779999,45.25,33.941105,7010200,BA,4,-0.019566,43.926849,37.932151,50.66469,275.696308,20.494464,40.621667,43.237334,"[[0.001366150662406762, 0.00043393819572559104...",tic AAPL AMGN AXP ...
0,2009-01-02,44.91,46.98,44.709999,46.91,31.729921,7117200,CAT,4,1.248426,46.543072,38.372928,53.534743,131.675975,34.637448,40.623333,39.911333,"[[0.001366150662406762, 0.00043393819572559104...",tic AAPL AMGN AXP ...


## Environment for Portfolio Allocation


In [110]:
import numpy as np
import pandas as pd
from gym.utils import seeding
import gym
from gym import spaces
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from stable_baselines3.common.vec_env import DummyVecEnv

#NOTE: data is the 

class StockPortfolioEnv(gym.Env):
    """A single stock trading environment for OpenAI gym

    Attributes
    ----------
        df: DataFrame
            input data
        stock_dim : int
            number of unique stocks
        hmax : int
            maximum number of shares to trade
        initial_amount : int
            start money
        transaction_cost_pct: float
            transaction cost percentage per trade
        reward_scaling: float
            scaling factor for reward, good for training
        state_space: int
            the dimension of input features
        action_space: int
            equals stock dimension
        tech_indicator_list: list
            a list of technical indicator names
        turbulence_threshold: int
            a threshold to control risk aversion
        day: int
            an increment number to control date

    Methods
    -------
    _sell_stock()
        perform sell action based on the sign of the action
    _buy_stock()
        perform buy action based on the sign of the action
    step()
        at each step the agent will return actions, then 
        we will calculate the reward, and return the next observation.
    reset()
        reset the environment
    render()
        use render to return other functions
    save_asset_memory()
        return account value at each time step
    save_action_memory()
        return actions/positions at each time step
        

    """
    metadata = {'render.modes': ['human']}

    def __init__(self, 
                df,
                stock_dim,
                hmax,
                initial_amount,
                transaction_cost_pct,
                reward_scaling,
                state_space,
                action_space,
                tech_indicator_list,
                turbulence_threshold=None,
                lookback=252,
                day = 0):
        #super(StockEnv, self).__init__()
        #money = 10 , scope = 1
        self.day = day
        self.lookback=lookback
        self.df = df
        self.stock_dim = stock_dim
        self.hmax = hmax
        self.initial_amount = initial_amount
        self.transaction_cost_pct =transaction_cost_pct
        self.reward_scaling = reward_scaling
        self.state_space = state_space
        self.action_space = action_space
        self.tech_indicator_list = tech_indicator_list

        # action_space normalization and shape is self.stock_dim
        self.action_space = spaces.Box(low = 0, high = 1,shape = (self.action_space,)) 
        # Shape = (34, 30)
        # covariance matrix + technical indicators
        self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape = (self.state_space+len(self.tech_indicator_list),self.state_space))

        # load data from a pandas dataframe
        self.data = self.df.loc[self.day,:]
        self.covs = self.data['cov_list'].values[0]
        self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
        self.terminal = False     
        self.turbulence_threshold = turbulence_threshold        
        # initalize state: inital portfolio return + individual stock return + individual weights
        self.portfolio_value = self.initial_amount

        # memorize portfolio value each step
        self.asset_memory = [self.initial_amount]
        # memorize portfolio return each step
        self.portfolio_return_memory = [0]
        self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
        self.date_memory=[self.data.date.unique()[0]]

        
    def step(self, actions):
        # print(self.day)
        self.terminal = self.day >= len(self.df.index.unique())-1
        # print(actions)

        if self.terminal:
            df = pd.DataFrame(self.portfolio_return_memory)
            df.columns = ['daily_return']
            plt.plot(df.daily_return.cumsum(),'r')
            #plt.savefig('results/cumulative_reward.png')
            plt.close()
            
            plt.plot(self.portfolio_return_memory,'r')
            #plt.savefig('results/rewards.png')
            plt.close()

            print("=================================")
            print("begin_total_asset:{}".format(self.asset_memory[0]))           
            print("end_total_asset:{}".format(self.portfolio_value))

            df_daily_return = pd.DataFrame(self.portfolio_return_memory)
            df_daily_return.columns = ['daily_return']
            if df_daily_return['daily_return'].std() !=0:
              sharpe = (252**0.5)*df_daily_return['daily_return'].mean()/ \
                       df_daily_return['daily_return'].std()
              print("Sharpe: ",sharpe)
            print("=================================")
            
            return self.state, self.reward, self.terminal,{}

        else:
            # actions are the portfolio weight
            # normalize to sum of 1
            #if (np.array(actions) - np.array(actions).min()).sum() != 0:
            #  norm_actions = (np.array(actions) - np.array(actions).min()) / (np.array(actions) - np.array(actions).min()).sum()
            #else:
            #  norm_actions = actions
            weights = self.softmax_normalization(actions) 
            #print("Normalized actions: ", weights)
            self.actions_memory.append(weights)
            last_day_memory = self.data

            #load next state
            self.day += 1
            self.data = self.df.loc[self.day,:]
            self.covs = self.data['cov_list'].values[0]
            self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
            #print(self.state)
            # calcualte portfolio return
            # individual stocks' return * weight
            
            portfolio_return = sum(((self.data.close.values / last_day_memory.close.values)-1)*weights)
            # update portfolio value
            new_portfolio_value = self.portfolio_value*(1+portfolio_return)
            # the reward is the change in the portfolio value
            self.reward = self.portfolio_value*portfolio_return
            self.portfolio_value = new_portfolio_value

            # save into memory
            self.portfolio_return_memory.append(portfolio_return)
            self.date_memory.append(self.data.date.unique()[0])            
            self.asset_memory.append(new_portfolio_value)

            
            #print("Step reward: ", self.reward)
            #self.reward = self.reward*self.reward_scaling

        return self.state, self.reward, self.terminal, {}
    
    

    def reset(self):
        self.asset_memory = [self.initial_amount]
        self.day = 0
        self.data = self.df.loc[self.day,:]
        # load states
        self.covs = self.data['cov_list'].values[0]
        self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
        self.portfolio_value = self.initial_amount
        #self.cost = 0
        #self.trades = 0
        self.terminal = False 
        self.portfolio_return_memory = [0]
        self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
        self.date_memory=[self.data.date.unique()[0]] 
        return self.state
    
    def render(self, mode='human'):
        return self.state
        
    def softmax_normalization(self, actions):
        numerator = np.exp(actions)
        denominator = np.sum(np.exp(actions))
        softmax_output = numerator/denominator
        return softmax_output

    
    def save_asset_memory(self):
        date_list = self.date_memory
        portfolio_return = self.portfolio_return_memory
        #print(len(date_list))
        #print(len(asset_list))
        df_account_value = pd.DataFrame({'date':date_list,'daily_return':portfolio_return})
        return df_account_value

    def save_action_memory(self):
        # date and close price length must match actions length
        date_list = self.date_memory
        df_date = pd.DataFrame(date_list)
        df_date.columns = ['date']
        
        action_list = self.actions_memory
        df_actions = pd.DataFrame(action_list)
        df_actions.columns = self.data.tic.values
        df_actions.index = df_date.date
        #df_actions = pd.DataFrame({'date':date_list,'actions':action_list})
        return df_actions

    def _seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def get_sb_env(self):
        e = DummyVecEnv([lambda: self])
        obs = e.reset()
        return e, obs

In [16]:
stock_dimension = len(train.tic.unique())
state_space = stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")


Stock Dimension: 28, State Space: 28


In [17]:
env_kwargs = {
    "hmax": 100, 
    "initial_amount": 1000000, 
    "transaction_cost_pct": 0.001, 
    "state_space": state_space, 
    "stock_dim": stock_dimension, 
    "tech_indicator_list": config.INDICATORS, 
    "action_space": stock_dimension, 
    "reward_scaling": 1e-4
    
}

e_train_gym = StockPortfolioEnv(df = train, **env_kwargs)

In [18]:
env_train, _ = e_train_gym.get_sb_env()
print(type(env_train))

<class 'stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv'>


<a id='5'></a>
# Part 6: Implement DRL Algorithms
* The implementation of the DRL algorithms are based on **OpenAI Baselines** and **Stable Baselines**. Stable Baselines is a fork of OpenAI Baselines, with a major structural refactoring, and code cleanups.
* FinRL library includes fine-tuned standard DRL algorithms, such as DQN, DDPG,
Multi-Agent DDPG, PPO, SAC, A2C and TD3. We also allow users to
design their own DRL algorithms by adapting these DRL algorithms.

In [19]:
# initialize
agent = DRLAgent(env = env_train)

In [20]:
import os
os.chdir("C:\\Users\\justi\\Desktop\\Everything\\Code\\rl-trade")
#os.listdir()

In [21]:
os.getcwd()

'C:\\Users\\justi\\Desktop\\Everything\\Code\\rl-trade'

### Model 1: **A2C**


In [22]:
agent = DRLAgent(env = env_train)

A2C_PARAMS = {"n_steps": 5, "ent_coef": 0.005, "learning_rate": 0.0002}
model_a2c = agent.get_model(model_name="a2c",model_kwargs = A2C_PARAMS)

{'n_steps': 5, 'ent_coef': 0.005, 'learning_rate': 0.0002}
Using cpu device


In [24]:
'''trained_a2c = agent.train_model(model=model_a2c, 
                                tb_log_name='a2c',
                                total_timesteps=500)'''

saved_a2c = model_a2c.load('C:\\Users\\justi\\Desktop\\Everything\\Code\\ensemble-trader\\trained_models\\trained_a2c.zip')

In [None]:
#trained_a2c.save('trained_models/trained_a2c.zip')

### Model 2: **PPO**


In [26]:
agent = DRLAgent(env = env_train)
PPO_PARAMS = {
    "n_steps": 2048,
    "ent_coef": 0.005,
    "learning_rate": 0.0001,
    "batch_size": 128,
}
model_ppo = agent.get_model("ppo",model_kwargs = PPO_PARAMS)

{'n_steps': 2048, 'ent_coef': 0.005, 'learning_rate': 0.0001, 'batch_size': 128}
Using cpu device


In [27]:
'''trained_ppo = agent.train_model(model=model_ppo, 
                             tb_log_name='ppo',
                             total_timesteps=500)'''

saved_ppo = model_ppo.load('C:\\Users\\justi\\Desktop\\Everything\\Code\\ensemble-trader\\trained_models\\trained_ppo.zip')

### Model 3: **DDPG**


In [28]:
agent = DRLAgent(env = env_train)
DDPG_PARAMS = {"batch_size": 128, "buffer_size": 50000, "learning_rate": 0.001}


model_ddpg = agent.get_model("ddpg",model_kwargs = DDPG_PARAMS)

{'batch_size': 128, 'buffer_size': 50000, 'learning_rate': 0.001}
Using cpu device


In [29]:
'''trained_ddpg = agent.train_model(model=model_ddpg, 
                             tb_log_name='ddpg',
                             total_timesteps=5)'''

saved_ddpg = model_ddpg.load('C:\\Users\\justi\\Desktop\\Everything\\Code\\ensemble-trader\\trained_models\\trained_ddpg.zip')

In [None]:
#trained_ddpg.save('trained_models/trained_ddpg.zip')

### Model 4: **SAC**


In [30]:
agent = DRLAgent(env = env_train)
SAC_PARAMS = {
    "batch_size": 128,
    "buffer_size": 100000,
    "learning_rate": 0.0003,
    "learning_starts": 100,
    "ent_coef": "auto_0.1",
}
model_sac = agent.get_model("sac",model_kwargs = SAC_PARAMS)

{'batch_size': 128, 'buffer_size': 100000, 'learning_rate': 0.0003, 'learning_starts': 100, 'ent_coef': 'auto_0.1'}
Using cpu device


In [31]:
'''trained_sac = agent.train_model(model=model_sac, 
                             tb_log_name='sac',
                             total_timesteps=5)'''
saved_sac = model_sac.load('C:\\Users\\justi\\Desktop\\Everything\\Code\\ensemble-trader\\trained_models\\trained_sac.zip')

In [37]:
#trained_sac.save('trained_models/trained_sac.zip')

### Model 5: **TD3** (NOT ENOUGH MEMORY)


In [42]:
agent = DRLAgent(env = env_train)
TD3_PARAMS = {"batch_size": 100, 
              "buffer_size": 1000000, 
              "learning_rate": 0.001}

model_td3 = agent.get_model("td3",model_kwargs = TD3_PARAMS)

{'batch_size': 100, 'buffer_size': 1000000, 'learning_rate': 0.001}
Using cpu device




In [44]:
'''trained_td3 = agent.train_model(model=model_td3, 
                             tb_log_name='td3',
                             total_timesteps=5)'''

saved_td3 = model_td3.load('trained_td3.zip')

MemoryError: Unable to allocate 3.76 GiB for an array with shape (1000000, 1, 36, 28) and data type float32

In [40]:
#trained_td3.save('trained_models/trained_td3.zip')

## Trading
Assume that we have $1,000,000 initial capital at 2019-01-01. We use the A2C model to trade Dow jones 30 stocks.

In [91]:
trade = data_split(df,'2020-07-01', '2021-10-31')
e_trade_gym = StockPortfolioEnv(df = trade, **env_kwargs)


In [92]:
trade.shape

(9436, 19)

In [62]:
df_daily_return, df_actions = DRLAgent.DRL_prediction(model=saved_a2c,
                        environment = e_trade_gym)

begin_total_asset:1000000
end_total_asset:1364541.7476838455
Sharpe:  1.7225417284270286
hit end!


In [45]:
#df_daily_return.head()

Unnamed: 0,date,daily_return
0,2020-07-01,0.0
1,2020-07-02,0.003459
2,2020-07-06,0.016537
3,2020-07-07,-0.012888
4,2020-07-08,0.005256


In [61]:
#df_daily_return.to_csv('df_daily_return.csv')

In [46]:
#df_actions.head()

Unnamed: 0_level_0,AAPL,AMGN,AXP,BA,CAT,CRM,CSCO,CVX,DIS,GS,...,MMM,MRK,MSFT,NKE,PG,TRV,UNH,VZ,WBA,WMT
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-07-01,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,...,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714
2020-07-02,0.018737,0.018737,0.018737,0.050933,0.050933,0.050933,0.018737,0.050933,0.050933,0.050933,...,0.018737,0.050933,0.018737,0.018737,0.018737,0.050933,0.018737,0.050933,0.018737,0.050933
2020-07-06,0.018737,0.018737,0.018737,0.050933,0.050933,0.050933,0.018737,0.050933,0.050933,0.050933,...,0.018737,0.050933,0.018737,0.018737,0.018737,0.050933,0.018737,0.050933,0.018737,0.050933
2020-07-07,0.018737,0.018737,0.018737,0.050933,0.050933,0.050933,0.018737,0.050933,0.050933,0.050933,...,0.018737,0.050933,0.018737,0.018737,0.018737,0.050933,0.018737,0.050933,0.018737,0.050933
2020-07-08,0.018737,0.018737,0.018737,0.050933,0.050933,0.050933,0.018737,0.050933,0.050933,0.050933,...,0.018737,0.050933,0.018737,0.018737,0.018737,0.050933,0.018737,0.050933,0.018737,0.050933


In [63]:
df_actions.to_csv('df_actions.csv')

### MW and weighted average


In [201]:
def DRL_pred(model, environment, deterministic=True):
        test_env, test_obs = environment.get_sb_env()
        """make a prediction"""
        account_memory = []
        actions_memory = []
        #         state_memory=[] #add memory pool to store states
        test_env.reset()
        for i in range(len(environment.df.index.unique())):
            action, _states = model.predict(test_obs, deterministic=deterministic)
            # account_memory = test_env.env_method(method_name="save_asset_memory")
            # actions_memory = test_env.env_method(method_name="save_action_memory")
            print("action", action)
            test_obs, rewards, dones, info = test_env.step(action)
            if i == (len(environment.df.index.unique()) - 2):
                account_memory = test_env.env_method(method_name="save_asset_memory")
                actions_memory = test_env.env_method(method_name="save_action_memory")
            #                 state_memory=test_env.env_method(method_name="save_state_memory") # add current state to state memory
            if dones[0]:
                print("hit end!")
                break
        return account_memory[0], actions_memory[0]

In [107]:

#Softmax normalize actions to get porfolio weighting
def softmax_normalization(actions):
    numerator = np.exp(actions)
    denominator = np.sum(np.exp(actions))
    softmax_output = numerator/denominator
    return softmax_output 

'''Get the change in portfolio value due to action; if on last day, return None'''
def get_reward(day, actions, df, last_day_memory, cur_portfolio_val):
    # print(self.day)
    terminal = day >= len(df.index.unique())-1
    if terminal:
        return None
    else:
        #actions are numbers from 0-1 for each stock, this is not a weight distribution until normalize
        weights = softmax_normalization(actions) 
        #load next state
        data = df.loc[day + 1,:]
        price_changes = ((data.close.values / last_day_memory.close.values)-1)
        portfolio_return = sum(np.multiply(np.ravel(price_changes), np.ravel(weights)))
        reward = cur_portfolio_val * portfolio_return 
        return reward


def mult_weights(models, environment, epsilon, deterministic=True):
        test_env, test_obs = environment.get_sb_env()
        '''weights of each agent'''
        agent_weights = np.repeat(1/len(models), len(models)) #init to equal
        
        """make a prediction"""
        account_memory = []
        actions_memory = []
        test_env.reset()
        #iterate i over all days
        for i in range(len(environment.df.index.unique())):
            agent_action_rewards = [] #(reward, action)
            #get the action and consequent reward of each agent
            for j in range(0, len(models)):
                model = models[j]
                action, _states = model.predict(test_obs, deterministic=deterministic)
                #terminal case
                if environment.day >= len(environment.df.index.unique())-1:
                    #run step just to get final results
                    test_env.step(action)
                    return account_memory, actions_memory
                
                reward = get_reward(environment.day, action, environment.df, environment.data, environment.portfolio_value)
                agent_action_rewards.append((reward, action))
            
            #update the weights using MW algorithm
            for j in range(0, len(agent_action_rewards)):
                tanh_reward = np.tanh(agent_action_rewards[j][0]/20000) #Note: need to tune how we squash the reward to [-1,1], USE GIBBS DIST
                delta_weight = epsilon * tanh_reward #use tanh to ensure we don't get negative weights
                #print("delta weight", delta_weight)
                agent_weights[j] *= (1 + delta_weight) 
            agent_weights = agent_weights / agent_weights.sum()
         
            
            #update the weighted average action (using updated weights) and get resulting reward
            weighted_actions = np.array([np.ravel(agent_weights[i] * agent_action_rewards[i][1]) for i in range(0, len(agent_action_rewards))])
            weighted_avg_action = np.sum(weighted_actions, axis=0) #sum across elements
            weighted_avg_action = weighted_avg_action.reshape(1, len(weighted_avg_action)) #reshape
                
            test_obs, rewards, dones, info = test_env.step(weighted_avg_action)
            
            if i == (len(environment.df.index.unique()) - 2):
                account_memory = test_env.env_method(method_name="save_asset_memory")
                actions_memory = test_env.env_method(method_name="save_action_memory")
            if dones[0]:
                print("this shouldn't be triggered")
                break
        return account_memory, actions_memory

In [108]:
a,b = mult_weights([saved_a2c, saved_ppo, saved_ddpg, saved_sac], e_trade_gym, np.sqrt(np.log(5)/336), deterministic=True)

begin_total_asset:1000000
end_total_asset:1371208.4641898726
Sharpe:  1.8176886617317896


### Test using the ensemble of different risk portfolios


In [96]:
agent = DRLAgent(env = env_train)
A2C_PARAMS = {"n_steps": 5, "ent_coef": 0.005, "learning_rate": 0.0002}
model_a2c = agent.get_model(model_name="a2c",model_kwargs = A2C_PARAMS)
PPO_PARAMS = {
    "n_steps": 2048,
    "ent_coef": 0.005,
    "learning_rate": 0.0001,
    "batch_size": 128,
}
model_ppo = agent.get_model("ppo",model_kwargs = PPO_PARAMS)
DDPG_PARAMS = {"batch_size": 128, "buffer_size": 50000, "learning_rate": 0.001}
model_ddpg = agent.get_model("ddpg",model_kwargs = DDPG_PARAMS)

model_dicts = {'a2c': model_a2c, 'ddpg': model_ddpg, 'ppo': model_ppo}

{'n_steps': 5, 'ent_coef': 0.005, 'learning_rate': 0.0002}
Using cpu device
{'n_steps': 2048, 'ent_coef': 0.005, 'learning_rate': 0.0001, 'batch_size': 128}
Using cpu device
{'batch_size': 128, 'buffer_size': 50000, 'learning_rate': 0.001}
Using cpu device


In [97]:
risk_ensemble = []
model_types = ['a2c', 'ddpg', 'ppo']
for model_type in model_types:
    model_zips = os.listdir(f'C:\\Users\\justi\\Desktop\\Everything\\Code\\ensemble-trader\\trained_models\\spec_models\\{model_type}')
    for file in model_zips:
        model = model_dicts[model_type].load(f'C:\\Users\\justi\\Desktop\\Everything\\Code\\ensemble-trader\\trained_models\\spec_models\\{model_type}\\{file}')
        risk_ensemble.append(model)

In [109]:
risk_acc_memory, risk_action_memory = mult_weights(risk_ensemble, e_trade_gym, np.sqrt(np.log(21)/336), deterministic=True)

begin_total_asset:1000000
end_total_asset:1387957.5094829695
Sharpe:  1.872485128220912


### Exp Weights

In [None]:
def exp_weights(models, environment, deterministic=True):
        test_env, test_obs = environment.get_sb_env()
        '''weights of each agent'''
        agent_weights = np.repeat(1/len(models), len(models)) #init to equal
        
        """make a prediction"""
        account_memory = []
        actions_memory = []
        #         state_memory=[] #add memory pool to store states
        test_env.reset()
        #iterate i over all days
        for i in range(len(environment.df.index.unique())):
            if i == (len(environment.df.index.unique()) - 2):
                account_memory = test_env.env_method(method_name="save_asset_memory")
                actions_memory = test_env.env_method(method_name="save_action_memory")
            agent_action_rewards = [] #(reward, action)
            #get the action and consequent reward of each agent
            for j in range(0, len(models)):
                action, _states = model.predict(test_obs, deterministic=deterministic)
                test_obs, reward, dones, info = test_env.step(action)
                agent_action_rewards.append((reward, action))
                if dones[0]:
                    print("hit end!")
                    break
            
            #                 state_memory=test_env.env_method(method_name="save_state_memory") # add current state to state memory
            
        return account_memory[0], actions_memory[0]

# Neural Network

- input: (some market state... some market risk). Just the use the given state
- output: 

In [None]:
self.data = self.df.loc[self.day,:]
self.covs = self.data['cov_list'].values[0]
self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)

In [161]:
#returns the state vectors at each training day (contains cov and indicators)
def return_states(train):
    num_days = len(train.index.unique())
    tech_indicator_list = config.INDICATORS
    states = []
    for i in range(0, num_days):
        data = train.loc[i,:]
        covs = data['cov_list'].values[0]
        state = np.append(np.array(covs), [data[tech].values.tolist() for tech in tech_indicator_list], axis=0)
        states.append(state)
    return np.array(states)

## NOTE: need to change the normalization of rewards -> weight to have the agents be more distinct

In [163]:
def return_reward_history(model, environment, deterministic=True):
    train_env, train_obs = environment.get_sb_env()
    train_env.reset()
    reward_hist = []
    for i in range(len(environment.df.index.unique())):
        action, _states = model.predict(train_obs, deterministic=deterministic)
        train_obs, rewards, dones, info = train_env.step(action)
        reward_hist.append(rewards[0])
       
        if dones[0]:
            print("hit end!")
            break
    return reward_hist

#returns the softmax of normalized rewards across the models
def return_softmax_rewards(models, environment, deterministic=True):
        weight_vecs = []
        models_reward_hists = []
        for i in range(0, len(models)):
            model = models[i]
            reward_hist = return_reward_history(model, environment)
            models_reward_hists.append(reward_hist)
        num_rewards = len(models_reward_hists[0])
        models_reward_hists = np.array(models_reward_hists)
        for j in range(0, num_rewards):
            rewards = models_reward_hists[:, j]
            #min/max normalization
            norm_rewards = (rewards - rewards.min())/ (rewards.max() - rewards.min())
            weight_vec = softmax_normalization(norm_rewards)
            print(rewards, weight_vec)
            weight_vecs.append(weight_vec)
        return np.array(weight_vecs)

In [168]:
states = return_states(train)
labels = return_softmax_rewards(risk_ensemble, e_train_gym)

begin_total_asset:1000000
end_total_asset:4419658.9897577595
Sharpe:  0.7837875404298456
hit end!
begin_total_asset:1000000
end_total_asset:4559924.658521381
Sharpe:  0.8120507584487765
hit end!
begin_total_asset:1000000
end_total_asset:5061323.239152415
Sharpe:  0.8436140387787318
hit end!
begin_total_asset:1000000
end_total_asset:5196304.11633377
Sharpe:  0.8661815937503313
hit end!
begin_total_asset:1000000
end_total_asset:4756137.448654243
Sharpe:  0.8380741467632457
hit end!
begin_total_asset:1000000
end_total_asset:4718262.078590579
Sharpe:  0.8281945715683527
hit end!
begin_total_asset:1000000
end_total_asset:4479571.420804884
Sharpe:  0.825632300211407
hit end!
begin_total_asset:1000000
end_total_asset:4486861.704987937
Sharpe:  0.8058859920785147
hit end!
begin_total_asset:1000000
end_total_asset:4313850.054718562
Sharpe:  0.800265565571301
hit end!
begin_total_asset:1000000
end_total_asset:4698743.471322863
Sharpe:  0.82230114536307
hit end!
begin_total_asset:1000000
end_tota

 -25872.436 -25815.078 -25825.854] [0.04474381 0.04079381 0.0269431  0.04777706 0.03482141 0.04822013
 0.07323893 0.04849742 0.05468946 0.04294607 0.05013028 0.05214961
 0.04062748 0.05713363 0.04834788 0.04815289 0.04826917 0.04821772
 0.0479089  0.04822545 0.04816582]
[-2897.6968  -1372.2656   -431.06314 -2440.8254   -797.8648   -884.95215
 -4402.0615  -3320.2312  -2445.863   -1562.9678   -675.1186  -3293.9443
 -1818.5643  -2280.1843  -2311.7317  -2330.0498  -2334.659   -2324.2542
 -2335.3633  -2338.1062  -2308.7258 ] [0.03826805 0.05619111 0.07122028 0.04293415 0.06493634 0.06352773
 0.02620048 0.03440529 0.04287972 0.05355637 0.06697491 0.0346338
 0.05021777 0.0447066  0.04435283 0.04414871 0.04409749 0.04421319
 0.04408967 0.04405922 0.04438642]
[14695.29  11991.805 15029.967 12833.912 13811.585 11503.054 10889.79
 13106.794 10520.877 13662.118 13898.328 12236.91  14993.757 12377.19
 12754.088 12737.615 12749.461 12760.429 12763.024 12774.91  12746.83 ] [0.06866405 0.03770009 0.07

[-13225.524   -7150.32   -10607.671   -8677.331   -8784.401   -6683.7905
  -9493.996  -10474.035   -5229.4014  -9681.766  -13503.266   -8675.085
 -11442.284   -4798.539   -7917.929   -7879.6963  -7926.4385  -7919.119
  -7954.011   -7938.8916  -7945.1475] [0.02767695 0.05561875 0.03738761 0.04666979 0.04609925 0.05868096
 0.04249041 0.03796602 0.06935205 0.04158367 0.02680781 0.04668183
 0.03396936 0.07287118 0.05092415 0.05114831 0.05087439 0.05091719
 0.0507135  0.05080166 0.05076517]
[3062.4192 1712.5748 3309.7283 2822.6045 3086.1492 1805.5282 3274.878
 2520.2632 3204.7803 3401.882  3317.9526 3175.5146 5027.987  3889.9895
 2691.291  2692.9958 2696.5603 2700.8347 2724.6243 2720.584  2697.2112] [0.04779863 0.03181236 0.05150046 0.04446328 0.04814198 0.03271689
 0.05096195 0.04058795 0.04989577 0.05295203 0.05162838 0.04945727
 0.08647496 0.06135092 0.04273665 0.04275863 0.04280462 0.04285984
 0.04316849 0.04311591 0.04281303]
[38027.492 34626.64  35414.45  42867.26  36343.117 34992.113

 19737.105 19814.531 19792.252 19772.758 19790.723 19779.674 19816.27 ] [0.05497196 0.05413173 0.07080612 0.06006784 0.04917411 0.07778347
 0.02891632 0.03258616 0.03544114 0.04708279 0.02861494 0.03693405
 0.05005648 0.04110295 0.04719681 0.04763718 0.04751005 0.04739908
 0.04750133 0.04743842 0.04764711]
[20964.705 21974.543 25338.627 23661.236 22498.807 20835.65  18595.781
 18631.322 16904.66  27231.111 19753.584 23147.398 25777.266 16843.033
 21139.611 21179.127 21201.818 21162.635 21240.723 21150.73  21150.467] [0.04405871 0.0485568  0.06712624 0.05711704 0.05107025 0.04351474
 0.03507472 0.03519493 0.0298053  0.08053999 0.03921015 0.05436053
 0.07002136 0.029629   0.04480682 0.04497759 0.04507594 0.04490624
 0.04524507 0.0448548  0.04485367]
[ -7844.208   -8387.968  -12326.144   -9932.329   -7721.6045 -10723.775
  -6381.7295  -6895.3853  -5697.0693  -9287.978   -8198.556   -6719.1514
  -7570.565   -7590.1245  -8107.1553  -8127.92    -8086.208   -8121.569
  -8125.846   -8133.6753 

 -32163.73  -32175.566 -32148.48 ] [0.0486314  0.03931808 0.02754948 0.04049769 0.07017197 0.05880206
 0.07205102 0.03907627 0.07174554 0.04657602 0.05056061 0.05385374
 0.02650609 0.05462738 0.04305776 0.04309687 0.04280552 0.04271124
 0.04278097 0.04271778 0.04286252]
[44013.574 39949.816 44317.613 39070.316 42532.52  41332.33  41451.566
 44091.617 35317.04  46180.348 43086.832 41471.855 46878.06  39389.164
 41953.293 42002.77  42067.33  42014.83  42002.87  42018.637 41958.25 ] [0.05524835 0.03887429 0.05672057 0.03602664 0.04860521 0.04381241
 0.04426662 0.05562256 0.02603936 0.06663693 0.05099244 0.04434437
 0.07078233 0.03703406 0.04623001 0.04642827 0.04668826 0.04647673
 0.04642868 0.04649204 0.04624983]
[-43908.72  -32427.67  -40884.023 -46746.914 -30672.48  -34655.66
 -30974.193 -38865.703 -30656.172 -41228.285 -30998.24  -42273.93
 -33908.9   -38507.97  -36168.312 -36292.812 -36201.605 -36300.598
 -36243.664 -36388.086 -36359.906] [0.02937119 0.05995134 0.03544534 0.02462168 

 -48240.516 -48134.85  -48147.93 ] [0.04362588 0.04711097 0.02781706 0.02606324 0.05456127 0.03633495
 0.05224631 0.04802874 0.07084724 0.03812947 0.05400341 0.04803926
 0.04861161 0.05710665 0.04978612 0.04976196 0.04943259 0.0496558
 0.04932494 0.0497848  0.04972766]
[-18285.691 -14662.903 -24062.977 -27569.992 -21496.81  -20022.904
 -25502.012 -19366.592 -18101.861 -21274.254 -17239.512 -22963.502
 -20759.713 -17967.629 -19190.94  -19139.303 -19226.748 -19141.54
 -19185.258 -19141.754 -19114.89 ] [0.05383825 0.07128361 0.03441109 0.02622377 0.04198016 0.04705846
 0.03078067 0.04951321 0.05461054 0.04271029 0.05838383 0.03747081
 0.04444734 0.05518145 0.05019164 0.05039284 0.05005259 0.05038411
 0.05021374 0.05038328 0.05048824]
[ 6499.1343  6385.863   4602.607   6793.1416  6375.838   7116.351
  8935.198   5209.3076  3679.332  10276.662   9358.628   8415.95
  5077.098   8009.251   6540.8965  6563.035   6562.624   6563.356
  6564.562   6544.887   6526.426 ] [0.04434199 0.04358717 0.03

 -34352.348 -34339.582 -34320.355] [0.02828432 0.04719323 0.02915419 0.03003247 0.07688475 0.03564855
 0.07323615 0.06780996 0.0568497  0.03674382 0.05932822 0.0506091
 0.05277787 0.0436978  0.04464813 0.04455438 0.04465065 0.04449728
 0.04438457 0.04445462 0.04456036]
[ -5965.719   -7167.139   -4295.7837 -10395.496   -8824.783   -3486.4014
 -10334.808   -8349.858  -10345.954   -5062.1357  -3801.9265  -7860.945
  -9536.251   -9901.683   -7052.602   -7010.1265  -7030.3867  -7030.0254
  -7010.869   -7046.8135  -7000.307 ] [0.05574987 0.04685162 0.07099264 0.02936265 0.03685766 0.07981598
 0.02962171 0.03948033 0.02957396 0.0635392  0.07625291 0.04237533
 0.0332511  0.03153811 0.04763479 0.04792854 0.0477882  0.04779069
 0.04792338 0.04767472 0.0479967 ]
[51814.34  56709.676 68014.34  67438.945 55657.137 61494.316 54095.93
 46139.074 51172.793 59820.098 52294.934 52129.227 63110.887 47794.805
 54968.883 54932.918 54999.32  54937.88  54970.95  54898.76  54779.473] [0.03838933 0.04801741 0.

In [181]:
X_data = states.reshape(len(states), -1)

In [289]:
random_indices = np.random.permutation(len(labels))
train_indices = random_indices[: int(len(random_indices)*0.8)]
test_indices = random_indices[int(len(random_indices)*0.8):]

X_train = X_data[train_indices]
X_test = X_data[test_indices]
labels_train = labels[train_indices]
labels_test = labels[test_indices]

In [294]:
import tensorflow as tf
model = tf.keras.models.Sequential([
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(100),
    tf.keras.layers.Dense(50),
    tf.keras.layers.Dense(labels.shape[1])
])

In [295]:
model.compile(optimizer="sgd", loss='mse')
model.fit(X_train, labels_train, batch_size=128, epochs=1000)

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000
Epoch 22/1000
Epoch 23/1000
Epoch 24/1000
Epoch 25/1000
Epoch 26/1000
Epoch 27/1000
Epoch 28/1000
Epoch 29/1000
Epoch 30/1000
Epoch 31/1000
Epoch 32/1000
Epoch 33/1000
Epoch 34/1000
Epoch 35/1000
Epoch 36/1000
Epoch 37/1000
Epoch 38/1000
Epoch 39/1000
Epoch 40/1000
Epoch 41/1000
Epoch 42/1000
Epoch 43/1000
Epoch 44/1000
Epoch 45/1000
Epoch 46/1000
Epoch 47/1000
Epoch 48/1000
Epoch 49/1000
Epoch 50/1000
Epoch 51/1000
Epoch 52/1000
Epoch 53/1000
Epoch 54/1000
Epoch 55/1000
Epoch 56/1000
Epoch 57/1000
Epoch 58/1000
Epoch 59/1000
Epoch 60/1000
Epoch 61/1000
Epoch 62/1000
Epoch 63/1000
Epoch 64/1000
Epoch 65/1000
Epoch 66/1000
Epoch 67/1000
Epoch 68/1000
Epoch 69/1000
Epoch 70/1000
Epoch 71/1000
Epoch 72/1000
E

Epoch 102/1000
Epoch 103/1000
Epoch 104/1000
Epoch 105/1000
Epoch 106/1000
Epoch 107/1000
Epoch 108/1000
Epoch 109/1000
Epoch 110/1000
Epoch 111/1000
Epoch 112/1000
Epoch 113/1000
Epoch 114/1000
Epoch 115/1000
Epoch 116/1000
Epoch 117/1000
Epoch 118/1000
Epoch 119/1000
Epoch 120/1000
Epoch 121/1000
Epoch 122/1000
Epoch 123/1000
Epoch 124/1000
Epoch 125/1000
Epoch 126/1000
Epoch 127/1000
Epoch 128/1000
Epoch 129/1000
Epoch 130/1000
Epoch 131/1000
Epoch 132/1000
Epoch 133/1000
Epoch 134/1000
Epoch 135/1000
Epoch 136/1000
Epoch 137/1000
Epoch 138/1000
Epoch 139/1000
Epoch 140/1000
Epoch 141/1000
Epoch 142/1000
Epoch 143/1000
Epoch 144/1000
Epoch 145/1000
Epoch 146/1000
Epoch 147/1000
Epoch 148/1000
Epoch 149/1000
Epoch 150/1000
Epoch 151/1000
Epoch 152/1000
Epoch 153/1000
Epoch 154/1000
Epoch 155/1000
Epoch 156/1000
Epoch 157/1000
Epoch 158/1000
Epoch 159/1000
Epoch 160/1000
Epoch 161/1000
Epoch 162/1000
Epoch 163/1000
Epoch 164/1000
Epoch 165/1000
Epoch 166/1000
Epoch 167/1000
Epoch 168/

Epoch 298/1000
Epoch 299/1000
Epoch 300/1000
Epoch 301/1000
Epoch 302/1000
Epoch 303/1000
Epoch 304/1000
Epoch 305/1000
Epoch 306/1000
Epoch 307/1000
Epoch 308/1000
Epoch 309/1000
Epoch 310/1000
Epoch 311/1000
Epoch 312/1000
Epoch 313/1000
Epoch 314/1000
Epoch 315/1000
Epoch 316/1000
Epoch 317/1000
Epoch 318/1000
Epoch 319/1000
Epoch 320/1000
Epoch 321/1000
Epoch 322/1000
Epoch 323/1000
Epoch 324/1000
Epoch 325/1000
Epoch 326/1000
Epoch 327/1000
Epoch 328/1000
Epoch 329/1000
Epoch 330/1000
Epoch 331/1000
Epoch 332/1000
Epoch 333/1000
Epoch 334/1000
Epoch 335/1000
Epoch 336/1000
Epoch 337/1000
Epoch 338/1000
Epoch 339/1000
Epoch 340/1000
Epoch 341/1000
Epoch 342/1000
Epoch 343/1000
Epoch 344/1000
Epoch 345/1000
Epoch 346/1000
Epoch 347/1000
Epoch 348/1000
Epoch 349/1000
Epoch 350/1000
Epoch 351/1000
Epoch 352/1000
Epoch 353/1000
Epoch 354/1000
Epoch 355/1000
Epoch 356/1000
Epoch 357/1000
Epoch 358/1000
Epoch 359/1000
Epoch 360/1000
Epoch 361/1000
Epoch 362/1000
Epoch 363/1000
Epoch 364/

Epoch 494/1000
Epoch 495/1000
Epoch 496/1000
Epoch 497/1000
Epoch 498/1000
Epoch 499/1000
Epoch 500/1000
Epoch 501/1000
Epoch 502/1000
Epoch 503/1000
Epoch 504/1000
Epoch 505/1000
Epoch 506/1000
Epoch 507/1000
Epoch 508/1000
Epoch 509/1000
Epoch 510/1000
Epoch 511/1000
Epoch 512/1000
Epoch 513/1000
Epoch 514/1000
Epoch 515/1000
Epoch 516/1000
Epoch 517/1000
Epoch 518/1000
Epoch 519/1000
Epoch 520/1000
Epoch 521/1000
Epoch 522/1000
Epoch 523/1000
Epoch 524/1000
Epoch 525/1000
Epoch 526/1000
Epoch 527/1000
Epoch 528/1000
Epoch 529/1000
Epoch 530/1000
Epoch 531/1000
Epoch 532/1000
Epoch 533/1000
Epoch 534/1000
Epoch 535/1000
Epoch 536/1000
Epoch 537/1000
Epoch 538/1000
Epoch 539/1000
Epoch 540/1000
Epoch 541/1000
Epoch 542/1000
Epoch 543/1000
Epoch 544/1000
Epoch 545/1000
Epoch 546/1000
Epoch 547/1000
Epoch 548/1000
Epoch 549/1000
Epoch 550/1000
Epoch 551/1000
Epoch 552/1000
Epoch 553/1000
Epoch 554/1000
Epoch 555/1000
Epoch 556/1000
Epoch 557/1000
Epoch 558/1000
Epoch 559/1000
Epoch 560/

Epoch 690/1000
Epoch 691/1000
Epoch 692/1000
Epoch 693/1000
Epoch 694/1000
Epoch 695/1000
Epoch 696/1000
Epoch 697/1000
Epoch 698/1000
Epoch 699/1000
Epoch 700/1000
Epoch 701/1000
Epoch 702/1000
Epoch 703/1000
Epoch 704/1000
Epoch 705/1000
Epoch 706/1000
Epoch 707/1000
Epoch 708/1000
Epoch 709/1000
Epoch 710/1000
Epoch 711/1000
Epoch 712/1000
Epoch 713/1000
Epoch 714/1000
Epoch 715/1000
Epoch 716/1000
Epoch 717/1000
Epoch 718/1000
Epoch 719/1000
Epoch 720/1000
Epoch 721/1000
Epoch 722/1000
Epoch 723/1000
Epoch 724/1000
Epoch 725/1000
Epoch 726/1000
Epoch 727/1000
Epoch 728/1000
Epoch 729/1000
Epoch 730/1000
Epoch 731/1000
Epoch 732/1000
Epoch 733/1000
Epoch 734/1000
Epoch 735/1000
Epoch 736/1000
Epoch 737/1000
Epoch 738/1000
Epoch 739/1000
Epoch 740/1000
Epoch 741/1000
Epoch 742/1000
Epoch 743/1000
Epoch 744/1000
Epoch 745/1000
Epoch 746/1000
Epoch 747/1000
Epoch 748/1000
Epoch 749/1000
Epoch 750/1000
Epoch 751/1000
Epoch 752/1000
Epoch 753/1000
Epoch 754/1000
Epoch 755/1000
Epoch 756/

Epoch 881/1000
Epoch 882/1000
Epoch 883/1000
Epoch 884/1000
Epoch 885/1000
Epoch 886/1000
Epoch 887/1000
Epoch 888/1000
Epoch 889/1000
Epoch 890/1000
Epoch 891/1000
Epoch 892/1000
Epoch 893/1000
Epoch 894/1000
Epoch 895/1000
Epoch 896/1000
Epoch 897/1000
Epoch 898/1000
Epoch 899/1000
Epoch 900/1000
Epoch 901/1000
Epoch 902/1000
Epoch 903/1000
Epoch 904/1000
Epoch 905/1000
Epoch 906/1000
Epoch 907/1000
Epoch 908/1000
Epoch 909/1000
Epoch 910/1000
Epoch 911/1000
Epoch 912/1000
Epoch 913/1000
Epoch 914/1000
Epoch 915/1000
Epoch 916/1000
Epoch 917/1000
Epoch 918/1000
Epoch 919/1000
Epoch 920/1000
Epoch 921/1000
Epoch 922/1000
Epoch 923/1000
Epoch 924/1000
Epoch 925/1000
Epoch 926/1000
Epoch 927/1000
Epoch 928/1000
Epoch 929/1000
Epoch 930/1000
Epoch 931/1000
Epoch 932/1000
Epoch 933/1000
Epoch 934/1000
Epoch 935/1000
Epoch 936/1000
Epoch 937/1000
Epoch 938/1000
Epoch 939/1000
Epoch 940/1000
Epoch 941/1000
Epoch 942/1000
Epoch 943/1000
Epoch 944/1000
Epoch 945/1000
Epoch 946/1000
Epoch 947/

<keras.callbacks.History at 0x1f22d2132e0>

In [269]:
results = model.evaluate(X_test, labels_test)



In [296]:
labels_test[100]

array([0.04405738, 0.02994013, 0.04283784, 0.03427943, 0.05198713,
       0.04354418, 0.04977272, 0.04061732, 0.04847561, 0.04186449,
       0.06688803, 0.03810682, 0.03659083, 0.08138571, 0.05049782,
       0.04988918, 0.05008071, 0.04987748, 0.05000827, 0.04965864,
       0.04964035], dtype=float32)

In [285]:




output = model.predict(np.expand_dims(X_train[1], axis=0))



In [287]:
np.sum((labels_train[1] - output)**2)

0.10840499

In [288]:
np.sqrt(0.10840499)

0.3292491305987003

<a id='6'></a>
# Part 7: Backtest Our Strategy
Backtesting plays a key role in evaluating the performance of a trading strategy. Automated backtesting tool is preferred because it reduces the human error. We usually use the Quantopian pyfolio package to backtest our trading strategies. It is easy to use and consists of various individual plots that provide a comprehensive image of the performance of a trading strategy.

<a id='6.1'></a>
## 7.1 BackTestStats
pass in df_account_value, this information is stored in env class


In [89]:
from pyfolio import timeseries
df_daily_return = risk_acc_memory[0]
DRL_strat = convert_daily_return_to_pyfolio_ts(df_daily_return)
perf_func = timeseries.perf_stats 
perf_stats_all = perf_func( returns=DRL_strat, 
                              factor_returns=DRL_strat, 
                                positions=None, transactions=None, turnover_denom="AGB")

In [90]:
print("==============DRL Strategy Stats===========")
perf_stats_all



Annual return          0.282896
Cumulative returns     0.395354
Annual volatility      0.130240
Sharpe ratio           1.978674
Calmar ratio           3.487496
Stability              0.922508
Max drawdown          -0.081117
Omega ratio            1.392626
Sortino ratio          3.002248
Skew                  -0.223372
Kurtosis               1.280806
Tail ratio             1.108852
Daily value at risk   -0.015386
Alpha                  0.000000
Beta                   1.000000
dtype: float64

In [65]:
#baseline stats
print("==============Get Baseline Stats===========")
baseline_df = get_baseline(
        ticker="^DJI", 
        start = df_daily_return.loc[0,'date'],
        end = df_daily_return.loc[len(df_daily_return)-1,'date'])

stats = backtest_stats(baseline_df, value_col_name = 'close')

[*********************100%***********************]  1 of 1 completed
Shape of DataFrame:  (336, 8)
Annual return          0.279047
Cumulative returns     0.388402
Annual volatility      0.139129
Sharpe ratio           1.844560
Calmar ratio           3.124551
Stability              0.918675
Max drawdown          -0.089308
Omega ratio            1.358960
Sortino ratio          2.734872
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.052781
Daily value at risk   -0.016510
dtype: float64


<a id='6.2'></a>
## 7.2 BackTestPlot

In [82]:
import pyfolio
%matplotlib inline

baseline_df = get_baseline(
        ticker='^DJI', start=df_daily_return.loc[0,'date'], end='2021-11-01'
    )

baseline_returns = get_daily_return(baseline_df, value_col_name="close")

[*********************100%***********************]  1 of 1 completed
Shape of DataFrame:  (337, 8)


In [87]:
start = 100000
for daily_ret in baseline_returns.values[1:]:
    start *= (1+daily_ret)

In [88]:
start

139186.3196852229

In [None]:
ensemble_returns = get_daily_return(baseline_df, value_col_name="close")

In [None]:
with pyfolio.plotting.plotting_context(font_scale=1.1):
        pyfolio.create_full_tear_sheet(returns = DRL_strat,
                                       benchmark_rets=baseline_returns, set_context=False)

## Min-Variance Portfolio Allocation

In [None]:
%pip install PyPortfolioOpt

In [None]:
from pypfopt.efficient_frontier import EfficientFrontier
from pypfopt import risk_models

In [None]:
unique_tic = trade.tic.unique()
unique_trade_date = trade.date.unique()

In [None]:
df.head()

Unnamed: 0,date,open,high,low,close,adjcp,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2008-12-30,3.122143,3.144643,3.025714,3.081786,2.638855,967601600,AAPL,1,-0.095139,3.647287,2.922999,42.673739,-80.272843,16.129793,3.246952,3.3835,"[[0.001349448862375765, 0.00042834220439268546...",tic AAPL AMGN AXP ...
1,2008-12-30,57.0,57.66,56.82,57.59,44.154682,4300800,AMGN,1,0.20663,59.373659,55.651341,50.840501,37.623751,17.245628,56.616,55.998167,"[[0.001349448862375765, 0.00042834220439268546...",tic AAPL AMGN AXP ...
2,2008-12-30,17.82,18.129999,17.700001,18.0,14.507502,11777300,AXP,1,-1.263168,23.794809,16.256191,41.524298,-101.26398,33.966523,20.057333,22.604,"[[0.001349448862375765, 0.00042834220439268546...",tic AAPL AMGN AXP ...
3,2008-12-30,40.080002,41.34,39.810001,41.25,30.940779,4549700,BA,1,-0.597202,42.590111,38.59389,45.299685,38.696627,7.6935,40.382334,43.448167,"[[0.001349448862375765, 0.00042834220439268546...",tic AAPL AMGN AXP ...
4,2008-12-30,42.57,43.75,42.009998,43.66,30.079866,5060400,CAT,1,0.85086,45.671279,37.851721,49.916105,73.697581,19.456481,39.967,39.993833,"[[0.001349448862375765, 0.00042834220439268546...",tic AAPL AMGN AXP ...


In [None]:
#calculate_portfolio_minimum_variance
portfolio = pd.DataFrame(index = range(1), columns = unique_trade_date)
initial_capital = 1000000
portfolio.loc[0,unique_trade_date[0]] = initial_capital

for i in range(len( unique_trade_date)-1):
    df_temp = df[df.date==unique_trade_date[i]].reset_index(drop=True)
    df_temp_next = df[df.date==unique_trade_date[i+1]].reset_index(drop=True)
    #Sigma = risk_models.sample_cov(df_temp.return_list[0])
    #calculate covariance matrix
    Sigma = df_temp.return_list[0].cov()
    #portfolio allocation
    ef_min_var = EfficientFrontier(None, Sigma,weight_bounds=(0, 0.1))
    #minimum variance
    raw_weights_min_var = ef_min_var.min_volatility()
    #get weights
    cleaned_weights_min_var = ef_min_var.clean_weights()
    
    #current capital
    cap = portfolio.iloc[0, i]
    #current cash invested for each stock
    current_cash = [element * cap for element in list(cleaned_weights_min_var.values())]
    # current held shares
    current_shares = list(np.array(current_cash)
                                      / np.array(df_temp.close))
    # next time period price
    next_price = np.array(df_temp_next.close)
    ##next_price * current share to calculate next total account value 
    portfolio.iloc[0, i+1] = np.dot(current_shares, next_price)
    
portfolio=portfolio.T
portfolio.columns = ['account_value']

In [None]:
portfolio.head()

Unnamed: 0,account_value
2020-07-01,1000000.0
2020-07-02,1005234.883501
2020-07-06,1014933.780399
2020-07-07,1014238.666671
2020-07-08,1012674.038646


In [None]:
a2c_cumpod =(df_daily_return.daily_return+1).cumprod()-1

In [None]:
min_var_cumpod =(portfolio.account_value.pct_change()+1).cumprod()-1

In [None]:
dji_cumpod =(baseline_returns+1).cumprod()-1

## Plotly: DRL, Min-Variance, DJIA

In [None]:
%pip install plotly

In [None]:
from datetime import datetime as dt

import matplotlib.pyplot as plt
import plotly
import plotly.graph_objs as go

In [None]:
time_ind = pd.Series(df_daily_return.date)

In [None]:
trace0_portfolio = go.Scatter(x = time_ind, y = a2c_cumpod, mode = 'lines', name = 'A2C (Portfolio Allocation)')

trace1_portfolio = go.Scatter(x = time_ind, y = dji_cumpod, mode = 'lines', name = 'DJIA')
trace2_portfolio = go.Scatter(x = time_ind, y = min_var_cumpod, mode = 'lines', name = 'Min-Variance')
#trace3_portfolio = go.Scatter(x = time_ind, y = ddpg_cumpod, mode = 'lines', name = 'DDPG')
#trace4_portfolio = go.Scatter(x = time_ind, y = addpg_cumpod, mode = 'lines', name = 'Adaptive-DDPG')
#trace5_portfolio = go.Scatter(x = time_ind, y = min_cumpod, mode = 'lines', name = 'Min-Variance')

#trace4 = go.Scatter(x = time_ind, y = addpg_cumpod, mode = 'lines', name = 'Adaptive-DDPG')

#trace2 = go.Scatter(x = time_ind, y = portfolio_cost_minv, mode = 'lines', name = 'Min-Variance')
#trace3 = go.Scatter(x = time_ind, y = spx_value, mode = 'lines', name = 'SPX')

In [None]:
fig = go.Figure()
fig.add_trace(trace0_portfolio)

fig.add_trace(trace1_portfolio)

fig.add_trace(trace2_portfolio)



fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=15,
            color="black"
        ),
        bgcolor="White",
        bordercolor="white",
        borderwidth=2
        
    ),
)
#fig.update_layout(legend_orientation="h")
fig.update_layout(title={
        #'text': "Cumulative Return using FinRL",
        'y':0.85,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
#with Transaction cost
#fig.update_layout(title =  'Quarterly Trade Date')
fig.update_layout(
#    margin=dict(l=20, r=20, t=20, b=20),

    paper_bgcolor='rgba(1,1,0,0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
    #xaxis_title="Date",
    yaxis_title="Cumulative Return",
xaxis={'type': 'date', 
       'tick0': time_ind[0], 
        'tickmode': 'linear', 
       'dtick': 86400000.0 *80}

)
fig.update_xaxes(showline=True,linecolor='black',showgrid=True, gridwidth=1, gridcolor='LightSteelBlue',mirror=True)
fig.update_yaxes(showline=True,linecolor='black',showgrid=True, gridwidth=1, gridcolor='LightSteelBlue',mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='LightSteelBlue')

fig.show()