<a target="_blank" href="https://colab.research.google.com/github/AI4Finance-Foundation/FinRL-Tutorials/blob/master/2-Advance/FinRL_PortfolioAllocation_Explainable_DRL.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Explainable Deep Reinforcement Learning for Portfolio Managemnet: an Emprical Approach.

Tutorials to use FinRL Library to perform explainable portfolio allocation in one [Jupyter Notebook](https://colab.research.google.com/drive/117v2qWo-qPC7OPd7paY1wYkOUywU_DWZ?usp=sharing)

* This tutorial is based on the [portfolio allocation tutorial](https://github.com/AI4Finance-Foundation/FinRL/blob/master/FinRL_portfolio_allocation_NeurIPS_2020.ipynb) in FinRL Library.
* This blog is based on our paper: Explainable Deep Reinforcement Learning for Portfolio Managemnet: an Emprical Approach
* Please report any issues to our Github: https://github.com/AI4Finance-LLC/FinRL-Library/issues
* **Pytorch Version** 



# Content

* [1. Problem Definition](#0)
* [2. Getting Started - Load Python packages](#1)
    * [2.1. Install Packages](#1.1)    
    * [2.2. Check Additional Packages](#1.2)
    * [2.3. Import Packages](#1.3)
    * [2.4. Create Folders](#1.4)
* [3. Download Data](#2)
* [4. Preprocess Data](#3)        
    * [4.1. Technical Indicators](#3.1)
    * [4.2. Perform Feature Engineering](#3.2)
* [5.Build Environment](#4)  
    * [5.1. Training & Trade Data Split](#4.1)
    * [5.2. User-defined Environment](#4.2)   
    * [5.3. Initialize Environment](#4.3)    
* [6.Implement DRL Algorithms](#5)  
* [7.Backtesting Performance](#6)  
    * [7.1. BackTestStats](#6.1)
    * [7.2. BackTestPlot](#6.2)   
    * [7.3. Baseline Stats](#6.3)   
    * [7.3. Compare to Stock Market Index](#6.4)             

<a id='0'></a>
# Part 1. Problem Definition

This problem is to empirically explain the trading performance of DRL agents for the portfolio management task.

The algorithm is trained using Deep Reinforcement Learning (DRL) algorithms and the components of the reinforcement learning environment are:


* Action: The action space describes the allowed portfolio weights that the agent interacts with the
environment. Each element in the portfolio weights is between [0, 1].

* Reward function: r(s, a, s′) is the incentive mechanism for an agent to learn a better action. The logorithmic rate of portfolio return when action a is taken at state s and arriving at new state s',  i.e., r(s, a, s′) = ln(v'/v), where v′ and v represent the portfolio
values at state s′ and s, respectively

* State: The state space describes  an agent’s perception of a market.  Just as a human trader needs to analyze various information before executing a trade, so
our trading agent observes many different features to better learn in an interactive environment.

* Environment: Dow 30 consituents

We use Yahoo Finance API as the data source.


<a id='1'></a>
# Part 2. Getting Started- Load Python Packages

<a id='1.1'></a>
## 2.1. Install all the packages through FinRL library


In [None]:
## install finrl library
!pip install plotly==4.4.1
!wget https://github.com/plotly/orca/releases/download/v1.2.1/orca-1.2.1-x86_64.AppImage -O /usr/local/bin/orca
!chmod +x /usr/local/bin/orca
!apt-get install xvfb libgtk2.0-0 libgconf-2-4
!pip install wrds
!pip install swig
!pip install -q condacolab
import condacolab

condacolab.install()
!apt-get update -y -qq && apt-get install -y -qq cmake libopenmpi-dev python3-dev zlib1g-dev libgl1-mesa-glx swig
!pip install git+https://github.com/AI4Finance-Foundation/FinRL.git
!pip install PyPortfolioOpt


--2021-10-05 00:32:34--  https://github.com/plotly/orca/releases/download/v1.2.1/orca-1.2.1-x86_64.AppImage
Resolving github.com (github.com)... 192.30.255.113
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-releases.githubusercontent.com/99037241/9dc3a580-286a-11e9-8a21-4312b7c8a512?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20211005%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20211005T003234Z&X-Amz-Expires=300&X-Amz-Signature=c82453cba1aaa67f0ae2b92fd556e2f665e010c7b929cf4984641a70b10add78&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=99037241&response-content-disposition=attachment%3B%20filename%3Dorca-1.2.1-x86_64.AppImage&response-content-type=application%2Foctet-stream [following]
--2021-10-05 00:32:34--  https://github-releases.githubusercontent.com/99037241/9dc3a580-286a-11e9-8a21-4312b7c8a512?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AK


<a id='1.2'></a>
## 2.2. Check if the additional packages needed are present, if not install them. 
* Yahoo Finance API
* pandas
* numpy
* matplotlib
* stockstats
* OpenAI gym
* stable-baselines
* tensorflow
* pyfolio

<a id='1.3'></a>
## 2.3. Import Packages

<a id='1.4'></a>
## 2.4. Create Folders

In [1]:
import os
from finrl import config
from finrl import config_tickers

if not os.path.exists("./" + config.DATA_SAVE_DIR):
    os.makedirs("./" + config.DATA_SAVE_DIR)
if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
    os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
    os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
if not os.path.exists("./" + config.RESULTS_DIR):
    os.makedirs("./" + config.RESULTS_DIR)

<a id='2'></a>
# Part 3. Download Data
Yahoo Finance is a website that provides stock data, financial news, financial reports, etc. All the data provided by Yahoo Finance is free.
* FinRL uses a class **YahooDownloader** to fetch data from Yahoo Finance API
* Call Limit: Using the Public API (without authentication), you are limited to 2,000 requests per hour per IP (or up to a total of 48,000 requests a day).


In [2]:
from finrl.meta.preprocessor.yahoodownloader import YahooDownloader

df = YahooDownloader(start_date='2008-01-01',
                     end_date='2021-09-02',
                     ticker_list=config_tickers.DOW_30_TICKER).fetch_data()

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%*******

Shape of DataFrame:  (100385, 8)


# Part 4: Preprocess Data
Data preprocessing is a crucial step for training a high quality machine learning model. We need to check for missing data and do feature engineering in order to convert the data into a model-ready state.
* Add technical indicators. In practical trading, various information needs to be taken into account, for example the historical stock prices, current holding shares, technical indicators, etc. In this article, we demonstrate two trend-following technical indicators: MACD and RSI.
* Add turbulence index. Risk-aversion reflects whether an investor will choose to preserve the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the financial turbulence index that measures extreme asset price fluctuation.

In [3]:
from finrl.meta.preprocessor.preprocessors import FeatureEngineer

fe = FeatureEngineer(
    use_technical_indicator=True,
    use_turbulence=False,
    user_defined_feature=False)

df = fe.preprocess_data(df)

Successfully added technical indicators


## Add covariance matrix as states

In [4]:
# add covariance matrix as states
df = df.sort_values(['date', 'tic'], ignore_index=True)
df.index = df.date.factorize()[0]  # factorize表示将日期转换为数字，如2020-01-01转换为0

cov_list = []
return_list = []

# look back is one year
lookback = 252
for i in range(lookback, len(df.index.unique())):
    data_lookback = df.loc[i - lookback:i, :]
    price_lookback = data_lookback.pivot_table(index='date', columns='tic', values='close')
    return_lookback = price_lookback.pct_change().dropna()
    return_list.append(return_lookback)

    covs = return_lookback.cov().values
    cov_list.append(covs)

import pandas as pd

df_cov = pd.DataFrame({'date': df.date.unique()[lookback:], 'cov_list': cov_list, 'return_list': return_list})
df = df.merge(df_cov, on='date')
df = df.sort_values(['date', 'tic']).reset_index(drop=True)

<a id='4'></a>
# Part 5. Design Environment
Considering the stochastic and interactive nature of the automated stock trading tasks, a financial task is modeled as a **Markov Decision Process (MDP)** problem. The training process involves observing stock price change, taking an action and reward's calculation to have the agent adjusting its strategy accordingly. By interacting with the environment, the trading agent will derive a trading strategy with the maximized rewards as time proceeds.

Our trading environments, based on OpenAI Gym framework, simulate live stock markets with real market data according to the principle of time-driven simulation.

The action space describes the allowed portfolio weights that the agent interacts with the environment. Each element in the portfolio weights vector is non-negative and no more than 100%. Also, the sum of elements in each portfolio weight should equal to 100%.

## Training data split: 2009-01-01 to 2020-06-30

In [5]:
from finrl.meta.preprocessor.preprocessors import data_split

train = data_split(df, '2009-01-01', '2020-06-30')


## Environment for Portfolio Allocation


In [15]:

from gym.utils import seeding
import gym
from gym import spaces
import matplotlib

matplotlib.use('Agg')
from stable_baselines3.common.vec_env import DummyVecEnv


class StockPortfolioEnv(gym.Env):
    """
    A portfolio allocation environment for OpenAI gym.

    This environment simulates a stock portfolio trading process, allowing 
    an agent to interact with the market by buying or selling stocks. The goal 
    is to maximize the portfolio's value over time while considering transaction 
    costs and other market factors.

    Attributes
    ----------
    df : pd.DataFrame
        Input data containing stock prices, technical indicators, etc.
    stock_dim : int
        Number of unique stocks in the portfolio.
    hmax : int
        Maximum number of shares to trade per transaction.
    initial_amount : int
        Initial capital to start trading with.
    transaction_cost_pct : float
        Transaction cost percentage per trade.
    reward_scaling : float
        Scaling factor for rewards to facilitate training.
    state_space : int
        Dimension of the state space (input features).
    action_space : int
        Dimension of the action space (equal to stock dimension).
    tech_indicator_list : list
        List of technical indicators used in the model.
    turbulence_threshold : int, optional
        Threshold for market turbulence to control risk aversion.
    day : int
        Counter to keep track of the current day in the simulation.

    Methods
    -------
    _sell_stock()
        Executes sell actions based on the action signals.
    _buy_stock()
        Executes buy actions based on the action signals.
    step(actions)
        Executes a trading step: applies the given actions, computes the reward, 
        and returns the new state.
    reset()
        Resets the environment to the initial state.
    render()
        Renders the environment state (not implemented here).
    save_asset_memory()
        Saves and returns the portfolio's value history.
    save_action_memory()
        Saves and returns the action history.
    softmax_normalization(actions)
        Normalizes the actions using the softmax function.
    """

    metadata = {'render.modes': ['human']}

    def __init__(self,
                 df,
                 stock_dim,
                 hmax,
                 initial_amount,
                 transaction_cost_pct,
                 reward_scaling,
                 state_space,
                 action_space,
                 tech_indicator_list,
                 turbulence_threshold=None,
                 lookback=252,
                 day=0):
        """
        Initializes the StockPortfolioEnv with given parameters.

        Args:
            df (pd.DataFrame): Input data containing stock prices and other features.
            stock_dim (int): Number of unique stocks.
            hmax (int): Maximum number of shares to trade per transaction.
            initial_amount (int): Initial capital for the portfolio.
            transaction_cost_pct (float): Percentage cost per transaction.
            reward_scaling (float): Scaling factor for the reward.
            state_space (int): Dimensionality of the state space.
            action_space (int): Dimensionality of the action space.
            tech_indicator_list (list): List of technical indicators.
            turbulence_threshold (int, optional): Threshold for turbulence detection.
            lookback (int): Number of days to look back for features.
            day (int): Starting day index.
        """
        self.day = day
        self.lookback = lookback
        self.df = df
        self.stock_dim = stock_dim
        self.hmax = hmax
        self.initial_amount = initial_amount
        self.transaction_cost_pct = transaction_cost_pct
        self.reward_scaling = reward_scaling
        self.state_space = state_space
        self.action_space = action_space
        self.tech_indicator_list = tech_indicator_list

        # Step 1: Initialize action space and observation space
        # Action space is normalized and shaped according to the stock dimension
        self.action_space = spaces.Box(low=0, high=1, shape=(self.action_space,))
        self.observation_space = spaces.Box(low=-np.inf, high=np.inf,
                                            shape=(self.state_space + len(self.tech_indicator_list), self.state_space))

        # Step 2: Load initial data from the dataframe
        self.data = self.df.loc[self.day, :]
        self.covs = self.data['cov_list'].values[0]
        self.state = np.append(np.array(self.covs),
                               [self.data[tech].values.tolist() for tech in self.tech_indicator_list], axis=0)
        self.terminal = False
        self.turbulence_threshold = turbulence_threshold

        # Step 3: Initialize portfolio values and memory
        self.portfolio_value = self.initial_amount
        self.asset_memory = [self.initial_amount]  # To store portfolio value at each step
        self.portfolio_return_memory = [0]  # To store portfolio returns at each step
        self.actions_memory = [[1 / self.stock_dim] * self.stock_dim]  # To store actions taken
        self.date_memory = [self.data.date.unique()[0]]  # To track the date at each step

    def step(self, actions):
        """
        Executes one time step within the environment, applying the given actions.

        Args:
            actions (array-like): Actions (buy/sell signals) generated by the agent.

        Returns:
            tuple: A tuple containing the new state, reward, terminal status, and additional info.
        """
        # Check if the simulation is at the terminal step
        self.terminal = self.day >= len(self.df.index.unique()) - 1

        if self.terminal:
            # Step 1: Calculate and plot cumulative returns
            df = pd.DataFrame(self.portfolio_return_memory)
            df.columns = ['daily_return']
            plt.plot(df.daily_return.cumsum(), 'r')
            plt.savefig('results/cumulative_reward.png')
            plt.close()

            # Step 2: Plot individual daily returns
            plt.plot(self.portfolio_return_memory, 'r')
            plt.savefig('results/rewards.png')
            plt.close()

            # Step 3: Print summary statistics
            print("=================================")
            print("begin_total_asset:{}".format(self.asset_memory[0]))
            print("end_total_asset:{}".format(self.portfolio_value))

            df_daily_return = pd.DataFrame(self.portfolio_return_memory)
            df_daily_return.columns = ['daily_return']
            if df_daily_return['daily_return'].std() != 0:
                sharpe = (252 ** 0.5) * df_daily_return['daily_return'].mean() / \
                         df_daily_return['daily_return'].std()  # Calculate Sharpe ratio
                print("Sharpe: ", sharpe)
            print("=================================")

            # Return the final state, reward, and terminal status
            return self.state, self.reward, self.terminal, {}

        else:
            # Step 4: Normalize the actions using softmax
            weights = self.softmax_normalization(actions)
            self.actions_memory.append(weights)
            last_day_memory = self.data

            # Step 5: Load the next state from the dataframe
            self.day += 1
            self.data = self.df.loc[self.day, :]
            self.covs = self.data['cov_list'].values[0]
            self.state = np.append(np.array(self.covs),
                                   [self.data[tech].values.tolist() for tech in self.tech_indicator_list], axis=0)

            # Step 6: Calculate portfolio return
            portfolio_return = sum(((self.data.close.values / last_day_memory.close.values) - 1) * weights)
            log_portfolio_return = np.log(sum((self.data.close.values / last_day_memory.close.values) * weights))

            # Step 7: Update portfolio value
            new_portfolio_value = self.portfolio_value * (1 + portfolio_return)
            self.portfolio_value = new_portfolio_value

            # Step 8: Save the results into memory
            self.portfolio_return_memory.append(portfolio_return)
            self.date_memory.append(self.data.date.unique()[0])
            self.asset_memory.append(new_portfolio_value)

            # The reward is the updated portfolio value
            self.reward = new_portfolio_value

        # Return the updated state, reward, and terminal status
        return self.state, self.reward, self.terminal, {}

    def reset(self):
        """
        Resets the environment to its initial state.

        Returns:
            array: The initial state after reset.
        """
        # Step 1: Reset memory for portfolio value, actions, and dates
        self.asset_memory = [self.initial_amount]
        self.day = 0
        self.data = self.df.loc[self.day, :]

        # Step 2: Load initial state from the dataframe
        self.covs = self.data['cov_list'].values[0]
        self.state = np.append(np.array(self.covs),
                               [self.data[tech].values.tolist() for tech in self.tech_indicator_list], axis=0)
        self.portfolio_value = self.initial_amount
        self.terminal = False
        self.portfolio_return_memory = [0]
        self.actions_memory = [[1 / self.stock_dim] * self.stock_dim]
        self.date_memory = [self.data.date.unique()[0]]

        # Return the initial state
        return self.state

    def render(self, mode='human'):
        """
        Renders the environment state (placeholder, not implemented).

        Args:
            mode (str): The mode in which to render.

        Returns:
            array: The current state.
        """
        return self.state

    def softmax_normalization(self, actions):
        """
        Normalizes the action values using the softmax function.

        Args:
            actions (array-like): Raw action values.

        Returns:
            array: Softmax-normalized action values.
        """
        numerator = np.exp(actions)
        denominator = np.sum(np.exp(actions))
        softmax_output = numerator / denominator
        return softmax_output

    def save_asset_memory(self):
        """
        Saves the portfolio's value history.

        Returns:
            pd.DataFrame: Dataframe containing dates and corresponding portfolio returns.
        """
        date_list = self.date_memory
        portfolio_return = self.portfolio_return_memory
        df_account_value = pd.DataFrame({'date': date_list, 'daily_return': portfolio_return})
        return df_account_value

    def save_action_memory(self):
        """
        Saves the action history.

        Returns:
            pd.DataFrame: Dataframe containing dates and corresponding actions taken.
        """
        date_list = self.date_memory
        df_date = pd.DataFrame(date_list)
        df_date.columns = ['date']

        action_list = self.actions_memory
        df_actions = pd.DataFrame(action_list)
        df_actions.columns = self.data.tic.values
        df_actions.index = df_date.date
        return df_actions

    def _seed(self, seed=None):
        """
        Sets the random seed for reproducibility.

        Args:
            seed (int, optional): Seed value.

        Returns:
            list: List containing the seed.
        """
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def get_sb_env(self):
        """
        Returns the stable-baselines environment and observation.

        Returns:
            tuple: Stable-baselines environment and initial observation.
        """
        e = DummyVecEnv([lambda: self])
        obs = e.reset()
        return e, obs


In [16]:
stock_dimension = len(train.tic.unique())
state_space = stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")
tech_indicator_list = ['macd', 'rsi_30', 'cci_30', 'dx_30']
feature_dimension = len(tech_indicator_list)
print(f"Feature Dimension: {feature_dimension}")

Stock Dimension: 28, State Space: 28
Feature Dimension: 4


In [17]:
env_kwargs = {
    "hmax": 100,
    "initial_amount": 1000000,
    "transaction_cost_pct": 0,
    "state_space": state_space,
    "stock_dim": stock_dimension,
    "tech_indicator_list": tech_indicator_list,
    "action_space": stock_dimension,
    "reward_scaling": 1e-1

}

e_train_gym = StockPortfolioEnv(df=train, **env_kwargs)

In [18]:
env_train, _ = e_train_gym.get_sb_env()
print(type(env_train))

<class 'stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv'>




<a id='5'></a>
# Part 6: Implement DRL Algorithms
* The implementation of the DRL algorithms are based on **OpenAI Baselines** and **Stable Baselines**. Stable Baselines is a fork of OpenAI Baselines, with a major structural refactoring, and code cleanups.
* We use two DRL algorithms in FinRL library PPO andf A2C

### Model 1: **A2C**


In [19]:
from finrl.agents.stablebaselines3.models import DRLAgent

agent = DRLAgent(env=env_train)

A2C_PARAMS = {"n_steps": 10, "ent_coef": 0.005, "learning_rate": 0.0004}
model_a2c = agent.get_model(model_name="a2c", model_kwargs=A2C_PARAMS)

{'n_steps': 10, 'ent_coef': 0.005, 'learning_rate': 0.0004}
Using cuda device


In [20]:
trained_a2c = agent.train_model(model=model_a2c, tb_log_name='a2c',
                                total_timesteps=40000)

-------------------------------------
| time/                 |           |
|    fps                | 123       |
|    iterations         | 100       |
|    time_elapsed       | 8         |
|    total_timesteps    | 1000      |
| train/                |           |
|    entropy_loss       | -39.7     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0004    |
|    n_updates          | 99        |
|    policy_loss        | 4.26e+08  |
|    reward             | 2077328.6 |
|    std                | 0.998     |
|    value_loss         | 1.5e+14   |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 130       |
|    iterations         | 200       |
|    time_elapsed       | 15        |
|    total_timesteps    | 2000      |
| train/                |           |
|    entropy_loss       | -39.7     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0004    |
|    n_updat

### Model 2: **PPO**


In [21]:
agent = DRLAgent(env=env_train)
PPO_PARAMS = {
    "n_steps": 2048,
    "ent_coef": 0.005,
    "learning_rate": 0.001,
    "batch_size": 128,
}
model_ppo = agent.get_model("ppo", model_kwargs=PPO_PARAMS)

{'n_steps': 2048, 'ent_coef': 0.005, 'learning_rate': 0.001, 'batch_size': 128}
Using cuda device


In [22]:
trained_ppo = agent.train_model(model=model_ppo,
                                tb_log_name='ppo',
                                total_timesteps=40000)

----------------------------------
| time/              |           |
|    fps             | 191       |
|    iterations      | 1         |
|    time_elapsed    | 10        |
|    total_timesteps | 2048      |
| train/             |           |
|    reward          | 4034908.8 |
----------------------------------
begin_total_asset:1000000
end_total_asset:6024685.480706188
Sharpe:  0.9500765201687519
------------------------------------------
| time/                   |              |
|    fps                  | 176          |
|    iterations           | 2            |
|    time_elapsed         | 23           |
|    total_timesteps      | 4096         |
| train/                  |              |
|    approx_kl            | 9.720679e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -39.7        |
|    explained_variance   | 0            |
|    learning_rate        | 0.001        |
|    loss                 | 8.92e+14  

## Back-Testing
Assume that we have $1,000,000 initial capital at 2020-01-01. We use the PPO, A2C, SVM, Linear Regression, Decision Tree, Random Foreset models to trade Dow jones 30 constituent stocks.


In [23]:
trade = data_split(df, '2020-07-01', '2021-09-02')
e_trade_gym = StockPortfolioEnv(df=trade, **env_kwargs)


In [24]:
import torch
%matplotlib inline


In [25]:

import pandas as pd
from pypfopt import EfficientFrontier
from pypfopt import risk_models
from pypfopt import objective_functions

unique_tic = trade.tic.unique()
unique_trade_date = trade.date.unique()

In [26]:

%matplotlib inline
from finrl.plot import backtest_stats, get_daily_return, get_baseline, convert_daily_return_to_pyfolio_ts

baseline_df = get_baseline(
    ticker="^DJI",
    start='2020-07-01',
    end='2021-09-01')

baseline_df_stats = backtest_stats(baseline_df, value_col_name='close')
baseline_returns = get_daily_return(baseline_df, value_col_name="close")

dji_cumpod = (baseline_returns + 1).cumprod() - 1

[*********************100%%**********************]  1 of 1 completed

Shape of DataFrame:  (295, 8)
Annual return          0.311845
Cumulative returns     0.374034
Annual volatility      0.140762
Sharpe ratio           2.006165
Calmar ratio           3.491806
Stability              0.950106
Max drawdown          -0.089308
Omega ratio            1.397014
Sortino ratio          2.988706
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.094883
Daily value at risk   -0.016614
dtype: float64





In [27]:
from pyfolio import timeseries

df_daily_return_a2c, df_actions_a2c = DRLAgent.DRL_prediction(model=trained_a2c,
                                                              environment=e_trade_gym)
df_daily_return_ppo, df_actions_ppo = DRLAgent.DRL_prediction(model=trained_ppo,
                                                              environment=e_trade_gym)
time_ind = pd.Series(df_daily_return_a2c.date)
a2c_cumpod = (df_daily_return_a2c.daily_return + 1).cumprod() - 1
ppo_cumpod = (df_daily_return_ppo.daily_return + 1).cumprod() - 1
DRL_strat_a2c = convert_daily_return_to_pyfolio_ts(df_daily_return_a2c)
DRL_strat_ppo = convert_daily_return_to_pyfolio_ts(df_daily_return_ppo)

perf_func = timeseries.perf_stats
perf_stats_all_a2c = perf_func(returns=DRL_strat_a2c,
                               factor_returns=DRL_strat_a2c,
                               positions=None, transactions=None, turnover_denom="AGB")
perf_stats_all_ppo = perf_func(returns=DRL_strat_ppo,
                               factor_returns=DRL_strat_ppo,
                               positions=None, transactions=None, turnover_denom="AGB")



begin_total_asset:1000000
end_total_asset:1427715.184120404
Sharpe:  2.2696188537958384
hit end!




begin_total_asset:1000000
end_total_asset:1409480.4054735305
Sharpe:  2.180729253363325
hit end!


In [28]:
def extract_weights(drl_actions_list):
    a2c_weight_df = {'date': [], 'weights': []}
    for i in range(len(drl_actions_list)):
        date = drl_actions_list.index[i]
        tic_list = list(drl_actions_list.columns)
        weights_list = drl_actions_list.reset_index()[list(drl_actions_list.columns)].iloc[i].values
        weight_dict = {'tic': [], 'weight': []}
        for j in range(len(tic_list)):
            weight_dict['tic'] += [tic_list[j]]
            weight_dict['weight'] += [weights_list[j]]

        a2c_weight_df['date'] += [date]
        a2c_weight_df['weights'] += [pd.DataFrame(weight_dict)]

    a2c_weights = pd.DataFrame(a2c_weight_df)
    return a2c_weights


a2c_weights = extract_weights(df_actions_a2c)
ppo_weights = extract_weights(df_actions_ppo)

## Machine Learning Models

We trained the machine learning models with technical indicators: MACD, RSI, CCI, DX

In [29]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor

In [30]:


def prepare_data(trainData):
    # 提取并排序日期
    train_date = sorted(set(trainData.date.values))

    X = []  # 初始化存储每个时间段训练数据片段的列表

    for i in range(0, len(train_date) - 1):
        d = train_date[i]  # 当前日期
        d_next = train_date[i + 1]  # 下一个日期

        # 提取下一个日期的收益率数据
        y = train.loc[train['date'] == d_next].return_list.iloc[0].loc[d_next].reset_index()
        y.columns = ['tic', 'return']  # 重命名列为'tic'和'return'

        # 提取当前日期的技术指标数据
        x = train.loc[train['date'] == d][['tic', 'macd', 'rsi_30', 'cci_30', 'dx_30']]

        # 合并当前日期的技术指标与下一个日期的收益率
        train_piece = pd.merge(x, y, on='tic')

        # 添加当前日期信息到数据片段
        train_piece['date'] = [d] * len(train_piece)

        # 将当前片段添加到列表X中
        X += [train_piece]

    # 合并所有时间段的数据片段
    trainDataML = pd.concat(X)

    # 提取技术指标作为特征矩阵X
    X = trainDataML[tech_indicator_list].values

    # 提取收益率作为目标变量矩阵Y
    Y = trainDataML[['return']].values

    return X, Y  # 返回特征矩阵和目标变量矩阵


train_X, train_Y = prepare_data(train)
rf_model = RandomForestRegressor(max_depth=35, min_samples_split=10, random_state=0).fit(train_X, train_Y.reshape(-1))
dt_model = DecisionTreeRegressor(random_state=0, max_depth=35, min_samples_split=10).fit(train_X, train_Y.reshape(-1))
svm_model = SVR(epsilon=0.14).fit(train_X, train_Y.reshape(-1))
lr_model = LinearRegression().fit(train_X, train_Y)

In [31]:
def output_predict(model, reference_model=False):
    # 1. 初始化元系数字典，存储日期和对应的权重
    meta_coefficient = {"date": [], "weights": []}

    # 2. 创建投资组合表格，列为交易日期
    portfolio = pd.DataFrame(index=range(1), columns=unique_trade_date)
    initial_capital = 1000000  # 初始资本为1000000
    portfolio.loc[0, unique_trade_date[0]] = initial_capital  # 将初始资本分配给第一天

    # 3. 遍历所有的交易日期
    for i in range(len(unique_trade_date) - 1):

        # 4. 获取当前日期和下一日期的数据
        current_date = unique_trade_date[i]  # 当前日期
        next_date = unique_trade_date[i + 1]  # 下一日期
        df_current = df[df.date == current_date].reset_index(drop=True)  # 获取当前日期的数据
        tics = df_current['tic'].values  # 获取股票代码
        features = df_current[tech_indicator_list].values  # 获取当前日期的技术指标
        df_next = df[df.date == next_date].reset_index(drop=True)  # 获取下一日期的数据

        # 5. 用模型预测当前日期的收益率或使用实际收益率作为预测
        if not reference_model:
            predicted_y = model.predict(features)  # 用模型预测当前日期的收益率
            mu = predicted_y  # 预测的收益率
            Sigma = risk_models.sample_cov(df_current.return_list[0], returns_data=True)  # 计算收益率的协方差矩阵
        else:
            mu = df_next.return_list[0].loc[next_date].values  # 使用下一日期的实际收益率作为预测
            Sigma = risk_models.sample_cov(df_next.return_list[0], returns_data=True)  # 计算收益率的协方差矩阵

        # 6. 创建包含预测收益率的DataFrame
        predicted_y_df = pd.DataFrame({"tic": tics.reshape(-1, ), "predicted_y": mu.reshape(-1, )})

        # 7. 定义最小和最大权重，并创建EfficientFrontier对象
        min_weight, max_weight = 0, 1  # 定义最小和最大权重
        ef = EfficientFrontier(mu, Sigma)  # 创建EfficientFrontier对象，优化组合

        # 8. 最大化夏普比率，求解最优权重
        weights = ef.nonconvex_objective(
            objective_functions.sharpe_ratio,
            objective_args=(ef.expected_returns, ef.cov_matrix),
            weights_sum_to_one=True,
            constraints=[
                {"type": "ineq", "fun": lambda w: w - min_weight},  # 权重大于最小值
                {"type": "ineq", "fun": lambda w: max_weight - w},  # 权重小于最大值
            ],
        )

        # 9. 初始化权重数据框，并将当前日期添加到元系数字典中
        weight_df = {"tic": [], "weight": []}
        meta_coefficient["date"] += [current_date]  # 将当前日期添加到元系数字典中

        # 10. 遍历所有权重，填充权重数据框
        for item in weights:
            weight_df['tic'] += [item]  # 股票代码
            weight_df['weight'] += [weights[item]]  # 对应的权重

        # 11. 合并预测收益率和权重
        weight_df = pd.DataFrame(weight_df).merge(predicted_y_df, on=['tic'])  # 合并预测收益率和权重
        meta_coefficient["weights"] += [weight_df]  # 将权重数据添加到元系数字典中

        # 12. 计算当前日期的资金，并更新下一日期的投资组合价值
        cap = portfolio.iloc[0, i]  # 当前日期的资金
        current_cash = [element * cap for element in list(weights.values())]  # 当前每只股票的投资金额
        current_shares = list(np.array(current_cash) / np.array(df_current.close))  # 当前持有的股票数量
        next_price = np.array(df_next.close)  # 下一日期的股票价格
        portfolio.iloc[0, i + 1] = np.dot(current_shares, next_price)  # 计算下一个时间点的投资组合价值

    # 13. 转置投资组合表格，并重命名列名为账户价值
    portfolio = portfolio.T  # 转置投资组合表格
    portfolio.columns = ['account_value']  # 重命名列名为账户价值
    portfolio = portfolio.reset_index()  # 重置索引
    portfolio.columns = ['date', 'account_value']  # 重命名列名为日期和账户价值

    # 14. 计算回测统计数据
    stats = backtest_stats(portfolio, value_col_name='account_value')  # 计算回测统计数据

    # 15. 计算累计收益率
    portfolio_cumprod = (portfolio.account_value.pct_change() + 1).cumprod() - 1  # 计算累计收益率

    # 16. 返回投资组合，回测统计数据，累计收益率和元系数字典
    return portfolio, stats, portfolio_cumprod, pd.DataFrame(meta_coefficient)  # 返回投资组合，回测统计数据，累计收益率和元系数字典


# 17. 用不同模型预测
lr_portfolio, lr_stats, lr_cumprod, lr_weights = output_predict(lr_model)  # 用线性回归模型预测
dt_portfolio, dt_stats, dt_cumprod, dt_weights = output_predict(dt_model)  # 用决策树模型预测
svm_portfolio, svm_stats, svm_cumprod, svm_weights = output_predict(svm_model)  # 用支持向量机模型预测
rf_portfolio, rf_stats, rf_cumprod, rf_weights = output_predict(rf_model)  # 用随机森林模型预测
reference_portfolio, reference_stats, reference_cumprod, reference_weights = output_predict(None, True)  # 参考模型

  df["daily_return"] = df[value_col_name].pct_change(1)
  portfolio_cumprod = (portfolio.account_value.pct_change() + 1).cumprod() - 1  # 计算累计收益率


Annual return          0.175887
Cumulative returns     0.209627
Annual volatility      0.423552
Sharpe ratio           0.592017
Calmar ratio           0.759039
Stability              0.623836
Max drawdown          -0.231723
Omega ratio            1.104689
Sortino ratio          0.967516
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.352546
Daily value at risk   -0.052368
dtype: float64


  df["daily_return"] = df[value_col_name].pct_change(1)
  portfolio_cumprod = (portfolio.account_value.pct_change() + 1).cumprod() - 1  # 计算累计收益率


Annual return          0.223698
Cumulative returns     0.267600
Annual volatility      0.323395
Sharpe ratio           0.785440
Calmar ratio           0.839110
Stability              0.633387
Max drawdown          -0.266589
Omega ratio            1.152733
Sortino ratio          1.270854
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.041322
Daily value at risk   -0.039736
dtype: float64


  df["daily_return"] = df[value_col_name].pct_change(1)
  portfolio_cumprod = (portfolio.account_value.pct_change() + 1).cumprod() - 1  # 计算累计收益率


Annual return          0.306807
Cumulative returns     0.369312
Annual volatility      0.151135
Sharpe ratio           1.852661
Calmar ratio           5.072178
Stability              0.919234
Max drawdown          -0.060488
Omega ratio            1.364556
Sortino ratio          2.917685
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.230490
Daily value at risk   -0.017930
dtype: float64


  df["daily_return"] = df[value_col_name].pct_change(1)
  portfolio_cumprod = (portfolio.account_value.pct_change() + 1).cumprod() - 1  # 计算累计收益率


Annual return          0.131013
Cumulative returns     0.155588
Annual volatility      0.396498
Sharpe ratio           0.506293
Calmar ratio           0.525112
Stability              0.599731
Max drawdown          -0.249495
Omega ratio            1.092146
Sortino ratio          0.825852
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.216290
Daily value at risk   -0.049157
dtype: float64
Annual return            2793.779598
Cumulative returns      11169.967538
Annual volatility           0.423453
Sharpe ratio               19.296121
Calmar ratio           160680.746376
Stability                   0.987983
Max drawdown               -0.017387
Omega ratio               159.637083
Sortino ratio             330.673417
Skew                             NaN
Kurtosis                         NaN
Tail ratio                 11.004206
Daily value at risk        -0.020925
dtype: float64


  df["daily_return"] = df[value_col_name].pct_change(1)
  portfolio_cumprod = (portfolio.account_value.pct_change() + 1).cumprod() - 1  # 计算累计收益率


# Part 7: Explanation Method Implementation


### Integrated Gradient
>* Reference: [Integrated Gradients](https://www.tensorflow.org/tutorials/interpretability/integrated_gradients)

Implement the explanation method using integrated gradients and regression coefficients.
The formula for Integrated Gradients is as follows:

$IntegratedGradients_{i}(x) ::= (x_{i} - x'_{i})\times\int_{\alpha=0}^1\frac{\partial F(x'+\alpha \times (x - x'))}{\partial x_i}{d\alpha}$

where:

$_{i}$ = feature   
$x$ = input  
$x'$ = baseline   
$\alpha$ = interpolation constant to perturb features by


In practice, computing a definite integral is not always numerically possible and can be computationally costly, so you compute the following numerical approximation:

$IntegratedGrads^{approx}_{i}(x)::=(x_{i}-x'_{i})\times\sum_{k=1}^{m}\frac{\partial F(x' + \frac{k}{m}\times(x - x'))}{\partial x_{i}} \times \frac{1}{m}$

where:

$_{i}$ = feature (individual pixel)  
$x$ = input (image tensor)  
$x'$ = baseline (image tensor)  
$k$ = scaled feature perturbation constant  
$m$ = number of steps in the Riemann sum approximation of the integral  
$(x_{i}-x'_{i})$ = a term for the difference from the baseline. This is necessary to scale the integrated gradients and keep them in terms of the original image. The path from the baseline image to the input is in pixel space. Since with IG you are integrating in a straight line (linear transformation) this ends up being roughly equivalent to the integral term of the derivative of the interpolated image function with respect to $\alpha$ with enough steps. The integral sums each pixel's gradient times the change in the pixel along the path. It's simpler to implement this integration as uniform steps from one image to the other, substituting $x := (x' + \alpha(x-x'))$. So the change of variables gives $dx = (x-x')d\alpha$. The $(x-x')$ term is constant and is factored out of the integral.

In [32]:
def calculate_gradient(model, interpolated_input, actions, feature_idx, stock_idx, h=1e-1):
    """
    计算给定模型在特定输入特征上的梯度。
    
    Args:
        model (object): 强化学习模型，包含策略评估方法。
        interpolated_input (list or ndarray): 插值后的输入数据，表示环境的状态。
        actions (list or ndarray): 动作集合，对应当前输入的操作。
        feature_idx (int): 需要计算梯度的特征索引。
        stock_idx (int): 股票索引，对应具体的股票数据位置。
        h (float, optional): 用于梯度计算的微小增量，默认为1e-1。

    Returns:
        float: 输入特征对Q值的梯度估计。
    """

    # 1. 准备偏移后的输入
    forward_input = interpolated_input  # 保存当前的插值输入
    forward_input[feature_idx + stock_dimension][stock_idx] += h  # 对特定位置的特征值进行微小增加

    # 2. 计算偏移后的Q值和原始Q值
    # 调用模型的evaluate_actions方法来评估偏移后的Q值和原始Q值
    # stable_baselines3\common\policies.py evaluate_actions(self, obs: PyTorchObs, actions: th.Tensor) -> Tuple[th.Tensor, th.Tensor, Optional[th.Tensor]]:
    forward_Q = model.policy.evaluate_actions(
        torch.cuda.FloatTensor(forward_input).reshape(-1, stock_dimension * (stock_dimension + feature_dimension)),
        # 调整输入形状
        torch.cuda.FloatTensor(actions).reshape(-1, stock_dimension)  # 调整动作形状
    )
    interpolated_Q = model.policy.evaluate_actions(
        torch.cuda.FloatTensor(interpolated_input).reshape(-1, stock_dimension * (stock_dimension + feature_dimension)),
        # 调整输入形状
        torch.cuda.FloatTensor(actions).reshape(-1, stock_dimension)  # 调整动作形状
    )

    # 3. 提取Q值并计算梯度
    forward_Q = forward_Q[0].detach().cpu().numpy()[0]  # 将偏移后的Q值转换为numpy数组
    interpolated_Q = interpolated_Q[0].detach().cpu().numpy()[0]  # 将原始Q值转换为numpy数组
    return (forward_Q - interpolated_Q) / h  # 计算Q值差异与偏移量的比率，得到梯度


In [None]:
import copy

# 初始化存储结果的字典meta_Q
meta_Q = {"date": [], "feature": [], "Saliency Map": [], "algo": []}

# 1. 遍历算法A2C和PPO
for algo in {"A2C", "PPO"}:
    # 2. 根据算法选择不同的步长（精度步长）
    if algo == "A2C":
        prec_step = 1e-2
    else:
        prec_step = 1e-1

    # 3. 根据算法名动态调用预训练模型和动作数据
    model = eval("trained_" + algo.lower())  # 加载对应的预训练模型
    df_actions = eval("df_actions_" + algo.lower())  # 加载对应的动作数据

    # 4. 遍历所有交易日期（除了最后一个）
    for i in range(len(unique_trade_date) - 1):
        date = unique_trade_date[i]  # 当前日期
        covs = trade[trade['date'] == date].cov_list.iloc[0]  # 获取当天的协方差矩阵
        features = trade[trade['date'] == date][tech_indicator_list].values  # 获取当天的技术指标（特征矩阵）
        actions = df_actions.loc[date].values  # 获取当天的动作值

        # 5. 遍历每一个技术指标
        for feature_idx in range(len(tech_indicator_list)):

            int_grad_per_feature = 0  # 初始化累加的特征渐变值
            # 6. 遍历每只股票
            for stock_idx in range(features.shape[0]):  # N表示股票数量

                int_grad_per_stock = 0  # 初始化每只股票的渐变值
                avg_interpolated_grad = 0  # 初始化平均渐变值
                # 7. 插值计算梯度
                for alpha in range(1, 51):
                    scale = 1 / 50  # 步长比例
                    baseline_features = copy.deepcopy(features)  # 复制特征矩阵
                    baseline_noise = np.random.normal(0, 1, stock_dimension)  # 生成噪声（高斯分布）
                    baseline_features[:, feature_idx] = [0] * stock_dimension  # 将当前特征置零
                    # 计算插值后的特征矩阵
                    interpolated_features = baseline_features + scale * alpha * (features - baseline_features)  # N x K
                    interpolated_input = np.append(covs, interpolated_features.T, axis=0)  # 合并协方差矩阵与特征
                    # 计算插值后的梯度值
                    interpolated_gradient = \
                        calculate_gradient(model, interpolated_input, actions, feature_idx, stock_idx, h=prec_step)[0]

                    avg_interpolated_grad += interpolated_gradient * scale  # 累加梯度值
                int_grad_per_stock = (features[stock_idx][feature_idx] - 0) * avg_interpolated_grad  # 计算股票的总梯度值
                int_grad_per_feature += int_grad_per_stock  # 累加特征的梯度值

            # 8. 将计算结果存储到meta_Q字典中
            meta_Q['date'] += [date]
            meta_Q['algo'] += [algo]
            meta_Q['feature'] += [tech_indicator_list[feature_idx]]
            meta_Q['Saliency Map'] += [int_grad_per_feature]

# 9. 将结果转换为DataFrame
meta_Q = pd.DataFrame(meta_Q)


  torch.cuda.FloatTensor(forward_input).reshape(-1, stock_dimension * (stock_dimension + feature_dimension)),  # 调整输入形状


### Regression Coefficient
Implement the linear regression to measure the feature weights.

In [24]:
import statsmodels.api as sm  # 引入statsmodels库，用于进行统计建模（特别是线性回归）

# 初始化用于存储不同算法的回归系数的字典
meta_score_coef = {"date": [], "coef": [], "algo": []}

# 步骤 1: 遍历不同的算法，按算法名称选择对应的权重
for algo in ["LR", "RF", "Reference Model", "SVM", "DT", "A2C", "PPO"]:
    # 根据算法名称选择对应的权重数据
    if algo == "LR":
        weights = lr_weights  # 线性回归权重
    elif algo == "RF":
        weights = rf_weights  # 随机森林权重
    elif algo == "DT":
        weights = dt_weights  # 决策树权重
    elif algo == "SVM":
        weights = svm_weights  # 支持向量机权重
    elif algo == "A2C":
        weights = a2c_weights  # A2C算法权重
    elif algo == "PPO":
        weights = ppo_weights  # PPO算法权重
    else:
        weights = reference_weights  # 参考模型权重

    # 步骤 2: 遍历日期数据，进行数据处理和模型拟合
    for i in range(len(unique_trade_date) - 1):
        date = unique_trade_date[i]  # 当前日期
        next_date = unique_trade_date[i + 1]  # 下一日期

        # 筛选当前日期和下一日期的数据
        df_temp = df[df.date == date].reset_index(drop=True)  # 当前日期的数据
        df_temp_next = df[df.date == next_date].reset_index(drop=True)  # 下一日期的数据

        # 获取当前日期的权重信息
        weight_piece = weights[weights.date == date].iloc[0]['weights']

        # 获取下一日期的收益数据并重命名列
        piece_return = pd.DataFrame(df_temp_next.return_list.iloc[0].loc[next_date]).reset_index()
        piece_return.columns = ['tic', 'return']  # 股票代码和收益

        # 提取当前日期的特征数据
        X = df_temp[['macd', 'rsi_30', 'cci_30', 'dx_30', 'tic']]  # 选择相关特征和股票代码
        X_next = df_temp_next[['macd', 'rsi_30', 'cci_30', 'dx_30', 'tic']]  # 下一日期的特征数据

        # 合并权重、特征和收益数据
        piece = weight_piece.merge(X, on='tic').merge(piece_return, on='tic')
        piece['Y'] = piece['return'] * piece['weight']  # 计算加权收益

        # 构建回归模型的自变量和因变量
        X = piece[['macd', 'rsi_30', 'cci_30', 'dx_30']]  # 选择自变量
        X = sm.add_constant(X)  # 添加常数项用于回归
        Y = piece[['Y']]  # 因变量

        # 步骤 3: 使用OLS模型拟合数据并存储系数结果
        model = sm.OLS(Y, X)  # 创建线性回归模型
        results = model.fit()  # 拟合模型

        # 存储回归系数、日期和算法名称
        meta_score_coef["coef"] += [(X * results.params).sum(axis=0)]  # 将拟合参数应用到特征数据并求和
        meta_score_coef["date"] += [date]  # 存储日期
        meta_score_coef["algo"] += [algo]  # 存储算法名称

# 将字典转换为DataFrame格式
meta_score_coef = pd.DataFrame(meta_score_coef)


### Correlation Coefficient
Calculate the  sing-step and multi-step correlation coefficients

In [25]:
# 初始化用于存储不同算法在各日期的表现评分
performance_score = {"date": [], "algo": [], "score": []}

# 步骤 1: 遍历所有交易日期，计算各算法与参考模型之间的相关性评分
for i in range(0, len(unique_trade_date)):
    date_ = unique_trade_date[i]  # 当前日期

    # 如果当前日期没有元数据系数，则跳过该日期
    if len(meta_score_coef[(meta_score_coef['date'] == date_)]) == 0:
        continue  # 跳过无数据的日期

    # 获取当前日期线性回归算法的系数
    lr_coef = meta_score_coef[
        (meta_score_coef['date'] == date_) & (meta_score_coef['algo'] == 'LR')
        ]['coef'].values[0][['macd', 'rsi_30', 'cci_30', 'dx_30']].values

    # 获取当前日期随机森林算法的系数
    rf_coef = meta_score_coef[
        (meta_score_coef['date'] == date_) & (meta_score_coef['algo'] == 'RF')
        ]['coef'].values[0][['macd', 'rsi_30', 'cci_30', 'dx_30']].values

    # 获取当前日期参考模型的系数
    reference_coef = meta_score_coef[
        (meta_score_coef['date'] == date_) & (meta_score_coef['algo'] == 'Reference Model')
        ]['coef'].values[0][['macd', 'rsi_30', 'cci_30', 'dx_30']].values

    # 获取当前日期决策树算法的系数
    dt_coef = meta_score_coef[
        (meta_score_coef['date'] == date_) & (meta_score_coef['algo'] == 'DT')
        ]['coef'].values[0][['macd', 'rsi_30', 'cci_30', 'dx_30']].values

    # 获取当前日期支持向量机算法的系数
    svm_coef = meta_score_coef[
        (meta_score_coef['date'] == date_) & (meta_score_coef['algo'] == 'SVM')
        ]['coef'].values[0][['macd', 'rsi_30', 'cci_30', 'dx_30']].values

    # 获取A2C算法的显著性系数
    saliency_coef_a2c = meta_Q[
        (meta_Q['date'] == date_) & (meta_Q['algo'] == "A2C")
        ]['Saliency Map'].values

    # 获取PPO算法的显著性系数
    saliency_coef_ppo = meta_Q[
        (meta_Q['date'] == date_) & (meta_Q['algo'] == "PPO")
        ]['Saliency Map'].values

    # 步骤 2: 计算各算法系数与参考模型系数之间的相关性得分
    lr_score = np.corrcoef(lr_coef, reference_coef)[0][1]  # 计算线性回归的相关性得分
    rf_score = np.corrcoef(rf_coef, reference_coef)[0][1]  # 计算随机森林的相关性得分
    dt_score = np.corrcoef(dt_coef, reference_coef)[0][1]  # 计算决策树的相关性得分
    svm_score = np.corrcoef(svm_coef, reference_coef)[0][1]  # 计算支持向量机的相关性得分
    saliency_score_a2c = np.corrcoef(saliency_coef_a2c, reference_coef)[0][1]  # 计算A2C的显著性得分
    saliency_score_ppo = np.corrcoef(saliency_coef_ppo, reference_coef)[0][1]  # 计算PPO的显著性得分

    # 步骤 3: 将各算法的日期、算法名称和得分添加到performance_score字典
    for algo in ["LR", "A2C", "PPO", "RF", "DT", "SVM"]:
        performance_score["date"] += [date_]  # 存储当前日期
        performance_score["algo"] += [algo]  # 存储算法名称

        # 根据算法选择对应的得分
        if algo == "LR":
            score = lr_score  # 线性回归得分
        elif algo == "RF":
            score = rf_score  # 随机森林得分
        elif algo == "DT":
            score = dt_score  # 决策树得分
        elif algo == "A2C":
            score = saliency_score_a2c  # A2C显著性得分
        elif algo == "SVM":
            score = svm_score  # 支持向量机得分
        else:
            score = saliency_score_ppo  # PPO显著性得分

        # 添加算法得分到performance_score
        performance_score["score"] += [score]

# 将字典转换为DataFrame格式，方便后续分析
performance_score = pd.DataFrame(performance_score)


  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stdd

In [26]:
# 步骤 1: 初始化字典用于存储各算法的performance scores
multi_performance_score = {"date": [], "algo": [], "score": []}

# 步骤 2: 设置窗口大小window，用于计算rolling average
window = 20

# 步骤 3: 遍历unique_trade_date的日期，排除最后'window'天的数据
for i in range(len(unique_trade_date) - window):
    date_ = unique_trade_date[i]  # 当前处理的日期

    # 步骤 4: 检查当前日期是否有对应的meta_score_coef数据，如果没有，跳过此日期
    if len(meta_score_coef[(meta_score_coef['date'] == date_)]) == 0:
        continue  # 如果没有数据，则跳过

    # 步骤 5: 获取当前日期下不同算法的coefficients
    lr_coef = meta_score_coef[(meta_score_coef['date'] == date_) & 
                              (meta_score_coef['algo'] == 'LR')]['coef'].values[0][
        ['macd', 'rsi_30', 'cci_30', 'dx_30']].values  # 获取线性回归（LR）的系数

    rf_coef = meta_score_coef[(meta_score_coef['date'] == date_) & 
                              (meta_score_coef['algo'] == 'RF')]['coef'].values[0][
        ['macd', 'rsi_30', 'cci_30', 'dx_30']].values  # 获取随机森林（RF）的系数

    reference_coef = meta_score_coef[(meta_score_coef['date'] == date_) & 
                                     (meta_score_coef['algo'] == 'Reference Model')]['coef'].values[0][
        ['macd', 'rsi_30', 'cci_30', 'dx_30']].values  # 获取基准模型的系数

    # 步骤 6: 计算在窗口期内，基准模型系数的rolling average
    for w in range(1, window):
        date_f = unique_trade_date[i + w]  # 窗口期内的未来日期
        prx_coef = meta_score_coef[(meta_score_coef['date'] == date_f) & 
                                   (meta_score_coef['algo'] == 'Reference Model')]['coef'].values[0][
            ['macd', 'rsi_30', 'cci_30', 'dx_30']].values  # 获取未来日期的基准模型系数
        reference_coef += prx_coef  # 累加这些系数

    reference_coef = reference_coef / window  # 计算平均系数

    # 步骤 7: 获取当前日期下其他算法的coefficients
    dt_coef = meta_score_coef[(meta_score_coef['date'] == date_) & 
                              (meta_score_coef['algo'] == 'DT')]['coef'].values[0][
        ['macd', 'rsi_30', 'cci_30', 'dx_30']].values  # 获取决策树（DT）的系数

    svm_coef = meta_score_coef[(meta_score_coef['date'] == date_) & 
                               (meta_score_coef['algo'] == 'SVM')]['coef'].values[0][
        ['macd', 'rsi_30', 'cci_30', 'dx_30']].values  # 获取支持向量机（SVM）的系数

    saliency_coef_a2c = meta_Q[(meta_Q['date'] == date_) & 
                               (meta_Q['algo'] == "A2C")]['Saliency Map'].values  # 获取A2C的显著性图系数

    saliency_coef_ppo = meta_Q[(meta_Q['date'] == date_) & 
                               (meta_Q['algo'] == "PPO")]['Saliency Map'].values  # 获取PPO的显著性图系数

    # 步骤 8: 计算每个算法的coefficients与基准模型之间的相关系数
    lr_score = np.corrcoef(lr_coef, reference_coef)[0][1]  # 计算线性回归的相关系数
    rf_score = np.corrcoef(rf_coef, reference_coef)[0][1]  # 计算随机森林的相关系数
    dt_score = np.corrcoef(dt_coef, reference_coef)[0][1]  # 计算决策树的相关系数
    svm_score = np.corrcoef(svm_coef, reference_coef)[0][1]  # 计算支持向量机的相关系数
    saliency_score_a2c = np.corrcoef(saliency_coef_a2c, reference_coef)[0][1]  # 计算A2C的相关系数
    saliency_score_ppo = np.corrcoef(saliency_coef_ppo, reference_coef)[0][1]  # 计算PPO的相关系数

    # 步骤 9: 将计算出的scores存储到performance字典中
    for algo in ["LR", "A2C", "RF", "PPO", "DT", "SVM"]:
        multi_performance_score["date"] += [date_]  # 记录当前日期
        multi_performance_score["algo"] += [algo]  # 记录算法名称

        # 根据不同算法分配对应的相关系数score
        if algo == "LR":
            score = lr_score
        elif algo == "RF":
            score = rf_score
        elif algo == "DT":
            score = dt_score
        elif algo == "A2C":
            score = saliency_score_a2c
        elif algo == "SVM":
            score = svm_score
        else:
            score = saliency_score_ppo

        multi_performance_score["score"] += [score]  # 记录计算的分数

# 步骤 10: 将performance score字典转换为DataFrame
multi_performance_score = pd.DataFrame(multi_performance_score)


  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stddev[None, :]
  c /= stddev[:, None]
  c /= stdd

### Data Visualization

In [14]:
import matplotlib.pyplot as plt
import plotly.graph_objs as go

In [27]:

trace1_portfolio = go.Scatter(x=time_ind, y=a2c_cumpod, mode='lines', name='A2C')
trace2_portfolio = go.Scatter(x=time_ind, y=ppo_cumpod, mode='lines', name='PPO')
trace3_portfolio = go.Scatter(x=time_ind, y=dji_cumpod, mode='lines', name='DJIA')
trace4_portfolio = go.Scatter(x=time_ind, y=lr_cumprod, mode='lines', name='LR')
trace5_portfolio = go.Scatter(x=time_ind, y=rf_cumprod, mode='lines', name='RF')
trace6_portfolio = go.Scatter(x=time_ind, y=dt_cumprod, mode='lines', name='DT')
trace7_portfolio = go.Scatter(x=time_ind, y=svm_cumprod, mode='lines', name='SVM')


In [58]:
fig = go.Figure()
fig.add_trace(trace1_portfolio)
fig.add_trace(trace2_portfolio)

fig.add_trace(trace3_portfolio)

fig.add_trace(trace4_portfolio)
fig.add_trace(trace5_portfolio)
fig.add_trace(trace6_portfolio)
fig.add_trace(trace7_portfolio)

fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=15,
            color="black"
        ),
        bgcolor="White",
        bordercolor="white",
        borderwidth=2

    ),
)
fig.update_layout(title={
    #'text': "Cumulative Return using FinRL",
    'y': 0.85,
    'x': 0.5,
    'xanchor': 'center',
    'yanchor': 'top'})

fig.update_layout(
    paper_bgcolor='rgba(1,1,0,0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
    xaxis_title="Date",
    yaxis=dict(titlefont=dict(size=30), title="Cumulative Return"),
    font=dict(
        size=40,
    ),
)
fig.update_layout(font_size=20)
fig.update_traces(line=dict(width=2))

fig.update_xaxes(showline=True, linecolor='black', showgrid=True, gridwidth=1, gridcolor='LightSteelBlue', mirror=True)
fig.update_yaxes(showline=True, linecolor='black', showgrid=True, gridwidth=1, gridcolor='LightSteelBlue', mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='LightSteelBlue')

fig.show("png")

ValueError: 
Image export using the "kaleido" engine requires the kaleido package,
which can be installed using pip:
    $ pip install -U kaleido


#### We found that A2C and PPO succeeded in the portfoli management task and is better than all other algorithms/benchmark.

In [37]:
meta_score = {"Annual return": [], "Annual volatility": [], "Max drawdown": [], "Sharpe ratio": [], "Algorithm": [],
              "Calmar ratio": []}
for name in ["LR", "A2C", "RF", "Reference Model", "PPO", "SVM", "DT", "DJI"]:
    if name == "DT":
        annualreturn = dt_stats["Annual return"]
        annualvol = dt_stats["Annual volatility"]
        sharpeRatio = dt_stats["Sharpe ratio"]
        maxdradown = dt_stats["Max drawdown"]
        calmarratio = dt_stats["Calmar ratio"]
    elif name == "LR":
        annualreturn = lr_stats["Annual return"]
        annualvol = lr_stats["Annual volatility"]
        sharpeRatio = lr_stats["Sharpe ratio"]
        maxdradown = lr_stats["Max drawdown"]
        calmarratio = lr_stats["Calmar ratio"]
    elif name == "SVM":
        annualreturn = svm_stats["Annual return"]
        annualvol = svm_stats["Annual volatility"]
        sharpeRatio = svm_stats["Sharpe ratio"]
        maxdradown = svm_stats["Max drawdown"]
        calmarratio = svm_stats["Calmar ratio"]
    elif name == "RF":
        annualreturn = rf_stats["Annual return"]
        annualvol = rf_stats["Annual volatility"]
        sharpeRatio = rf_stats["Sharpe ratio"]
        maxdradown = rf_stats["Max drawdown"]
        calmarratio = rf_stats["Calmar ratio"]
    elif name == "Reference Model":
        annualreturn = reference_stats["Annual return"]
        annualvol = reference_stats["Annual volatility"]
        sharpeRatio = reference_stats["Sharpe ratio"]
        maxdradown = reference_stats["Max drawdown"]
        calmarratio = reference_stats["Calmar ratio"]
    elif name == "PPO":
        annualreturn = perf_stats_all_ppo["Annual return"]
        annualvol = perf_stats_all_ppo["Annual volatility"]
        sharpeRatio = perf_stats_all_ppo["Sharpe ratio"]
        maxdradown = perf_stats_all_ppo["Max drawdown"]
        calmarratio = perf_stats_all_ppo["Calmar ratio"]
    elif name == "DJI":
        annualreturn = baseline_df_stats["Annual return"]
        annualvol = baseline_df_stats["Annual volatility"]
        sharpeRatio = baseline_df_stats["Sharpe ratio"]
        maxdradown = baseline_df_stats["Max drawdown"]
        calmarratio = baseline_df_stats["Calmar ratio"]
    else:
        annualreturn = perf_stats_all_a2c["Annual return"]
        annualvol = perf_stats_all_a2c["Annual volatility"]
        sharpeRatio = perf_stats_all_a2c["Sharpe ratio"]
        maxdradown = perf_stats_all_a2c["Max drawdown"]
        calmarratio = perf_stats_all_a2c["Calmar ratio"]
    meta_score["Algorithm"] += [name]
    meta_score["Annual return"] += [annualreturn]
    meta_score["Annual volatility"] += [annualvol]
    meta_score["Max drawdown"] += [maxdradown]
    meta_score["Sharpe ratio"] += [sharpeRatio]
    meta_score["Calmar ratio"] += [calmarratio]

meta_score = pd.DataFrame(meta_score).sort_values("Sharpe ratio")


In [38]:
postiveRatio = pd.DataFrame(performance_score.groupby("algo").apply(lambda x: np.mean(x['score'])))

postiveRatio = postiveRatio.reset_index()
postiveRatio.columns = ['algo', 'avg_correlation_coefficient']
postiveRatio['Sharpe Ratio'] = [0] * 6

# postiveRatio.plot.bar(x = 'algo', y = 'avg_correlation_coefficient')

postiveRatiom = pd.DataFrame(multi_performance_score.groupby("algo").apply(lambda x: np.mean(x['score'])))
postiveRatiom = postiveRatiom.reset_index()
postiveRatiom.columns = ['algo', 'avg_correlation_coefficient']
postiveRatiom['Sharpe Ratio'] = [0] * 6

# postiveRatiom.plot.bar(x = 'algo', y = 'avg_correlation_coefficient')


for algo in ['A2C', 'PPO', 'LR', 'DT', 'RF', 'SVM']:
    postiveRatio.loc[postiveRatio['algo'] == algo, 'Sharpe Ratio'] = \
        meta_score.loc[meta_score['Algorithm'] == algo, 'Sharpe ratio'].values[0]
    postiveRatiom.loc[postiveRatio['algo'] == algo, 'Sharpe Ratio'] = \
        meta_score.loc[meta_score['Algorithm'] == algo, 'Sharpe ratio'].values[0]

postiveRatio.sort_values("Sharpe Ratio", inplace=True)

postiveRatiom.sort_values("Sharpe Ratio", inplace=True)







Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '2.188811782999448' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.


Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '2.188811782999448' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.



In [39]:

import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add traces
fig.add_trace(
    go.Scatter(x=postiveRatiom['algo'], y=postiveRatiom['Sharpe Ratio'], name="Sharpe Ratio", marker_size=15,
               line_width=5),
    secondary_y=True,
)

fig.add_trace(
    go.Bar(x=postiveRatiom['algo'], y=postiveRatiom['avg_correlation_coefficient'],
           name="Multi-Step Average Correlation Coefficient          ", width
           =0.38),
    secondary_y=False,
)
fig.add_trace(
    go.Bar(x=postiveRatio['algo'], y=postiveRatio['avg_correlation_coefficient'],
           name="Single-Step Average Correlation Coefficient           ", width
           =0.38),
    secondary_y=False,
)

fig.update_layout(
    paper_bgcolor='rgba(1,1,0,0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
)
fig.update_layout(legend=dict(
    yanchor="top",
    y=1.5,
    xanchor="right",
    x=0.95
))
fig.update_layout(font_size=15)

# Set x-axis title
fig.update_xaxes(title_text="Model")
fig.update_xaxes(showline=True, linecolor='black', showgrid=True, gridwidth=1, gridcolor='LightSteelBlue', mirror=True)
fig.update_yaxes(showline=True, linecolor='black', showgrid=True, secondary_y=False, gridwidth=1,
                 gridcolor='LightSteelBlue', mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='LightSteelBlue')
# Set y-axes titles
fig.update_yaxes(title_text="Average Correlation Coefficient", secondary_y=False, range=[-0.1, 0.1])
fig.update_yaxes(title_text="Sharpe Ratio", secondary_y=True, range=[-0.5, 2.5])

fig.show("png")

ValueError: 
Image export using the "kaleido" engine requires the kaleido package,
which can be installed using pip:
    $ pip install -U kaleido


#### The correlation coefficient represents the level of prediction power. 

We found that:
>*  The sharpe ratio is in accordance with both single-step and  multi-step average correlation coefficient.
>* DRL agents is better at multi-step prediction than ML algorithms while worse at single-step prediction

In [40]:

import plotly.graph_objects as go
from plotly.subplots import make_subplots

fig = make_subplots(rows=2, cols=3)

trace0 = go.Histogram(x=performance_score[performance_score['algo'] == 'A2C']['score'].values, nbinsx=25, name='A2C',
                      histnorm='probability')
trace1 = go.Histogram(x=performance_score[performance_score['algo'] == 'PPO']['score'].values, nbinsx=25, name='PPO',
                      histnorm='probability')
trace2 = go.Histogram(x=performance_score[performance_score['algo'] == 'DT']['score'].values, nbinsx=25, name='DT',
                      histnorm='probability')
trace3 = go.Histogram(x=performance_score[performance_score['algo'] == 'LR']['score'].values, nbinsx=25, name='LR',
                      histnorm='probability')
trace4 = go.Histogram(x=performance_score[performance_score['algo'] == 'SVM']['score'].values, nbinsx=25, name='SVM',
                      histnorm='probability')
trace5 = go.Histogram(x=performance_score[performance_score['algo'] == 'RF']['score'].values, nbinsx=25, name='RF',
                      histnorm='probability')

fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 1, 2)
fig.append_trace(trace2, 1, 3)
fig.append_trace(trace3, 2, 1)
fig.append_trace(trace4, 2, 2)
fig.append_trace(trace5, 2, 3)
# Update xaxis properties
fig.update_xaxes(title_text="Correlation coefficient", row=2, col=2)
fig.update_yaxes(title_text="Frequency", row=1, col=1)
fig.update_yaxes(title_text="Frequency", row=2, col=1)

fig.update_layout(

    paper_bgcolor='rgba(1,1,0,0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
    font=dict(

        size=18,
    ),

)
fig.update_layout(legend=dict(
    yanchor="top",
    y=0.99,
    xanchor="left",
    x=1
))

fig.update_xaxes(showline=True, linecolor='black', showgrid=True, gridwidth=1, gridcolor='LightSteelBlue', mirror=True)
fig.update_yaxes(showline=True, linecolor='black', showgrid=True, gridwidth=1, gridcolor='LightSteelBlue', mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='LightSteelBlue')

fig.show("png")

ValueError: 
Image export using the "kaleido" engine requires the kaleido package,
which can be installed using pip:
    $ pip install -U kaleido


#### Histogram of single-step correlation coefficient

In [41]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

fig = make_subplots(rows=2, cols=3)

trace0 = go.Histogram(x=multi_performance_score[multi_performance_score['algo'] == 'A2C']['score'].values, nbinsx=25,
                      name='A2C', histnorm='probability')
trace1 = go.Histogram(x=multi_performance_score[multi_performance_score['algo'] == 'PPO']['score'].values, nbinsx=25,
                      name='PPO', histnorm='probability')
trace2 = go.Histogram(x=multi_performance_score[multi_performance_score['algo'] == 'DT']['score'].values, nbinsx=25,
                      name='DT', histnorm='probability')
trace3 = go.Histogram(x=multi_performance_score[multi_performance_score['algo'] == 'LR']['score'].values, nbinsx=25,
                      name='LR', histnorm='probability')
trace4 = go.Histogram(x=multi_performance_score[multi_performance_score['algo'] == 'SVM']['score'].values, nbinsx=25,
                      name='SVM', histnorm='probability')
trace5 = go.Histogram(x=multi_performance_score[multi_performance_score['algo'] == 'RF']['score'].values, nbinsx=25,
                      name='RF', histnorm='probability')

fig.update_layout(yaxis1=dict(range=[0, 0.2]))
fig.update_layout(yaxis2=dict(range=[0, 0.2]))
fig.update_layout(yaxis3=dict(range=[0, 0.4]))
fig.update_layout(yaxis4=dict(range=[0, 0.4]))
fig.update_layout(yaxis5=dict(range=[0, 0.4]))
fig.update_layout(yaxis6=dict(range=[0, 0.4]))

fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 1, 2)
fig.append_trace(trace2, 1, 3)
fig.append_trace(trace3, 2, 1)
fig.append_trace(trace4, 2, 2)
fig.append_trace(trace5, 2, 3)
# Update xaxis properties
fig.update_xaxes(title_text="Correlation coefficient", row=2, col=2)
fig.update_yaxes(title_text="Frequency", row=1, col=1)
fig.update_yaxes(title_text="Frequency", row=2, col=1)

fig.update_layout(

    paper_bgcolor='rgba(1,1,0,0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
    font=dict(

        size=18,
    ),

)
fig.update_layout(legend=dict(
    yanchor="top",
    y=0.99,
    xanchor="left",
    x=1
))

fig.update_xaxes(showline=True, linecolor='black', showgrid=True, gridwidth=1, gridcolor='LightSteelBlue', mirror=True)
fig.update_yaxes(showline=True, linecolor='black', showgrid=True, gridwidth=1, gridcolor='LightSteelBlue', mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='LightSteelBlue')

fig.show("png")

ValueError: 
Image export using the "kaleido" engine requires the kaleido package,
which can be installed using pip:
    $ pip install -U kaleido


#### Histogram of multi-step correlation coefficient