# Сможет ли робот торговать в условиях финансового кризиса?
<img src="https://mulino58.ru/wp-content/uploads/e/9/1/e91087969be5f0ac43b7cf9860037e10.jpg" width="50%">



# Основные определения


Будем использовать подход глубокого обучения с подкреплением (DRL) и наборы рыночных данных будем обрабатывать в среде со стандартным интерфейсом gym.

Состояние показывает, как агент воспринимает рыночную ситуацию.

Изучив состояние, агент может предпринять действие из набора действий, который может варьироваться в зависимости от финансовых задач.

Вознаграждение - это механизм стимулирования агента к изучению лучшей политики.

Среда - источник данных о биржевой торговле.

# 1. Постановка задачи

Ззадача заключается в разработке автоматизированного торгового решения для распределения весов портфеля (portfoli allocation). Мы моделируем процесс торговли акциями как марковский процесс принятия решений (MDP). Затем мы формулируем нашу торговую цель как задачу максимизации.

Алгоритм обучается с использованием алгоритмов глубокого обучения с подкреплением (DRL), а компонентами среды обучения с подкреплением являются:


* Действие: Пространство действий описывает разрешенные действия, с которыми агент взаимодействует с
окружающая среда. Обычно a ∈ A представляет вес запаса в порфолио: a ∈ (-1,1). Предположим, что наш фонд включает N запасов, мы можем использовать список [a<sub>1</sub>, a<sub>2</sub>, ... , a<sub>N</sub>], чтобы определить вес для каждого запаса в порфотлио, где a<sub>i</sub> ∈ (-1,1), a<sub>1</sub>+ a<sub>2</sub>+...+a<sub>N</sub>=1. Например, "Вес AAPL в портфеле составляет 10%". равно [0.1 , ...].

* Функция вознаграждения: r(s, a, s') - это механизм стимулирования агента к более эффективному действию. Изменение значения портфеля при выполнении действия a в состоянии s и достижении нового состояния s', т.е. r(s, a, s') = v' − v, где v' и v представляют значения портфеля
в состоянии s' и s, соответственно

* Состояние: Пространство состояний описывает наблюдения, которые агент получает из окружающей среды. Точно так же, как трейдеру-человеку необходимо анализировать различную информацию перед совершением сделки, так
и наш торговый агент наблюдает за множеством различных функций, чтобы лучше учиться в интерактивной среде.

* Среда: Dow 30 consituents


Данные в формате OHLCV по каждой акции получены из Yahoo Finance API.

# 2. Установка библиотек

In [None]:
%%capture
#установка занимает 10 минут
!pip install wrds
!pip install swig
!pip install box2d-py
!pip install -q condacolab
import condacolab
condacolab.install()
!apt-get update -y -qq && apt-get install -y -qq cmake libopenmpi-dev python3-dev zlib1g-dev libgl1-mesa-glx swig
!pip install git+https://github.com/AI4Finance-Foundation/FinRL.git
!pip install numpy==1.24.1 pandas==1.5.3

In [None]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('Agg')
%matplotlib inline
import datetime

from finrl import config
from finrl import config_tickers
from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split
from finrl.meta.env_portfolio_allocation.env_portfolio import StockPortfolioEnv
from finrl.agents.stablebaselines3.models import DRLAgent
from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline,convert_daily_return_to_pyfolio_ts
from finrl.meta.data_processor import DataProcessor
from finrl.meta.data_processors.processor_yahoofinance import YahooFinanceProcessor
import sys
sys.path.append("../FinRL-Library")



In [None]:
import os
if not os.path.exists("./" + config.DATA_SAVE_DIR):
    os.makedirs("./" + config.DATA_SAVE_DIR)
if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
    os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
    os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
if not os.path.exists("./" + config.RESULTS_DIR):
    os.makedirs("./" + config.RESULTS_DIR)

# 3. Загрузка данных
Yahoo Finance - это веб-сайт, который предоставляет биржевые данные, финансовые новости, финансовые отчеты и т.д. Все данные, предоставляемые Yahoo Finance, бесплатны.
* FinRL использует класс **Yahoo Downloader** для извлечения данных из Yahoo Finance API
* Лимит вызовов: Используя общедоступный API (без аутентификации), вы ограничены 2000 запросами в час на один IP-адрес (или в общей сложности до 48 000 запросов в день).

In [None]:
df = YahooDownloader(start_date = '2008-01-01',
                     end_date = '2021-10-31',
                     ticker_list = config_tickers.DOW_30_TICKER
                     ).fetch_data()
df.sort_values(['date','tic']).tail()
# df = dp.download_data(start_date = '2008-01-01',
#                      end_date = '2021-10-31',
#                      ticker_list = config_tickers.DOW_30_TICKER, time_interval='1D')

# df['date'] = pd.to_datetime(df['timestamp'],format='%Y-%m-%d')

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%*******

Shape of DataFrame:  (101615, 8)


Unnamed: 0,date,open,high,low,close,volume,tic,day
101610,2021-10-29,454.410004,461.390015,453.059998,444.694977,2497800,UNH,4
101611,2021-10-29,209.210007,213.669998,208.539993,207.770325,14329800,V,4
101612,2021-10-29,52.5,53.049999,52.41,46.026157,17763200,VZ,4
101613,2021-10-29,46.860001,47.279999,46.77,41.057888,4999000,WBA,4
101614,2021-10-29,49.303333,50.033333,49.186668,47.942196,22022700,WMT,4


# 4: Предобработка данных

Предварительная обработка данных - важнейший шаг для подготовки высококачественной модели машинного обучения. Нам нужно проверить наличие недостающих данных и выполнить разработку функций, чтобы преобразовать данные в состояние готовности модели.
* Добавим технические индикаторы. В практической торговле необходимо принимать во внимание различную информацию, например, исторические цены акций, текущие акции холдинга, технические индикаторы и т.д. Мы используем 2 технических индикатора, следующих за трендом: MACD и RSI.
* Добавим индекс турбулентности. Неприятие риска отражает, предпочтет ли инвестор сохранить капитал. Это также влияет на торговую стратегию при различном уровне волатильности рынка. Чтобы контролировать риск при наихудшем сценарии, таком как финансовый кризис 2007-2008 годов, FinRL использует **индекс финансовой турбулентности**, который измеряет экстремальные колебания цен на активы.


класс Feature Engineer:
    
    
    Предоставляет методы для предварительной обработки данных о ценах на акции

    Атрибуты
    ----------
        use_technical_indicator : логическое значение
            нужен технический индикатор или нет
        tech_indicator_list : список
            список названий технических индикаторов (изменен с neofinrl_config.py)
        use_turbulence : логическое значение
            использовать индекс турбулентности или нет
        user_defined_feature:логическое значение
            использовать пользовательские функции или нет

    Методы
    -------
    preprocess_data()
        основной метод для разработки функций


In [None]:
fe = FeatureEngineer(
                    use_technical_indicator=True,
                    use_turbulence=False,
                    user_defined_feature = False)

df = fe.preprocess_data(df)

Successfully added technical indicators


In [None]:
df.shape

(97524, 16)

In [None]:
df.head()

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma
0,2008-01-02,7.116786,7.152143,6.876786,5.891123,1079178800,AAPL,2,0.0,5.896329,5.888637,100.0,-66.666667,100.0,5.891123,5.891123
3483,2008-01-02,46.599998,47.040001,46.259998,33.499531,7934400,AMGN,2,0.0,5.896329,5.888637,100.0,-66.666667,100.0,33.499531,33.499531
6966,2008-01-02,52.09,52.32,50.790001,39.460526,8053700,AXP,2,0.0,5.896329,5.888637,100.0,-66.666667,100.0,39.460526,39.460526
10449,2008-01-02,87.57,87.839996,86.0,63.481636,4303000,BA,2,0.0,5.896329,5.888637,100.0,-66.666667,100.0,63.481636,63.481636
13932,2008-01-02,72.559998,72.669998,70.050003,45.395164,6337800,CAT,2,0.0,5.896329,5.888637,100.0,-66.666667,100.0,45.395164,45.395164


##  Используем ковариационную матрицу в качестве состояний среды

In [None]:
df=df.sort_values(['date','tic'],ignore_index=True)
df.index = df.date.factorize()[0]

cov_list = []
return_list = []

# look back is one year
lookback=252
for i in range(lookback,len(df.index.unique())):
  data_lookback = df.loc[i-lookback:i,:]
  price_lookback=data_lookback.pivot_table(index = 'date',columns = 'tic', values = 'close')
  return_lookback = price_lookback.pct_change().dropna()
  return_list.append(return_lookback)

  covs = return_lookback.cov().values
  cov_list.append(covs)


df_cov = pd.DataFrame({'date':df.date.unique()[lookback:],'cov_list':cov_list,'return_list':return_list})
df = df.merge(df_cov, on='date')
df = df.sort_values(['date','tic']).reset_index(drop=True)


In [None]:
df.shape

(90468, 18)

In [None]:
df.head()

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2008-12-31,3.070357,3.133571,3.047857,2.580616,607541200,AAPL,2,-0.082498,3.089709,2.451164,42.25478,-80.459659,16.129793,2.746056,2.858024,"[[0.0013489693666397986, 0.0004284139126318498...",tic AAPL AMGN AXP ...
1,2008-12-31,57.110001,58.220001,57.060001,41.514973,6287200,AMGN,2,0.15554,42.375764,40.536302,51.060597,51.513342,10.432018,40.739556,40.288822,"[[0.0013489693666397986, 0.0004284139126318498...",tic AAPL AMGN AXP ...
2,2008-12-31,17.969999,18.75,17.91,14.533796,9625600,AXP,2,-0.93257,18.586823,12.619706,42.554841,-75.445678,25.776759,15.693366,17.559647,"[[0.0013489693666397986, 0.0004284139126318498...",tic AAPL AMGN AXP ...
3,2008-12-31,41.59,43.049999,41.5,32.005882,5443100,BA,2,-0.279799,32.174381,28.867828,47.440231,156.994666,5.366299,30.32721,32.389914,"[[0.0013489693666397986, 0.0004284139126318498...",tic AAPL AMGN AXP ...
4,2008-12-31,43.700001,45.099998,43.700001,29.472118,6277400,CAT,2,0.652588,30.208137,25.338257,51.205318,98.368799,26.331746,26.566469,26.301738,"[[0.0013489693666397986, 0.0004284139126318498...",tic AAPL AMGN AXP ...


# 5. Создаём среду

Учитывая стохастический характер задач автоматической торговли акциями, финансовая задача моделируется как задача Марковского процесса принятия решений (MDP). Процесс обучения включает наблюдение за изменением цены акций, выполнение действия и расчет вознаграждения, чтобы агент соответствующим образом скорректировал свою стратегию. Взаимодействуя с окружающей средой, торговый агент с течением времени будет разрабатывать торговую стратегию с максимальным вознаграждением.

Наши торговые среды, основанные на интерфейсе Openal Gym, моделируют рынки с использованием реальных рыночных данных.


In [None]:
train = data_split(df, '2009-01-01','2020-07-01')
#trade = data_split(df, '2020-01-01', config.END_DATE)

In [None]:
train.head()

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2009-01-02,3.067143,3.251429,3.041429,2.743889,746015200,AAPL,4,-0.070063,3.076204,2.449097,45.440197,-32.225703,2.140064,2.746903,2.858825,"[[0.001366151045220885, 0.0004339394654692824,...",tic AAPL AMGN AXP ...
0,2009-01-02,58.59,59.080002,57.75,42.406384,6547900,AMGN,4,0.230361,42.520228,40.501107,52.756853,92.789047,0.814217,40.803056,40.376285,"[[0.001366151045220885, 0.0004339394654692824,...",tic AAPL AMGN AXP ...
0,2009-01-02,18.57,19.52,18.4,15.144922,10955700,AXP,4,-0.82937,18.403801,12.603722,43.957552,-42.863711,16.335101,15.69206,17.443167,"[[0.001366151045220885, 0.0004339394654692824,...",tic AAPL AMGN AXP ...
0,2009-01-02,42.799999,45.560001,42.779999,33.94109,7010200,BA,4,-0.002008,32.948623,28.452123,50.822028,272.812684,20.494464,30.469475,32.344129,"[[0.001366151045220885, 0.0004339394654692824,...",tic AAPL AMGN AXP ...
0,2009-01-02,44.91,46.98,44.709999,30.950008,7117200,CAT,4,0.829341,30.70792,25.317469,53.66125,129.50275,34.637448,26.802228,26.302317,"[[0.001366151045220885, 0.0004339394654692824,...",tic AAPL AMGN AXP ...


## В чём держать активы? (среда Portfolio Allocation)


In [None]:
import numpy as np
import pandas as pd
from gym.utils import seeding
import gym
from gym import spaces
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from stable_baselines3.common.vec_env import DummyVecEnv


class StockPortfolioEnv(gym.Env):
    """Унифицированная среда торговли акциями для OpenAI gym

    Атрибуты
    ----------
        df: DataFrame, входные данные
        stock_dim : int, количество уникальных акций
        hmax : int, максимальное количество акций для торговли
        initial_amount : int, начальные деньги
        transaction_cost_pct: float, комиссионые за сделку
        reward_scaling: float, коэффициент масштабирования вознаграждения, подходящий для обучения
        state_space: int, размерность пространства состояний
        action_space: int, размерность пространства действий
        tech_indicator_list: list, список технических индикаторов
        turbulence_threshold: int, порог для контроля неприятия риска
        день: int, число приращений к контрольной дате

    Методы
    -------
    _sell_stock()
        выполнить действие продажи на основе признака действия
    _buy_stock()
        выполнить действие покупки на основе признака
    step()
        на каждом шаге агент будет возвращать действия, затем
        мы рассчитаем вознаграждение и вернем следующее наблюдение.
    reset()
        сброс среды в начальное состояние
    render()
        используйте render для возврата других функций
    save_asset_memory()
        возвращает значение учетной записи на каждом временном шаге
    save_action_memory()
        возвращает действия/позиции на каждом временном шаге

    """
    metadata = {'render.modes': ['human']}

    def __init__(self,
                df,
                stock_dim,
                hmax,
                initial_amount,
                transaction_cost_pct,
                reward_scaling,
                state_space,
                action_space,
                tech_indicator_list,
                turbulence_threshold=None,
                lookback=252,
                day = 0):
        #super(StockEnv, self).__init__()
        #money = 10 , scope = 1
        self.day = day
        self.lookback=lookback
        self.df = df
        self.stock_dim = stock_dim
        self.hmax = hmax
        self.initial_amount = initial_amount
        self.transaction_cost_pct =transaction_cost_pct
        self.reward_scaling = reward_scaling
        self.state_space = state_space
        self.action_space = action_space
        self.tech_indicator_list = tech_indicator_list

        # action_space normalization and shape is self.stock_dim
        self.action_space = spaces.Box(low = 0, high = 1,shape = (self.action_space,))
        # Shape = (34, 30)
        # covariance matrix + technical indicators
        self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape = (self.state_space+len(self.tech_indicator_list),self.state_space))

        # load data from a pandas dataframe
        self.data = self.df.loc[self.day,:]
        self.covs = self.data['cov_list'].values[0]
        self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
        self.terminal = False
        self.turbulence_threshold = turbulence_threshold
        # initalize state: inital portfolio return + individual stock return + individual weights
        self.portfolio_value = self.initial_amount

        # memorize portfolio value each step
        self.asset_memory = [self.initial_amount]
        # memorize portfolio return each step
        self.portfolio_return_memory = [0]
        self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
        self.date_memory=[self.data.date.unique()[0]]


    def step(self, actions):
        # print(self.day)
        self.terminal = self.day >= len(self.df.index.unique())-1
        # print(actions)

        if self.terminal:
            df = pd.DataFrame(self.portfolio_return_memory)
            df.columns = ['daily_return']
            plt.plot(df.daily_return.cumsum(),'r')
            plt.savefig('results/cumulative_reward.png')
            plt.close()

            plt.plot(self.portfolio_return_memory,'r')
            plt.savefig('results/rewards.png')
            plt.close()

            print("=================================")
            print("Начальное сальдо по активу:{}".format(self.asset_memory[0]))
            print("Конечное сальдо по активу:{}".format(self.portfolio_value))

            df_daily_return = pd.DataFrame(self.portfolio_return_memory)
            df_daily_return.columns = ['daily_return']
            if df_daily_return['daily_return'].std() !=0:
              sharpe = (252**0.5)*df_daily_return['daily_return'].mean()/ \
                       df_daily_return['daily_return'].std()
              print("Коэффициент Шарпа: ",sharpe)
            print("=================================")

            return self.state, self.reward, self.terminal,{}

        else:
            #print("Model actions: ",actions)
            # actions are the portfolio weight
            # normalize to sum of 1
            #if (np.array(actions) - np.array(actions).min()).sum() != 0:
            #  norm_actions = (np.array(actions) - np.array(actions).min()) / (np.array(actions) - np.array(actions).min()).sum()
            #else:
            #  norm_actions = actions
            weights = self.softmax_normalization(actions)
            #print("Normalized actions: ", weights)
            self.actions_memory.append(weights)
            last_day_memory = self.data

            #load next state
            self.day += 1
            self.data = self.df.loc[self.day,:]
            self.covs = self.data['cov_list'].values[0]
            self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
            #print(self.state)
            # calcualte portfolio return
            # individual stocks' return * weight
            portfolio_return = sum(((self.data.close.values / last_day_memory.close.values)-1)*weights)
            # update portfolio value
            new_portfolio_value = self.portfolio_value*(1+portfolio_return)
            self.portfolio_value = new_portfolio_value

            # save into memory
            self.portfolio_return_memory.append(portfolio_return)
            self.date_memory.append(self.data.date.unique()[0])
            self.asset_memory.append(new_portfolio_value)

            # the reward is the new portfolio value or end portfolo value
            self.reward = new_portfolio_value
            #print("Step reward: ", self.reward)
            #self.reward = self.reward*self.reward_scaling

        return self.state, self.reward, self.terminal, {}

    def reset(self):
        self.asset_memory = [self.initial_amount]
        self.day = 0
        self.data = self.df.loc[self.day,:]
        # load states
        self.covs = self.data['cov_list'].values[0]
        self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
        self.portfolio_value = self.initial_amount
        #self.cost = 0
        #self.trades = 0
        self.terminal = False
        self.portfolio_return_memory = [0]
        self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
        self.date_memory=[self.data.date.unique()[0]]
        return self.state

    def render(self, mode='human'):
        return self.state

    def softmax_normalization(self, actions):
        numerator = np.exp(actions)
        denominator = np.sum(np.exp(actions))
        softmax_output = numerator/denominator
        return softmax_output


    def save_asset_memory(self):
        date_list = self.date_memory
        portfolio_return = self.portfolio_return_memory
        #print(len(date_list))
        #print(len(asset_list))
        df_account_value = pd.DataFrame({'date':date_list,'daily_return':portfolio_return})
        return df_account_value

    def save_action_memory(self):
        # date and close price length must match actions length
        date_list = self.date_memory
        df_date = pd.DataFrame(date_list)
        df_date.columns = ['date']

        action_list = self.actions_memory
        df_actions = pd.DataFrame(action_list)
        df_actions.columns = self.data.tic.values
        df_actions.index = df_date.date
        #df_actions = pd.DataFrame({'date':date_list,'actions':action_list})
        return df_actions

    def _seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def get_sb_env(self):
        e = DummyVecEnv([lambda: self])
        obs = e.reset()
        return e, obs

In [None]:
stock_dimension = len(train.tic.unique())
state_space = stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")


Stock Dimension: 28, State Space: 28


In [None]:
env_kwargs = {
    "hmax": 100,
    "initial_amount": 1000000,
    "transaction_cost_pct": 0.001,
    "state_space": state_space,
    "stock_dim": stock_dimension,
    "tech_indicator_list": config.INDICATORS,
    "action_space": stock_dimension,
    "reward_scaling": 1e-4

}

e_train_gym = StockPortfolioEnv(df = train, **env_kwargs)

In [None]:
env_train, _ = e_train_gym.get_sb_env()
print(type(env_train))

<class 'stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv'>




#  6: Выбор алгоритма DRL

Реализация алгоритмов DRL основана на **OpenAI Baselines** и **Stable Baselines**. Stable Baselines - это ветвь OpenAI Baselines после структурного рефакторинга и очистки кода.

### Модель 1: A2C


In [None]:
agent = DRLAgent(env = env_train)

A2C_PARAMS = {"n_steps": 5, "ent_coef": 0.005, "learning_rate": 0.0002}
model_a2c = agent.get_model(model_name="a2c",model_kwargs = A2C_PARAMS)

{'n_steps': 5, 'ent_coef': 0.005, 'learning_rate': 0.0002}
Using cpu device


In [None]:
trained_a2c = agent.train_model(model=model_a2c,
                                tb_log_name='a2c',
                                total_timesteps=50000)

-------------------------------------
| time/                 |           |
|    fps                | 200       |
|    iterations         | 100       |
|    time_elapsed       | 2         |
|    total_timesteps    | 500       |
| train/                |           |
|    entropy_loss       | -39.7     |
|    explained_variance | 0         |
|    learning_rate      | 0.0002    |
|    n_updates          | 99        |
|    policy_loss        | 1.77e+08  |
|    reward             | 1644446.0 |
|    std                | 0.998     |
|    value_loss         | 2.85e+13  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 168       |
|    iterations         | 200       |
|    time_elapsed       | 5         |
|    total_timesteps    | 1000      |
| train/                |           |
|    entropy_loss       | -39.7     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0002    |
|    n_updat

In [None]:
trained_a2c.save('/content/trained_models/trained_a2c.zip')

### Модель 2: PPO


In [None]:
agent = DRLAgent(env = env_train)
PPO_PARAMS = {
    "n_steps": 2048,
    "ent_coef": 0.005,
    "learning_rate": 0.0001,
    "batch_size": 128,
}
model_ppo = agent.get_model("ppo",model_kwargs = PPO_PARAMS)

{'n_steps': 2048, 'ent_coef': 0.005, 'learning_rate': 0.0001, 'batch_size': 128}
Using cpu device


In [None]:
trained_ppo = agent.train_model(model=model_ppo,
                             tb_log_name='ppo',
                             total_timesteps=80000)

----------------------------------
| time/              |           |
|    fps             | 286       |
|    iterations      | 1         |
|    time_elapsed    | 7         |
|    total_timesteps | 2048      |
| train/             |           |
|    reward          | 4167348.0 |
----------------------------------
begin_total_asset:1000000
end_total_asset:5781717.99460306
Sharpe:  0.9271356690774338
---------------------------------------
| time/                   |           |
|    fps                  | 237       |
|    iterations           | 2         |
|    time_elapsed         | 17        |
|    total_timesteps      | 4096      |
| train/                  |           |
|    approx_kl            | 0.0       |
|    clip_fraction        | 0         |
|    clip_range           | 0.2       |
|    entropy_loss         | -39.7     |
|    explained_variance   | 0         |
|    learning_rate        | 0.0001    |
|    loss                 | 9.22e+14  |
|    n_updates            | 10        

In [None]:
trained_ppo.save('/content/trained_models/trained_ppo.zip')

### Модель 3: DDPG


In [None]:
agent = DRLAgent(env = env_train)
DDPG_PARAMS = {"batch_size": 128, "buffer_size": 50000, "learning_rate": 0.001}


model_ddpg = agent.get_model("ddpg",model_kwargs = DDPG_PARAMS)

{'batch_size': 128, 'buffer_size': 50000, 'learning_rate': 0.001}
Using cpu device


In [None]:
trained_ddpg = agent.train_model(model=model_ddpg,
                             tb_log_name='ddpg',
                             total_timesteps=50000)

begin_total_asset:1000000
end_total_asset:5839569.0604262585
Sharpe:  0.9400166159587564
begin_total_asset:1000000
end_total_asset:5760044.838215547
Sharpe:  0.9347910867946942
begin_total_asset:1000000
end_total_asset:5760044.838215547
Sharpe:  0.9347910867946942
begin_total_asset:1000000
end_total_asset:5760044.838215547
Sharpe:  0.9347910867946942
----------------------------------
| time/              |           |
|    episodes        | 4         |
|    fps             | 15        |
|    time_elapsed    | 738       |
|    total_timesteps | 11572     |
| train/             |           |
|    actor_loss      | -1.32e+08 |
|    critic_loss     | 2.29e+12  |
|    learning_rate   | 0.001     |
|    n_updates       | 11471     |
|    reward          | 5760045.0 |
----------------------------------
begin_total_asset:1000000
end_total_asset:5760044.838215547
Sharpe:  0.9347910867946942
begin_total_asset:1000000
end_total_asset:5760044.838215547
Sharpe:  0.9347910867946942
begin_total_asse

In [None]:
trained_ddpg.save('/content/trained_models/trained_ddpg.zip')

### Модель 4: SAC


In [None]:
agent = DRLAgent(env = env_train)
SAC_PARAMS = {
    "batch_size": 128,
    "buffer_size": 100000,
    "learning_rate": 0.0003,
    "learning_starts": 100,
    "ent_coef": "auto_0.1",
}

model_sac = agent.get_model("sac",model_kwargs = SAC_PARAMS)

{'batch_size': 128, 'buffer_size': 100000, 'learning_rate': 0.0003, 'learning_starts': 100, 'ent_coef': 'auto_0.1'}
Using cpu device


In [None]:
trained_sac = agent.train_model(model=model_sac,
                             tb_log_name='sac',
                             total_timesteps=50000)

begin_total_asset:1000000
end_total_asset:6660690.072807977
Sharpe:  0.9733781323344579
begin_total_asset:1000000
end_total_asset:6752031.726243672
Sharpe:  0.9773814644928907
begin_total_asset:1000000
end_total_asset:6751921.453541156
Sharpe:  0.9773753985227259
begin_total_asset:1000000
end_total_asset:6752185.098115166
Sharpe:  0.9774009068067696
----------------------------------
| time/              |           |
|    episodes        | 4         |
|    fps             | 15        |
|    time_elapsed    | 759       |
|    total_timesteps | 11572     |
| train/             |           |
|    actor_loss      | -1.43e+08 |
|    critic_loss     | 2.2e+12   |
|    ent_coef        | 3.21      |
|    ent_coef_loss   | -182      |
|    learning_rate   | 0.0003    |
|    n_updates       | 11471     |
|    reward          | 6752185.0 |
----------------------------------
begin_total_asset:1000000
end_total_asset:6752155.950045996
Sharpe:  0.9774018154294765
begin_total_asset:1000000
end_total

In [None]:
trained_sac.save('/content/trained_models/trained_sac.zip')

### Модель 5: TD3


In [None]:
agent = DRLAgent(env = env_train)
TD3_PARAMS = {"batch_size": 100,
              "buffer_size": 1000000,
              "learning_rate": 0.001}

model_td3 = agent.get_model("td3",model_kwargs = TD3_PARAMS)

{'batch_size': 100, 'buffer_size': 1000000, 'learning_rate': 0.001}
Using cpu device


In [None]:
trained_td3 = agent.train_model(model=model_td3,
                             tb_log_name='td3',
                             total_timesteps=30000)

begin_total_asset:1000000
end_total_asset:6175149.8098474005
Sharpe:  0.944605590968042
begin_total_asset:1000000
end_total_asset:6255897.645133298
Sharpe:  0.9446118803828547
begin_total_asset:1000000
end_total_asset:6255897.645133298
Sharpe:  0.9446118803828547
begin_total_asset:1000000
end_total_asset:6255897.645133298
Sharpe:  0.9446118803828547
----------------------------------
| time/              |           |
|    episodes        | 4         |
|    fps             | 17        |
|    time_elapsed    | 669       |
|    total_timesteps | 11572     |
| train/             |           |
|    actor_loss      | -8.27e+07 |
|    critic_loss     | 1.53e+12  |
|    learning_rate   | 0.001     |
|    n_updates       | 11471     |
|    reward          | 6255897.5 |
----------------------------------
begin_total_asset:1000000
end_total_asset:6255897.645133298
Sharpe:  0.9446118803828547
begin_total_asset:1000000
end_total_asset:6255897.645133298
Sharpe:  0.9446118803828547
begin_total_asset

In [None]:
trained_td3.save('/content/trained_models/trained_td3.zip')

Предобученные модели лежат [здесь](https://drive.google.com/file/d/10r9VNWTFKnbKZIQ2Dizv_somPHLVa0Af/view?usp=sharing), их можно загрузить в агента

## Начинаем торговлю

Предположим, что на 2019-01-01 у нас есть начальный капитал в размере 1 000 000 долларов. Мы используем модель A2C для торговли акциями из индекса Dow jones 30.

In [None]:
trade = data_split(df,'2020-07-01', '2021-10-31')
e_trade_gym = StockPortfolioEnv(df = trade, **env_kwargs)


In [None]:
trade.shape

(9436, 18)

In [None]:
df_daily_return, df_actions = DRLAgent.DRL_prediction(model=trained_a2c,
                        environment = e_trade_gym)



begin_total_asset:1000000
end_total_asset:1435773.4652148657
Sharpe:  2.002437393597995
hit end!


In [None]:
df_daily_return.head()

Unnamed: 0,date,daily_return
0,2020-07-01,0.0
1,2020-07-02,0.005367
2,2020-07-06,0.017271
3,2020-07-07,-0.015715
4,2020-07-08,0.005764


In [None]:
df_daily_return.to_csv('df_daily_return.csv')

In [None]:
df_actions.head()

Unnamed: 0_level_0,AAPL,AMGN,AXP,BA,CAT,CRM,CSCO,CVX,DIS,GS,...,MMM,MRK,MSFT,NKE,PG,TRV,UNH,VZ,WBA,WMT
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-07-01,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,...,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714
2020-07-02,0.025605,0.025605,0.025605,0.049659,0.034554,0.025605,0.025605,0.034846,0.025605,0.027639,...,0.025605,0.025605,0.025605,0.069602,0.05072,0.025605,0.025605,0.025605,0.069602,0.025605
2020-07-06,0.025605,0.025605,0.025605,0.049659,0.034554,0.025605,0.025605,0.034846,0.025605,0.027639,...,0.025605,0.025605,0.025605,0.069602,0.05072,0.025605,0.025605,0.025605,0.069602,0.025605
2020-07-07,0.025605,0.025605,0.025605,0.049659,0.034554,0.025605,0.025605,0.034846,0.025605,0.027639,...,0.025605,0.025605,0.025605,0.069602,0.05072,0.025605,0.025605,0.025605,0.069602,0.025605
2020-07-08,0.025605,0.025605,0.025605,0.049659,0.034554,0.025605,0.025605,0.034846,0.025605,0.027639,...,0.025605,0.025605,0.025605,0.069602,0.05072,0.025605,0.025605,0.025605,0.069602,0.025605


In [None]:
df_actions.to_csv('df_actions.csv')

# 7. Тестирование торговой стратегии на исторических данных

Тестирование торговой стратегии на исторических данных
играет ключевую роль в оценке эффективности торговой стратегии. Предпочтителен инструмент автоматического обратного тестирования, поскольку он снижает вероятность человеческой ошибки. Обычно мы используем пакет Quantopian pyfolio для обратного тестирования наших торговых стратегий. Он прост в использовании и состоит из различных отдельных графиков, которые дают полное представление об эффективности торговой стратегии.


## Статистика по торговой стратегии на исторических данных


In [None]:
from pyfolio import timeseries
DRL_strat = convert_daily_return_to_pyfolio_ts(df_daily_return)
perf_func = timeseries.perf_stats
perf_stats_all = perf_func( returns=DRL_strat,
                              factor_returns=DRL_strat,
                                positions=None, transactions=None, turnover_denom="AGB")

  stats = pd.Series()


In [None]:
print("==============Статистика по торговой стратегии агентов DRL на исторических данных===========")
perf_stats_all



Annual return          0.310584
Cumulative returns     0.435773
Annual volatility      0.140021
Sharpe ratio           2.002437
Calmar ratio           4.092228
Stability              0.914440
Max drawdown          -0.075896
Omega ratio            1.396894
Sortino ratio          3.111224
Skew                  -0.016994
Kurtosis               1.513677
Tail ratio             1.178752
Daily value at risk   -0.016528
Alpha                  0.000000
Beta                   1.000000
dtype: float64

In [None]:
print("==============Статистика по торговой стратегии индексом DJ на исторических данны===========")
baseline_df = get_baseline(
        ticker="^DJI",
        start = df_daily_return.loc[0,'date'],
        end = df_daily_return.loc[len(df_daily_return)-1,'date'])

stats = backtest_stats(baseline_df, value_col_name = 'close')



[*********************100%%**********************]  1 of 1 completed

Shape of DataFrame:  (336, 8)
Annual return          0.279047
Cumulative returns     0.388402
Annual volatility      0.139129
Sharpe ratio           1.844560
Calmar ratio           3.124551
Stability              0.918675
Max drawdown          -0.089308
Omega ratio            1.358960
Sortino ratio          2.734872
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.052781
Daily value at risk   -0.016510
dtype: float64



  stats = pd.Series()


In [None]:
import pyfolio
%matplotlib inline

baseline_df = get_baseline(
        ticker='^DJI', start=df_daily_return.loc[0,'date'], end='2021-11-01'
    )

baseline_returns = get_daily_return(baseline_df, value_col_name="close")
baseline_returns.head()

# with pyfolio.plotting.plotting_context(font_scale=1.1):
#         pyfolio.create_full_tear_sheet(returns = DRL_strat,
#                                        benchmark_rets=baseline_returns, set_context=True)

[*********************100%%**********************]  1 of 1 completed

Shape of DataFrame:  (337, 8)





date
2020-07-01 00:00:00+00:00         NaN
2020-07-02 00:00:00+00:00    0.003590
2020-07-06 00:00:00+00:00    0.017798
2020-07-07 00:00:00+00:00   -0.015097
2020-07-08 00:00:00+00:00    0.006840
Name: daily_return, dtype: float64

## Распределение активов в портфель с минимальной дисперсией

In [None]:
%pip install PyPortfolioOpt

[0m

In [None]:
from pypfopt.efficient_frontier import EfficientFrontier
from pypfopt import risk_models

In [None]:
unique_tic = trade.tic.unique()
unique_trade_date = trade.date.unique()

In [None]:
df.head()

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2008-12-31,3.070357,3.133571,3.047857,2.580616,607541200,AAPL,2,-0.082498,3.089709,2.451164,42.25478,-80.459659,16.129793,2.746056,2.858024,"[[0.0013489693666397986, 0.0004284139126318498...",tic AAPL AMGN AXP ...
1,2008-12-31,57.110001,58.220001,57.060001,41.514973,6287200,AMGN,2,0.15554,42.375764,40.536302,51.060597,51.513342,10.432018,40.739556,40.288822,"[[0.0013489693666397986, 0.0004284139126318498...",tic AAPL AMGN AXP ...
2,2008-12-31,17.969999,18.75,17.91,14.533796,9625600,AXP,2,-0.93257,18.586823,12.619706,42.554841,-75.445678,25.776759,15.693366,17.559647,"[[0.0013489693666397986, 0.0004284139126318498...",tic AAPL AMGN AXP ...
3,2008-12-31,41.59,43.049999,41.5,32.005882,5443100,BA,2,-0.279799,32.174381,28.867828,47.440231,156.994666,5.366299,30.32721,32.389914,"[[0.0013489693666397986, 0.0004284139126318498...",tic AAPL AMGN AXP ...
4,2008-12-31,43.700001,45.099998,43.700001,29.472118,6277400,CAT,2,0.652588,30.208137,25.338257,51.205318,98.368799,26.331746,26.566469,26.301738,"[[0.0013489693666397986, 0.0004284139126318498...",tic AAPL AMGN AXP ...


In [None]:
#calculate_portfolio_minimum_variance
portfolio = pd.DataFrame(index = range(1), columns = unique_trade_date)
initial_capital = 1000000
portfolio.loc[0,unique_trade_date[0]] = initial_capital

for i in range(len( unique_trade_date)-1):
    df_temp = df[df.date==unique_trade_date[i]].reset_index(drop=True)
    df_temp_next = df[df.date==unique_trade_date[i+1]].reset_index(drop=True)
    #Sigma = risk_models.sample_cov(df_temp.return_list[0])
    #calculate covariance matrix
    Sigma = df_temp.return_list[0].cov()
    #portfolio allocation
    ef_min_var = EfficientFrontier(None, Sigma,weight_bounds=(0, 0.1))
    #minimum variance
    raw_weights_min_var = ef_min_var.min_volatility()
    #get weights
    cleaned_weights_min_var = ef_min_var.clean_weights()

    #current capital
    cap = portfolio.iloc[0, i]
    #current cash invested for each stock
    current_cash = [element * cap for element in list(cleaned_weights_min_var.values())]
    # current held shares
    current_shares = list(np.array(current_cash)
                                      / np.array(df_temp.close))
    # next time period price
    next_price = np.array(df_temp_next.close)
    ##next_price * current share to calculate next total account value
    portfolio.iloc[0, i+1] = np.dot(current_shares, next_price)

portfolio=portfolio.T
portfolio.columns = ['account_value']

In [None]:
portfolio.head()

Unnamed: 0,account_value
2020-07-01,1000000.0
2020-07-02,1005253.854911
2020-07-06,1014938.483557
2020-07-07,1014208.459331
2020-07-08,1012633.780599


In [None]:
a2c_cumpod =(df_daily_return.daily_return+1).cumprod()-1

In [None]:
min_var_cumpod =(portfolio.account_value.pct_change()+1).cumprod()-1

In [None]:
dji_cumpod =(baseline_returns+1).cumprod()-1

In [None]:
%pip install plotly

Collecting plotly
  Downloading plotly-5.20.0-py3-none-any.whl.metadata (7.0 kB)
Collecting tenacity>=6.2.0 (from plotly)
  Downloading tenacity-8.2.3-py3-none-any.whl.metadata (1.0 kB)
Downloading plotly-5.20.0-py3-none-any.whl (15.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.7/15.7 MB[0m [31m43.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tenacity-8.2.3-py3-none-any.whl (24 kB)
Installing collected packages: tenacity, plotly
Successfully installed plotly-5.20.0 tenacity-8.2.3
[0m

In [None]:
from datetime import datetime as dt

import matplotlib.pyplot as plt
import plotly
import plotly.graph_objs as go

In [None]:
time_ind = pd.Series(df_daily_return.date)

In [None]:
trace0_portfolio = go.Scatter(x = time_ind, y = a2c_cumpod, mode = 'lines', name = 'Портфель агента A2C')

trace1_portfolio = go.Scatter(x = time_ind, y = dji_cumpod, mode = 'lines', name = 'Портфель по индексу DJIA')
trace2_portfolio = go.Scatter(x = time_ind, y = min_var_cumpod, mode = 'lines', name = 'Портфель с минимальной дисперсией')
#trace3_portfolio = go.Scatter(x = time_ind, y = ddpg_cumpod, mode = 'lines', name = 'DDPG')
#trace4_portfolio = go.Scatter(x = time_ind, y = addpg_cumpod, mode = 'lines', name = 'Adaptive-DDPG')
#trace5_portfolio = go.Scatter(x = time_ind, y = min_cumpod, mode = 'lines', name = 'Min-Variance')

#trace4 = go.Scatter(x = time_ind, y = addpg_cumpod, mode = 'lines', name = 'Adaptive-DDPG')

#trace2 = go.Scatter(x = time_ind, y = portfolio_cost_minv, mode = 'lines', name = 'Min-Variance')
#trace3 = go.Scatter(x = time_ind, y = spx_value, mode = 'lines', name = 'SPX')

In [None]:
fig = go.Figure()
fig.add_trace(trace0_portfolio)

fig.add_trace(trace1_portfolio)

fig.add_trace(trace2_portfolio)



fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=15,
            color="black"
        ),
        bgcolor="White",
        bordercolor="white",
        borderwidth=2

    ),
)
#fig.update_layout(legend_orientation="h")
fig.update_layout(title={
        #'text': "Cumulative Return using FinRL",
        'y':0.85,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
#with Transaction cost
#fig.update_layout(title =  'Quarterly Trade Date')
fig.update_layout(
#    margin=dict(l=20, r=20, t=20, b=20),

    paper_bgcolor='rgba(1,1,0,0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
    #xaxis_title="Date",
    yaxis_title="Cumulative Return",
xaxis={'type': 'date',
       'tick0': time_ind[0],
        'tickmode': 'linear',
       'dtick': 86400000.0 *80}

)
fig.update_xaxes(showline=True,linecolor='black',showgrid=True, gridwidth=1, gridcolor='LightSteelBlue',mirror=True)
fig.update_yaxes(showline=True,linecolor='black',showgrid=True, gridwidth=1, gridcolor='LightSteelBlue',mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='LightSteelBlue')

fig.show()

In [None]:
!pip install ccxt backtrader pandas

In [None]:
import ccxt  # noqa: E402
import pandas as pd
import backtrader as bt
import backtrader.analyzers as btanalyzers
import backtrader.feeds as btfeeds

exchanges = {}

for id in ccxt.exchanges:
    exchange = getattr(ccxt, id)
    exchanges[id] = exchange()
data = exchanges["bitbay"].fetch_ohlcv("BTC" + "/" + "USDT", "3d")
header = ["Timestamp", "open", "high", "low", "close", "volume"]
df = pd.DataFrame(data, columns=header)
df.Timestamp = (df.Timestamp / 1000)
df["datetime"] = pd.to_datetime(df.Timestamp, unit="s")
df["open"] = pd.to_numeric(df["open"])
df["high"] = pd.to_numeric(df["high"])
df["low"] = pd.to_numeric(df["low"])
df["close"] = pd.to_numeric(df["close"])
df["volume"] = pd.to_numeric(df["volume"])
df["openinterest"] = 1000.0
df = df.drop(["Timestamp"], axis=1)
df = df[["datetime", "open", "high", "low", "close", "volume", "openinterest"]]
dataname = "btc-usdt.csv"
df.to_csv(dataname, header=True, index=False)
# df = pd.read_csv(dataname)
# print(df.shape)
cerebro = bt.Cerebro()
data = btfeeds.BacktraderCSVData(dataname=dataname,
                                 timeframe=bt.TimeFrame.Days
                                 )


class Sma30(bt.Indicator):
    lines = ('signal',)
    params = (('period', 30),)

    def __init__(self):
        self.lines.signal = self.data - bt.indicators.SMA(period=self.p.period)


cerebro.adddata(data)
cerebro.add_signal(bt.SIGNAL_LONGSHORT, Sma30)

#метрики риска
cerebro.addanalyzer(btanalyzers.SharpeRatio, _name='mysharpe')
cerebro.broker.setcommission(commission=2.0, margin=2000.0, mult=10.0)

thestrats = cerebro.run()
thestrat = thestrats[0]

print('Коэффициент Шарпа:', thestrat.analyzers.mysharpe.get_analysis()['sharperatio'])

"""
>>>Коэффициент Шарпа: -0.8729741832630236
"""

<video src="https://www.youtube.com/watch?v=2u007Msq1qo">