<img src="static/img/reminder.jpg">

# Capstone Project Demonstration


In this capstone we explored the usage of Reinforcement Learningfor portfolio construction and enhancement of quantitative investmentstrategies (QIS). Particularly we explored the usage of Policy GradientMethods (PGM) due to their ability to handle continuous action spaces.We used PGM to create a model free agent that selects portfolio weights according to a diverse set of features that we considered as space. The PGM explored are: REINFORCE, REINFORCEwith baseline, Actor-Critic, Actor-Critic with eligibility traces and soft-actor Critic.

Exploring model-free reinforcement leanring algorithms in portfolio allocation that can be generalized to any type of features provides ground work to integrate signal discovery into portfolio allocation. 


## Stones on the way ( Problems we faced)

We will focus particularly on the problem that we faced related to the proposed models.

1. Slow convergence : REINFORCE and REINFORCE with baselines experienced extremely slow convergence in the test set. This make us consider the impracticity of the algorithms for real world solutions where the number of features are complexity of the data require a higher dimensional space. 

2. Complex model pipelines: RL implementation requires more complex model pipelines than other machine learning models due to the necesity of creating different assets likes:environment, actors and policies. The interaction of assets in the algorithm creates a complex relation that is not simple to paralelize or transport to other devices. For example; in Soft Actor Critic, the agent  has a 4 architectures one model for the policy mean one for the policy variance and 2 for a twin Q function. Each of this model is a Neural Network that needs to be trained in synchrony at each step. 

3. Sampling efficiency. As with any reinforcement leanring algorithm, a great amount of time is spent in sampling sars from the environment.

4. Parametrizing the standard deviation on the normal policy did not seem to bring any improvement as we couldnt achieve learning on this parameter. 


## Model Test and proper model function

For the control dataset, we simulated different assets using a classical geometric Brownian motion process for each of the assets i.e.

$$
dS_t=\mu S_tdt+\sqrt{\sigma}S_tdB_t
$$

The control data set is built to measure the porformance of each model/algorithm against known solutions given a constant drift and a constant volatility. 

We measured each algorithm on a 2-asset simulated data using two different reward windows.

1. Next period return: On each observation the agent gets as reward the return of the portfolio for the next period. 
2. Negative of squared return : On each observation the agent gets as reward the negative of the squared return of the portfolio in the next period. 

With only two assets we expect that our algorithm will converge to the asset with highest return for the next period return reward and to the asset with the smaller volatility in the second reward. 

## Reward Functions



$$
r_t=\text{Reward at time "t"}
$$
$$
\Pi_{t}|a_{t-1}=\text{Portfolio at time "t", given action was taken at time t-1}
$$

For now on we will use $\Pi_{t}=\Pi_{t}|a_{t-1}$


### Max Return

$$
r_t=\frac{\Pi_{t+1}}{\Pi_{t}}-1
$$

### Min Quadratic Return
$$
r_t=-(\frac{\Pi_{t+1}}{\Pi_{t}}-1)^2
$$

### Return with Quadratic Risk
$$
r_t=[\frac{\Pi_{t+1}}{\Pi_{t}}-1]-\lambda a_t^t\Sigma a_t
$$

Where sigma is the asset returns covariance matrix

### Test Runs

To run test on simulated assets, user just neeed to define a dictionary with the assets characteristics. 
and use the class method 'build_environment_from_simulated_assets' from the Environment class. Below we show the run for REINFORCE and ACTOR_CRITIC. 



<img src="static/img/conver_1.png">

<img src="static/img/conv2.png">

In [1]:

from environments.e_greedy import DeepTradingEnvironment, LinearAgent
import datetime
import numpy as np


out_reward_window=datetime.timedelta(days=1)
# parameters related to the transformation of data, this parameters govern an step before the algorithm
meta_parameters = {"in_bars_count": 64,
                   "out_reward_window":out_reward_window ,
                   "state_type":"in_window_out_window",
                   "asset_names":["asset_1","asset_2"],
                   "risk_aversion":1,
                   "include_previous_weights":False}

# parameters that are related to the objective/reward function construction
objective_parameters = {"percent_commission": .001,
                        }
print("===Meta Parameters===")
print(meta_parameters)
print("===Objective Parameters===")
print(objective_parameters)

assets_simulation_details={"asset_1":{"method":"GBM","sigma":.01,"mean":.02},
                    "asset_2":{"method":"GBM","sigma":.03,"mean":.18}}

env=DeepTradingEnvironment.build_environment_from_simulated_assets(assets_simulation_details=assets_simulation_details,
                                                                     data_hash="simulation_gbm",
                                                                     meta_parameters=meta_parameters,
                                                                     objective_parameters=objective_parameters)

def create_environment():
    env=DeepTradingEnvironment.build_environment_from_simulated_assets(assets_simulation_details=assets_simulation_details,
                                                                     data_hash="simulation_gbm",
                                                                     meta_parameters=meta_parameters,
                                                                     objective_parameters=objective_parameters)
    return env



===Meta Parameters===
{'in_bars_count': 64, 'out_reward_window': datetime.timedelta(days=1), 'state_type': 'in_window_out_window', 'asset_names': ['asset_1', 'asset_2'], 'risk_aversion': 1, 'include_previous_weights': False}
===Objective Parameters===
{'percent_commission': 0.001}
covariance rolling estimate 128


In [2]:
cov=np.array([[assets_simulation_details["asset_1"]["sigma"]**2,0],
             [0,assets_simulation_details["asset_2"]["sigma"]**2]])/252

In [None]:
env=create_environment()
# env.state.reward_factory.ext_covariance=cov
linear_agent=LinearAgent(environment=env,out_reward_window_td=out_reward_window,
                         reward_function="return_with_variance_risk",sample_observations=32)
linear_agent.REINFORCE_fit(add_baseline=False,max_iterations=4000,plot_every=2000, verbose=True)

pre-sampling indices:   4%|██▏                                                      | 54/1373 [00:00<00:05, 261.40it/s]

covariance rolling estimate 128


pre-sampling indices: 100%|███████████████████████████████████████████████████████| 1373/1373 [00:04<00:00, 286.34it/s]
  5%|████▎                                                                          | 218/4000 [00:15<04:21, 14.45it/s]

In [None]:
env=create_environment()
# env.state.reward_factory.ext_covariance=cov
linear_agent=LinearAgent(environment=env,out_reward_window_td=out_reward_window,
                         reward_function="cum_return",sample_observations=32)
linear_agent.REINFORCE_fit(add_baseline=True,max_iterations=4000,plot_every=3999, verbose=True)

In [None]:
env=create_environment()
# env.state.reward_factory.ext_covariance=cov
linear_agent=LinearAgent(environment=env,out_reward_window_td=out_reward_window,
                         reward_function="cum_return",sample_observations=32)
linear_agent.ACTOR_CRITIC_FIT(use_traces=True,max_iterations=4000, verbose=True)

In [None]:

from environments.e_greedy import DeepTradingEnvironment, LinearAgent,DeepAgentPytorch
import datetime
import numpy as np


out_reward_window=datetime.timedelta(days=1)
# parameters related to the transformation of data, this parameters govern an step before the algorithm
meta_parameters = {"in_bars_count": 30,
                   "out_reward_window":out_reward_window ,
                   "state_type":"in_window_out_window",
                   "asset_names":["asset_1","asset_2"],
                   "risk_aversion":.001,
                   "include_previous_weights":False}

# parameters that are related to the objective/reward function construction
objective_parameters = {"percent_commission": .001,
                        }
print("===Meta Parameters===")
print(meta_parameters)
print("===Objective Parameters===")
print(objective_parameters)

assets_simulation_details={"asset_1":{"method":"GBM","sigma":.01,"mean":.02},
                    "asset_2":{"method":"GBM","sigma":.03,"mean":.18}}

env_min_vol=DeepTradingEnvironment.build_environment_from_simulated_assets(assets_simulation_details=assets_simulation_details,
                                                                     data_hash="simulation_gbm",
                                                                     meta_parameters=meta_parameters,
                                                                     objective_parameters=objective_parameters)
def create_environment():
    env=DeepTradingEnvironment.build_environment_from_simulated_assets(assets_simulation_details=assets_simulation_details,
                                                                     data_hash="simulation_gbm",
                                                                     meta_parameters=meta_parameters,
                                                                     objective_parameters=objective_parameters)
    return env


In [None]:

env=create_environment()
# env.state.reward_factory.ext_covariance=cov
linear_agent_min_vol=LinearAgent(environment=env,out_reward_window_td=out_reward_window,
                         reward_function="return_with_variance_risk",sample_observations=32)
linear_agent_min_vol.REINFORCE_fit(add_baseline=False,max_iterations=4000,plot_every=2000, verbose=True)

In [None]:
env.state.forward_returns.cov()

In [None]:
env.state.forward_returns.ewm(alpha=.01,).cov()

In [None]:
env=create_environment()
# env.state.reward_factory.ext_covariance=cov
linear_agent_min_vol=LinearAgent(environment=env,out_reward_window_td=out_reward_window,
                         reward_function="min_vol",sample_observations=32)
linear_agent_min_vol.REINFORCE_fit(add_baseline=True,max_iterations=4000,plot_every=3999, verbose=True)

In [None]:
env=create_environment()
# env.state.reward_factory.ext_covariance=cov

linear_agent_min_vol=LinearAgent(environment=env_min_vol,out_reward_window_td=out_reward_window,
                         reward_function="return_with_variance_risk",sample_observations=32)
linear_agent_min_vol.ACTOR_CRITIC_FIT(use_traces=True,max_iterations=4000,plot_every=2000, verbose=True)

In [None]:

from environments.e_greedy import DeepTradingEnvironment, LinearAgent,DeepAgentPytorch
import datetime
import numpy as np


out_reward_window=datetime.timedelta(days=1)
# parameters related to the transformation of data, this parameters govern an step before the algorithm
meta_parameters = {"in_bars_count": 30,
                   "out_reward_window":out_reward_window ,
                   "state_type":"in_window_out_window",
                   "asset_names":["asset_1","asset_2"],
                   "risk_aversion":.95,
                   "include_previous_weights":False}

# parameters that are related to the objective/reward function construction
objective_parameters = {"percent_commission": .001,
                        }
print("===Meta Parameters===")
print(meta_parameters)
print("===Objective Parameters===")
print(objective_parameters)

assets_simulation_details={"asset_1":{"method":"GBM","sigma":.01,"mean":.02},
                    "asset_2":{"method":"GBM","sigma":.03,"mean":.18}}

env_min_vol=DeepTradingEnvironment.build_environment_from_simulated_assets(assets_simulation_details=assets_simulation_details,
                                                                     data_hash="simulation_gbm",
                                                                     meta_parameters=meta_parameters,
                                                                     objective_parameters=objective_parameters)
def create_environment():
    env=DeepTradingEnvironment.build_environment_from_simulated_assets(assets_simulation_details=assets_simulation_details,
                                                                     data_hash="simulation_gbm",
                                                                     meta_parameters=meta_parameters,
                                                                     objective_parameters=objective_parameters)
    
    return env

In [None]:

env=create_environment()
# env.state.reward_factory.ext_covariance=cov
linear_agent_min_vol=LinearAgent(environment=env,out_reward_window_td=out_reward_window,
                         reward_function="return_with_variance_risk",sample_observations=32)
linear_agent_min_vol.REINFORCE_fit(add_baseline=False,max_iterations=4000,plot_every=2000, verbose=True)

### Soft Actor Critic. 

We separated the execution of Soft actor critic as it uses a third party library. 

In [None]:
from environments.open_ai import DeepTradingEnvironment
from algorithms.sac.sac import sac as sac_capstone
import pandas as pd
from algorithms.sac.core import MLPActorCritic as MLPActorCriticCapstone

In [None]:
out_reward_window=datetime.timedelta(days=1)
meta_parameters = {"in_bars_count": 30,
                   "out_reward_window":out_reward_window ,
                   "state_type":"in_window_out_window",
                   "asset_names":["asset_1","asset_2"],
                   "include_previous_weights":False}

objective_parameters = {"percent_commission": .001,
                        "reward_function":"min_realized_variance"
                        }
features=pd.read_parquet("/home/jose/code/capstone/temp_persisted_data/only_features_simulation_gbm")
forward_returns_dates=pd.read_parquet("/home/jose/code/capstone/temp_persisted_data/forward_return_dates_simulation_gbm")
forward_returns= pd.read_parquet("/home/jose/code/capstone/temp_persisted_data/only_forward_returns_simulation_gbm")
new_environment= DeepTradingEnvironment(objective_parameters=objective_parameters,meta_parameters=meta_parameters,
                                        features=features,
                                        forward_returns=forward_returns,
                                        forward_returns_dates=forward_returns_dates)



env_fun =lambda : DeepTradingEnvironment(objective_parameters=objective_parameters,meta_parameters=meta_parameters,
                                        features=features,
                                        forward_returns=forward_returns,
                                        forward_returns_dates=forward_returns_dates)



#cum return
sac_capstone(env_fn=env_fun,actor_critic=MLPActorCriticCapstone,ac_kwargs={"hidden_sizes":(1,)},update_every=32,steps_per_epoch=64,epochs=400,
             start_steps=32,update_after=32*5,alpha=.001*0, lr=1e-3,save_freq=10000,num_test_episodes=1
            )

In [None]:
out_reward_window=datetime.timedelta(days=1)
meta_parameters = {"in_bars_count": 30,
                   "out_reward_window":out_reward_window ,
                   "state_type":"in_window_out_window",
                   "asset_names":["asset_1","asset_2"],
                   "include_previous_weights":False}

objective_parameters = {"percent_commission": .001,
                        "reward_function":"cum_return"
                        }
features=pd.read_parquet("/home/jose/code/capstone/temp_persisted_data/only_features_simulation_gbm")
forward_returns_dates=pd.read_parquet("/home/jose/code/capstone/temp_persisted_data/forward_return_dates_simulation_gbm")
forward_returns= pd.read_parquet("/home/jose/code/capstone/temp_persisted_data/only_forward_returns_simulation_gbm")
new_environment= DeepTradingEnvironment(objective_parameters=objective_parameters,meta_parameters=meta_parameters,
                                        features=features,
                                        forward_returns=forward_returns,
                                        forward_returns_dates=forward_returns_dates)



env_fun =lambda : DeepTradingEnvironment(objective_parameters=objective_parameters,meta_parameters=meta_parameters,
                                        features=features,
                                        forward_returns=forward_returns,
                                        forward_returns_dates=forward_returns_dates)



#cum return
sac_capstone(env_fn=env_fun,actor_critic=MLPActorCriticCapstone,ac_kwargs={"hidden_sizes":(1,)},update_every=32,steps_per_epoch=64,epochs=400,
             start_steps=32,update_after=32*5,alpha=.001, lr=1e-3,save_freq=10000,num_test_episodes=1
            )


