In [1]:
import utils

# Get the default financial and AC Model parameters
financial_params, ac_params = utils.get_env_param()

In [2]:
financial_params

0,1,2,3
Annual Volatility:,12%,Bid-Ask Spread:,0.125
Daily Volatility:,0.8%,Daily Trading Volume:,5000000.0


In [3]:
ac_params

0,1,2,3
Total Number of Shares to Sell:,1000000,Fixed Cost of Selling per Share:,$0.062
Starting Price per Share:,$50.00,Trader's Risk Aversion:,1e-06
Price Impact for Each 1% of Daily Volume Traded:,$2.5e-06,Permanent Impact Constant:,2.5e-07
Number of Days to Sell All the Shares:,60,Single Step Variance:,0.144
Number of Trades:,60,Time Interval between trades:,1.0


In [5]:
import numpy as np

import importlib
import syntheticChrissAlmgren as sca
importlib.reload(sca)
import ddpg_agent
from ddpg_agent import Agent

from collections import deque

def test_agent(use_custom_reward, sparse_reward):
    # Create simulation environment
    env = sca.MarketEnvironment()

    # Initialize Feed-forward DNNs for Actor and Critic models.
    agent = Agent(state_size=env.observation_space_dimension(), action_size=env.action_space_dimension(), random_seed=0)

    # Set the liquidation time
    lqt = 60

    # Set the number of trades
    n_trades = 60

    # Set trader's risk aversion
    tr = 1e-6

    # Set the number of episodes to run the simulation
    episodes = 1000

    shortfall_hist = np.array([])
    shortfall_deque = deque(maxlen=100)

    for episode in range(episodes):
        # Reset the enviroment
        cur_state = env.reset(seed=episode, liquid_time=lqt, num_trades=n_trades, lamb=tr, use_custom_reward=use_custom_reward, sparse_reward=sparse_reward)


        # set the environment to make transactions
        env.start_transactions()

        for i in range(n_trades + 1):

            # Predict the best action for the current state.
            action = agent.act(cur_state, add_noise = True)

            # Action is performed and new state, reward, info are received.
            new_state, reward, done, info = env.step(action)

            # current state, action, reward, new state are stored in the experience replay
            agent.step(cur_state, action, reward, new_state, done)

            # roll over new state
            cur_state = new_state

            if info.done:
                shortfall_hist = np.append(shortfall_hist, info.implementation_shortfall)
                shortfall_deque.append(info.implementation_shortfall)
                break


    print(f'\nAverage Implementation Shortfall for sparse_reward={sparse_reward}: ${np.mean(shortfall_hist):,.2f} \n')


# Dense reward (normal case)
test_agent(use_custom_reward=True, sparse_reward=False)

# Sparse reward (only at end)
test_agent(use_custom_reward=True, sparse_reward=True)


Average Implementation Shortfall for sparse_reward=False: $2,483,291.89 


Average Implementation Shortfall for sparse_reward=True: $2,534,694.13 



# Todo

The above code should provide you with a starting framework for incorporating more complex dynamics into our model. Here are a few things you can try out:

- Explain why log-returns in a time window of 6 periods, along with $m_k$ and $i_k$ is a good choice for the state? Could you expand or shrink $D$ = number of past log-returns (which is considered $D=5$) to get better results?

- Incorporate your own reward function in the simulation environmet to see if you can achieve a expected shortfall that is better (lower) than that produced by the Almgren and Chriss model.


- Experiment rewarding the agent at every step and only giving a reward at the end. Which is; what happens if the reward function is sparse?


- Use more realistic price dynamics, such as geometric brownian motion (GBM). The equations used to model GBM can be found in section 3b of paper: GBM


- Try different functions for the action. You can change the values of the actions produced by the agent by using different functions. You can choose your function depending on the interpretation you give to the action. For example, you could set the action to be a **function of the trading rate**.


- Add more complex dynamics to the environment. Try incorporate trading fees, for example. This can be done by adding and extra term to the fixed cost of selling, $\epsilon$.

- Use SAC (soft actor-critic) and TD3 (Twin Delayed Deep Deterministic) with different hyperparameters and network structures to compare your results to DDPG results. Explain why this happens.