Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Q-learning applied to trading

Here we present an approach for training an agent to behave optimally in a trading environment. We model such an environment with some simplifying assumptions:

  • There is only one opportunity to trade per day which must be done by paying the close price provided in historical market data.
  • An agent either has a long or short position in one security. The exact number of shares is made irrelevant by using cumulative returns.
  • No transaction costs are taken into account.


To train an agent using the Q-learning method we first determine an indicative set of signals to represent the lucrative potential of changing stock prices. e.g. price momentum, volatility, moving average crossings are used. These values are encoded into a bitfield state value.

The Q-learning algorithm is practically agnostic to the trading environment. The abstract state values are simply numbers (encoded bitfields) which the algorithm uses to track the perceived values of buying or selling while in any one state.

An agent is tested using a cross-validated approach, whereby the market data is sliced into, e.g. 5, equally sized portions. The agent is trained on one portion and its performance is then tested against the remaining 4. This process is repeated until each slice acts as the training set. For each slice, training occurs over multiple "episodes". A single episode involves iterating through all records in a training set.


The performance is measured as the total return from applying the train and test procedure mentioned above. A total return of 1 means the agent has as much money at the end of trading as it did at the start.

Results are presented here. With Q-learning the average return and deviation were just about 1. Compared to a randomized trading strategy the standard deviation is significantly lower.

Q-learned returns histogram

  • alpha and gamma parameters varied in range [0.1, 0.9]
  • epsilon set to 0.1
  • 500 unique stocks used, market data spanning 2000-2012
  • Each stock trained and tested in 5-way cross-validated manner
  • Agent trained for 100 episodes for each training set
  • ~800k data points


  • Data generated by running:
    q trade.q

Random trading returns histogram

  • Long and short state chosen randomly
  • ~160k data points


  • Data generated by running:
    q trade.q

alpha & gamma

The data suggests higher returns are produced with higher values of alpha and gamma. Experiments were conducted with the cross product of alpha and gamma values in the range [0.1, 0.9]. Average returns are plotted below with the log of standard deviation added as a band.

alpha gamma

Top 3 Runs


  • Data generated by running:
    q trade.q