# AutoTrader (Q-Learning Trader)

This project aims to create a Q-Learning that learns how to trade cryptocurrencies by analysising historical data and finding the patterns of which indicates whether the agent should `BUY`, `SELL` or `NADA` when choosing its actions.

### Data Source

The data source for this experiment will be from the US-based cryptocurrency exchange [Poloniex](http://poloniex.com/). Where there are dozens of digital assets readily available for trading.

In [32]:
# Load poloniex API Wrapper
from poloniex import Poloniex
from datetime import datetime, date
import time
import pandas as pd
from market_env import DataSource

from IPython.display import display # Allows the use of display() for DataFrames

# Previously set random_state, so that we can have reproducibility of results
random_state = 42

# Pretty display for notebooks
%matplotlib inline

# Build data into data frame
def build_dataframe(data):
  df = pd.DataFrame.from_dict(data)
  df['date'] = pd.to_datetime(df['date'], unit='s')
  df.set_index('date', inplace=True)
  df = df.convert_objects(convert_numeric=True)
  return df

def to_timestamp(dt):
  return (dt - datetime(1970, 1, 1)).total_seconds()


pol = Poloniex()
pair = 'USDT_BTC'

start_date = to_timestamp(datetime(2016, 1, 1))
#end_date = to_timestamp(datetime(2016, 12, 31))
end_date = time.time()

# Timestamp periods
SECOND = 1
MINUTE = 60 * SECOND
HOUR = MINUTE * 60
DAY = 24 * HOUR
WEEK = 7 * DAY

period = 300 # 5 minutes
data = pol.returnChartData(pair, DAY, start_date, end_date)
data = build_dataframe(data)
data['Return'] = data['close'].pct_change()

pctrank = lambda x: pd.Series(x).rank(pct=True).iloc[-1]
data['ClosePctl'] = data['close'].expanding(100).apply(pctrank)
data



Unnamed: 0_level_0,close,high,low,open,quoteVolume,volume,weightedAverage,Return,OtherReturn,PctTest,ClosePctl
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2016-01-01,434.990000,435.000000,428.260002,428.260002,6.887257,2.975027e+03,431.961057,,,,
2016-01-02,436.949900,438.140000,430.500000,434.990000,2.179981,9.448533e+02,433.422678,0.004506,0.004506,-0.995494,
2016-01-03,428.140000,435.614998,426.453089,432.310000,1.596349,6.839185e+02,428.426673,-0.020162,-0.020162,-1.020162,
2016-01-04,432.000011,435.999999,427.291140,427.291141,1.673428,7.249760e+02,433.228179,0.009016,0.009016,-0.990984,
2016-01-05,430.376774,435.999999,429.569500,430.140211,0.992530,4.276473e+02,430.865842,-0.003757,-0.003757,-1.003757,
2016-01-06,427.500020,435.000000,427.290001,430.170473,1.903358,8.193448e+02,430.473321,-0.006684,-0.006684,-1.006684,
2016-01-07,458.897002,458.897002,427.500573,427.500573,24.013560,1.070624e+04,445.841533,0.073443,0.073443,-0.926557,
2016-01-08,453.050000,468.000000,447.000000,451.100011,12.930452,5.912948e+03,457.288610,-0.012741,-0.012741,-1.012741,
2016-01-09,447.420001,457.398054,447.000379,453.050000,4.181581,1.879485e+03,449.467648,-0.012427,-0.012427,-1.012427,
2016-01-10,450.728520,453.030000,440.020007,447.420001,20.461995,9.116861e+03,445.550928,0.007395,0.007395,-0.992605,


In [39]:
import gym
import pandas as pd
import numpy as np
import math
from poloniex import Poloniex

def sharpe_ratio(returns, freq=365) :
  """Given a set of returns, calculates naive (rfr=0) sharpe """
  return (np.sqrt(freq) * np.mean(returns))/np.std(returns)

def prices_to_returns(prices):
  px = pd.DataFrame(prices)
  nl = px.shift().fillna(0)
  R = ((px - nl)/nl).fillna(0).replace([np.inf, -np.inf], np.nan).dropna()
  R = np.append( R[0].values, 0)
  return R


class DataSource(object):
    '''
    Class responsible for querying data from Poloniex API, prepares it for the trading environment
    and then also acts as as data source for each step on the environment
    '''
    MinPercentileDays = 100 
    def __init__(self, pair, period, start_date, end_date, days = 365, client = Poloniex()):
        self.pair = pair
        self.days = days
        data = client.returnChartData(pair, period, start_date, end_date)
        df = pd.DataFrame.from_dict(data)
        df['date'] = pd.to_datetime(df['date'], unit='s')
        df.set_index('date', inplace=True)
        df = df.convert_objects(convert_numeric=True)
        df['return'] = df['close'].pct_change().dropna()
        pctrank = lambda x: pd.Series(x).rank(pct=True).iloc[-1]
        df['close_pctl'] = df['close'].expanding(self.MinPercentileDays).apply(pctrank)
        df['volume_pctl'] = df['volume'].expanding(self.MinPercentileDays).apply(pctrank)
        df.dropna(axis=0,inplace=True)
        self.data = df[['close', 'volume','return', 'close_pctl', 'volume_pctl']]

    def reset(self):
        self.idx = np.random.randint( low = 0, high=len(self.data.index)-self.days )
        self.step = 0

    def _step(self):
        '''
        Step function is responsible for geting the data for the current step and return it,
        it also returns whether we are done with the data as well
        '''
        obs = self.data.iloc[self.idx].as_matrix()
        self.idx += 1
        self.step += 1
        done = self.step >= self.days
        return obs, done

    def get_data(self):
        return self.data

class MarketEnv(gym.Env):
    '''
    This GYM implements an environment in which an agent can simulate cryptocurrency trading
    '''
    def __init__(self):
        pass

    def reset(self):
        pass

    def _step(self):
        pass

# get the data from `returnChartData`
src = DataSource(pair, DAY, start_date, end_date)
display(src.get_data())


src.get_data()['close'].expanding(100)



Unnamed: 0_level_0,close,volume,return,close_pctl,volume_pctl
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2016-04-09,420.500000,5.584545e+04,0.003580,0.660000,0.770000
2016-04-10,419.719304,9.035382e+04,-0.001857,0.653465,0.940594
2016-04-11,422.752628,1.632047e+04,0.007227,0.705882,0.372549
2016-04-12,424.890000,6.984433e+04,0.005056,0.766990,0.864078
2016-04-13,425.000000,2.920716e+04,0.000259,0.769231,0.509615
2016-04-14,427.179969,3.522339e+04,0.005129,0.800000,0.571429
2016-04-15,430.617096,2.898400e+04,0.008046,0.839623,0.500000
2016-04-16,430.850000,2.008052e+04,0.000541,0.841121,0.411215
2016-04-17,428.000000,7.185477e+04,-0.006615,0.796296,0.870370
2016-04-18,430.310530,4.677067e+04,0.005398,0.816514,0.688073


Expanding [min_periods=100,center=False,axis=0]

### Visualize data

There are a few important public datapoints that are important for us in this context:

- We need to know the state of the price at that moment, given a pair. This can be accomplished by calling the `returnTicker` function passing in the trading pair (`BTC_ETH`) as an example.
- Also, it is important to have a feel for what is the state of current orders in the market, what does the current order book for this pair look like? What is the volume being traded?
- Thirdly, its important to have a feel for the price history of that given pair and how its been evolving. What is the daily return? What is the sharpe-ratio? Simple moving average?

### Environment

In order to be able to create a trading bot, we need to be able to simulate a trading environment, where the bot can visualize the current state of the market and take actions depending on it. For this to be done we need:

- A market environment where the reinforcement learning agent can act upon and measure its performance;
- A reward scheme, where the agent can get rewarded depending on its actions;


#### Agent

Aiming to simplify the task at hand, we shall create an agent that chooses the available actions randomly.

