Updated README.md and added suitable errors against user actions

samre12 · Apr 20, 2018 · b9af98d · b9af98d
1 parent 5c25e52
commit b9af98d
Show file tree

Hide file tree

Showing 6 changed files with 216 additions and 30 deletions.
diff --git a/README.md b/README.md
@@ -1 +1,160 @@
-# gym-cryptotrading
+# Gym CryptoTrading Environment
+
+[![license](https://img.shields.io/packagist/l/doctrine/orm.svg)](https://github.com/samre12/deep-trading-agent/blob/master/LICENSE)
+[![dep2](https://img.shields.io/badge/python-2.7-red.svg)](https://www.python.org/download/releases/2.7/)
+[![dep3](https://img.shields.io/badge/status-in%20progress-green.svg)](https://github.com/samre12/gym-cryptotrading/)
+[![dep4](https://img.shields.io/circleci/project/github/RedSparr0w/node-csgo-parser.svg)](https://github.com/samre12/gym-cryptotrading/)
+
+Gym Environment API based Bitcoin trading simulator with continuous observation space and discrete action space. It uses real world transactions from **CoinBaseUSD** exchange to sample *per minute closing, lowest and highest prices along with volume of the currency traded* in the particular minute interval.
+
+**Contents of this document**
+
+- [Installation](#installation)
+- [Usage](#usage)
+- [Environment](#env)
+    - [Obsevation Space](#obs)
+    - [Action Space](#action)
+    - [Parameters](#params)
+    - [Simulator](#simulator)
+- [Important Information](#inf)
+- [Examples](#exp)
+
+<a name="introduction"></a> 
+
+## Installation
+
+```bash
+git clone https://github.com/samre12/gym-cryptotrading.git
+cd gym-cryptotrading
+pip install -e .
+```
+
+<a name="usage"></a>
+
+## Usage
+
+Importing the module into the current session using `import gym_cryptotrading` will register the environment with `gym` after which it can be used as any other gym environment.
+
+```python
+import gym
+import gym_cryptotrading
+env = gym.make('CryptoTrading-v0')
+```
+
+- Use `env.reset()` to start a new random episode.
+
+    - returns history of observations prior to the starting point of the episode. Look [Parameters](#params) for more information.
+
+    ```python
+    state = env.reset() # use state to make initial prediction
+    ```
+
+    **Note:** Make sure to reset the environment before first use else `gym.error.ResetNeeded()` will be raised.
+
+- Use `env.step(action)` to take one step in the environment.
+
+    - returns `(observation, reward, is_terminal)` in respective order
+
+    ```python
+    observation, reward, is_terminal = env.step(action)
+    ```
+
+    **Note:** Calling `env.step(action)` after the terminal state is reached will raise `gym.error.ResetNeeded()`.
+
+- With the current implementation, the environment does not support `env.render()`.
+
+Setting the logging level of `gym` using `gym.logger.set_level(level)` to a value less than or equal 10 will allow to track all the logs (`debug` and `info` levels) generated by the environment.</br>
+These include human readable timestamps of Bitcoin prices used to simulate an episode.
+
+<a name="env"></a>
+
+## Environment
+
+<a name="obs"></a>
+
+### Observation Space
+
+- Observation at a time step is `(closing, lowest, highest, volume)` of Bitcoin in the corresponding minute interval.
+
+- Since the price of Bitcoin varies from a few dollars to 15K dollars, the observation for time step i + 1 is normalized by the prices at time instant i.
+
+Each entry in the observation is the ratio of *increase (value greater than 1.0)* or *decrease (value lessar than 1.0)* from the price at previos time instant.
+
+<a name="action"></a>
+
+### Action Space
+
+At each time step, the agent can either go **LONG** or **SHORT** in a `unit` (for more information , refer to [Parameters](#params)) of Bitcoin of can stay **NEUTRAL**.</br>
+Action space thus becomes *discrete* with three possible actions:
+
+- `NEUTRAL` corresponds to `0`
+
+- `LONG` corresponds to `1`
+
+- `SHORT` corresponds to `2`
+
+**Note:** Use `env.action_space.get_action(action)` to lookup action names corresponding to their respective values.
+
+<a name="params"></a>
+
+### Parameters
+
+The environment is characterized with these parameters:
+
+- `history_length` lag in the observations that is used for the state representation of the trading agent.</br>
+
+    - every call to `env.reset()` returns a numpy array of shape `(history_length,) + shape(observation)` that corresponds to observations of length `history_length` prior to the starting point of the episode.
+
+    - trading agent can use the returned array to predict the first action
+
+    - defaults to `100`.
+
+    - supplied value must be greater than or equal to `0`
+
+- `horizon` alternatively **episode length** is the number trades that the agent does in a single episode
+
+    - defaults to `5`.
+
+    - supplied value must be greater than `0`
+
+- `unit` is the fraction of Bitcoin that can be traded in each time step
+
+    - defaults to `5e-4`.
+
+    - supplied value must be greater than `0`
+
+**Usage**
+
+```python
+env = gym.make('CryptoTrading-v0')
+env.env.set_params(history_length, horizon, unit)
+```
+
+**Note:** parameters can only be set before first reset of the environment, that is, before the first call to `env.reset()`, else `gym_cryptotrading.errors.EnvironmentAlreadyLoaded` will be raised
+
+<a name="simulator"></a>
+
+### Simulator
+
+- Dataset for per minute prices of Bitcoin is not continuos and compute due to the downtime of the exchanges.
+
+- Current implementation does not make any assumptions about the missing values. 
+
+- It rather finds continuos blocks with lengths greater than `history_length + horizon + 1` and use them to simulate episodes. This avoids any discrepancies in results due to random subsitution of missing values
+
+<a name="inf"></a>
+
+## Important Information
+
+Upon first use, the environment downloads latest transactions dataset from the exchange which are then cached in *tempory directory* of the operating system for future use.</br>
+
+- A user can also update the latest transactions dataset by calling `gym_cryptotrading.Generator.update_gen()` **prior** to making the environment.
+
+    - Updating the latest transactions won't reflect in environments made earlier.
+
+- If you are running the environment behind a proxy, export suitalble **http proxy settings** to allow the environment to download transactions from the exchange
+
+<a name="exp"></a> 
+
+## Examples
+Coming soon.
diff --git a/gym_cryptotrading/envs/basicenv.py b/gym_cryptotrading/envs/basicenv.py
@@ -3,8 +3,11 @@
 import gym
 from gym import error, logger
 
+from abc import abstractmethod
+
 from gym_cryptotrading.generator import Generator
 from gym_cryptotrading.strings import *
+from gym_cryptotrading.errors import *
 
 from gym_cryptotrading.spaces.action import ActionSpace
 from gym_cryptotrading.spaces.observation import ObservationSpace
@@ -14,23 +17,37 @@ class BaseEnv(gym.Env):
     observation_space = ObservationSpace()
     metadata = {'render.modes': []}
 
-    def __init__(self, history_length=100, horizon=5, unit=5e-4):
+    def __init__(self):
         self.episode_number = 0
-        self.timesteps = None
-        self.history_length = history_length
-        self.horizon = horizon
-        self.unit = unit #units of Bitcoin traded each time
+        self.generator = None
+
+        self.history_length = 100 
+        self.horizon = 5 
+        self.unit = 5e-4
 
-        self.timesteps = None
+    def set_params(self, history_length, horizon, unit):
+        if self.generator:
+            raise EnvironmentAlreadyLoaded()
+
+        if history_length < 0 or horizon < 1 or unit < 0:
+            raise ValueError()
+
+        else:
+            self.history_length = history_length
+            self.horizon = horizon
+            self.unit = unit #units of Bitcoin traded each time
 
-        self.generator = Generator(history_length, horizon)
+    def _load_gen(self):
+        if not self.generator:
+            self.generator = Generator(self.history_length, self.horizon)
 
     def _new_random_episode(self):
         '''
         TODO: In the current setting, the selection of an episode does not follow pure uniform process. 
         Need to index every episode and then generate a random index rather than going on multiple levels
         of selection.
         '''
+        self._load_gen()
         self._reset_params()
         message_list = []
         self.episode_number = self.episode_number + 1
@@ -59,13 +76,15 @@ def _new_random_episode(self):
 
         return self.historical_prices[self.current - self.history_length:self.current]
 
+    @abstractmethod
     def _reset_params(self):
         pass
 
+    @abstractmethod
     def _take_action(self, action):
-        if action not in BaseEnv.action_space.lookup.keys():
-            raise error.InvalidAction()
-        
+        pass
+
+    @abstractmethod
     def _get_reward(self):
         return 0
 
@@ -75,6 +94,10 @@ def _get_new_state(self):
     def reset(self):
         return self._new_random_episode()
 
+    @abstractmethod
     def step(self, action):
-        raise NotImplementedError()
+        state = self._get_new_state()
+        self._take_action(action)
+        reward = self._get_reward()
+        return state, reward, False, None
 
diff --git a/gym_cryptotrading/envs/cryptotrading.py b/gym_cryptotrading/envs/cryptotrading.py
@@ -6,21 +6,22 @@
 from gym_cryptotrading.envs.basicenv import BaseEnv
 
 class CryptoTradingEnv(BaseEnv):
-    def __init__(self, history_length=100, horizon=5, unit=5e-4):
-        super(CryptoTradingEnv, self).__init__(history_length, horizon, unit)
+    def __init__(self):
+        super(CryptoTradingEnv, self).__init__()
 
     def _reset_params(self):
         self.long, self.short = 0, 0
         self.timesteps = 0
 
     def _take_action(self, action):
-        super(CryptoTradingEnv, self)._take_action(action)
-
-        if BaseEnv.action_space.lookup[action] is LONG:
-            self.long = self.long + 1
-
-        elif BaseEnv.action_space.lookup[action] is SHORT:
-            self.short = self.short + 1
+        if action not in BaseEnv.action_space.lookup.keys():
+            raise error.InvalidAction()
+        else:
+            if BaseEnv.action_space.lookup[action] is LONG:
+                self.long = self.long + 1
+
+            elif BaseEnv.action_space.lookup[action] is SHORT:
+                self.short = self.short + 1
 
     def _get_reward(self):
         return (self.long - self.short) * self.unit * self.diffs[self.current]
@@ -41,7 +42,7 @@ def step(self, action):
         self.timesteps = self.timesteps + 1
         if self.timesteps is not self.horizon:
             self.current = self.current + 1
-            return state, reward, False, (self.horizon - self.timesteps)
+            return state, reward, False, None
         else:
-            return state, reward, True, (self.horizon - self.timesteps)
+            return state, reward, True, None
 
diff --git a/gym_cryptotrading/errors.py b/gym_cryptotrading/errors.py
@@ -0,0 +1,8 @@
+from gym.error import Error
+
+class EnvironmentAlreadyLoaded(Error):
+    '''
+    Raised when user tries to set the parameters of the environment that is
+    already loaded.
+    '''
+    pass
diff --git a/gym_cryptotrading/spaces/observation.py b/gym_cryptotrading/spaces/observation.py
@@ -1,10 +1,8 @@
 import numpy as np
 
-import gym
+from gym import Space
 
-from gym_cryptotrading.strings import BAD_OBSERVATION
-
-class ObservationSpace(gym.Space):
+class ObservationSpace(Space):
     max_ratio = 3.0
 
     def __init__(self):

diff --git a/gym_cryptotrading/strings.py b/gym_cryptotrading/strings.py
@@ -3,8 +3,5 @@
 NEUTRAL = 'neutral'
 SHORT = 'short'
 
-#errors
-BAD_OBSERVATION = 'Invalid Observation'
-
 #url
 URL = 'http://api.bitcoincharts.com/v1/csv/coinbaseUSD.csv.gz'