From a5dca260d3b9d84e52df279a633d0a2e1b105541 Mon Sep 17 00:00:00 2001
From: ishikota <ishikota086@gmail.com>
Date: Sat, 26 Nov 2016 17:42:25 +0900
Subject: [PATCH] Update README.md

---
 README.md | 160 +++---------------------------------------------------
 1 file changed, 7 insertions(+), 153 deletions(-)

diff --git a/README.md b/README.md
index 1a8564a..8a71e2c 100644
--- a/README.md
+++ b/README.md
@@ -3,169 +3,23 @@
 [![Coverage Status](https://coveralls.io/repos/github/ishikota/kyoka/badge.svg?branch=master)](https://coveralls.io/github/ishikota/kyoka?branch=master)
 [![PyPI](https://img.shields.io/pypi/v/kyoka.svg?maxAge=2592000)](https://badge.fury.io/py/kyoka)
 [![license](https://img.shields.io/github/license/mashape/apistatus.svg?maxAge=2592000)](https://github.com/ishikota/kyoka/blob/master/LICENSE.md)
-## Implemented algorithmes
+## Supported algorithmes
 - MonteCarlo
 - Sarsa
 - QLearning
-- SarsaLambda
-- QLambda
-- deep Q-network (DQN)
+- deep QLearning (from DQN paper)
+- MonteCarloTreeSearch
 
 **Reference**
 - [Sutton & Barto Book: Reinforcement Learning: An Introduction](https://webdocs.cs.ualberta.ca/~sutton/book/ebook/the-book.html)
 - [Human-level control through deep reinforcement learning](http://www.nature.com/nature/journal/v518/n7540/abs/nature14236.html)
 
-# Getting Started
-## Motivation
-RL(Reinforcement Learning) algorithms  learns which action is good or bad through **trial-and-error**.  
-So what we need to do is **making our learning task in RL format**.
-
-This library provides two template classes to make your task in RL format.
-- `BaseDomain` class which represents our learning task
-- `ValueFunction` class which RL algorithm uses to save trial-and-error result
-
-So let's see how to use these template classes through simple *maze* example.
-
-## Example. Find the best policy to escape from the maze
-Here we will find the best policy to escape from the below maze by using RL algorithm.
-```
-S: start, G: goal, X: wall
-
--------XG
---X----X-
-S-X----X-
---X------
------X---
----------
-```
-
-### Step1. Create MazeDomain class
-`BaseDomain` class requires you to implement 5 methods
-- `generate_initial_state()`
-  - returns initial state that RL algorithms starts simulation from.
-- `generate_possible_actions(state)`
-  - returns valid actions in passed state. RL algorithms choose next action from these actions.
-- `transit_state(state, action)`
-  - returns next state after applied the passed action on the passed state.
-- `calculate_reward(state)`
-  - returns how good the passed state is.
-- `is_terminal_state(state)`
-  - returns if passed state is terminal state or not.
-  
-```python
-from kyoka.domain.base_domain import BaseDomain
-
-class MazeDomain(BaseDomain):
-
-  ACTION_UP = 0
-  ACTION_DOWN = 1
-  ACTION_RIGHT = 2
-  ACTION_LEFT = 3
-
-  # we use current position of the maze as "state". So here we return start position of the maze.
-  def generate_initial_state(self):
-    return (0, 0)
-
-  # the position of the goal is (row=0, column=8)
-  def is_terminal_state(self, state):
-    return (0, 8) == state
-
-  # we can always move to 4 directions.
-  def generate_possible_actions(self, state):
-    return [self.ACTION_UP, self.ACTION_DOWN, self.ACTION_RIGHT, self.ACTION_LEFT]
-
-  # RL algorithm can get reward only when he reaches to the goal.
-  def calculate_reward(self, state):
-    return 1 if self.is_terminal_state(state) else 0
-
-  def transit_state(self, state, action):
-    row, col = state
-    wall_position = [(1,2), (2,2), (3,2), (4,5), (0,7), (1,7), (2,7)]
-    height, width = 6, 9
-    if action == self.ACTION_UP:
-      row = max(0, row-1)
-    elif action == self.ACTION_DOWN:
-      row = min(height-1, row+1)
-    elif action == self.ACTION_RIGHT:
-      col= min(width-1, col+1)
-    elif action == self.ACTION_LEFT:
-      col = max(0, col-1)
-    if (row, col) not in wall_position:
-      return (row, col)
-    else:
-      return state # If destination is the wall or edge of the maze then position does not change.
-```
-
-Ok! next is `ValueFunction`!!
-
-### Step2. Create MazeActionValueFunction class
-`BaseActionValueFunction` class requires you to implement 2 methods.
-- `calculate_value(state, action)`
-  - fetch current value of state and action pair.
-- `update_function(state, action, new_value)`
-  - update value of passed state and action pair by passed value.
-
-The state space of this example is very small (state space = |state| x |action| = (6 x 9) x 4 = 216).  
-So we prepare the table (3-dimentional array) and save value on it.
-
-```python
-from kyoka.value_function.base_action_value_function import BaseActionValueFunction
-
-class MazeActionValueFunction(BaseActionValueFunction):
-
-  # call this method before start learning
-  def setUp(self):
-    maze_width, maze_height, action_num = 6, 9, 4
-    self.table = [[[0 for k in range(action_num)] for j in range(maze_height)] for i in range(maze_width)]
-
-  # just take value from the table
-  def calculate_value(self, state, action):
-    row, col = state
-    return self.table[row][col][action]
-
-  # just insert value into the table
-  def update_function(self, state, action, new_value):
-    row, col = state
-    self.table[row][col][action] = new_value
-```
-
-### Step3. Running RL algorithm and see its result
-OK, let's try `QLearning` for our *maze*  task.
-
-```python
-from kyoka.policy.epsilon_greedy_policy import EpsilonGreedyPolicy
-from kyoka.algorithm.td_learning.q_learning import QLearning
-
-domain = MazeDomain()
-policy = EpsilonGreedyPolicy(eps=0.1)
-value_function = MazeActionValueFunction()
-
-# You can easily replace algorithm like "rl_algo = Sarsa(alpha=0.1, gamma=0.7)"
-rl_algo = QLearning(alpha=0.1, gamma=0.7)
-rl_algo.setUp(domain, policy, value_function)
-rl_algo.run_gpi(nb_iteration=50)
-```
-
-That's all !! Let's visualize the value function which QLearning learned.  
-(If you interested in `MazeHelper` utility class, Please checkout [complete code](https://github.com/ishikota/kyoka/blob/master/sample/maze/readme_sample.py).)
-```
->>> print MazeHelper.visualize_policy(domain, value_function)
-
-      -------XG
-      --X-v-vX^
- S -> v-X-vvvX^
-      vvX>>>>>^
-      >>>>^-^^^
-      ->^<^----
-```
-
-Great!! QLearning found the policy which leads us to goal in 14 steps. (14 step is minimum step to the goal !!)
+## Getting Started
+TODO enrich README
 
 ## Sample code
-In sample directory, we prepared more practical sample code as jupyter notebook and script.  
-You can also checkout another RL task example *tick-tack-toe* .
-- [sample: Learning how to escape from maze by RL](https://github.com/ishikota/kyoka/tree/master/sample/maze)
-- [sample: Learning tick-tack-toe by RL](https://github.com/ishikota/kyoka/tree/master/sample/ticktacktoe)
+- [example1: Learning how to escape from maze by RL](./examples/maze)
+- [example2: Learning tick-tack-toe by RL](./examples/ticktacktoe)
 
 # Installation
 You can use pip like this.