Skip to content
OpenAI-Gym's CartPole-v0 solved using XCS-RC Reinforcement Learning
Python
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore Algorithm update. Jul 31, 2019
CartPole-xcs-rc.py Small parameter adjustments. Aug 7, 2019
LICENSE Initial commit Jul 29, 2019
README.md Readme. Jul 31, 2019
cartpole.csv Readme. Jul 29, 2019
cartpole_static_knowledge.py Algorithm update. Jul 31, 2019
console.txt Readme. Jul 29, 2019

README.md

OpenAI-Gym CartPole-v0 with XCS-RC

A fully classical Reinforcement Learning solution for CartPole environment from OpenAI-Gym (solved in 38 episodes so far).

Links for XCS-RC


Code Description CartPole-xcs-rc.py

Initialization

import xcs_rc
import gym

agent = xcs_rc.Agent(maxpopsize=100, tcomb=20, predtol=20.0, prederrtol=0.0)
agent.reward_map(max=100.0, projected=80.0)
env = gym.make('CartPole-v0')
stop_learning = False
scores = []

Reward Function (judgement of input-action suitability)

def get_reward(input, action):
    reward = 0.0

    if input[1] * (action - 0.5) >= 0.0:
        reward = 0.8
    if abs(input[1]) < 0.05:
        reward += 0.2

    return reward

First State and Action

for i_episode in range(50):
    state = env.reset()
    input = [state[1], state[2]]
    for t in range(200):
        env.render()

        action = agent.next_action(input, 1, 1 - i_episode)

Get env Response and Assign my_reward

        state, reward, done, info = env.step(action)
        input = [state[1], state[2]]

        if not stop_learning:
            my_reward = agent.maxreward * get_reward(input, action)
            agent.apply_reward(my_reward)

Terminating Current Episode

        if done:
            print("Episode #{} finished after {} timesteps".format(i_episode + 1, t + 1))
            scores.append(t + 1)

Store Knowledge to File, End Current Episode

            save_mode = 'w' if i_episode == 0 else 'a'
            title = "Episode: " + str(i_episode + 1)
            agent.pop.save("cartpole.csv", title, save_mode)
            break

Stop Learning if Knowledge is Solid

    if len(scores) >= 20 and not stop_learning:
        check_stop = scores[-20:]
        if float(sum(check_stop) / 20) >= 195.0:
            print("Learning stopped.")
            stop_learning = True
            sleep(10)  # add little pause here to prepare recording ;)

Terminate if CartPole is Solved

    if len(scores) >= 100:
        check_solved = scores[-100:]
        if float(sum(check_solved) / 100) >= 195.0:
            break

Final Report and Close

agent.pop.combine()
agent.pop.print("\nFinal Population")
agent.pop.save("cartpole.csv", 'Final', 'a')
print("Average last 100 episodes:", float(sum(scores) / 100))
env.close()

Results

Outputs:

CartPole XCS-RC

Test Code:

You can’t perform that action at this time.