# Module Five Assignment: Cartpole Problem
Review the code in this notebook and in the score_logger.py file in the *scores* folder (directory). Once you have reviewed the code, return to this notebook and select **Cell** and then **Run All** from the menu bar to run this code. The code takes several minutes to run.

In [1]:
import random  
import gym  
import numpy as np  
from collections import deque  
from keras.models import Sequential  
from keras.layers import Dense  
from keras.optimizers import Adam  
  
  
from scores.score_logger import ScoreLogger  
  
ENV_NAME = "CartPole-v1"  
  
GAMMA = 0.95  
LEARNING_RATE = 0.001  
  
MEMORY_SIZE = 1000000  
BATCH_SIZE = 20  
  
EXPLORATION_MAX = 1.0  
EXPLORATION_MIN = 0.01  
EXPLORATION_DECAY = 0.995  
  
  
class DQNSolver:  
  
    def __init__(self, observation_space, action_space):  
        self.exploration_rate = EXPLORATION_MAX  
  
        self.action_space = action_space  
        self.memory = deque(maxlen=MEMORY_SIZE)  
  
        self.model = Sequential()  
        self.model.add(Dense(24, input_shape=(observation_space,), activation="relu"))  
        self.model.add(Dense(24, activation="relu"))  
        self.model.add(Dense(self.action_space, activation="linear"))  
        self.model.compile(loss="mse", optimizer=Adam(lr=LEARNING_RATE))  
  
    def remember(self, state, action, reward, next_state, done):  
        self.memory.append((state, action, reward, next_state, done))  
  
    def act(self, state):  
        if np.random.rand() < self.exploration_rate:  
            return random.randrange(self.action_space)  
        q_values = self.model.predict(state)  
        return np.argmax(q_values[0])  
  
    def experience_replay(self):  
        if len(self.memory) < BATCH_SIZE:  
            return  
        batch = random.sample(self.memory, BATCH_SIZE)  
        for state, action, reward, state_next, terminal in batch:  
            q_update = reward  
            if not terminal:  
                q_update = (reward + GAMMA * np.amax(self.model.predict(state_next)[0]))  
            q_values = self.model.predict(state)  
            q_values[0][action] = q_update  
            self.model.fit(state, q_values, verbose=0)  
        self.exploration_rate *= EXPLORATION_DECAY  
        self.exploration_rate = max(EXPLORATION_MIN, self.exploration_rate)  
  
  
def cartpole():  
    env = gym.make(ENV_NAME)  
    score_logger = ScoreLogger(ENV_NAME)  
    observation_space = env.observation_space.shape[0]  
    action_space = env.action_space.n  
    dqn_solver = DQNSolver(observation_space, action_space)  
    run = 0  
    while True:  
        run += 1  
        state = env.reset()  
        state = np.reshape(state, [1, observation_space])  
        step = 0  
        while True:  
            step += 1  
            #env.render()  
            action = dqn_solver.act(state)  
            state_next, reward, terminal, info = env.step(action)  
            reward = reward if not terminal else -reward  
            state_next = np.reshape(state_next, [1, observation_space])  
            dqn_solver.remember(state, action, reward, state_next, terminal)  
            state = state_next  
            if terminal:  
                print ("Run: " + str(run) + ", exploration: " + str(dqn_solver.exploration_rate) + ", score: " + str(step))  
                score_logger.add_score(step, run)  
                break  
            dqn_solver.experience_replay()  



Using TensorFlow backend.


In [2]:
cartpole()

Run: 1, exploration: 0.9511101304657719, score: 30
Scores: (min: 30, avg: 30, max: 30)

Run: 2, exploration: 0.8866535105013078, score: 15
Scores: (min: 15, avg: 22.5, max: 30)

Run: 3, exploration: 0.851801859600347, score: 9
Scores: (min: 9, avg: 18, max: 30)

Run: 4, exploration: 0.8142285204175609, score: 10
Scores: (min: 9, avg: 16, max: 30)

Run: 5, exploration: 0.7666961448653229, score: 13
Scores: (min: 9, avg: 15.4, max: 30)

Run: 6, exploration: 0.7292124703704616, score: 11
Scores: (min: 9, avg: 14.666666666666666, max: 30)

Run: 7, exploration: 0.6900935609921609, score: 12
Scores: (min: 9, avg: 14.285714285714286, max: 30)

Run: 8, exploration: 0.6596532430440636, score: 10
Scores: (min: 9, avg: 13.75, max: 30)

Run: 9, exploration: 0.6305556603555866, score: 10
Scores: (min: 9, avg: 13.333333333333334, max: 30)

Run: 10, exploration: 0.6057704364907278, score: 9
Scores: (min: 9, avg: 12.9, max: 30)

Run: 11, exploration: 0.5732736268885887, score: 12
Scores: (min: 9, avg:

Run: 89, exploration: 0.01, score: 158
Scores: (min: 9, avg: 95.86516853932584, max: 217)

Run: 90, exploration: 0.01, score: 164
Scores: (min: 9, avg: 96.62222222222222, max: 217)

Run: 91, exploration: 0.01, score: 204
Scores: (min: 9, avg: 97.8021978021978, max: 217)

Run: 92, exploration: 0.01, score: 133
Scores: (min: 9, avg: 98.18478260869566, max: 217)

Run: 93, exploration: 0.01, score: 194
Scores: (min: 9, avg: 99.21505376344086, max: 217)

Run: 94, exploration: 0.01, score: 157
Scores: (min: 9, avg: 99.82978723404256, max: 217)

Run: 95, exploration: 0.01, score: 150
Scores: (min: 9, avg: 100.3578947368421, max: 217)

Run: 96, exploration: 0.01, score: 129
Scores: (min: 9, avg: 100.65625, max: 217)

Run: 97, exploration: 0.01, score: 72
Scores: (min: 9, avg: 100.36082474226804, max: 217)

Run: 98, exploration: 0.01, score: 109
Scores: (min: 9, avg: 100.44897959183673, max: 217)

Run: 99, exploration: 0.01, score: 141
Scores: (min: 9, avg: 100.85858585858585, max: 217)

Run: 1

Run: 189, exploration: 0.01, score: 115
Scores: (min: 19, avg: 137.17, max: 270)

Run: 190, exploration: 0.01, score: 96
Scores: (min: 19, avg: 136.49, max: 270)

Run: 191, exploration: 0.01, score: 145
Scores: (min: 19, avg: 135.9, max: 270)

Run: 192, exploration: 0.01, score: 204
Scores: (min: 19, avg: 136.61, max: 270)

Run: 193, exploration: 0.01, score: 202
Scores: (min: 19, avg: 136.69, max: 270)

Run: 194, exploration: 0.01, score: 129
Scores: (min: 19, avg: 136.41, max: 270)

Run: 195, exploration: 0.01, score: 208
Scores: (min: 19, avg: 136.99, max: 270)

Run: 196, exploration: 0.01, score: 156
Scores: (min: 19, avg: 137.26, max: 270)

Run: 197, exploration: 0.01, score: 159
Scores: (min: 19, avg: 138.13, max: 270)

Run: 198, exploration: 0.01, score: 168
Scores: (min: 19, avg: 138.72, max: 270)

Run: 199, exploration: 0.01, score: 154
Scores: (min: 19, avg: 138.85, max: 270)

Run: 200, exploration: 0.01, score: 165
Scores: (min: 19, avg: 139.3, max: 270)

Run: 201, explorati

Run: 290, exploration: 0.01, score: 94
Scores: (min: 17, avg: 166.42, max: 442)

Run: 291, exploration: 0.01, score: 29
Scores: (min: 17, avg: 165.26, max: 442)

Run: 292, exploration: 0.01, score: 123
Scores: (min: 17, avg: 164.45, max: 442)

Run: 293, exploration: 0.01, score: 84
Scores: (min: 17, avg: 163.27, max: 442)

Run: 294, exploration: 0.01, score: 153
Scores: (min: 17, avg: 163.51, max: 442)

Run: 295, exploration: 0.01, score: 316
Scores: (min: 17, avg: 164.59, max: 442)

Run: 296, exploration: 0.01, score: 141
Scores: (min: 17, avg: 164.44, max: 442)

Run: 297, exploration: 0.01, score: 240
Scores: (min: 17, avg: 165.25, max: 442)

Run: 298, exploration: 0.01, score: 159
Scores: (min: 17, avg: 165.16, max: 442)

Run: 299, exploration: 0.01, score: 158
Scores: (min: 17, avg: 165.2, max: 442)

Run: 300, exploration: 0.01, score: 337
Scores: (min: 17, avg: 166.92, max: 442)

Run: 301, exploration: 0.01, score: 277
Scores: (min: 17, avg: 168.17, max: 442)

Run: 302, exploratio

Run: 391, exploration: 0.01, score: 144
Scores: (min: 9, avg: 160.95, max: 479)

Run: 392, exploration: 0.01, score: 144
Scores: (min: 9, avg: 161.16, max: 479)

Run: 393, exploration: 0.01, score: 178
Scores: (min: 9, avg: 162.1, max: 479)

Run: 394, exploration: 0.01, score: 500
Scores: (min: 9, avg: 165.57, max: 500)

Run: 395, exploration: 0.01, score: 192
Scores: (min: 9, avg: 164.33, max: 500)

Run: 396, exploration: 0.01, score: 382
Scores: (min: 9, avg: 166.74, max: 500)

Run: 397, exploration: 0.01, score: 162
Scores: (min: 9, avg: 165.96, max: 500)

Run: 398, exploration: 0.01, score: 40
Scores: (min: 9, avg: 164.77, max: 500)

Run: 399, exploration: 0.01, score: 92
Scores: (min: 9, avg: 164.11, max: 500)

Run: 400, exploration: 0.01, score: 110
Scores: (min: 9, avg: 161.84, max: 500)

Run: 401, exploration: 0.01, score: 214
Scores: (min: 9, avg: 161.21, max: 500)

Run: 402, exploration: 0.01, score: 231
Scores: (min: 9, avg: 162.29, max: 500)

Run: 403, exploration: 0.01, sc

Run: 493, exploration: 0.01, score: 135
Scores: (min: 9, avg: 183.38, max: 500)

Run: 494, exploration: 0.01, score: 121
Scores: (min: 9, avg: 179.59, max: 500)

Run: 495, exploration: 0.01, score: 208
Scores: (min: 9, avg: 179.75, max: 500)

Run: 496, exploration: 0.01, score: 24
Scores: (min: 9, avg: 176.17, max: 500)

Run: 497, exploration: 0.01, score: 16
Scores: (min: 9, avg: 174.71, max: 500)

Run: 498, exploration: 0.01, score: 27
Scores: (min: 9, avg: 174.58, max: 500)

Run: 499, exploration: 0.01, score: 149
Scores: (min: 9, avg: 175.15, max: 500)

Run: 500, exploration: 0.01, score: 98
Scores: (min: 9, avg: 175.03, max: 500)

Run: 501, exploration: 0.01, score: 94
Scores: (min: 9, avg: 173.83, max: 500)

Run: 502, exploration: 0.01, score: 500
Scores: (min: 9, avg: 176.52, max: 500)

Run: 503, exploration: 0.01, score: 290
Scores: (min: 9, avg: 174.42, max: 500)

Run: 504, exploration: 0.01, score: 354
Scores: (min: 9, avg: 175.39, max: 500)

Run: 505, exploration: 0.01, scor

NameError: name 'exit' is not defined

In [5]:
import random  
import gym  
import numpy as np  
from collections import deque  
from keras.models import Sequential  
from keras.layers import Dense  
from keras.optimizers import Adam  
  
  
from scores.score_logger import ScoreLogger  
  
ENV_NAME = "CartPole-v1"  
  
GAMMA = 0.95  
LEARNING_RATE = 0.01 #increased value of learning rate  
  
MEMORY_SIZE = 1000000  
BATCH_SIZE = 20  
  
EXPLORATION_MAX = 1.0  
EXPLORATION_MIN = 0.01  
EXPLORATION_DECAY = 0.995  
  
  
class DQNSolver:  
  
    def __init__(self, observation_space, action_space):  
        self.exploration_rate = EXPLORATION_MAX  
  
        self.action_space = action_space  
        self.memory = deque(maxlen=MEMORY_SIZE)  
  
        self.model = Sequential()  
        self.model.add(Dense(24, input_shape=(observation_space,), activation="relu"))  
        self.model.add(Dense(24, activation="relu"))  
        self.model.add(Dense(self.action_space, activation="linear"))  
        self.model.compile(loss="mse", optimizer=Adam(lr=LEARNING_RATE))  
  
    def remember(self, state, action, reward, next_state, done):  
        self.memory.append((state, action, reward, next_state, done))  
  
    def act(self, state):  
        if np.random.rand() < self.exploration_rate:  
            return random.randrange(self.action_space)  
        q_values = self.model.predict(state)  
        return np.argmax(q_values[0])  
  
    def experience_replay(self):  
        if len(self.memory) < BATCH_SIZE:  
            return  
        batch = random.sample(self.memory, BATCH_SIZE)  
        for state, action, reward, state_next, terminal in batch:  
            q_update = reward  
            if not terminal:  
                q_update = (reward + GAMMA * np.amax(self.model.predict(state_next)[0]))  
            q_values = self.model.predict(state)  
            q_values[0][action] = q_update  
            self.model.fit(state, q_values, verbose=0)  
        self.exploration_rate *= EXPLORATION_DECAY  
        self.exploration_rate = max(EXPLORATION_MIN, self.exploration_rate)  
  
  
def cartpole():  
    env = gym.make(ENV_NAME)  
    score_logger = ScoreLogger(ENV_NAME)  
    observation_space = env.observation_space.shape[0]  
    action_space = env.action_space.n  
    dqn_solver = DQNSolver(observation_space, action_space)  
    run = 0  
    while True:  
        run += 1  
        state = env.reset()  
        state = np.reshape(state, [1, observation_space])  
        step = 0  
        while True:  
            step += 1  
            #env.render()  
            action = dqn_solver.act(state)  
            state_next, reward, terminal, info = env.step(action)  
            reward = reward if not terminal else -reward  
            state_next = np.reshape(state_next, [1, observation_space])  
            dqn_solver.remember(state, action, reward, state_next, terminal)  
            state = state_next  
            if terminal:  
                print ("Run: " + str(run) + ", exploration: " + str(dqn_solver.exploration_rate) + ", score: " + str(step))  
                score_logger.add_score(step, run)  
                break  
            dqn_solver.experience_replay() 

In [6]:
cartpole()

Run: 1, exploration: 0.946354579813443, score: 31
Scores: (min: 31, avg: 31, max: 31)

Run: 2, exploration: 0.8911090557802088, score: 13
Scores: (min: 13, avg: 22, max: 31)

Run: 3, exploration: 0.8307187014821328, score: 15
Scores: (min: 13, avg: 19.666666666666668, max: 31)

Run: 4, exploration: 0.7666961448653229, score: 17
Scores: (min: 13, avg: 19, max: 31)

Run: 5, exploration: 0.7111635524897149, score: 16
Scores: (min: 13, avg: 18.4, max: 31)

Run: 6, exploration: 0.6369088258938781, score: 23
Scores: (min: 13, avg: 19.166666666666668, max: 31)

Run: 7, exploration: 0.5907768628656763, score: 16
Scores: (min: 13, avg: 18.714285714285715, max: 31)

Run: 8, exploration: 0.5590843898207511, score: 12
Scores: (min: 12, avg: 17.875, max: 31)

Run: 9, exploration: 0.5185893309484582, score: 16
Scores: (min: 12, avg: 17.666666666666668, max: 31)

Run: 10, exploration: 0.4883155414435353, score: 13
Scores: (min: 12, avg: 17.2, max: 31)

Run: 11, exploration: 0.4598090507939749, score:

Run: 87, exploration: 0.01, score: 9
Scores: (min: 9, avg: 24.057471264367816, max: 91)

Run: 88, exploration: 0.01, score: 11
Scores: (min: 9, avg: 23.90909090909091, max: 91)

Run: 89, exploration: 0.01, score: 41
Scores: (min: 9, avg: 24.10112359550562, max: 91)

Run: 90, exploration: 0.01, score: 63
Scores: (min: 9, avg: 24.533333333333335, max: 91)

Run: 91, exploration: 0.01, score: 91
Scores: (min: 9, avg: 25.263736263736263, max: 91)

Run: 92, exploration: 0.01, score: 29
Scores: (min: 9, avg: 25.304347826086957, max: 91)

Run: 93, exploration: 0.01, score: 17
Scores: (min: 9, avg: 25.21505376344086, max: 91)

Run: 94, exploration: 0.01, score: 51
Scores: (min: 9, avg: 25.48936170212766, max: 91)

Run: 95, exploration: 0.01, score: 8
Scores: (min: 8, avg: 25.305263157894736, max: 91)

Run: 96, exploration: 0.01, score: 12
Scores: (min: 8, avg: 25.166666666666668, max: 91)

Run: 97, exploration: 0.01, score: 24
Scores: (min: 8, avg: 25.15463917525773, max: 91)

Run: 98, explorat

Run: 190, exploration: 0.01, score: 10
Scores: (min: 8, avg: 32.3, max: 164)

Run: 191, exploration: 0.01, score: 9
Scores: (min: 8, avg: 31.48, max: 164)

Run: 192, exploration: 0.01, score: 10
Scores: (min: 8, avg: 31.29, max: 164)

Run: 193, exploration: 0.01, score: 10
Scores: (min: 8, avg: 31.22, max: 164)

Run: 194, exploration: 0.01, score: 10
Scores: (min: 8, avg: 30.81, max: 164)

Run: 195, exploration: 0.01, score: 10
Scores: (min: 8, avg: 30.83, max: 164)

Run: 196, exploration: 0.01, score: 8
Scores: (min: 8, avg: 30.79, max: 164)

Run: 197, exploration: 0.01, score: 9
Scores: (min: 8, avg: 30.64, max: 164)

Run: 198, exploration: 0.01, score: 10
Scores: (min: 8, avg: 30.64, max: 164)

Run: 199, exploration: 0.01, score: 9
Scores: (min: 8, avg: 30.22, max: 164)

Run: 200, exploration: 0.01, score: 10
Scores: (min: 8, avg: 30, max: 164)

Run: 201, exploration: 0.01, score: 10
Scores: (min: 8, avg: 29.98, max: 164)

Run: 202, exploration: 0.01, score: 10
Scores: (min: 8, avg:

Run: 295, exploration: 0.01, score: 63
Scores: (min: 8, avg: 24.64, max: 129)

Run: 296, exploration: 0.01, score: 154
Scores: (min: 8, avg: 26.1, max: 154)

Run: 297, exploration: 0.01, score: 55
Scores: (min: 8, avg: 26.56, max: 154)

Run: 298, exploration: 0.01, score: 106
Scores: (min: 8, avg: 27.52, max: 154)

Run: 299, exploration: 0.01, score: 61
Scores: (min: 8, avg: 28.04, max: 154)

Run: 300, exploration: 0.01, score: 89
Scores: (min: 8, avg: 28.83, max: 154)

Run: 301, exploration: 0.01, score: 87
Scores: (min: 8, avg: 29.6, max: 154)

Run: 302, exploration: 0.01, score: 116
Scores: (min: 8, avg: 30.66, max: 154)

Run: 303, exploration: 0.01, score: 15
Scores: (min: 8, avg: 30.72, max: 154)

Run: 304, exploration: 0.01, score: 81
Scores: (min: 8, avg: 31.44, max: 154)

Run: 305, exploration: 0.01, score: 86
Scores: (min: 8, avg: 32.2, max: 154)

Run: 306, exploration: 0.01, score: 162
Scores: (min: 8, avg: 33.71, max: 162)

Run: 307, exploration: 0.01, score: 62
Scores: (min

Run: 399, exploration: 0.01, score: 8
Scores: (min: 8, avg: 41.63, max: 194)

Run: 400, exploration: 0.01, score: 9
Scores: (min: 8, avg: 40.83, max: 194)

Run: 401, exploration: 0.01, score: 10
Scores: (min: 8, avg: 40.06, max: 194)

Run: 402, exploration: 0.01, score: 10
Scores: (min: 8, avg: 39, max: 194)

Run: 403, exploration: 0.01, score: 9
Scores: (min: 8, avg: 38.94, max: 194)

Run: 404, exploration: 0.01, score: 9
Scores: (min: 8, avg: 38.22, max: 194)

Run: 405, exploration: 0.01, score: 9
Scores: (min: 8, avg: 37.45, max: 194)

Run: 406, exploration: 0.01, score: 9
Scores: (min: 8, avg: 35.92, max: 194)

Run: 407, exploration: 0.01, score: 9
Scores: (min: 8, avg: 35.39, max: 194)

Run: 408, exploration: 0.01, score: 10
Scores: (min: 8, avg: 35.34, max: 194)

Run: 409, exploration: 0.01, score: 9
Scores: (min: 8, avg: 35.07, max: 194)

Run: 410, exploration: 0.01, score: 8
Scores: (min: 8, avg: 34.68, max: 194)

Run: 411, exploration: 0.01, score: 10
Scores: (min: 8, avg: 33.

Run: 504, exploration: 0.01, score: 11
Scores: (min: 8, avg: 16.27, max: 219)

Run: 505, exploration: 0.01, score: 20
Scores: (min: 8, avg: 16.38, max: 219)

Run: 506, exploration: 0.01, score: 16
Scores: (min: 8, avg: 16.45, max: 219)

Run: 507, exploration: 0.01, score: 25
Scores: (min: 8, avg: 16.61, max: 219)

Run: 508, exploration: 0.01, score: 18
Scores: (min: 8, avg: 16.69, max: 219)

Run: 509, exploration: 0.01, score: 16
Scores: (min: 8, avg: 16.76, max: 219)

Run: 510, exploration: 0.01, score: 32
Scores: (min: 8, avg: 17, max: 219)

Run: 511, exploration: 0.01, score: 17
Scores: (min: 8, avg: 17.07, max: 219)

Run: 512, exploration: 0.01, score: 39
Scores: (min: 8, avg: 17.36, max: 219)

Run: 513, exploration: 0.01, score: 9
Scores: (min: 8, avg: 17.35, max: 219)

Run: 514, exploration: 0.01, score: 16
Scores: (min: 8, avg: 17.42, max: 219)

Run: 515, exploration: 0.01, score: 10
Scores: (min: 8, avg: 17.42, max: 219)

Run: 516, exploration: 0.01, score: 30
Scores: (min: 8, 

Run: 608, exploration: 0.01, score: 10
Scores: (min: 8, avg: 30.97, max: 136)

Run: 609, exploration: 0.01, score: 10
Scores: (min: 8, avg: 30.91, max: 136)

Run: 610, exploration: 0.01, score: 13
Scores: (min: 8, avg: 30.72, max: 136)

Run: 611, exploration: 0.01, score: 34
Scores: (min: 8, avg: 30.89, max: 136)

Run: 612, exploration: 0.01, score: 15
Scores: (min: 8, avg: 30.65, max: 136)

Run: 613, exploration: 0.01, score: 21
Scores: (min: 8, avg: 30.77, max: 136)

Run: 614, exploration: 0.01, score: 10
Scores: (min: 8, avg: 30.71, max: 136)

Run: 615, exploration: 0.01, score: 17
Scores: (min: 8, avg: 30.78, max: 136)

Run: 616, exploration: 0.01, score: 15
Scores: (min: 8, avg: 30.63, max: 136)

Run: 617, exploration: 0.01, score: 40
Scores: (min: 8, avg: 30.89, max: 136)

Run: 618, exploration: 0.01, score: 12
Scores: (min: 8, avg: 30.71, max: 136)

Run: 619, exploration: 0.01, score: 13
Scores: (min: 8, avg: 30.62, max: 136)

Run: 620, exploration: 0.01, score: 26
Scores: (min:

Run: 713, exploration: 0.01, score: 16
Scores: (min: 8, avg: 18.12, max: 56)

Run: 714, exploration: 0.01, score: 12
Scores: (min: 8, avg: 18.14, max: 56)

Run: 715, exploration: 0.01, score: 13
Scores: (min: 8, avg: 18.1, max: 56)

Run: 716, exploration: 0.01, score: 11
Scores: (min: 8, avg: 18.06, max: 56)

Run: 717, exploration: 0.01, score: 15
Scores: (min: 8, avg: 17.81, max: 56)

Run: 718, exploration: 0.01, score: 16
Scores: (min: 8, avg: 17.85, max: 56)

Run: 719, exploration: 0.01, score: 27
Scores: (min: 8, avg: 17.99, max: 56)

Run: 720, exploration: 0.01, score: 26
Scores: (min: 8, avg: 17.99, max: 56)

Run: 721, exploration: 0.01, score: 14
Scores: (min: 8, avg: 17.86, max: 56)

Run: 722, exploration: 0.01, score: 33
Scores: (min: 8, avg: 18.06, max: 56)

Run: 723, exploration: 0.01, score: 11
Scores: (min: 8, avg: 17.99, max: 56)

Run: 724, exploration: 0.01, score: 24
Scores: (min: 8, avg: 18, max: 56)

Run: 725, exploration: 0.01, score: 18
Scores: (min: 8, avg: 18.03, 

Run: 819, exploration: 0.01, score: 64
Scores: (min: 9, avg: 18.2, max: 64)

Run: 820, exploration: 0.01, score: 11
Scores: (min: 9, avg: 18.05, max: 64)

Run: 821, exploration: 0.01, score: 26
Scores: (min: 9, avg: 18.17, max: 64)

Run: 822, exploration: 0.01, score: 31
Scores: (min: 9, avg: 18.15, max: 64)

Run: 823, exploration: 0.01, score: 13
Scores: (min: 9, avg: 18.17, max: 64)

Run: 824, exploration: 0.01, score: 14
Scores: (min: 9, avg: 18.07, max: 64)

Run: 825, exploration: 0.01, score: 10
Scores: (min: 9, avg: 17.99, max: 64)

Run: 826, exploration: 0.01, score: 17
Scores: (min: 9, avg: 18.05, max: 64)

Run: 827, exploration: 0.01, score: 19
Scores: (min: 9, avg: 18.13, max: 64)

Run: 828, exploration: 0.01, score: 35
Scores: (min: 9, avg: 18.37, max: 64)

Run: 829, exploration: 0.01, score: 12
Scores: (min: 9, avg: 18.29, max: 64)

Run: 830, exploration: 0.01, score: 14
Scores: (min: 9, avg: 18.28, max: 64)

Run: 831, exploration: 0.01, score: 12
Scores: (min: 9, avg: 18.2

Run: 925, exploration: 0.01, score: 15
Scores: (min: 9, avg: 20.41, max: 56)

Run: 926, exploration: 0.01, score: 11
Scores: (min: 9, avg: 20.35, max: 56)

Run: 927, exploration: 0.01, score: 22
Scores: (min: 9, avg: 20.38, max: 56)

Run: 928, exploration: 0.01, score: 22
Scores: (min: 9, avg: 20.25, max: 56)

Run: 929, exploration: 0.01, score: 17
Scores: (min: 9, avg: 20.3, max: 56)

Run: 930, exploration: 0.01, score: 60
Scores: (min: 9, avg: 20.76, max: 60)

Run: 931, exploration: 0.01, score: 25
Scores: (min: 9, avg: 20.89, max: 60)

Run: 932, exploration: 0.01, score: 55
Scores: (min: 9, avg: 21.26, max: 60)

Run: 933, exploration: 0.01, score: 20
Scores: (min: 9, avg: 21.34, max: 60)

Run: 934, exploration: 0.01, score: 35
Scores: (min: 9, avg: 21.49, max: 60)

Run: 935, exploration: 0.01, score: 10
Scores: (min: 9, avg: 21.28, max: 60)

Run: 936, exploration: 0.01, score: 12
Scores: (min: 9, avg: 21.27, max: 60)

Run: 937, exploration: 0.01, score: 21
Scores: (min: 9, avg: 21.3

Run: 1030, exploration: 0.01, score: 15
Scores: (min: 9, avg: 19.45, max: 58)

Run: 1031, exploration: 0.01, score: 9
Scores: (min: 9, avg: 19.29, max: 58)

Run: 1032, exploration: 0.01, score: 14
Scores: (min: 9, avg: 18.88, max: 58)

Run: 1033, exploration: 0.01, score: 21
Scores: (min: 9, avg: 18.89, max: 58)

Run: 1034, exploration: 0.01, score: 14
Scores: (min: 9, avg: 18.68, max: 58)

Run: 1035, exploration: 0.01, score: 26
Scores: (min: 9, avg: 18.84, max: 58)

Run: 1036, exploration: 0.01, score: 27
Scores: (min: 9, avg: 18.99, max: 58)

Run: 1037, exploration: 0.01, score: 15
Scores: (min: 9, avg: 18.93, max: 58)

Run: 1038, exploration: 0.01, score: 16
Scores: (min: 9, avg: 18.9, max: 58)

Run: 1039, exploration: 0.01, score: 35
Scores: (min: 9, avg: 19.13, max: 58)

Run: 1040, exploration: 0.01, score: 25
Scores: (min: 9, avg: 19.26, max: 58)

Run: 1041, exploration: 0.01, score: 13
Scores: (min: 9, avg: 19.15, max: 58)

Run: 1042, exploration: 0.01, score: 11
Scores: (min: 

Run: 1134, exploration: 0.01, score: 15
Scores: (min: 9, avg: 22.32, max: 70)

Run: 1135, exploration: 0.01, score: 16
Scores: (min: 9, avg: 22.22, max: 70)

Run: 1136, exploration: 0.01, score: 10
Scores: (min: 9, avg: 22.05, max: 70)

Run: 1137, exploration: 0.01, score: 44
Scores: (min: 9, avg: 22.34, max: 70)

Run: 1138, exploration: 0.01, score: 20
Scores: (min: 9, avg: 22.38, max: 70)

Run: 1139, exploration: 0.01, score: 11
Scores: (min: 9, avg: 22.14, max: 70)

Run: 1140, exploration: 0.01, score: 16
Scores: (min: 9, avg: 22.05, max: 70)

Run: 1141, exploration: 0.01, score: 13
Scores: (min: 9, avg: 22.05, max: 70)

Run: 1142, exploration: 0.01, score: 16
Scores: (min: 9, avg: 22.1, max: 70)

Run: 1143, exploration: 0.01, score: 19
Scores: (min: 9, avg: 21.83, max: 70)

Run: 1144, exploration: 0.01, score: 17
Scores: (min: 9, avg: 21.88, max: 70)

Run: 1145, exploration: 0.01, score: 14
Scores: (min: 9, avg: 21.87, max: 70)

Run: 1146, exploration: 0.01, score: 19
Scores: (min:

Run: 1238, exploration: 0.01, score: 33
Scores: (min: 8, avg: 20.32, max: 78)

Run: 1239, exploration: 0.01, score: 13
Scores: (min: 8, avg: 20.34, max: 78)

Run: 1240, exploration: 0.01, score: 12
Scores: (min: 8, avg: 20.3, max: 78)

Run: 1241, exploration: 0.01, score: 13
Scores: (min: 8, avg: 20.3, max: 78)

Run: 1242, exploration: 0.01, score: 40
Scores: (min: 8, avg: 20.54, max: 78)

Run: 1243, exploration: 0.01, score: 18
Scores: (min: 8, avg: 20.53, max: 78)

Run: 1244, exploration: 0.01, score: 22
Scores: (min: 8, avg: 20.58, max: 78)

Run: 1245, exploration: 0.01, score: 19
Scores: (min: 8, avg: 20.63, max: 78)

Run: 1246, exploration: 0.01, score: 17
Scores: (min: 8, avg: 20.61, max: 78)

Run: 1247, exploration: 0.01, score: 12
Scores: (min: 8, avg: 20.56, max: 78)

Run: 1248, exploration: 0.01, score: 15
Scores: (min: 8, avg: 20.59, max: 78)

Run: 1249, exploration: 0.01, score: 21
Scores: (min: 8, avg: 20.65, max: 78)

Run: 1250, exploration: 0.01, score: 10
Scores: (min: 

Run: 1342, exploration: 0.01, score: 34
Scores: (min: 8, avg: 20.41, max: 83)

Run: 1343, exploration: 0.01, score: 31
Scores: (min: 8, avg: 20.54, max: 83)

Run: 1344, exploration: 0.01, score: 13
Scores: (min: 8, avg: 20.45, max: 83)

Run: 1345, exploration: 0.01, score: 34
Scores: (min: 8, avg: 20.6, max: 83)

Run: 1346, exploration: 0.01, score: 20
Scores: (min: 8, avg: 20.63, max: 83)

Run: 1347, exploration: 0.01, score: 21
Scores: (min: 8, avg: 20.72, max: 83)

Run: 1348, exploration: 0.01, score: 15
Scores: (min: 8, avg: 20.72, max: 83)

Run: 1349, exploration: 0.01, score: 45
Scores: (min: 8, avg: 20.96, max: 83)

Run: 1350, exploration: 0.01, score: 22
Scores: (min: 8, avg: 21.08, max: 83)

Run: 1351, exploration: 0.01, score: 21
Scores: (min: 8, avg: 20.78, max: 83)

Run: 1352, exploration: 0.01, score: 34
Scores: (min: 8, avg: 20.94, max: 83)

Run: 1353, exploration: 0.01, score: 37
Scores: (min: 8, avg: 21.04, max: 83)

Run: 1354, exploration: 0.01, score: 15
Scores: (min:

Run: 1446, exploration: 0.01, score: 10
Scores: (min: 8, avg: 21.97, max: 63)

Run: 1447, exploration: 0.01, score: 28
Scores: (min: 8, avg: 22.04, max: 63)

Run: 1448, exploration: 0.01, score: 14
Scores: (min: 8, avg: 22.03, max: 63)

Run: 1449, exploration: 0.01, score: 18
Scores: (min: 8, avg: 21.76, max: 63)

Run: 1450, exploration: 0.01, score: 18
Scores: (min: 8, avg: 21.72, max: 63)

Run: 1451, exploration: 0.01, score: 38
Scores: (min: 8, avg: 21.89, max: 63)

Run: 1452, exploration: 0.01, score: 19
Scores: (min: 8, avg: 21.74, max: 63)

Run: 1453, exploration: 0.01, score: 37
Scores: (min: 8, avg: 21.74, max: 63)

Run: 1454, exploration: 0.01, score: 10
Scores: (min: 8, avg: 21.69, max: 63)

Run: 1455, exploration: 0.01, score: 15
Scores: (min: 8, avg: 21.47, max: 63)

Run: 1456, exploration: 0.01, score: 42
Scores: (min: 8, avg: 21.67, max: 63)

Run: 1457, exploration: 0.01, score: 11
Scores: (min: 8, avg: 21.57, max: 63)

Run: 1458, exploration: 0.01, score: 18
Scores: (min

Run: 1550, exploration: 0.01, score: 36
Scores: (min: 9, avg: 21.2, max: 77)

Run: 1551, exploration: 0.01, score: 34
Scores: (min: 9, avg: 21.16, max: 77)

Run: 1552, exploration: 0.01, score: 13
Scores: (min: 9, avg: 21.1, max: 77)

Run: 1553, exploration: 0.01, score: 10
Scores: (min: 9, avg: 20.83, max: 77)

Run: 1554, exploration: 0.01, score: 16
Scores: (min: 9, avg: 20.89, max: 77)

Run: 1555, exploration: 0.01, score: 42
Scores: (min: 9, avg: 21.16, max: 77)

Run: 1556, exploration: 0.01, score: 14
Scores: (min: 9, avg: 20.88, max: 77)

Run: 1557, exploration: 0.01, score: 48
Scores: (min: 9, avg: 21.25, max: 77)

Run: 1558, exploration: 0.01, score: 33
Scores: (min: 9, avg: 21.4, max: 77)

Run: 1559, exploration: 0.01, score: 20
Scores: (min: 9, avg: 21.23, max: 77)

Run: 1560, exploration: 0.01, score: 15
Scores: (min: 9, avg: 21.13, max: 77)

Run: 1561, exploration: 0.01, score: 10
Scores: (min: 9, avg: 20.98, max: 77)

Run: 1562, exploration: 0.01, score: 30
Scores: (min: 9

Run: 1654, exploration: 0.01, score: 16
Scores: (min: 8, avg: 19.95, max: 61)

Run: 1655, exploration: 0.01, score: 64
Scores: (min: 8, avg: 20.17, max: 64)

Run: 1656, exploration: 0.01, score: 11
Scores: (min: 8, avg: 20.14, max: 64)

Run: 1657, exploration: 0.01, score: 15
Scores: (min: 8, avg: 19.81, max: 64)

Run: 1658, exploration: 0.01, score: 15
Scores: (min: 8, avg: 19.63, max: 64)

Run: 1659, exploration: 0.01, score: 12
Scores: (min: 8, avg: 19.55, max: 64)

Run: 1660, exploration: 0.01, score: 11
Scores: (min: 8, avg: 19.51, max: 64)

Run: 1661, exploration: 0.01, score: 10
Scores: (min: 8, avg: 19.51, max: 64)

Run: 1662, exploration: 0.01, score: 24
Scores: (min: 8, avg: 19.45, max: 64)

Run: 1663, exploration: 0.01, score: 24
Scores: (min: 8, avg: 19.55, max: 64)

Run: 1664, exploration: 0.01, score: 28
Scores: (min: 8, avg: 19.72, max: 64)

Run: 1665, exploration: 0.01, score: 12
Scores: (min: 8, avg: 19.74, max: 64)

Run: 1666, exploration: 0.01, score: 15
Scores: (min

Run: 1758, exploration: 0.01, score: 19
Scores: (min: 8, avg: 20.25, max: 55)

Run: 1759, exploration: 0.01, score: 18
Scores: (min: 8, avg: 20.31, max: 55)

Run: 1760, exploration: 0.01, score: 69
Scores: (min: 8, avg: 20.89, max: 69)

Run: 1761, exploration: 0.01, score: 20
Scores: (min: 8, avg: 20.99, max: 69)

Run: 1762, exploration: 0.01, score: 10
Scores: (min: 8, avg: 20.85, max: 69)

Run: 1763, exploration: 0.01, score: 12
Scores: (min: 8, avg: 20.73, max: 69)

Run: 1764, exploration: 0.01, score: 11
Scores: (min: 8, avg: 20.56, max: 69)

Run: 1765, exploration: 0.01, score: 13
Scores: (min: 8, avg: 20.57, max: 69)

Run: 1766, exploration: 0.01, score: 12
Scores: (min: 8, avg: 20.54, max: 69)

Run: 1767, exploration: 0.01, score: 26
Scores: (min: 8, avg: 20.63, max: 69)

Run: 1768, exploration: 0.01, score: 34
Scores: (min: 8, avg: 20.88, max: 69)

Run: 1769, exploration: 0.01, score: 15
Scores: (min: 8, avg: 20.91, max: 69)

Run: 1770, exploration: 0.01, score: 18
Scores: (min

KeyboardInterrupt: 

In [7]:
import random  
import gym  
import numpy as np  
from collections import deque  
from keras.models import Sequential  
from keras.layers import Dense  
from keras.optimizers import Adam  
  
  
from scores.score_logger import ScoreLogger  
  
ENV_NAME = "CartPole-v1"  
  
GAMMA = 0.95  
LEARNING_RATE = 0.001
  
MEMORY_SIZE = 1000000  
BATCH_SIZE = 20  
  
EXPLORATION_MAX = 1.5 #increase exploration factor  
EXPLORATION_MIN = 0.1  
EXPLORATION_DECAY = 0.995  
  
  
class DQNSolver:  
  
    def __init__(self, observation_space, action_space):  
        self.exploration_rate = EXPLORATION_MAX  
  
        self.action_space = action_space  
        self.memory = deque(maxlen=MEMORY_SIZE)  
  
        self.model = Sequential()  
        self.model.add(Dense(24, input_shape=(observation_space,), activation="relu"))  
        self.model.add(Dense(24, activation="relu"))  
        self.model.add(Dense(self.action_space, activation="linear"))  
        self.model.compile(loss="mse", optimizer=Adam(lr=LEARNING_RATE))  
  
    def remember(self, state, action, reward, next_state, done):  
        self.memory.append((state, action, reward, next_state, done))  
  
    def act(self, state):  
        if np.random.rand() < self.exploration_rate:  
            return random.randrange(self.action_space)  
        q_values = self.model.predict(state)  
        return np.argmax(q_values[0])  
  
    def experience_replay(self):  
        if len(self.memory) < BATCH_SIZE:  
            return  
        batch = random.sample(self.memory, BATCH_SIZE)  
        for state, action, reward, state_next, terminal in batch:  
            q_update = reward  
            if not terminal:  
                q_update = (reward + GAMMA * np.amax(self.model.predict(state_next)[0]))  
            q_values = self.model.predict(state)  
            q_values[0][action] = q_update  
            self.model.fit(state, q_values, verbose=0)  
        self.exploration_rate *= EXPLORATION_DECAY  
        self.exploration_rate = max(EXPLORATION_MIN, self.exploration_rate)  
  
  
def cartpole():  
    env = gym.make(ENV_NAME)  
    score_logger = ScoreLogger(ENV_NAME)  
    observation_space = env.observation_space.shape[0]  
    action_space = env.action_space.n  
    dqn_solver = DQNSolver(observation_space, action_space)  
    run = 0  
    while True:  
        run += 1  
        state = env.reset()  
        state = np.reshape(state, [1, observation_space])  
        step = 0  
        while True:  
            step += 1  
            #env.render()  
            action = dqn_solver.act(state)  
            state_next, reward, terminal, info = env.step(action)  
            reward = reward if not terminal else -reward  
            state_next = np.reshape(state_next, [1, observation_space])  
            dqn_solver.remember(state, action, reward, state_next, terminal)  
            state = state_next  
            if terminal:  
                print ("Run: " + str(run) + ", exploration: " + str(dqn_solver.exploration_rate) + ", score: " + str(step))  
                score_logger.add_score(step, run)  
                break  
            dqn_solver.experience_replay() 

In [8]:
cartpole()

Run: 1, exploration: 1.3501311418098665, score: 41
Scores: (min: 41, avg: 41, max: 41)

Run: 2, exploration: 1.1159712930101071, score: 39
Scores: (min: 39, avg: 40, max: 41)

Run: 3, exploration: 1.0299646397808, score: 17
Scores: (min: 17, avg: 32.333333333333336, max: 41)

Run: 4, exploration: 0.9747117539024632, score: 12
Scores: (min: 12, avg: 27.25, max: 41)

Run: 5, exploration: 0.9132217635538613, score: 14
Scores: (min: 12, avg: 24.6, max: 41)

Run: 6, exploration: 0.7288109456044766, score: 46
Scores: (min: 12, avg: 28.166666666666668, max: 46)

Run: 7, exploration: 0.605436882358516, score: 38
Scores: (min: 12, avg: 29.571428571428573, max: 46)

Run: 8, exploration: 0.5729579664079543, score: 12
Scores: (min: 12, avg: 27.375, max: 46)

Run: 9, exploration: 0.5504367325892504, score: 9
Scores: (min: 9, avg: 25.333333333333332, max: 46)

Run: 10, exploration: 0.5261567362272512, score: 10
Scores: (min: 9, avg: 23.8, max: 46)

Run: 11, exploration: 0.5004330020385458, score: 11

Run: 89, exploration: 0.1, score: 17
Scores: (min: 8, avg: 13.404494382022472, max: 46)

Run: 90, exploration: 0.1, score: 21
Scores: (min: 8, avg: 13.488888888888889, max: 46)

Run: 91, exploration: 0.1, score: 28
Scores: (min: 8, avg: 13.648351648351648, max: 46)

Run: 92, exploration: 0.1, score: 32
Scores: (min: 8, avg: 13.847826086956522, max: 46)

Run: 93, exploration: 0.1, score: 25
Scores: (min: 8, avg: 13.96774193548387, max: 46)

Run: 94, exploration: 0.1, score: 33
Scores: (min: 8, avg: 14.170212765957446, max: 46)

Run: 95, exploration: 0.1, score: 23
Scores: (min: 8, avg: 14.263157894736842, max: 46)

Run: 96, exploration: 0.1, score: 18
Scores: (min: 8, avg: 14.302083333333334, max: 46)

Run: 97, exploration: 0.1, score: 22
Scores: (min: 8, avg: 14.381443298969073, max: 46)

Run: 98, exploration: 0.1, score: 40
Scores: (min: 8, avg: 14.642857142857142, max: 46)

Run: 99, exploration: 0.1, score: 45
Scores: (min: 8, avg: 14.94949494949495, max: 46)

Run: 100, exploration: 

Run: 192, exploration: 0.1, score: 162
Scores: (min: 13, avg: 112.8, max: 434)

Run: 193, exploration: 0.1, score: 232
Scores: (min: 13, avg: 114.87, max: 434)

Run: 194, exploration: 0.1, score: 187
Scores: (min: 13, avg: 116.41, max: 434)

Run: 195, exploration: 0.1, score: 158
Scores: (min: 13, avg: 117.76, max: 434)

Run: 196, exploration: 0.1, score: 234
Scores: (min: 13, avg: 119.92, max: 434)

Run: 197, exploration: 0.1, score: 152
Scores: (min: 13, avg: 121.22, max: 434)

Run: 198, exploration: 0.1, score: 172
Scores: (min: 13, avg: 122.54, max: 434)

Run: 199, exploration: 0.1, score: 405
Scores: (min: 13, avg: 126.14, max: 434)

Run: 200, exploration: 0.1, score: 190
Scores: (min: 13, avg: 127.75, max: 434)

Run: 201, exploration: 0.1, score: 260
Scores: (min: 13, avg: 129.74, max: 434)

Run: 202, exploration: 0.1, score: 163
Scores: (min: 13, avg: 130.91, max: 434)

Run: 203, exploration: 0.1, score: 234
Scores: (min: 13, avg: 132.87, max: 434)

Run: 204, exploration: 0.1, s

Run: 295, exploration: 0.1, score: 11
Scores: (min: 8, avg: 140.27, max: 436)

Run: 296, exploration: 0.1, score: 10
Scores: (min: 8, avg: 138.03, max: 436)

Run: 297, exploration: 0.1, score: 10
Scores: (min: 8, avg: 136.61, max: 436)

Run: 298, exploration: 0.1, score: 10
Scores: (min: 8, avg: 134.99, max: 436)

Run: 299, exploration: 0.1, score: 10
Scores: (min: 8, avg: 131.04, max: 436)

Run: 300, exploration: 0.1, score: 8
Scores: (min: 8, avg: 129.22, max: 436)

Run: 301, exploration: 0.1, score: 9
Scores: (min: 8, avg: 126.71, max: 436)

Run: 302, exploration: 0.1, score: 13
Scores: (min: 8, avg: 125.21, max: 436)

Run: 303, exploration: 0.1, score: 10
Scores: (min: 8, avg: 122.97, max: 436)

Run: 304, exploration: 0.1, score: 12
Scores: (min: 8, avg: 121.6, max: 436)

Run: 305, exploration: 0.1, score: 10
Scores: (min: 8, avg: 119.67, max: 436)

Run: 306, exploration: 0.1, score: 9
Scores: (min: 8, avg: 118.26, max: 436)

Run: 307, exploration: 0.1, score: 9
Scores: (min: 8, av

Run: 401, exploration: 0.1, score: 73
Scores: (min: 8, avg: 12.04, max: 98)

Run: 402, exploration: 0.1, score: 55
Scores: (min: 8, avg: 12.46, max: 98)

Run: 403, exploration: 0.1, score: 34
Scores: (min: 8, avg: 12.7, max: 98)

Run: 404, exploration: 0.1, score: 64
Scores: (min: 8, avg: 13.22, max: 98)

Run: 405, exploration: 0.1, score: 54
Scores: (min: 8, avg: 13.66, max: 98)

Run: 406, exploration: 0.1, score: 61
Scores: (min: 8, avg: 14.18, max: 98)

Run: 407, exploration: 0.1, score: 56
Scores: (min: 8, avg: 14.65, max: 98)

Run: 408, exploration: 0.1, score: 56
Scores: (min: 8, avg: 15.11, max: 98)

Run: 409, exploration: 0.1, score: 58
Scores: (min: 8, avg: 15.59, max: 98)

Run: 410, exploration: 0.1, score: 67
Scores: (min: 8, avg: 16.16, max: 98)

Run: 411, exploration: 0.1, score: 105
Scores: (min: 8, avg: 17.1, max: 105)

Run: 412, exploration: 0.1, score: 80
Scores: (min: 8, avg: 17.8, max: 105)

Run: 413, exploration: 0.1, score: 144
Scores: (min: 8, avg: 19.14, max: 144

Run: 505, exploration: 0.1, score: 187
Scores: (min: 23, avg: 177.98, max: 405)

Run: 506, exploration: 0.1, score: 134
Scores: (min: 23, avg: 178.71, max: 405)

Run: 507, exploration: 0.1, score: 185
Scores: (min: 23, avg: 180, max: 405)

Run: 508, exploration: 0.1, score: 207
Scores: (min: 23, avg: 181.51, max: 405)

Run: 509, exploration: 0.1, score: 186
Scores: (min: 23, avg: 182.79, max: 405)

Run: 510, exploration: 0.1, score: 182
Scores: (min: 23, avg: 183.94, max: 405)

Run: 511, exploration: 0.1, score: 137
Scores: (min: 23, avg: 184.26, max: 405)

Run: 512, exploration: 0.1, score: 202
Scores: (min: 23, avg: 185.48, max: 405)

Run: 513, exploration: 0.1, score: 135
Scores: (min: 23, avg: 185.39, max: 405)

Run: 514, exploration: 0.1, score: 125
Scores: (min: 23, avg: 184.44, max: 405)

Run: 515, exploration: 0.1, score: 205
Scores: (min: 23, avg: 184.61, max: 405)

Run: 516, exploration: 0.1, score: 243
Scores: (min: 23, avg: 182.99, max: 375)

Run: 517, exploration: 0.1, sco

NameError: name 'exit' is not defined

The goal of the agent in the cartpole problem is to balance a pole on a cart by applying force to move the cart left or right. The agent attempts to keep the pole upright for as long as possible without it falling over or the cart moving out of the predefined boundaries. 
The state values for the CartPole problem are the cart position, cart velocity, pole angle, and pole angular velocity. In order to cause a change in the state values, the program must take actions on them. The possible actions available are to apply force to move the cart left or apply force to move the cart right.
This example of reinforcement learning uses the Deep Q-learning algorithm and uses the DQNSolver. This algorithm uses a neural network to approximate the Q-value function, which enables the agent to make decisions in continuous state spaces.
Experience replay  is used to improve the stability and efficiency of training in reinforcement learning. In the cartpole problem, the agent stores its experiences in a buffer during training. Each experience contains: current state, action taken, reward received, and next state. The discount factor determines the importance of future rewards. The higher the value, or closer to 1, the agent prioritizes long term rewards, which encourages pole balancing. The lower the value, or closer to 0, the agent focuses on immediate reqards, which may result in short term actions that are less optimal for balancing to pole.
The neural network architecture for solving the cartpole problem consists of an input layer, hidden layers, and an output layer. The input layer takes the state values, passes it to the hidden layer to process the data, and then passes the processed data to the output layer to produce the Q value for each possible action.
The neural network makes the Q-learning algorithm more efficient by approximating the Q-value function, which allows the agent to observe from its experiences and make better decisions. 
When applying a higher learning rate, it creates faster updates, but leads to instability. Lower learning rates creates slower training and requires more episodes but is more stable.

PyLessons. (n.d.). https://pylessons.com/CartPole-reinforcement-learning
Chan, M. (2018, June 19). Cart-Pole Balancing with Q-Learning - Matthew Chan - Medium. Medium. https://medium.com/@tuzzer/cart-pole-balancing-with-q-learning-b54c6068d947


Please note: 
    example 1: solved in 449 runs, with 549 total runs (base code)
    example 2: (increased learning rate)ended after approximately 2000 episodes with very low scores, very inefficient solution
    example 3: solved in 446 runs, with 546 total runs (increased exploration factor)
    
