# Module Five Assignment: Cartpole Problem
Review the code in this notebook and in the score_logger.py file in the *scores* folder (directory). Once you have reviewed the code, return to this notebook and select **Cell** and then **Run All** from the menu bar to run this code. The code takes several minutes to run.

In [1]:
## This is the initial imported code from 
## the Cartpole zipped file
## The Solved is in 26 runs, with 126 total runs.
import random  
import gym  
import numpy as np  
from collections import deque  
from keras.models import Sequential  
from keras.layers import Dense  
from keras.optimizers import Adam  
  
  
from scores.score_logger import ScoreLogger  
  
ENV_NAME = "CartPole-v1"  
  
GAMMA = 0.95  
LEARNING_RATE = 0.001  
  
MEMORY_SIZE = 1000000  
BATCH_SIZE = 20  
  
EXPLORATION_MAX = 1.0  
EXPLORATION_MIN = 0.01  
EXPLORATION_DECAY = 0.995  
  
  
class DQNSolver:  
  
    def __init__(self, observation_space, action_space):  
        self.exploration_rate = EXPLORATION_MAX  
  
        self.action_space = action_space  
        self.memory = deque(maxlen=MEMORY_SIZE)  
  
        self.model = Sequential()  
        self.model.add(Dense(24, input_shape=(observation_space,), activation="relu"))  
        self.model.add(Dense(24, activation="relu"))  
        self.model.add(Dense(self.action_space, activation="linear"))  
        self.model.compile(loss="mse", optimizer=Adam(lr=LEARNING_RATE))  
  
    def remember(self, state, action, reward, next_state, done):  
        self.memory.append((state, action, reward, next_state, done))  
  
    def act(self, state):  
        if np.random.rand() < self.exploration_rate:  
            return random.randrange(self.action_space)  
        q_values = self.model.predict(state)  
        return np.argmax(q_values[0])  
  
    def experience_replay(self):  
        if len(self.memory) < BATCH_SIZE:  
            return  
        batch = random.sample(self.memory, BATCH_SIZE)  
        for state, action, reward, state_next, terminal in batch:  
            q_update = reward  
            if not terminal:  
                q_update = (reward + GAMMA * np.amax(self.model.predict(state_next)[0]))  
            q_values = self.model.predict(state)  
            q_values[0][action] = q_update  
            self.model.fit(state, q_values, verbose=0)  
        self.exploration_rate *= EXPLORATION_DECAY  
        self.exploration_rate = max(EXPLORATION_MIN, self.exploration_rate)  
  
  
def cartpole():  
    env = gym.make(ENV_NAME)  
    score_logger = ScoreLogger(ENV_NAME)  
    observation_space = env.observation_space.shape[0]  
    action_space = env.action_space.n  
    dqn_solver = DQNSolver(observation_space, action_space)  
    run = 0  
    while True:  
        run += 1  
        state = env.reset()  
        state = np.reshape(state, [1, observation_space])  
        step = 0  
        while True:  
            step += 1  
            #env.render()  
            action = dqn_solver.act(state)  
            state_next, reward, terminal, info = env.step(action)  
            reward = reward if not terminal else -reward  
            state_next = np.reshape(state_next, [1, observation_space])  
            dqn_solver.remember(state, action, reward, state_next, terminal)  
            state = state_next  
            if terminal:  
                print ("Run: " + str(run) + ", exploration: " + str(dqn_solver.exploration_rate) + ", score: " + str(step))  
                score_logger.add_score(step, run)  
                break  
            dqn_solver.experience_replay()  



Using TensorFlow backend.


In [2]:
cartpole()

Run: 1, exploration: 0.995, score: 21
Scores: (min: 21, avg: 21, max: 21)

Run: 2, exploration: 0.9137248860125932, score: 18
Scores: (min: 18, avg: 19.5, max: 21)

Run: 3, exploration: 0.7590483508202912, score: 38
Scores: (min: 18, avg: 25.666666666666668, max: 38)

Run: 4, exploration: 0.7219385759785162, score: 11
Scores: (min: 11, avg: 22, max: 38)

Run: 5, exploration: 0.6763948591909945, score: 14
Scores: (min: 11, avg: 20.4, max: 38)

Run: 6, exploration: 0.5848838636585911, score: 30
Scores: (min: 11, avg: 22, max: 38)

Run: 7, exploration: 0.5562889678716474, score: 11
Scores: (min: 11, avg: 20.428571428571427, max: 38)

Run: 8, exploration: 0.5211953074858876, score: 14
Scores: (min: 11, avg: 19.625, max: 38)

Run: 9, exploration: 0.49571413690105054, score: 11
Scores: (min: 11, avg: 18.666666666666668, max: 38)

Run: 10, exploration: 0.47147873742168567, score: 11
Scores: (min: 11, avg: 17.9, max: 38)

Run: 11, exploration: 0.43952667968844233, score: 15
Scores: (min: 11, a

Run: 91, exploration: 0.01, score: 122
Scores: (min: 9, avg: 151.8131868131868, max: 376)

Run: 92, exploration: 0.01, score: 212
Scores: (min: 9, avg: 152.4673913043478, max: 376)

Run: 93, exploration: 0.01, score: 191
Scores: (min: 9, avg: 152.88172043010752, max: 376)

Run: 94, exploration: 0.01, score: 236
Scores: (min: 9, avg: 153.7659574468085, max: 376)

Run: 95, exploration: 0.01, score: 248
Scores: (min: 9, avg: 154.7578947368421, max: 376)

Run: 96, exploration: 0.01, score: 156
Scores: (min: 9, avg: 154.77083333333334, max: 376)

Run: 97, exploration: 0.01, score: 222
Scores: (min: 9, avg: 155.46391752577318, max: 376)

Run: 98, exploration: 0.01, score: 106
Scores: (min: 9, avg: 154.9591836734694, max: 376)

Run: 99, exploration: 0.01, score: 115
Scores: (min: 9, avg: 154.55555555555554, max: 376)

Run: 100, exploration: 0.01, score: 117
Scores: (min: 9, avg: 154.18, max: 376)

Run: 101, exploration: 0.01, score: 189
Scores: (min: 9, avg: 155.86, max: 376)

Run: 102, explo

NameError: name 'exit' is not defined

In [2]:
## First Modification

## I modified the Exploration factors, including the 
## Exploration_Max (0.98), exploration_Min (0.15), and the exploration_Decay (0.970) 
## Solved in 30 runs, 130 total runs.

import random  
import gym  
import numpy as np  
from collections import deque  
from keras.models import Sequential  
from keras.layers import Dense  
from keras.optimizers import Adam  

from scores.score_logger import ScoreLogger  
  
ENV_NAME = "CartPole-v1"  
  
GAMMA = 0.95  
LEARNING_RATE = 0.001  
  
MEMORY_SIZE = 1000000  
BATCH_SIZE = 20  
  
EXPLORATION_MAX = 0.98  
EXPLORATION_MIN = 0.15  
EXPLORATION_DECAY = 0.970  
  
  
class DQNSolver:  
  
    def __init__(self, observation_space, action_space):  
        self.exploration_rate = EXPLORATION_MAX  
  
        self.action_space = action_space  
        self.memory = deque(maxlen=MEMORY_SIZE)  
  
        self.model = Sequential()  
        self.model.add(Dense(24, input_shape=(observation_space,), activation="relu"))  
        self.model.add(Dense(24, activation="relu"))  
        self.model.add(Dense(self.action_space, activation="linear"))  
        self.model.compile(loss="mse", optimizer=Adam(lr=LEARNING_RATE))  
  
    def remember(self, state, action, reward, next_state, done):  
        self.memory.append((state, action, reward, next_state, done))  
  
    def act(self, state):  
        if np.random.rand() < self.exploration_rate:  
            return random.randrange(self.action_space)  
        q_values = self.model.predict(state)  
        return np.argmax(q_values[0])  
  
    def experience_replay(self):  
        if len(self.memory) < BATCH_SIZE:  
            return  
        batch = random.sample(self.memory, BATCH_SIZE)  
        for state, action, reward, state_next, terminal in batch:  
            q_update = reward  
            if not terminal:  
                q_update = (reward + GAMMA * np.amax(self.model.predict(state_next)[0]))  
            q_values = self.model.predict(state)  
            q_values[0][action] = q_update  
            self.model.fit(state, q_values, verbose=0)  
        self.exploration_rate *= EXPLORATION_DECAY  
        self.exploration_rate = max(EXPLORATION_MIN, self.exploration_rate)  
  
  
def cartpole():  
    env = gym.make(ENV_NAME)  
    score_logger = ScoreLogger(ENV_NAME)  
    observation_space = env.observation_space.shape[0]  
    action_space = env.action_space.n  
    dqn_solver = DQNSolver(observation_space, action_space)  
    run = 0  
    while True:  
        run += 1  
        state = env.reset()  
        state = np.reshape(state, [1, observation_space])  
        step = 0  
        while True:  
            step += 1  
            #env.render()  
            action = dqn_solver.act(state)  
            state_next, reward, terminal, info = env.step(action)  
            reward = reward if not terminal else -reward  
            state_next = np.reshape(state_next, [1, observation_space])  
            dqn_solver.remember(state, action, reward, state_next, terminal)  
            state = state_next  
            if terminal:  
                print ("Run: " + str(run) + ", exploration: " + str(dqn_solver.exploration_rate) + ", score: " + str(step))  
                score_logger.add_score(step, run)  
                break  
            dqn_solver.experience_replay()
            

Using TensorFlow backend.


In [3]:
cartpole()

Run: 1, exploration: 0.4176713010652507, score: 48
Scores: (min: 48, avg: 48, max: 48)

Run: 2, exploration: 0.30800089451711105, score: 11
Scores: (min: 11, avg: 29.5, max: 48)

Run: 3, exploration: 0.22712729072213747, score: 11
Scores: (min: 11, avg: 23.333333333333332, max: 48)

Run: 4, exploration: 0.16748914405478274, score: 11
Scores: (min: 11, avg: 20.25, max: 48)

Run: 5, exploration: 0.15, score: 16
Scores: (min: 11, avg: 19.4, max: 48)

Run: 6, exploration: 0.15, score: 10
Scores: (min: 10, avg: 17.833333333333332, max: 48)

Run: 7, exploration: 0.15, score: 9
Scores: (min: 9, avg: 16.571428571428573, max: 48)

Run: 8, exploration: 0.15, score: 11
Scores: (min: 9, avg: 15.875, max: 48)

Run: 9, exploration: 0.15, score: 11
Scores: (min: 9, avg: 15.333333333333334, max: 48)

Run: 10, exploration: 0.15, score: 17
Scores: (min: 9, avg: 15.5, max: 48)

Run: 11, exploration: 0.15, score: 13
Scores: (min: 9, avg: 15.272727272727273, max: 48)

Run: 12, exploration: 0.15, score: 10


Run: 94, exploration: 0.15, score: 340
Scores: (min: 8, avg: 121.18085106382979, max: 500)

Run: 95, exploration: 0.15, score: 125
Scores: (min: 8, avg: 121.22105263157894, max: 500)

Run: 96, exploration: 0.15, score: 179
Scores: (min: 8, avg: 121.82291666666667, max: 500)

Run: 97, exploration: 0.15, score: 308
Scores: (min: 8, avg: 123.74226804123711, max: 500)

Run: 98, exploration: 0.15, score: 500
Scores: (min: 8, avg: 127.58163265306122, max: 500)

Run: 99, exploration: 0.15, score: 194
Scores: (min: 8, avg: 128.25252525252526, max: 500)

Run: 100, exploration: 0.15, score: 198
Scores: (min: 8, avg: 128.95, max: 500)

Run: 101, exploration: 0.15, score: 148
Scores: (min: 8, avg: 129.95, max: 500)

Run: 102, exploration: 0.15, score: 500
Scores: (min: 8, avg: 134.84, max: 500)

Run: 103, exploration: 0.15, score: 116
Scores: (min: 8, avg: 135.89, max: 500)

Run: 104, exploration: 0.15, score: 208
Scores: (min: 8, avg: 137.86, max: 500)

Run: 105, exploration: 0.15, score: 500
Sco

NameError: name 'exit' is not defined

In [None]:
## Second Modification

## I modified the GAMMA factor by modifying the discount factor to 0.80
## Solved in 84 runs, 184 total runs.

import random  
import gym  
import numpy as np  
from collections import deque  
from keras.models import Sequential  
from keras.layers import Dense  
from keras.optimizers import Adam  
  
  
from scores.score_logger import ScoreLogger  
  
ENV_NAME = "CartPole-v1"  
  
GAMMA = 0.80  
LEARNING_RATE = 0.001  
  
MEMORY_SIZE = 1000000  
BATCH_SIZE = 20  
  
EXPLORATION_MAX = 1.0  
EXPLORATION_MIN = 0.01  
EXPLORATION_DECAY = 0.995  
  
  
class DQNSolver:  
  
    def __init__(self, observation_space, action_space):  
        self.exploration_rate = EXPLORATION_MAX  
  
        self.action_space = action_space  
        self.memory = deque(maxlen=MEMORY_SIZE)  
  
        self.model = Sequential()  
        self.model.add(Dense(24, input_shape=(observation_space,), activation="relu"))  
        self.model.add(Dense(24, activation="relu"))  
        self.model.add(Dense(self.action_space, activation="linear"))  
        self.model.compile(loss="mse", optimizer=Adam(lr=LEARNING_RATE))  
  
    def remember(self, state, action, reward, next_state, done):  
        self.memory.append((state, action, reward, next_state, done))  
  
    def act(self, state):  
        if np.random.rand() < self.exploration_rate:  
            return random.randrange(self.action_space)  
        q_values = self.model.predict(state)  
        return np.argmax(q_values[0])  
  
    def experience_replay(self):  
        if len(self.memory) < BATCH_SIZE:  
            return  
        batch = random.sample(self.memory, BATCH_SIZE)  
        for state, action, reward, state_next, terminal in batch:  
            q_update = reward  
            if not terminal:  
                q_update = (reward + GAMMA * np.amax(self.model.predict(state_next)[0]))  
            q_values = self.model.predict(state)  
            q_values[0][action] = q_update  
            self.model.fit(state, q_values, verbose=0)  
        self.exploration_rate *= EXPLORATION_DECAY  
        self.exploration_rate = max(EXPLORATION_MIN, self.exploration_rate)  
  
  
def cartpole():  
    env = gym.make(ENV_NAME)  
    score_logger = ScoreLogger(ENV_NAME)  
    observation_space = env.observation_space.shape[0]  
    action_space = env.action_space.n  
    dqn_solver = DQNSolver(observation_space, action_space)  
    run = 0  
    while True:  
        run += 1  
        state = env.reset()  
        state = np.reshape(state, [1, observation_space])  
        step = 0  
        while True:  
            step += 1  
            #env.render()  
            action = dqn_solver.act(state)  
            state_next, reward, terminal, info = env.step(action)  
            reward = reward if not terminal else -reward  
            state_next = np.reshape(state_next, [1, observation_space])  
            dqn_solver.remember(state, action, reward, state_next, terminal)  
            state = state_next  
            if terminal:  
                print ("Run: " + str(run) + ", exploration: " + str(dqn_solver.exploration_rate) + ", score: " + str(step))  
                score_logger.add_score(step, run)  
                break  
            dqn_solver.experience_replay()

In [4]:
cartpole()

Run: 1, exploration: 0.7450264374814737, score: 29
Scores: (min: 29, avg: 29, max: 29)

Run: 2, exploration: 0.4863802860551433, score: 15
Scores: (min: 15, avg: 22, max: 29)

Run: 3, exploration: 0.34790850104962573, score: 12
Scores: (min: 12, avg: 18.666666666666668, max: 29)

Run: 4, exploration: 0.25655612262584343, score: 11
Scores: (min: 11, avg: 16.75, max: 29)

Run: 5, exploration: 0.16748914405478274, score: 15
Scores: (min: 11, avg: 16.4, max: 29)

Run: 6, exploration: 0.15, score: 11
Scores: (min: 11, avg: 15.5, max: 29)

Run: 7, exploration: 0.15, score: 9
Scores: (min: 9, avg: 14.571428571428571, max: 29)

Run: 8, exploration: 0.15, score: 8
Scores: (min: 8, avg: 13.75, max: 29)

Run: 9, exploration: 0.15, score: 12
Scores: (min: 8, avg: 13.555555555555555, max: 29)

Run: 10, exploration: 0.15, score: 10
Scores: (min: 8, avg: 13.2, max: 29)

Run: 11, exploration: 0.15, score: 9
Scores: (min: 8, avg: 12.818181818181818, max: 29)

Run: 12, exploration: 0.15, score: 9
Scores

Run: 94, exploration: 0.15, score: 32
Scores: (min: 8, avg: 127.69148936170212, max: 420)

Run: 95, exploration: 0.15, score: 367
Scores: (min: 8, avg: 130.21052631578948, max: 420)

Run: 96, exploration: 0.15, score: 32
Scores: (min: 8, avg: 129.1875, max: 420)

Run: 97, exploration: 0.15, score: 17
Scores: (min: 8, avg: 128.03092783505156, max: 420)

Run: 98, exploration: 0.15, score: 237
Scores: (min: 8, avg: 129.14285714285714, max: 420)

Run: 99, exploration: 0.15, score: 213
Scores: (min: 8, avg: 129.989898989899, max: 420)

Run: 100, exploration: 0.15, score: 139
Scores: (min: 8, avg: 130.08, max: 420)

Run: 101, exploration: 0.15, score: 119
Scores: (min: 8, avg: 130.98, max: 420)

Run: 102, exploration: 0.15, score: 111
Scores: (min: 8, avg: 131.94, max: 420)

Run: 103, exploration: 0.15, score: 198
Scores: (min: 8, avg: 133.8, max: 420)

Run: 104, exploration: 0.15, score: 190
Scores: (min: 8, avg: 135.59, max: 420)

Run: 105, exploration: 0.15, score: 293
Scores: (min: 8, av

NameError: name 'exit' is not defined

In [None]:
## Third Modification

## I modified the Learning factor by changing the value from 0.001 to 0.01 

import random  
import gym  
import numpy as np  
from collections import deque  
from keras.models import Sequential  
from keras.layers import Dense  
from keras.optimizers import Adam  
  
  
from scores.score_logger import ScoreLogger  
  
ENV_NAME = "CartPole-v1"  
  
GAMMA = 0.95  
LEARNING_RATE = 0.01  
  
MEMORY_SIZE = 1000000  
BATCH_SIZE = 20  
  
EXPLORATION_MAX = 1.0  
EXPLORATION_MIN = 0.01  
EXPLORATION_DECAY = 0.995  
  
  
class DQNSolver:  
  
    def __init__(self, observation_space, action_space):  
        self.exploration_rate = EXPLORATION_MAX  
  
        self.action_space = action_space  
        self.memory = deque(maxlen=MEMORY_SIZE)  
  
        self.model = Sequential()  
        self.model.add(Dense(24, input_shape=(observation_space,), activation="relu"))  
        self.model.add(Dense(24, activation="relu"))  
        self.model.add(Dense(self.action_space, activation="linear"))  
        self.model.compile(loss="mse", optimizer=Adam(lr=LEARNING_RATE))  
  
    def remember(self, state, action, reward, next_state, done):  
        self.memory.append((state, action, reward, next_state, done))  
  
    def act(self, state):  
        if np.random.rand() < self.exploration_rate:  
            return random.randrange(self.action_space)  
        q_values = self.model.predict(state)  
        return np.argmax(q_values[0])  
  
    def experience_replay(self):  
        if len(self.memory) < BATCH_SIZE:  
            return  
        batch = random.sample(self.memory, BATCH_SIZE)  
        for state, action, reward, state_next, terminal in batch:  
            q_update = reward  
            if not terminal:  
                q_update = (reward + GAMMA * np.amax(self.model.predict(state_next)[0]))  
            q_values = self.model.predict(state)  
            q_values[0][action] = q_update  
            self.model.fit(state, q_values, verbose=0)  
        self.exploration_rate *= EXPLORATION_DECAY  
        self.exploration_rate = max(EXPLORATION_MIN, self.exploration_rate)  
  
  
def cartpole():  
    env = gym.make(ENV_NAME)  
    score_logger = ScoreLogger(ENV_NAME)  
    observation_space = env.observation_space.shape[0]  
    action_space = env.action_space.n  
    dqn_solver = DQNSolver(observation_space, action_space)  
    run = 0  
    while True:  
        run += 1  
        state = env.reset()  
        state = np.reshape(state, [1, observation_space])  
        step = 0  
        while True:  
            step += 1  
            #env.render()  
            action = dqn_solver.act(state)  
            state_next, reward, terminal, info = env.step(action)  
            reward = reward if not terminal else -reward  
            state_next = np.reshape(state_next, [1, observation_space])  
            dqn_solver.remember(state, action, reward, state_next, terminal)  
            state = state_next  
            if terminal:  
                print ("Run: " + str(run) + ", exploration: " + str(dqn_solver.exploration_rate) + ", score: " + str(step))  
                score_logger.add_score(step, run)  
                break  
            dqn_solver.experience_replay()

In [None]:
cartpole()

Run: 1, exploration: 0.7450264374814737, score: 29
Scores: (min: 29, avg: 29, max: 29)

Run: 2, exploration: 0.5494004701734145, score: 11
Scores: (min: 11, avg: 20, max: 29)

Run: 3, exploration: 0.35866855778311935, score: 15
Scores: (min: 11, avg: 18.333333333333332, max: 29)

Run: 4, exploration: 0.24139365577865607, score: 14
Scores: (min: 11, avg: 17.25, max: 29)

Run: 5, exploration: 0.15759053564114506, score: 15
Scores: (min: 11, avg: 16.8, max: 29)

Run: 6, exploration: 0.15, score: 9
Scores: (min: 9, avg: 15.5, max: 29)

Run: 7, exploration: 0.15, score: 9
Scores: (min: 9, avg: 14.571428571428571, max: 29)

Run: 8, exploration: 0.15, score: 11
Scores: (min: 9, avg: 14.125, max: 29)

Run: 9, exploration: 0.15, score: 14
Scores: (min: 9, avg: 14.11111111111111, max: 29)

Run: 10, exploration: 0.15, score: 12
Scores: (min: 9, avg: 13.9, max: 29)

Run: 11, exploration: 0.15, score: 11
Scores: (min: 9, avg: 13.636363636363637, max: 29)

Run: 12, exploration: 0.15, score: 8
Scores

Run: 94, exploration: 0.15, score: 211
Scores: (min: 8, avg: 130.14893617021278, max: 395)

Run: 95, exploration: 0.15, score: 157
Scores: (min: 8, avg: 130.43157894736842, max: 395)

Run: 96, exploration: 0.15, score: 20
Scores: (min: 8, avg: 129.28125, max: 395)

Run: 97, exploration: 0.15, score: 14
Scores: (min: 8, avg: 128.09278350515464, max: 395)

Run: 98, exploration: 0.15, score: 142
Scores: (min: 8, avg: 128.23469387755102, max: 395)

Run: 99, exploration: 0.15, score: 159
Scores: (min: 8, avg: 128.54545454545453, max: 395)

Run: 100, exploration: 0.15, score: 74
Scores: (min: 8, avg: 128, max: 395)

Run: 101, exploration: 0.15, score: 113
Scores: (min: 8, avg: 128.84, max: 395)

Run: 102, exploration: 0.15, score: 129
Scores: (min: 8, avg: 130.02, max: 395)

Run: 103, exploration: 0.15, score: 154
Scores: (min: 8, avg: 131.41, max: 395)

Run: 104, exploration: 0.15, score: 383
Scores: (min: 8, avg: 135.1, max: 395)

Run: 105, exploration: 0.15, score: 263
Scores: (min: 8, av

Run: 194, exploration: 0.15, score: 200
Scores: (min: 10, avg: 152.42, max: 383)

Run: 195, exploration: 0.15, score: 139
Scores: (min: 10, avg: 152.24, max: 383)

Run: 196, exploration: 0.15, score: 71
Scores: (min: 10, avg: 152.75, max: 383)

Run: 197, exploration: 0.15, score: 186
Scores: (min: 10, avg: 154.47, max: 383)

Run: 198, exploration: 0.15, score: 230
Scores: (min: 10, avg: 155.35, max: 383)

Run: 199, exploration: 0.15, score: 105
Scores: (min: 10, avg: 154.81, max: 383)

Run: 200, exploration: 0.15, score: 111
Scores: (min: 10, avg: 155.18, max: 383)

Run: 201, exploration: 0.15, score: 60
Scores: (min: 10, avg: 154.65, max: 383)

Run: 202, exploration: 0.15, score: 248
Scores: (min: 10, avg: 155.84, max: 383)

Run: 203, exploration: 0.15, score: 35
Scores: (min: 10, avg: 154.65, max: 383)

Run: 204, exploration: 0.15, score: 60
Scores: (min: 10, avg: 151.42, max: 338)

Run: 205, exploration: 0.15, score: 35
Scores: (min: 10, avg: 149.14, max: 338)

Run: 206, exploration

Run: 296, exploration: 0.15, score: 96
Scores: (min: 9, avg: 109.21, max: 443)

Run: 297, exploration: 0.15, score: 133
Scores: (min: 9, avg: 108.68, max: 443)

Run: 298, exploration: 0.15, score: 176
Scores: (min: 9, avg: 108.14, max: 443)

Run: 299, exploration: 0.15, score: 107
Scores: (min: 9, avg: 108.16, max: 443)

Run: 300, exploration: 0.15, score: 158
Scores: (min: 9, avg: 108.63, max: 443)

Run: 301, exploration: 0.15, score: 214
Scores: (min: 9, avg: 110.17, max: 443)

Run: 302, exploration: 0.15, score: 151
Scores: (min: 9, avg: 109.2, max: 443)

Run: 303, exploration: 0.15, score: 102
Scores: (min: 9, avg: 109.87, max: 443)

Run: 304, exploration: 0.15, score: 112
Scores: (min: 9, avg: 110.39, max: 443)

Run: 305, exploration: 0.15, score: 129
Scores: (min: 9, avg: 111.33, max: 443)

Run: 306, exploration: 0.15, score: 232
Scores: (min: 9, avg: 113.5, max: 443)

Run: 307, exploration: 0.15, score: 20
Scores: (min: 9, avg: 113.59, max: 443)

Run: 308, exploration: 0.15, sco

Run: 398, exploration: 0.15, score: 103
Scores: (min: 10, avg: 118.29, max: 374)

Run: 399, exploration: 0.15, score: 33
Scores: (min: 10, avg: 117.55, max: 374)

Run: 400, exploration: 0.15, score: 30
Scores: (min: 10, avg: 116.27, max: 374)

Run: 401, exploration: 0.15, score: 31
Scores: (min: 10, avg: 114.44, max: 374)

Run: 402, exploration: 0.15, score: 19
Scores: (min: 10, avg: 113.12, max: 374)

Run: 403, exploration: 0.15, score: 78
Scores: (min: 10, avg: 112.88, max: 374)

Run: 404, exploration: 0.15, score: 44
Scores: (min: 10, avg: 112.2, max: 374)

Run: 405, exploration: 0.15, score: 82
Scores: (min: 10, avg: 111.73, max: 374)

Run: 406, exploration: 0.15, score: 83
Scores: (min: 10, avg: 110.24, max: 374)

Run: 407, exploration: 0.15, score: 75
Scores: (min: 10, avg: 110.79, max: 374)

Run: 408, exploration: 0.15, score: 22
Scores: (min: 10, avg: 110.31, max: 374)

Run: 409, exploration: 0.15, score: 110
Scores: (min: 10, avg: 110.18, max: 374)

Run: 410, exploration: 0.15

Run: 499, exploration: 0.15, score: 64
Scores: (min: 10, avg: 112.29, max: 349)

Run: 500, exploration: 0.15, score: 12
Scores: (min: 10, avg: 112.11, max: 349)

Run: 501, exploration: 0.15, score: 134
Scores: (min: 10, avg: 113.14, max: 349)

Run: 502, exploration: 0.15, score: 70
Scores: (min: 10, avg: 113.65, max: 349)

Run: 503, exploration: 0.15, score: 99
Scores: (min: 10, avg: 113.86, max: 349)

Run: 504, exploration: 0.15, score: 38
Scores: (min: 10, avg: 113.8, max: 349)

Run: 505, exploration: 0.15, score: 12
Scores: (min: 10, avg: 113.1, max: 349)

Run: 506, exploration: 0.15, score: 12
Scores: (min: 10, avg: 112.39, max: 349)

Run: 507, exploration: 0.15, score: 70
Scores: (min: 10, avg: 112.34, max: 349)

Run: 508, exploration: 0.15, score: 342
Scores: (min: 10, avg: 115.54, max: 349)

Run: 509, exploration: 0.15, score: 357
Scores: (min: 10, avg: 118.01, max: 357)

Run: 510, exploration: 0.15, score: 158
Scores: (min: 10, avg: 117.93, max: 357)

Run: 511, exploration: 0.1

Run: 601, exploration: 0.15, score: 65
Scores: (min: 8, avg: 104.07, max: 500)

Run: 602, exploration: 0.15, score: 12
Scores: (min: 8, avg: 103.49, max: 500)

Run: 603, exploration: 0.15, score: 92
Scores: (min: 8, avg: 103.42, max: 500)

Run: 604, exploration: 0.15, score: 56
Scores: (min: 8, avg: 103.6, max: 500)

Run: 605, exploration: 0.15, score: 12
Scores: (min: 8, avg: 103.6, max: 500)

Run: 606, exploration: 0.15, score: 104
Scores: (min: 8, avg: 104.52, max: 500)

Run: 607, exploration: 0.15, score: 43
Scores: (min: 8, avg: 104.25, max: 500)

Run: 608, exploration: 0.15, score: 18
Scores: (min: 8, avg: 101.01, max: 500)

Run: 609, exploration: 0.15, score: 37
Scores: (min: 8, avg: 97.81, max: 500)

Run: 610, exploration: 0.15, score: 62
Scores: (min: 8, avg: 96.85, max: 500)

Run: 611, exploration: 0.15, score: 125
Scores: (min: 8, avg: 96.24, max: 500)

Run: 612, exploration: 0.15, score: 153
Scores: (min: 8, avg: 95.66, max: 500)

Run: 613, exploration: 0.15, score: 71
Scor

Run: 705, exploration: 0.15, score: 59
Scores: (min: 13, avg: 99.68, max: 346)

Run: 706, exploration: 0.15, score: 233
Scores: (min: 13, avg: 100.97, max: 346)

Run: 707, exploration: 0.15, score: 218
Scores: (min: 13, avg: 102.72, max: 346)

Run: 708, exploration: 0.15, score: 154
Scores: (min: 13, avg: 104.08, max: 346)

Run: 709, exploration: 0.15, score: 13
Scores: (min: 13, avg: 103.84, max: 346)

Run: 710, exploration: 0.15, score: 74
Scores: (min: 13, avg: 103.96, max: 346)

Run: 711, exploration: 0.15, score: 47
Scores: (min: 13, avg: 103.18, max: 346)

Run: 712, exploration: 0.15, score: 17
Scores: (min: 13, avg: 101.82, max: 346)

Run: 713, exploration: 0.15, score: 18
Scores: (min: 13, avg: 101.29, max: 346)

Run: 714, exploration: 0.15, score: 50
Scores: (min: 13, avg: 101.54, max: 346)

Run: 715, exploration: 0.15, score: 72
Scores: (min: 13, avg: 101.98, max: 346)

Run: 716, exploration: 0.15, score: 123
Scores: (min: 13, avg: 103.06, max: 346)

Run: 717, exploration: 0.

Run: 806, exploration: 0.15, score: 17
Scores: (min: 11, avg: 103.03, max: 291)

Run: 807, exploration: 0.15, score: 175
Scores: (min: 11, avg: 102.6, max: 291)

Run: 808, exploration: 0.15, score: 100
Scores: (min: 11, avg: 102.06, max: 291)

Run: 809, exploration: 0.15, score: 154
Scores: (min: 11, avg: 103.47, max: 291)

Run: 810, exploration: 0.15, score: 232
Scores: (min: 11, avg: 105.05, max: 291)

Run: 811, exploration: 0.15, score: 106
Scores: (min: 11, avg: 105.64, max: 291)

Run: 812, exploration: 0.15, score: 234
Scores: (min: 11, avg: 107.81, max: 291)

Run: 813, exploration: 0.15, score: 53
Scores: (min: 11, avg: 108.16, max: 291)

Run: 814, exploration: 0.15, score: 239
Scores: (min: 11, avg: 110.05, max: 291)

Run: 815, exploration: 0.15, score: 101
Scores: (min: 11, avg: 110.34, max: 291)

Run: 816, exploration: 0.15, score: 108
Scores: (min: 11, avg: 110.19, max: 291)

Run: 817, exploration: 0.15, score: 97
Scores: (min: 11, avg: 110.84, max: 291)

Run: 818, exploratio

Run: 907, exploration: 0.15, score: 146
Scores: (min: 13, avg: 129.87, max: 459)

Run: 908, exploration: 0.15, score: 14
Scores: (min: 13, avg: 129.01, max: 459)

Run: 909, exploration: 0.15, score: 36
Scores: (min: 13, avg: 127.83, max: 459)

Run: 910, exploration: 0.15, score: 80
Scores: (min: 13, avg: 126.31, max: 459)

Run: 911, exploration: 0.15, score: 118
Scores: (min: 13, avg: 126.43, max: 459)

Run: 912, exploration: 0.15, score: 77
Scores: (min: 13, avg: 124.86, max: 459)

Run: 913, exploration: 0.15, score: 26
Scores: (min: 13, avg: 124.59, max: 459)

Run: 914, exploration: 0.15, score: 105
Scores: (min: 13, avg: 123.25, max: 459)

Run: 915, exploration: 0.15, score: 24
Scores: (min: 13, avg: 122.48, max: 459)

Run: 916, exploration: 0.15, score: 100
Scores: (min: 13, avg: 122.4, max: 459)

Run: 917, exploration: 0.15, score: 407
Scores: (min: 13, avg: 125.5, max: 459)

Run: 918, exploration: 0.15, score: 121
Scores: (min: 13, avg: 125.62, max: 459)

Run: 919, exploration: 0

Run: 1008, exploration: 0.15, score: 104
Scores: (min: 11, avg: 101.95, max: 407)

Run: 1009, exploration: 0.15, score: 17
Scores: (min: 11, avg: 101.76, max: 407)

Run: 1010, exploration: 0.15, score: 11
Scores: (min: 11, avg: 101.07, max: 407)

Run: 1011, exploration: 0.15, score: 113
Scores: (min: 11, avg: 101.02, max: 407)

Run: 1012, exploration: 0.15, score: 15
Scores: (min: 11, avg: 100.4, max: 407)

Run: 1013, exploration: 0.15, score: 23
Scores: (min: 11, avg: 100.37, max: 407)

Run: 1014, exploration: 0.15, score: 161
Scores: (min: 11, avg: 100.93, max: 407)

Run: 1015, exploration: 0.15, score: 110
Scores: (min: 11, avg: 101.79, max: 407)

Run: 1016, exploration: 0.15, score: 147
Scores: (min: 11, avg: 102.26, max: 407)

Run: 1017, exploration: 0.15, score: 153
Scores: (min: 11, avg: 99.72, max: 407)

Run: 1018, exploration: 0.15, score: 161
Scores: (min: 11, avg: 100.12, max: 407)

Run: 1019, exploration: 0.15, score: 116
Scores: (min: 11, avg: 98.85, max: 407)

Run: 1020, 

Run: 1109, exploration: 0.15, score: 139
Scores: (min: 11, avg: 79.29, max: 311)

Run: 1110, exploration: 0.15, score: 176
Scores: (min: 11, avg: 80.94, max: 311)

Run: 1111, exploration: 0.15, score: 81
Scores: (min: 11, avg: 80.62, max: 311)

Run: 1112, exploration: 0.15, score: 75
Scores: (min: 11, avg: 81.22, max: 311)

Run: 1113, exploration: 0.15, score: 43
Scores: (min: 11, avg: 81.42, max: 311)

Run: 1114, exploration: 0.15, score: 55
Scores: (min: 11, avg: 80.36, max: 311)

Run: 1115, exploration: 0.15, score: 146
Scores: (min: 11, avg: 80.72, max: 311)

Run: 1116, exploration: 0.15, score: 18
Scores: (min: 11, avg: 79.43, max: 311)

Run: 1117, exploration: 0.15, score: 175
Scores: (min: 11, avg: 79.65, max: 311)

Run: 1118, exploration: 0.15, score: 184
Scores: (min: 11, avg: 79.88, max: 311)

Run: 1119, exploration: 0.15, score: 139
Scores: (min: 11, avg: 80.11, max: 311)

Run: 1120, exploration: 0.15, score: 355
Scores: (min: 11, avg: 82.75, max: 355)

Run: 1121, exploratio

Run: 1210, exploration: 0.15, score: 119
Scores: (min: 10, avg: 111.09, max: 478)

Run: 1211, exploration: 0.15, score: 86
Scores: (min: 10, avg: 111.14, max: 478)

Run: 1212, exploration: 0.15, score: 101
Scores: (min: 10, avg: 111.4, max: 478)

Run: 1213, exploration: 0.15, score: 188
Scores: (min: 10, avg: 112.85, max: 478)

Run: 1214, exploration: 0.15, score: 127
Scores: (min: 10, avg: 113.57, max: 478)

Run: 1215, exploration: 0.15, score: 95
Scores: (min: 10, avg: 113.06, max: 478)

Run: 1216, exploration: 0.15, score: 78
Scores: (min: 10, avg: 113.66, max: 478)

Run: 1217, exploration: 0.15, score: 110
Scores: (min: 10, avg: 113.01, max: 478)

Run: 1218, exploration: 0.15, score: 190
Scores: (min: 10, avg: 113.07, max: 478)

Run: 1219, exploration: 0.15, score: 46
Scores: (min: 10, avg: 112.14, max: 478)

Run: 1220, exploration: 0.15, score: 118
Scores: (min: 10, avg: 109.77, max: 478)

Run: 1221, exploration: 0.15, score: 67
Scores: (min: 10, avg: 109.29, max: 478)

Run: 1222,

Run: 1310, exploration: 0.15, score: 25
Scores: (min: 10, avg: 99.57, max: 500)

Run: 1311, exploration: 0.15, score: 12
Scores: (min: 10, avg: 98.83, max: 500)

Run: 1312, exploration: 0.15, score: 65
Scores: (min: 10, avg: 98.47, max: 500)

Run: 1313, exploration: 0.15, score: 52
Scores: (min: 10, avg: 97.11, max: 500)

Run: 1314, exploration: 0.15, score: 193
Scores: (min: 10, avg: 97.77, max: 500)

Run: 1315, exploration: 0.15, score: 15
Scores: (min: 10, avg: 96.97, max: 500)

Run: 1316, exploration: 0.15, score: 82
Scores: (min: 10, avg: 97.01, max: 500)

Run: 1317, exploration: 0.15, score: 32
Scores: (min: 10, avg: 96.23, max: 500)

Run: 1318, exploration: 0.15, score: 68
Scores: (min: 10, avg: 95.01, max: 500)

Run: 1319, exploration: 0.15, score: 147
Scores: (min: 10, avg: 96.02, max: 500)

Run: 1320, exploration: 0.15, score: 84
Scores: (min: 10, avg: 95.68, max: 500)

Run: 1321, exploration: 0.15, score: 14
Scores: (min: 10, avg: 95.15, max: 500)

Run: 1322, exploration: 0.

Run: 1410, exploration: 0.15, score: 91
Scores: (min: 11, avg: 118.54, max: 500)

Run: 1411, exploration: 0.15, score: 193
Scores: (min: 11, avg: 120.35, max: 500)

Run: 1412, exploration: 0.15, score: 217
Scores: (min: 11, avg: 121.87, max: 500)

Run: 1413, exploration: 0.15, score: 104
Scores: (min: 11, avg: 122.39, max: 500)

Run: 1414, exploration: 0.15, score: 138
Scores: (min: 11, avg: 121.84, max: 500)

Run: 1415, exploration: 0.15, score: 94
Scores: (min: 11, avg: 122.63, max: 500)

Run: 1416, exploration: 0.15, score: 191
Scores: (min: 11, avg: 123.72, max: 500)

Run: 1417, exploration: 0.15, score: 133
Scores: (min: 11, avg: 124.73, max: 500)

Run: 1418, exploration: 0.15, score: 11
Scores: (min: 11, avg: 124.16, max: 500)

Run: 1419, exploration: 0.15, score: 88
Scores: (min: 11, avg: 123.57, max: 500)

Run: 1420, exploration: 0.15, score: 285
Scores: (min: 11, avg: 125.58, max: 500)

Run: 1421, exploration: 0.15, score: 106
Scores: (min: 11, avg: 126.5, max: 500)

Run: 1422

Run: 1511, exploration: 0.15, score: 124
Scores: (min: 9, avg: 139.57, max: 500)

Run: 1512, exploration: 0.15, score: 48
Scores: (min: 9, avg: 137.88, max: 500)

Run: 1513, exploration: 0.15, score: 80
Scores: (min: 9, avg: 137.64, max: 500)

Run: 1514, exploration: 0.15, score: 82
Scores: (min: 9, avg: 137.08, max: 500)

Run: 1515, exploration: 0.15, score: 202
Scores: (min: 9, avg: 138.16, max: 500)

Run: 1516, exploration: 0.15, score: 97
Scores: (min: 9, avg: 137.22, max: 500)

Run: 1517, exploration: 0.15, score: 14
Scores: (min: 9, avg: 136.03, max: 500)

Run: 1518, exploration: 0.15, score: 176
Scores: (min: 9, avg: 137.68, max: 500)

Run: 1519, exploration: 0.15, score: 158
Scores: (min: 9, avg: 138.38, max: 500)

Run: 1520, exploration: 0.15, score: 88
Scores: (min: 9, avg: 136.41, max: 500)

Run: 1521, exploration: 0.15, score: 77
Scores: (min: 9, avg: 136.12, max: 500)

Run: 1522, exploration: 0.15, score: 330
Scores: (min: 9, avg: 138.24, max: 500)

Run: 1523, exploration:

Run: 1611, exploration: 0.15, score: 52
Scores: (min: 10, avg: 105.79, max: 500)

Run: 1612, exploration: 0.15, score: 55
Scores: (min: 10, avg: 105.86, max: 500)

Run: 1613, exploration: 0.15, score: 47
Scores: (min: 10, avg: 105.53, max: 500)

Run: 1614, exploration: 0.15, score: 73
Scores: (min: 10, avg: 105.44, max: 500)

Run: 1615, exploration: 0.15, score: 29
Scores: (min: 10, avg: 103.71, max: 500)

Run: 1616, exploration: 0.15, score: 37
Scores: (min: 10, avg: 103.11, max: 500)

Run: 1617, exploration: 0.15, score: 13
Scores: (min: 10, avg: 103.1, max: 500)

Run: 1618, exploration: 0.15, score: 162
Scores: (min: 10, avg: 102.96, max: 500)

Run: 1619, exploration: 0.15, score: 139
Scores: (min: 10, avg: 102.77, max: 500)

Run: 1620, exploration: 0.15, score: 88
Scores: (min: 10, avg: 102.77, max: 500)

Run: 1621, exploration: 0.15, score: 128
Scores: (min: 10, avg: 103.28, max: 500)

Run: 1622, exploration: 0.15, score: 102
Scores: (min: 10, avg: 101, max: 500)

Run: 1623, explo

Run: 1711, exploration: 0.15, score: 23
Scores: (min: 10, avg: 103.75, max: 500)

Run: 1712, exploration: 0.15, score: 121
Scores: (min: 10, avg: 104.41, max: 500)

Run: 1713, exploration: 0.15, score: 124
Scores: (min: 10, avg: 105.18, max: 500)

Run: 1714, exploration: 0.15, score: 165
Scores: (min: 10, avg: 106.1, max: 500)

Run: 1715, exploration: 0.15, score: 17
Scores: (min: 10, avg: 105.98, max: 500)

Run: 1716, exploration: 0.15, score: 108
Scores: (min: 10, avg: 106.69, max: 500)

Run: 1717, exploration: 0.15, score: 113
Scores: (min: 10, avg: 107.69, max: 500)

Run: 1718, exploration: 0.15, score: 114
Scores: (min: 10, avg: 107.21, max: 500)

Run: 1719, exploration: 0.15, score: 76
Scores: (min: 10, avg: 106.58, max: 500)

Run: 1720, exploration: 0.15, score: 55
Scores: (min: 10, avg: 106.25, max: 500)

Run: 1721, exploration: 0.15, score: 194
Scores: (min: 10, avg: 106.91, max: 500)



# Explain how reinforcement learning concepts apply to the cartpole problem.

### What is the goal of the agent in this case?

The agent's objective is to maintain the pole's state of balance by regulating the cart's movement. Also, reward or penalty points are given to the agent based on how effectively it keeps the pole erect as feedback.

### What are the various state values?

The various state values of the cart pole include the cartpole's position, the cart's speed or velocity, and the pole's angle represents the pole's direction with respect to the vertical axis. The cart swings from right to left at a certain speed. The motion of the cart creates many states in this experiment. There is an initial position where the cart moves right to left and a final position where the pole falls too fast in one direction.

### What are the possible actions that can be performed?

There are two possible actions the cart can perform. To maintain the pole balanced and centered, the cart must either go left or right. Otherwise, it will cause the pole to fall or topple.

### What reinforcement algorithm is used for this problem?
Deep Q-learning is the learning algorithm that was used to teach the agent how to maneuver the cart to maintain the pole's balance. Q-learning is a model-free reinforcement learning algorithm that learns the value of an action in the various states of the cart. Q-learning builds a Q-table to store values for each state-action pair, enabling the agent to choose the most rewarding actions.

# Analyze how experience replay is applied to the cartpole problem.

### How does experience replay work in this algorithm?

To enhance learning efficiency and stability, experience replay is commonly used in reinforcement learning algorithms, particularly in Deep Q Networks (DQN). The Q-values are assessments of the effectiveness of any given move the cart makes. Also, the optimum course of action for a specific problem can be found by modifying the Q-values using various right or left adjustments, as I did in the last three experiments.
Experience replay works by interacting with the environment, while the agent collects experiences such as tuples of state, action, reward, next state, and done flag as it takes actions and observes the outcomes. These experiences are stored in a replay buffer, a fixed-size memory that holds a collection of recent experiences. The learning process uses these sampled events to update the weights of the Q-network, which assists in breaking the temporal connections in the data and improves the reliability and efficiency of training

### What is the effect of introducing a discount factor for calculating the future rewards?

The discount factor in reinforcement learning is a value between 0 and 1 that determines the present value of future rewards. Its introduction influences the agent's decision-making by providing a mechanism to balance immediate rewards against future rewards. The original discount factor in imported code was 1.0. During the modification experiment, I changed it to 0.80 and I had different results. Short-term gains, such as maintaining the pole's balance for a brief period of time, yield more immediate rewards. Since there is a potential that an action won't provide a reward, those with higher risk have a discounted return for the agent.

# Analyze how neural networks are used in deep Q-learning.

### Explain the neural network architecture that is used in the cartpole problem.

The neural network architecture used in DQN for the cartpole problem typically involves the Input Layer, the Output Layer, and the Hidden Layer. The inputs of the neural network are formed as states, and the outputs are created as actions with q-values associated with them. The optimal action currently known for that state is represented by the output with the largest Q-value. The state is fed into the neural network, which then generates Q-values for every possible action the agent can take in that state. The next action chosen by the agent is the one with the highest Q-value.

### How does the neural network make the Q-learning algorithm more efficient?

One of the neural network's advantages which makes it more efficient is that it requires less memory for large or continuous collections. Neural networks allow the agent to estimate Q-values and approximate the Q-function by applying previously learned experiences to new state-action pairs. Neural networks can effectively manage the complexity of issues such as cartpole, which include high-dimensional, continuous state spaces. They are capable of detecting patterns and relationships in the data.

### What difference do you see in the algorithm performance when you increase or decrease the learning rate?

My initial learning rate of the given code was 0.001. I changed it to 0.01 and I had a different output. It took serveral minutes to complete. I have realized that an increase in the learning rate led to faster convergence as the agent adapts more rapidly to new information. However, a very high learning rate might cause instability, leading to oscillations or overshooting optimal values, hindering convergence.
Also, when the learning rate decreases, the learning rate makes learning more gradual and stable, potentially ensuring convergence to a better solution over time. To sum up, I would say that the stability and rate of learning are impacted by changing the learning rate in the Deep Q Networks solution for the cartpole problem. It's critical to choose the ideal learning rate because too high might cause instability and too low can hinder reaching a steady state.

# References

Phy, V. (2021, December 12). Reinforcement Learning Concept on Cart-Pole with DQN. Medium. https://towardsdatascience.com/reinforcement-learning-concept-on-cart-pole-with-dqn-799105ca670

Surma, G. (2021, October 13). Cartpole - Introduction to Reinforcement Learning (DQN - Deep Q-Learning). Medium. https://gsurma.medium.com/cartpole-introduction-to-reinforcement-learning-ed0eb5b58288

Reinforcement Learning (DQN) Tutorial — PyTorch Tutorials 2.1.1+cu121 documentation. (n.d.). https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html

What is “experience replay” and what are its benefits? (n.d.). Data Science Stack Exchange. https://datascience.stackexchange.com/questions/20535/what-is-experience-replay-and-what-are-its-benefits

Beysolow, I. T. (2019). Applied reinforcement learning with python: With openai gym, tensorflow, and keras. Apress L. P..
Singh, S. (2022, October 7). How are neural networks used in deep Q-Learning? https://www.turing.com/kb/how-are-neural-networks-used-in-deep-q-learning

Amine, A. (2021, December 24). Deep Q-Networks: from theory to implementation - Towards Data Science. Medium. https://towardsdatascience.com/deep-q-networks-theory-and-implementation-37543f60dd67



