# Reinforcement Learning - CartPole-v1 Example

## Install the below Dependencies

To run this code, you'll need to install the following primary dependencies:

In [1]:
!pip install tensorflow==2.13.1
!pip install gym #Open AI Gym
!pip install keras
!pip install keras-rl2


Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


## Testing Random Environment with OpenAI-Gym

This code initializes the CartPole-v1 environment from OpenAI Gym and performs random actions for a specified number of episodes. It prints out the episode number and the total score obtained in each episode. The environment is reset at the beginning of each episode, and actions are chosen randomly (either left or right) without any learning algorithm applied.

In [2]:
import gym
import random

In [3]:
env = gym.make('CartPole-v1')
states = env.observation_space.shape[0]
actions = env.action_space.n

In [4]:
states

4

In [5]:
episodes = 10

#Resetting the CartPole enviroment
for episode in range(1, episodes+1):
    state = env.reset()
    done = False
    score = 0


#Testing the random environment
    while not done:
        #env.render()
        action = random.choice([0,1])
        n_state, reward, done, info = env.step(action)
        score+=reward
    print('Episode:{} Score:{}'.format(episode, score))

Episode:1 Score:11.0
Episode:2 Score:11.0
Episode:3 Score:11.0
Episode:4 Score:21.0
Episode:5 Score:19.0
Episode:6 Score:22.0
Episode:7 Score:53.0
Episode:8 Score:42.0
Episode:9 Score:36.0
Episode:10 Score:15.0


## Creating Deep Learning Model with Keras

This code defines a function build_model() that constructs a neural network model using Keras for a reinforcement learning agent. The model architecture consists of an input layer, two hidden layers with ReLU activation functions, and an output layer with a linear activation function. The states and actions parameters determine the input and output dimensions of the model respectively. Finally, the function returns the compiled model. The model.summary() call prints a summary of the model's architecture.

In [6]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers.legacy import Adam

2024-03-23 15:40:22.645321: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-23 15:40:22.673156: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-23 15:40:22.673836: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [7]:
def build_model(states, actions):
    model = Sequential()
    model.add(Flatten(input_shape=(1, states)))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(actions, activation='linear'))
    return model

In [8]:
model= build_model(states, actions)

2024-03-23 15:40:25.773698: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


In [9]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 4)                 0         
                                                                 
 dense (Dense)               (None, 24)                120       
                                                                 
 dense_1 (Dense)             (None, 24)                600       
                                                                 
 dense_2 (Dense)             (None, 2)                 50        
                                                                 
Total params: 770 (3.01 KB)
Trainable params: 770 (3.01 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


## Build Agent with Keras-RL

This code snippet demonstrates how to build and train a Deep Q-Network (DQN) agent using the reinforcement learning (RL) library. The build_agent function constructs the DQN agent with specified policies and memory. The agent is then compiled and trained on an environment (env) with a specified number of steps. Finally, the trained agent's performance is evaluated and the average episode reward is printed. Ensure proper installation of TensorFlow and Keras libraries before running the code.

In [10]:
import tensorflow as tf
from keras import __version__
tf.keras.__version__ = __version__ #As the rl/callbacks.py file has from tensorflow.keras instead of from keras import __version__

from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

In [11]:
states

4

In [12]:
model= build_model(states, actions)

In [13]:
def build_agent(model, actions):
    policy = BoltzmannQPolicy()
    memory = SequentialMemory(limit=5000, window_length=1)
    dqn = DQNAgent(model=model, memory=memory, policy=policy, nb_actions=actions, nb_steps_warmup=10, target_model_update=1e-2)
    return dqn

In [15]:
dqn = build_agent(model, actions)
dqn.compile(Adam(learning_rate=1e-3), metrics=['mae'])
dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)            

Training for 50000 steps ...
Interval 1 (0 steps performed)
    1/10000 [..............................] - ETA: 6:43 - reward: 1.0000

2024-03-23 15:52:04.270207: W tensorflow/c/c_api.cc:304] Operation '{name:'dense_3_2/kernel/Assign' id:738 op device:{requested: '', assigned: ''} def:{{{node dense_3_2/kernel/Assign}} = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](dense_3_2/kernel, dense_3_2/kernel/Initializer/stateless_random_uniform)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2024-03-23 15:52:04.383424: W tensorflow/c/c_api.cc:304] Operation '{name:'dense_5/BiasAdd' id:75 op device:{requested: '', assigned: ''} def:{{{node dense_5/BiasAdd}} = BiasAdd[T=DT_FLOAT, _has_manual_control_dependencies=true, data_format="NHWC"](dense_5/MatMul, dense_5/BiasAdd/ReadVariableOp)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the f

   12/10000 [..............................] - ETA: 4:02 - reward: 1.0000

2024-03-23 15:52:04.529316: W tensorflow/c/c_api.cc:304] Operation '{name:'loss_7/AddN' id:1050 op device:{requested: '', assigned: ''} def:{{{node loss_7/AddN}} = AddN[N=2, T=DT_FLOAT, _has_manual_control_dependencies=true](loss_7/mul, loss_7/mul_1)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2024-03-23 15:52:04.551680: W tensorflow/c/c_api.cc:304] Operation '{name:'training_2/Adam/dense_4/kernel/v/Assign' id:1250 op device:{requested: '', assigned: ''} def:{{{node training_2/Adam/dense_4/kernel/v/Assign}} = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](training_2/Adam/dense_4/kernel/v, training_2/Adam/dense_4/kernel/v/Initializer/zeros)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the f

52 episodes - episode_reward: 190.577 [12.000, 264.000] - loss: 1.122 - mae: 29.508 - mean_q: 59.371

Interval 2 (10000 steps performed)
41 episodes - episode_reward: 239.463 [41.000, 350.000] - loss: 0.567 - mae: 36.536 - mean_q: 73.476

Interval 3 (20000 steps performed)
32 episodes - episode_reward: 320.094 [94.000, 463.000] - loss: 0.794 - mae: 42.387 - mean_q: 85.168

Interval 4 (30000 steps performed)
23 episodes - episode_reward: 432.913 [173.000, 500.000] - loss: 3.763 - mae: 47.459 - mean_q: 95.311

Interval 5 (40000 steps performed)
done, took 132.679 seconds


<keras.src.callbacks.History at 0x155551560e10>

In [17]:
scores = dqn.test(env, nb_episodes=50, visualize=False)
print(np.mean(scores.history['episode_reward']))

Testing for 50 episodes ...
Episode 1: reward: 361.000, steps: 361
Episode 2: reward: 326.000, steps: 326
Episode 3: reward: 361.000, steps: 361
Episode 4: reward: 339.000, steps: 339
Episode 5: reward: 362.000, steps: 362
Episode 6: reward: 362.000, steps: 362
Episode 7: reward: 338.000, steps: 338
Episode 8: reward: 373.000, steps: 373
Episode 9: reward: 326.000, steps: 326
Episode 10: reward: 364.000, steps: 364
Episode 11: reward: 340.000, steps: 340
Episode 12: reward: 339.000, steps: 339
Episode 13: reward: 345.000, steps: 345
Episode 14: reward: 356.000, steps: 356
Episode 15: reward: 347.000, steps: 347
Episode 16: reward: 329.000, steps: 329
Episode 17: reward: 380.000, steps: 380
Episode 18: reward: 364.000, steps: 364
Episode 19: reward: 370.000, steps: 370
Episode 20: reward: 333.000, steps: 333
Episode 21: reward: 330.000, steps: 330
Episode 22: reward: 359.000, steps: 359
Episode 23: reward: 324.000, steps: 324
Episode 24: reward: 364.000, steps: 364
Episode 25: reward: 3