# Custom Environment Deep Reinforcement Learning Tutorial

Following youtube video [Building a Custom Environment for Deep Reinforcement Learning with OpenAI Gym and Python](https://www.youtube.com/watch?v=bD6V3rcr_54&ab_channel=NicholasRenotte) by [Nicholas Renotte](https://www.youtube.com/channel/UCHXa4OpASJEwrHrLeIzw7Yg)

In this video you'll go through: 
1. How to build a  custom environment with OpenAI  Gym
2. Training a DQN Agent on a Custom OpenAI Environment
3. Testing out a Reinforcement Learning agent on a Custom Environment

## Problem

### Our Scenario
We are tired of haveing to mess around with the temperature in the shower

![image.png](attachment:image.png)

### Goal
Build a reinforcement learning model to adjust the temperature automatically to get it in the optimal range

## Enviroment Description

### Optimal Temperature
Between 37 and 39 degrees

### Shower Length
60 Seconds

### Actions
Turn Down, Leave, Turn Up

### Task
Build a model that keeps us in the optimal range for as long as possible.

## 0. Install Dependencies

# 1. Test Random Environment with OpenAI Gym

In [1]:
from gym import Env
from gym.spaces import Discrete, Box
import numpy as np
import random

In [2]:
class ShowerEnv(Env):
    def __init__(self):
        # Actions we can take, down, stay, up
        self.action_space = Discrete(3)
        # Temperature array
        self.observation_space = Box(low=np.array([0]), high=np.array([100]))
        # Set start temp
        self.state = 38 + random.randint(-3, 3)
        # Set shower length
        self.shower_length = 60
        
    def step(self, action):
        # Apply action
        # 0 - 1 = -1 temperature
        # 1 - 1 = 0 temperature
        # 2 - 1 = 1 temperature
        self.state += action - 1
        # Reduce shower length by 1 second
        self.shower_length -= 1
        
        # Calculate reward
        if self.state >= 37 and self.state <= 39:
            reward = 1
        else:
            reward = -1
            
        # Check if shower is done
        if self.shower_length <= 0:
            done = True
        else:
            done = False
            
        # Apply temperature noise
        self.state += random.randint(-1, 1)
        # Set placeholder for info, required by OpenAI gym
        info = {}
        
        # Return step information
        return self.state, reward, done, info

    def render(self):
        # Implement vis
        pass
        
    def reset(self):
        # Reset shower temperature
        self.state = 38 + random.randint(-3, 3)
        # Reset shower time
        self.shower_length = 60
        return self.state

#### Note

> __Discrete__: Discrete is going to give us three discrete value __0__, __1__, or __2__

> __Box__: The Box space gives you a lot more flexibility and allows you to also pass through multiple values. We've used a single value array but it could hold an n dimensional tensor and couldb be used to hold daataframes, images and audio.

> __Noise__: Having the noise here is optional but aids to simulate a real environment. You can drop it off and your model will likely converge faster!

In [3]:
env = ShowerEnv()



In [6]:
env.action_space.sample()

1

In [7]:
env.observation_space.sample()

array([2.7881846], dtype=float32)

In [8]:
episodes = 10
for episode in range(1, episodes+1):
    state = env.reset()
    done = False
    score = 0
    
    while not done:
        # env.render()
        action = env.action_space.sample()
        n_state, reward, done, info = env.step(action)
        score += reward
    print('Episode:{} Score:{}'.format(episode, score))

Episode:1 Score:-32
Episode:2 Score:-38
Episode:3 Score:-54
Episode:4 Score:-18
Episode:5 Score:-48
Episode:6 Score:-16
Episode:7 Score:-56
Episode:8 Score:-32
Episode:9 Score:-48
Episode:10 Score:-40


# 2. Create a Deep Learning Model with Keras

In [9]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam

In [101]:
states = env.observation_space.shape
actions = env.action_space.n

In [102]:
actions, states

(3, (1,))

In [20]:
def build_model(states, actions):
    model = Sequential()
    model.add(Dense(24, activation='relu', input_shape=states))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(actions, activation='linear'))
    return model

In [21]:
del model

In [22]:
model = build_model(states, actions)

In [15]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 24)                48        
_________________________________________________________________
dense_1 (Dense)              (None, 24)                600       
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 75        
Total params: 723
Trainable params: 723
Non-trainable params: 0
_________________________________________________________________


# 3. Build Agent with Keras-RL

In [28]:
from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

In [29]:
def build_agent(model, actions):
    policy = BoltzmannQPolicy()
    memory = SequentialMemory(limit=50000, window_length=1)
    dqn = DQNAgent(model=model, memory=memory, policy=policy, 
                   nb_actions=actions, nb_steps_warmup=10, target_model_update=1e-2)
    return dqn

In [30]:
dqn = build_agent(model, actions)

In [31]:
# mae = Mean Absolute Error
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

In [32]:
dqn.fit(env, nb_steps=80000, visualize=False, verbose=1)

Training for 80000 steps ...
Interval 1 (0 steps performed)
166 episodes - episode_reward: -36.614 [-60.000, 16.000] - loss: 2.860 - mae: 12.996 - mean_q: -18.706

Interval 2 (10000 steps performed)
167 episodes - episode_reward: -27.904 [-60.000, 30.000] - loss: 2.520 - mae: 11.487 - mean_q: -16.447

Interval 3 (20000 steps performed)
167 episodes - episode_reward: -13.222 [-60.000, 42.000] - loss: 1.641 - mae: 8.668 - mean_q: -12.192

Interval 4 (30000 steps performed)
166 episodes - episode_reward: 28.880 [-26.000, 58.000] - loss: 1.749 - mae: 6.760 - mean_q: 6.443

Interval 5 (40000 steps performed)
167 episodes - episode_reward: 38.551 [-32.000, 60.000] - loss: 9.381 - mae: 21.473 - mean_q: 32.862

Interval 6 (50000 steps performed)
167 episodes - episode_reward: 36.838 [-4.000, 58.000] - loss: 12.716 - mae: 25.272 - mean_q: 38.487

Interval 7 (60000 steps performed)
166 episodes - episode_reward: 27.759 [-28.000, 58.000] - loss: 12.048 - mae: 25.093 - mean_q: 37.949

Interval 8 (

<tensorflow.python.keras.callbacks.History at 0x16793a23b20>

In [97]:
dqn.fit(env, nb_steps=10000, visualize=False, verbose=1)

Training for 10000 steps ...
Interval 1 (0 steps performed)
done, took 91.730 seconds


<tensorflow.python.keras.callbacks.History at 0x16794f1f730>

#### Improving Results
We have trained for 50,000 steps here but could train the agent for longer to produce better results. Try changing the parameter __nb_steps__ to train for longer, you could also provide more information in the state for the model to learn from.

In [98]:
scores = dqn.test(env, nb_episodes=100, visualize=False)
print("Mean Score: ", np.mean(scores.history['episode_reward']))

Testing for 100 episodes ...
Episode 1: reward: 58.000, steps: 60
Episode 2: reward: 60.000, steps: 60
Episode 3: reward: 60.000, steps: 60
Episode 4: reward: 60.000, steps: 60
Episode 5: reward: 58.000, steps: 60
Episode 6: reward: 60.000, steps: 60
Episode 7: reward: 60.000, steps: 60
Episode 8: reward: 58.000, steps: 60
Episode 9: reward: 60.000, steps: 60
Episode 10: reward: 60.000, steps: 60
Episode 11: reward: 60.000, steps: 60
Episode 12: reward: 60.000, steps: 60
Episode 13: reward: 60.000, steps: 60
Episode 14: reward: 60.000, steps: 60
Episode 15: reward: 60.000, steps: 60
Episode 16: reward: 60.000, steps: 60
Episode 17: reward: 60.000, steps: 60
Episode 18: reward: 60.000, steps: 60
Episode 19: reward: 58.000, steps: 60
Episode 20: reward: 58.000, steps: 60
Episode 21: reward: 58.000, steps: 60
Episode 22: reward: 60.000, steps: 60
Episode 23: reward: 60.000, steps: 60
Episode 24: reward: 60.000, steps: 60
Episode 25: reward: 60.000, steps: 60
Episode 26: reward: 60.000, st

In [None]:
_ = dqn.test(env, nb_episodes=15, visualize=True)
env.close()

# 4. Reloading Agent from Memory

In [99]:
dqn.save_weights('shower_agent_dqn_weights.h5f', overwrite=True)

In [123]:
del model
del dqn
del env

In [124]:
env = ShowerEnv()
actions = env.action_space.n
states = env.observation_space.shape[0]
model = build_model([states], actions)
dqn = build_agent(model, actions)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

In [125]:
dqn.load_weights('shower_agent_dqn_weights.h5f')

In [126]:
scores = dqn.test(env, nb_episodes=5, visualize=False)
print("Mean Score: ", np.mean(scores.history['episode_reward']))

Testing for 5 episodes ...
Episode 1: reward: 56.000, steps: 60
Episode 2: reward: 58.000, steps: 60
Episode 3: reward: 58.000, steps: 60
Episode 4: reward: 60.000, steps: 60
Episode 5: reward: 60.000, steps: 60
Mean Score:  58.4
