# Random Agents

This notebook covers basic usage of OpenAI gym, including how to display your RL agents as GIFs in Jupyter.

Let's start by importing our libraries.

In [1]:
# Insert HTML into Jupyter for it to display our GIFs
from IPython.display import HTML

# Transform gym frames to GIFs
# PIL is the pillow library
import PIL.Image

# numpy to deliver arrays to PIL
import numpy as np

# OpenAI gym
import gym

# Import local script the defines simple linear agents
import agents

## What's cartpole?

I'm still learning this stuff, so I'll be using one of the simplest RL games: cartpole.

Cartpole is basically a pole in a little cart. If the pole falls over, you lose. Your goal is to move that cart left or right and keep that little pole safe.

You can view a very useful description of cartpole [here](https://github.com/openai/gym/wiki/CartPole-v1). That information is crucial for knowing what kind of inputs and outputs cartpole works with. The OpenAI website also has a page with a little visual demo [here](https://gym.openai.com/envs/CartPole-v1/).

In case you're curious, cartpole-v1 just seems to be cartpole-v0 with a higher time limit. Version 0 terminates at 200 time steps, while version 1 terminates at 500. I prefer version 1 since it's a bit more challenging.

## Basic gym

Here's the sample code given by the [gym documentation](https://gym.openai.com/docs/#environments). I've changed it a little bit so that the frames are imported into pillow (PIL). pillow can then save the sequence of frames as a GIF, which is easy to work with.

If you run this code, you'll get a small window pop-up as the game as played.

Here cartpole is being played randomly. The agent randomly decides whether to move the cart left of right. The pole doesn't stand a chance.

In [2]:
env = gym.make('CartPole-v1')

# Random actor
observation = env.reset()
cum_reward = 0
frames = []
for t in range(1000):
    # Render into buffer. 
    # You will still see the window.
    frames.append(PIL.Image.fromarray(env.render(mode = 'rgb_array'), "RGB"))
    action = env.action_space.sample()
    observation, reward, done, info = env.step(action)
    if done:
        break
env.close()

# Save the GIF
frames[0].save('./images/random_actor.gif', format='GIF', append_images=frames[1:], save_all=True, duration=10, loop=0)

# Display the GIF in Jupyter
HTML('<img src="./images/random_actor.gif">')

## Trying the simple model randomly

Let's try something else. I've written a little `LinearAgent` class in a separate python script file. This linear agent is just a logistic regression of the form $\sigma(w_0 + w_1 a + w_2 b + w_3 c + w_4 d) = y$, where $w_0$ is an intercept term and $\sigma$ is the [sigmoid function](https://en.wikipedia.org/wiki/Sigmoid_function).

The code below randomly initializes the weights and plays the game. It's just guessing at a solution and trying it. Sometimes it even solves the game!

In [3]:
env = gym.make('CartPole-v1')

best_w = None
best_score = 0
# Simple model
for m in range(25):
    actor = agents.LinearAgent()
    scores = []
    for i in range(100):
        observation = env.reset()
        score = 0
        frames = []
        for t in range(1000):
            action = actor.predict(observation)
            observation, reward, done, info = env.step(action)
            if done:
                break
            score += reward
        scores.append(score)
    print(f"[{m+1:3}] Minimum {min(scores):5.1f} Maximum {max(scores):5.1f} Average {sum(scores)/len(scores):5.1f}")
    if sum(scores)/len(scores) > best_score:
        best_score = sum(scores)/len(scores)
        best_w = actor
env.close()
print(best_score)
print(best_w)

[  1] Minimum   7.0 Maximum  10.0 Average   8.4
[  2] Minimum   7.0 Maximum  10.0 Average   8.3
[  3] Minimum  17.0 Maximum  53.0 Average  29.1
[  4] Minimum   7.0 Maximum  10.0 Average   8.2
[  5] Minimum   7.0 Maximum  10.0 Average   8.3
[  6] Minimum   7.0 Maximum  10.0 Average   8.3
[  7] Minimum   7.0 Maximum  10.0 Average   8.3
[  8] Minimum   7.0 Maximum  10.0 Average   8.4
[  9] Minimum   7.0 Maximum  10.0 Average   8.4
[ 10] Minimum   7.0 Maximum  10.0 Average   8.3
[ 11] Minimum   7.0 Maximum  10.0 Average   8.3
[ 12] Minimum  13.0 Maximum  30.0 Average  20.4
[ 13] Minimum   7.0 Maximum   9.0 Average   8.3
[ 14] Minimum  32.0 Maximum  83.0 Average  45.0
[ 15] Minimum   7.0 Maximum  10.0 Average   8.3
[ 16] Minimum   7.0 Maximum  11.0 Average   9.0
[ 17] Minimum   7.0 Maximum  10.0 Average   8.4
[ 18] Minimum   7.0 Maximum  10.0 Average   8.3
[ 19] Minimum   7.0 Maximum  10.0 Average   8.3
[ 20] Minimum   7.0 Maximum  10.0 Average   8.2
[ 21] Minimum 499.0 Maximum 499.0 Averag

In [4]:
HTML(f"<img src='{best_w.render('random_simple.gif')}'>")

## Trying the second-order agent randomly

I also try a second-order linear agent of the form $\sigma(w_0 + w_1 a + w_2 b + w_3 c + w_4 d + w_5 e + w_6 f + w_7 g + w_8 h) = y$. This try I try it 100 times. Sometimes the agents solves the games, usually it doesn't.

That's about it. Try re-running the agents to see if you can hit a lucky solution.

In [5]:
env = gym.make('CartPole-v1')

best_w = None
best_score = 0
# Complex model
for m in range(100):
    actor = agents.LinearAgent(order=2)
    scores = []
    for i in range(100):
        observation = env.reset()
        score = 0
        frames = []
        for t in range(1000):
            action = actor.predict(observation)
            observation, reward, done, info = env.step(action)
            if done:
                break
            score += reward
        scores.append(score)
    print(f"[{m+1:3}] Minimum {min(scores):5.1f} Maximum {max(scores):5.1f} Average {sum(scores)/len(scores):5.1f}")
    if sum(scores)/len(scores) > best_score:
        best_score = sum(scores)/len(scores)
        best_w = actor
env.close()
print(best_score)
print(best_w)

[  1] Minimum   7.0 Maximum  10.0 Average   8.3
[  2] Minimum   7.0 Maximum  10.0 Average   8.3
[  3] Minimum   7.0 Maximum  10.0 Average   8.4
[  4] Minimum   9.0 Maximum  17.0 Average  11.8
[  5] Minimum   7.0 Maximum  10.0 Average   8.3
[  6] Minimum   7.0 Maximum  10.0 Average   8.4
[  7] Minimum   7.0 Maximum  10.0 Average   8.3
[  8] Minimum  10.0 Maximum  17.0 Average  12.6
[  9] Minimum   8.0 Maximum  12.0 Average  10.0
[ 10] Minimum   9.0 Maximum  14.0 Average  11.9
[ 11] Minimum   7.0 Maximum  21.0 Average   9.0
[ 12] Minimum   7.0 Maximum  10.0 Average   8.3
[ 13] Minimum   7.0 Maximum  49.0 Average  10.0
[ 14] Minimum   9.0 Maximum  19.0 Average  12.8
[ 15] Minimum   7.0 Maximum  10.0 Average   8.4
[ 16] Minimum  25.0 Maximum 103.0 Average  46.0
[ 17] Minimum   7.0 Maximum  10.0 Average   8.4
[ 18] Minimum   7.0 Maximum  10.0 Average   8.4
[ 19] Minimum  30.0 Maximum 204.0 Average  83.2
[ 20] Minimum   8.0 Maximum  13.0 Average  10.1
[ 21] Minimum   7.0 Maximum  10.0 Averag

In [6]:
HTML(f"<img src='{best_w.render('random_complex.gif')}'>")