# OpenAI Gym
This notebook serves as a simple working example of how to interact with OpenAI's AI Gym.

## Environment Setup
1. Install swig: https://www.dev2qa.com/how-to-install-swig-on-macos-linux-and-windows/
2. Set up a python venv (optional):

`pip3 install virtualenv`

`python3 -m virtualenv venv`

`source venv/bin/activate`

3. Install required python packages:
`pip3 install gym==0.17.2 box2d-py==2.3.8`


In [1]:
import numpy as np
from numpy import pi
import gym

import matplotlib.pyplot as plt
%matplotlib inline

The following variables are used for defining the actions and states of the game "BipedalWalker-v3"

BipedalWalker has 2 legs. Each leg has 2 joints. You can apply the torque on each of these joints in the range of (-1, 1)

The state of the game is given by 24 variables, described in more detail here: https://github.com/openai/gym/wiki/BipedalWalker-v2


In [2]:
STATE_SPACE = 24
ACTION_SPACE = 4
ENV = gym.make("BipedalWalker-v3")

# actions
Hip_1 = 0
Knee_1 = 1
Hip_2 = 2
Knee_2 = 3

# state
HULL_ANGLE = 0
HULL_ANGULAR_VELOCITY = 1
VEL_X = 2
VEL_Y = 3
HIP_JOINT_1_ANGLE = 4
HIP_JOINT_1_SPEED = 5
KNEE_JOINT_1_ANGLE = 6
KNEE_JOINT_1_SPEED = 7
LEG_1_GROUND_CONTACT_FLAG = 8
HIP_JOINT_2_ANGLE = 9
HIP_JOINT_2_SPEED = 10
KNEE_JOINT_2_ANGLE = 11
KNEE_JOINT_2_SPEED = 12
LEG_2_GROUND_CONTACT_FLAG = 13




Now we'll define the basic interface with the game. The game is essentially a very rapid turn-based game. In each round, there are 2 main steps:
1. An action is taken by the agent.
2. The game updates according to its current state and the input action from the agent.


In [3]:
def play_game(agent, env, score_augmentation=False, show=False, verbose=False):
    state = env.reset()
    cumulative_score = 0
    if show:
        env.render()
    for step in range(500):
        action = agent(state)
        state, reward, terminal, info = env.step(action)
        if verbose:
            print(state)
        if show:
            env.render()
        cumulative_score += reward
    env.close()
    return cumulative_score

The agent gets a positive reward proportional to the distance walked on the terrain. It can get a total of 300+ reward all the way up to the end.
If agent tumbles, it gets a reward of -100.
There is some negative reward proportional to the torque applied on the joint. So that agent learns to walk smoothly with minimal torque.

Finally, we'll define a couple of test agents to play with.

In [4]:
def random_agent(state):
    """This agent returns random actions."""
    return np.random.uniform(low=-1.0, high=1.0, size=4)


def stupid_agent(state):
    """A very simple expert system."""
    if state[LEG_1_GROUND_CONTACT_FLAG] == 1 and state[LEG_2_GROUND_CONTACT_FLAG] == 1:
        return np.random.uniform(low=-1.0, high=1.0, size=4)
    if state[KNEE_JOINT_1_SPEED] < -0.5:
        return np.array([0., 0.2, 0., 0.])
    if state[KNEE_JOINT_2_SPEED] < -0.5:
        return np.array([0., 0., 0., 0.2])
    if abs(state[HULL_ANGLE]-pi) > pi/2:
        return np.array([1., 1., 1., 1.])
    return np.array([0., 0., 0., 0.])

In [6]:
my_agent = random_agent
play_game(my_agent, ENV, show=True, verbose=False)

-42314.801448657105