## Let’s Gym Together
What is OpenAI gym ? This python library gives us huge number of test environments to work on our RL agent’s algorithms with shared interfaces for writing general algorithms and testing them. Let’s get started just type pip install gym on terminal for easy install, you’ll get some classic environment to start working on your agent. Copy the code below and run it, your environment will get loaded only classic control comes as default. 

In [None]:
import gym
import numpy as np

In [None]:
# 1. It renders instance for 300 timesteps, perform random actions
env = gym.make('Acrobot-v1')
env.reset()

for _ in range(300):
    env.render()
    env.step(env.action_space.sample())
    
env.close()

In [None]:
# 2. To check all env available, uninstalled ones are also shown

envs = gym.envs.registry.all()
len(envs)

When object interacts with environment with an action then step(…) function returns ```observation``` which represents environments state, ```reward``` a float of reward in previous action, ```done``` when its time to reset the environment or goal achieved and ```info``` a dict for debugging, it can be used for learning if it contains raw probabilities of environment’s last state. See how it works. Also, observe how ```observation``` of type ```Space``` is different for different environments.

In [None]:
env = gym.make('MountainCarContinuous-v0') # try for different environements
observation = env.reset()

for t in range(200):
    env.render()
    action = env.action_space.sample()
    observation, reward, done, info = env.step(action)
    if done:
        print("Finished after {} timesteps".format(t+1))
        break

env.close()

What is ```action_space``` in above code? ```action-space``` & ```observation-space``` describes what is the valid format for that particular env to work on with. Just take a look at values returned.

In [None]:
env = gym.make('CartPole-v0')
print(env.action_space) #[Output: ] Discrete(2)
print(env.observation_space) # [Output: ] Box(4,)

env = gym.make('MountainCarContinuous-v0')
print(env.action_space) #[Output: ] Box(1,)
print(env.observation_space) #[Output: ] Box(2,)

Discrete is non-negative possible values, above 0 or 1 are equivalent to left and right movement for CartPole balancing. Box represent n-dim array. These can help in writing general codes for different environments. As we can simply check the bounds ```env.observation_space.high/[low]``` and code them into our general algorithm.

### An Illustration

I’ll recommend after knowing basics of OpenAI’s gym you can install all dependencies of gym and then completely install gym with following commands. Here, we are using python2.x you can also use python3.x just change below commands for it accordingly.

- apt-get install 
   - python-numpy 
   - python-dev 
   - cmake 
   - zlib1g-dev 
   - libjpeg-dev 
   - xvfb 
   - libav-tools 
   - xorg-dev 
   - python-opengl -libboost-all-dev 
   - libsdl2-dev swig 

- sudo pip install 'gym[all]'

Let’s start building our Q-table algorithm, which will try to solve [FrozenLake environment](https://gym.openai.com/envs/FrozenLake8x8-v0/). In this environment aim is to reach the goal, on a frozen lake that might have some holes in it. Here is how surface is depicted by this algorithm.

```
SFFF       (S: starting point, safe)
FHFH       (F: frozen surface, safe)
FFFH       (H: hole, fall to your doom)
HFFG       (G: goal, where the frisbee is located)
```

Q table contains state-action pairs mapping to reward. So, we will construct an array which maps different state and actions to reward values during run of algorithm. Its dimension will clearly |states|x|actions|. Let’s write it in code for Q-learning Algorithm.

In [None]:
# 1. Load Environment and Q-table structure
env = gym.make('FrozenLake8x8-v0')

# env.observation.n, env.action_space.n gives number of states and action in env loaded
Q = np.zeros([env.observation_space.n,env.action_space.n])

In [None]:
# 2. Parameters of Q-leanring
eta = .628
gma = .9
epis = 500
rev_list = [] # rewards per episode calculate

In [None]:
# 3. Q-learning Algorithm
for i in range(epis):
    # Reset environment
    s = env.reset()
    rAll = 0
    d = False
    j = 0
    #The Q-Table learning algorithm
    while j < 99:
        env.render()
        j+=1
        # Choose action from Q table
        a = np.argmax(Q[s,:] + np.random.randn(1,env.action_space.n)*(1./(i+1)))
        #Get new state & reward from environment
        s1,r,d,_ = env.step(a)
        #Update Q-Table with new knowledge
        Q[s,a] = Q[s,a] + eta*(r + gma*np.max(Q[s1,:]) - Q[s,a])
        rAll += r
        s = s1
        if d == True:
            break
    rev_list.append(rAll)
    env.render()
    env.close()
    
print("Reward Sum on all episodes " + str(sum(rev_list)/epis))
print("Final Values Q-Table")
print(Q)

If you are interested in simulation of agent to find the solution through the environment write this snippet instead of Q-learning algorithm.

<p align="center">
<img src="https://miro.medium.com/max/650/1*S6CG3jyp5rGxMUGw_Bqr3Q.png"/>
<br />
    <i>Frozen Lake Environment’s Visualization & Below code is for its simulation.</i>
</p>

In [None]:
# Reset environment
s = env.reset()
d = False
# The Q-Table learning algorithm
while d != True:
    env.render()
    # Choose action from Q table
    a = np.argmax(Q[s,:] + np.random.randn(1,env.action_space.n)*(1./(i+1)))
    #Get new state & reward from environment
    s1,r,d,_ = env.step(a)
    #Update Q-Table with new knowledge
    Q[s,a] = Q[s,a] + eta*(r + gma*np.max(Q[s1,:]) - Q[s,a])
    s = s1
# Code will stop at d == True, and render one state before it

But do remember even with common interface the code complexity will be different for different environments. In above environment we only had a simple 64 state environment only with few actions only to handle. We were able to store them in two dimensional array for reward mapping very easily. Now, Let’s consider more complicated environment case like *Atari envs* and look at the approach that is needed.

In [None]:
env = gym.make("Breakout-v0")

# action_space
print(env.action_space.n)

# env.get_action_meanings
print(env.env.get_action_meanings())

# env.observation()
print(env.observation_space)

```observation_space``` is needed to be represented by 210x160x3 tensor which makes our Q-table even more complicated. Also, each action is repeatedly performed for a duration of k frames, where k is uniformly sampled from {2,3,4}. With 33,600 pixels in RGB channels with values ranging from 0–255 the environment clearly has become over complicated simple Q-learning approach can’t be used here. Deep learning with its CNN architecture is the solution for this problem and topic for follow up of this introductory article.

### Conclusion

Now, with the above tutorial you have the basic knowledge about the gym and all you need to get started with it. Gym is also TensorFlow compatible but I haven’t used it to keep the tutorial simple. After trying out gym you must get started with [baselines](https://github.com/openai/baselines) for good implementations of RL algorithms to compare your implementations. To see all the OpenAI tools check out their [github page](https://github.com/openai). RL is an expanding fields with applications in huge number of domains and it will play an important role in future AI breakthroughs. Thanks for reading!!