# Lets "standardize" the Flower Filed

From [https://gym.openai.com/](https://gym.openai.com/): 
"Gym is a toolkit for developing and comparing reinforcement learning algorithms."

Gym has expectations on how a environment for reinforcement learning should behave. For example the following functions must be implemented:
- step
- reset

and the following member variables must be defined:
- action_space
- observation_space

Before we adapt our FlowerField in a way that makes it compatible with gym, lets talk about the action_space and the observation_space.

### Spaces

Gym expects that both the actions (the thing that the agent does to interact with the environment) and the observations (the observable state of the environment) are formulated in a special way: A gym.space.
There are different kind of spaces. Lets splay with some examples to get a feeling:

In [1]:
import numpy as np
import gym
gym.logger.set_level(40)
from gym.spaces import Discrete, Box, Dict, Tuple, MultiBinary, MultiDiscrete


d = Discrete(3)
d.sample()

2

In [2]:
b = Box(low=np.array([0, 100]), high=np.array([41, 102]))
b.sample()

array([ 27.685123, 100.66234 ], dtype=float32)

There are more kind of Spaces and it is possible to combine them to express more complicated settings.

## FlowerField the Gym way

In [3]:
from course_002_FlowerFieldEnv import FlowerFiledEnv
from stable_baselines3.common.env_checker import check_env


env = FlowerFiledEnv()
check_env(env)

In [4]:
from course_002_FlowerFieldEnv import FlowerFiledEnv
from course_001_Bumblebee import Bumblebee


env = FlowerFiledEnv()
bb = Bumblebee()

n_episodes = 5
for ep in range(1, n_episodes+1):
    obs = env.reset()
    done = False
    score = 0  
    stepcount = 0
    while not done:
        # env.render()
        action = bb.choose_flower() # Bumblebee Acting
        obs, reward, done, info = env.step(action)
        stepcount += 1
        bb.update_memory(action, reward) # Bumblebee Learning
        score+=reward
    print(f'Episode: {ep}, Score: {score}, Average Reward: {score/stepcount}, Stepcount: {stepcount}')
env.close()

Episode: 1, Score: 468.808657909591, Average Reward: 9.37617315819182, Stepcount: 50
Episode: 2, Score: 480.28566290308447, Average Reward: 9.605713258061689, Stepcount: 50
Episode: 3, Score: 494.1300835925612, Average Reward: 9.882601671851225, Stepcount: 50
Episode: 4, Score: 488.28093444088734, Average Reward: 9.765618688817746, Stepcount: 50
Episode: 5, Score: 462.73839367964166, Average Reward: 9.254767873592833, Stepcount: 50


## Now why was it a good idea again to use this clunky gym style?

Lets have a look at the next chapter :)