# Agent Implementation Guide

Reinforcement learning agents interact with the environment and are given rewards by the environment based on the actions they take.

ORSuite environments are subclasses of OpenAI Gym environments, and use the same action and observation spaces, which makes it relatively easy to design agents that will interact correctly with a given environment.

(add a bit more introduction here?)

Every environment in ORSuite consists of an action and observation space that are both [OpenAI Gym space](https://github.com/openai/gym/tree/master/gym/spaces) objects. The action space is all possible actions that can be chosen by the agent, and the observation space is all possible states that the environment can be in. Knowing what kind of spaces are used by the environment your agent is trying to interact with will allow you to write an agent that can effectively communicate with the environment.

`box`: an $n$-dimensional continuous feature space with an upper and lower bound for each dimension

`dict`: a dictionary of simpler spaces and labels for those spaces

`discrete`: a discrete space over $n$ integers $\{ 0, 1, ..., n-1 \}$

`multi_binary`: a binary space of size $n$

`multi_discrete`: allows for multiple discrete spaces with a different number of actions in each

`tuple`: a tuple space is a tuple of simpler spaces



Include more specifics of how an agent figures out what kind of environment they're working with here

An agent should have the following functions:





class Agent(object):

`__init__(config)`: initializes any necessary information for the agent (such as episode length, or information about the structure of the environment) stored in the `config` dictionary. Don't currently have a config dictionary, but should we add one? Or is the way we have it now good?

Do we need `update_config`, or is that basically encapsulated by `__init__`?




`update_obs(obs, action, reward, newObs, timestep, info)`: updates any information needed by the agent using the information passed in.
    * `obs`: the state of the system at the previous timestep
    * `action`: the most recent action chosen by the agent
    * `reward`: the reward received by the agent for their most recent action
    * `newObs`: the state of the system at the current timestep
    * `timestep`: the current timestep
    * `info`: a dictionary potentially containing additional information, see specific environment for details
    
proposed new terms:
    * `old_state`:
    * `action`:
    * `reward`:
    * `new_state`:
    * `step`:
    * `info`: 

`update_policy(self, h)`: updates internal policy based upon records.

Is h intended to be the timestep?



`pick_action(state, step)`: given the current state of the environment and timestep, `pick_action` chooses and returns an action from the action space.