# Open AI Gym

This Cpater is covering the OpenAI Gym API.
We implement randomly behaving Agent

## The anatomy of the agent
Defenitions:
* **Agent**: A person or a thing that takes an active role. In practice, it's some piece of code, 
    which implements some policy. Basically, this policy must decide what action is needed at every time step,
    given our observations.
* **Environment**: Some model of the world, which is external to the agent and has the responsibility of providing us
    with observations and giving us rewards. It changes its state based on our actions.
*  **Episodes**: the agent interactions with the environment is divided into a sequence of steps called episodes. Episodes can be finite, like in a game of chess, or infinite like the Voyager 2 mission.

Implemented in Python for a simplistic situation.
We will define an environment that gives the agent random rewards for a limited number of steps, regardless of the agent's actions. 
This scenario is not very useful, but will allow us to focus on specific methods in both the **environment** and the **agent** classes. 


### Environment Class

In [23]:
import random

class Environment:
    def __init__(self): #Intitialise the Environment method to initialize its internal state
        self.steps_left = 100 #In this case the state is just a counter that limiyts the number of steps the agent is allows to take to interact with the environment
    
    def get_observation(self): #Method to return the current environment's observation to the agent. It is usually implemented as some function of the internal state of the environment.  state:
        return [0.0, 0.0, 0.0] #In our example, the observation vector is always zero, as the environment basically has no interna state

    def get_actions(self): #Method to allow the agent to query the set of actions it can execute.
        #Normally, the set of actions that the agent can execute does not change over time, but some actions can become impossible in different states (for example, not every move is possible in any position of the TicTacToe game).
        return [0, 1] #In our case, there are only two actions that the agent can carry out, encoded with the integers 0 and 1:

    def is_done(self): #Method to indicate the end of the episode.
        return self.steps_left == 0 #TRue if steps_left eq 0

    def action(self, action): #Handles/Responds to the Agent's action and check if the episode is completed
        if self.is_done():
            raise Exception("Game is over")
        self.steps_left -= 1 #Else return a random reward number and decrement the steps_left - in this case the enviromnet ignors the Agent's action
        return random.random()      
        

### Agent Class

In [13]:
#In this example, the agent ingnores the observations received from the environment
#The agent selects actions randomly instead
class Agent:
    def __init__(self):
        self.total_reward = 0.0 #?Intialize total rewards

    def step(self, env): #Execute a step 
        current_obs = env.get_observation() #Requesrt observations from the environment
        actions = env.get_actions() ##Requesrt possible actions from the environment
        reward = env.action(random.choice(actions)) #Select random action and request the environment to execute it (through the action method)
        self.total_reward += reward #Increment total reward by the last step returned reward


### Main body of agent + environment

In [32]:
if __name__ == "__main__":
    env = Environment() # Create and instance of the Environment class 
    agent = Agent()     # Create and instance of the Agent class 

    while not env.is_done(): #Loop of the steps
        agent.step(env) #Agent executes the next step

    print("Total reward got: %.4f" % agent.total_reward)

Total reward got: 49.9068
