# chap2 OpenAI Gym

## The anatomy of the agent

强化学习有以下两个实体：
* 代理(agent): 采取实际行动的对象
* 环境模型(environment): 对于代理来说属于外界的环境，并且给出奖励和提供观察基础

In [6]:
# start with the environment
import random
from typing import List

class Environment:
    
    def __init__(self):
        self.steps_left = 10
    
    def get_observation(self) -> List[float]:
        return [0.0, 0.0, 0.0]

    def get_actions(self) -> List[int]:
        return [0, 1]

    def is_done(self) -> bool:
        return self.steps_left == 0

    def action(self, action: int) -> float:
        if self.is_done():
            raise Exception("Game is over")
        self.steps_left -= 1
        return random.random()

In [7]:
# Look at the agent's part

class Agent:
    
    def __init__(self):
        self.total_reward = 0.0

    def step(self, env: Environment):
        current_obs = env.get_observation()
        actions = env.get_actions()
        reward = env.action(random.choice(actions))
        self.total_reward += reward

In [9]:
# 创建两个类然后运行一个测试
env = Environment()
agent = Agent()
while not env.is_done():
    agent.step(env)

print("Total reward got: %.4f" % agent.total_reward)

Total reward got: 5.6188
