Skip to content
Matheus Vieira Portela edited this page Nov 29, 2016 · 3 revisions

In It's simple, we kill the Pac-Man, we understood how to define a new ghost agent. In fact, it was pretty useless since the only action it takes is going North. Now, let's create a more complex agent, one that uses a learning algorithm to select the best possible action for the given state.

As in the previous tutorial, we define a class in the simulation's agents.py file that implements learn and act. However, instead of blindly selection an action, say we want to use SARSA learning algorithm but it is not implemented in learning.py. The first step is implementing it:

# multiagentrl/learning.py
class SARSA(BaseLearningAlgorithm):
    def __init__(self):
        self.previous_state = None
        self.previous_action = None

    def learn(self, state, action, reward):
        # Incorporate learning from received state, action and reward
        pass

    def act(self, state):
        # Select action from current policy
        return action

Now, we need to create a learning agent in the simulation's agents.py:

# examples/mymodule/agents.py
from multiagentrl import core
from multiagentrl import learning

class SARSAAgent(core.BaseControllerAgent):
    def __init__(self, agent_id, ally_ids, enemy_ids):
        super(SARSAAgent, self).__init__(agent_id)
        self.learning = learning.SARSA()
        self.exploration = exploration.EGreedy(exploration_rate=0.1)

    def start_game(self):
        pass

    def finish_game(self):
        pass

    def learn(self, state, action, reward):
        self.learning.learn(state, action, reward)

    def act(self, state, legal_actions, explore):
        action = self.learning.act(state)

        if explore:
            action = self.exploration.explore(action, legal_actions)

        return action

Bang! You just created an agent that learns using SARSA. Now, you only have to update the adapter.py, as in It's simple, we kill the Pac-Man, to make sure it sends the SARSAAgent class when starting the simulation.

Clone this wiki locally