# Building a simple agent with TextWorld
This tutorial outlines how to build an agent that learns how to play __choice-based__ text-based games generated with TextWorld.

### Learning challenges
Training an agent such that it can learn how to play text-based games is not trivial. Among other challenges, we have to deal with

1. a combinatorial action space (that grows w.r.t. vocabulary)
2. a really sparse reward signal.

To ease the learning process, TextWorld offers control over what information is availble to the agent during training.    

#### Admissible commands
We can provide the agent with the list of __valid__ actions that can be performed at every game state. This is done by calling `env.activate_state_tracking()` before `env.reset()`.

_*Only available for games generated with TextWorld._ 

#### Intermediate reward
We can compute an intermediate reward for the agent. This is done by calling `env.compute_intermediate_reward()` before `env.reset()`. The intermediate reward can either be:
- __-1__: last action needs to be undone before resuming the quest
-  __0__: last action didn't affect the quest
-  __1__: last action brought us closer to the goal

_*Only available for games generated with TextWorld._ 

## Test games
We handcrafted 6 incremental games using the following world:
```
                     Bathroom
                        +
                        |
                        +
    Bedroom +-(d1)-+ Kitchen +--(d2)--+ Backyard
      (P)               +                  +
                        |                  |
                        +                  +
                   Living Room           Garden
```
* `games/baby.ulx`: __Escape__ the bedroom (5 actions).
* `games/short.ulx`: __Escape__ + __open d2__ (6 actions). 
* `games/medium.ulx`: __Escape__ + __open d2__ + __take carrot__ found in the garden (9 actions).
* `games/long.ulx`: __Escape__ + __open d2__ + __take carrot__ + __put carrot__ on the stove in the kitchen (12 actions).
* `games/last.ulx`: Same as the __long__ version but only the last step is described in the objective.
* `games/human.ulx`: Same as the __long__ but the objective is not provided. One must search for a clue to figure out what to do. Hint: one should look for a note!

_* NB: Agent can lose the game if it eats the carrot instead of putting it on the stove._ 

## Building the random baseline
Let's start with building an agent that simply selects a valid command at random.

In [None]:
import numpy as np
import textworld


class RandomAgent(textworld.Agent):
    """ Agent that randomly selects commands from the admissible ones. """
    def __init__(self, seed=1234):
        self.seed = seed
        self.rng = np.random.RandomState(self.seed)

    def reset(self, env):
        # Activate state tracking in order to get the admissible commands.
        env.activate_state_tracking()
        env.compute_intermediate_reward()  # Needed to detect if a game is lost.

    def act(self, game_state, reward, done):
        return self.rng.choice(game_state.admissible_commands)


## Test function
Let's write simple function to test our agent on a test game (i.e. `./games/long.ulx`).

In [None]:
def test_agent(agent, game, max_step=500, nb_episodes=10):
    env = textworld.start(game)  # Start the game.
    print(game.split("/")[-1], end="")
    
    # Collect some statistics: nb_steps, final reward.
    avg_moves, avg_scores = [], []
    for no_episode in range(nb_episodes):
        agent.reset(env)          # Tell the agent a new episode is starting.
        game_state = env.reset()  # Start new episode.

        reward = 0
        done = False
        for no_step in range(max_step):
            command = agent.act(game_state, reward, done)
            game_state, reward, done = env.step(command)

            if done:
                break
                
        # print("Done after {} steps. Score {}/1.".format(game_state.nb_moves, game_state.score))
        print(".", end="")
        avg_moves.append(game_state.nb_moves)
        avg_scores.append(game_state.score)

    env.close()
    print("  \tavg. steps: {:5.1f}; avg. score: {:4.1f} / 1.".format(np.mean(avg_moves), np.mean(avg_scores)))

#### Testing the random agent

In [None]:
test_agent(RandomAgent(), game="./games/baby.ulx")
test_agent(RandomAgent(), game="./games/short.ulx")
test_agent(RandomAgent(), game="./games/medium.ulx")
test_agent(RandomAgent(), game="./games/long.ulx")

#### Running the oracle agent on the test games

In [None]:
test_agent(textworld.agents.WalkthroughAgent(), game="./games/baby.ulx", nb_episodes=1)
test_agent(textworld.agents.WalkthroughAgent(), game="./games/short.ulx", nb_episodes=1)
test_agent(textworld.agents.WalkthroughAgent(), game="./games/medium.ulx", nb_episodes=1)
test_agent(textworld.agents.WalkthroughAgent(), game="./games/long.ulx", nb_episodes=1)
test_agent(textworld.agents.WalkthroughAgent(), game="./games/last.ulx", nb_episodes=1)
test_agent(textworld.agents.WalkthroughAgent(), game="./games/human.ulx", nb_episodes=1)

## Neural agent
TextWorld allows anyone to generate text-based games with varying complexity. This allows us to train a neural reinforcement learning agent that doesn't just memorize the steps to solving a game, but generalizes to different games.

### Curriculum learning
It would make more sense to __avoid training on the test games__ and rather train the agent on a different set of games, then report the agent's performance on the test games.

One can generate several sets of games having different level of difficulty. Then, during training, we start by playing the easier games and gradually move toward harder games.

#### Generate games for training
To generate random games for training the agent, we can use the script `tw_make` (shipped with TextWorld). The main arguments for the script are:
* `--world-size`: number of rooms in the game.
* `--nb-objects`: number of objects in the game.
* `--quest-length`: number of steps needed to solve the game.
* `--seed`: seed controlling the game generation process.
* `--theme`: specify which text grammar to use (default `house`).
* `--output`: folder where to save generated games.

##### Example - Generating 100 training games in parallel using bash:
```bash
seq 1 100 | xargs -n1 -P4 tw-make custom --world-size 2 --nb-objects 10 --quest-length 3 --output ./obj_10_qlen_3_room_2/ --seed
```

## Build your own agent

The TextWorld environment provides a number of properties for your agent to work on.

### `GameState` properties
* `description`: Text description of the current room.
* `inventory`: Text description of the inventory.
* `feedback`: Text feedback of the parser in response to the last text command.
* `objective`: Text instructions to follow in order to win the game.
* `admissible_commands`: A list of candidate text commands.
* `reward`: 1.0 if the agent wins the game at current step; -1.0 if it loses the game; 0.0 otherwise.
* `intermediate_reward`: {-1.0, 0.0, 1.0}, the oracle reward as described [above](#Oracle-reward).
* `done`: True, if the agent either won or lost the game; False, otherwise.


### Agent template

In [None]:
import textworld


class NeuralAgent(textworld.Agent):
    def __init__(self):
        # Initialize your agent.
        pass

    def reset(self, env):
        # For the purpose of this tutorial we need the two lines below.
        env.activate_state_tracking()  # Needed to get the admissible commands.
        env.compute_intermediate_reward()  # Needed to detect if a game is lost and to get intermediate reward.

    def act(self, game_state, reward, done):
        # Given the current game_state return the next text command.
        # Perfom inference. *Do not update model's parameters.*
        return game_state.admissible_commands[0]
    
    def finish(self, game_state, reward, done):
        # The game has finished. If done is True, agent won/lost the game; otherwise step limit was reached.
        # This is where you should *update your model's parameters*.
        pass

### Train function

In [None]:
import os, glob
import numpy as np
import textworld


def train_agent(agent, filenames_pattern, max_step=400, max_epoch=10):
    rng = np.random.RandomState(1234)
    try:
        msg = "# At any point you can hit Ctrl + C (stop the kernel) to break out of training early."
        print(msg)
        print("Training...")
        train_games = glob.glob(filenames_pattern)
    
        for epoch in range(max_epoch):
            rng.shuffle(train_games)
            for game in train_games:
                game_name = os.path.basename(game)
                env = textworld.start(game)
                agent.reset(env)  # tells the agent a new run is starting.
                game_state = env.reset()  # Start new run.

                total_reward = 0
                reward = 0
                done = False
                for t in range(max_step):
                    command = agent.act(game_state, reward, done)
                    game_state, reward, done = env.step(command)
                    total_reward += reward

                    if done:
                        break

                # Tell the agent the run is done.
                agent.finish(game_state, reward, done)

                msg = "#{:2d}. {}:\t {:3d} steps; score: {:2d}"
                msg = msg.format(epoch, game_name, game_state.nb_moves, total_reward)
                print(msg)
                env.close()

    # At any point you can hit Ctrl + C to break out of training early.
    except KeyboardInterrupt:
        print('--------------------------------------------\n')
        print('Exiting from training early\n')


In [None]:
agent = NeuralAgent()  # Instantiate the agent.

#### Train the agent on generated games for how long as you want (or use early stopping)

In [None]:
! seq 1 100 | xargs -n1 -P4 tw-make custom --world-size 2 --nb-objects 10 --quest-length 3 --output ./obj_10_qlen_3_room_2/ --seed

In [None]:
train_agent(agent, "obj_10_qlen_3_room_2/*ulx")

In [None]:
test_agent(agent, game="./games/techfest2018_baby.ulx")
#test_agent(agent, game="./games/techfest2018_short.ulx")
#test_agent(agent, game="./games/techfest2018_medium.ulx")
#test_agent(agent, game="./games/techfest2018_long.ulx")


## Papers about RL applied to text-based games
* [Language Understanding for Text-based games using Deep Reinforcement Learning][narasimhan_et_al_2015]
* [Learning How Not to Act in Text-based Games][haroush_et_al_2017]
* [Deep Reinforcement Learning with a Natural Language Action Space][he_et_al_2015]
* [What can you do with a rock? Affordance extraction via word embeddings][fulda_et_al_2017]
* [Text-based adventures of the Golovin AI Agent][kostka_et_al_2017]
* [Using reinforcement learning to learn how to play text-based games][zelinka_2018]

[narasimhan_et_al_2015]: https://arxiv.org/abs/1506.08941
[haroush_et_al_2017]: https://openreview.net/pdf?id=B1-tVX1Pz
[he_et_al_2015]: https://arxiv.org/abs/1511.04636
[fulda_et_al_2017]: https://arxiv.org/abs/1703.03429
[kostka_et_al_2017]: https://arxiv.org/abs/1705.05637
[zelinka_2018]: https://arxiv.org/abs/1801.01999