# Introduction To Jupyter Notebooks

This is a notebook. It is an environment that mixes text and executable code. It is running in your browser and sends python code chunks to a backend server to execute. The backend server may be on your own computer, or on the cloud (in the case where you might be running on Google Colab).

The environment makes it easy to write code (python) in small chunks and test those chunks separately. This is in contrast to writing code in a file and running the entire file. This makes it easy to prototype because you can go back and change small chunks of code and re-run those chunks.

**1.** Notebooks are made up of *cells*. There are two types of cells. This piece of text you are reading now is a *text cell*. It is formatted using [Markdown](https://commonmark.org/), a lightweight alternative to HTML. Text cells are great ways to document your code. If you double-click this text you can edit it.

The cell below this one is a *code cell*. The code in a code cell can be run by pushing the triangle "play" button. When you push the play button, the code is sent to a server running a python interpreter. The python interpreter executes the code and sends the result to be displayed below the code cell.

Try running the code cell below here now.

In [None]:
print("hello world")

You should see ```hello world``` printed just under the cell.

**2.** In the next code cell, we will create and manipulate some variables. Once a variable is created, it is accessible by all other cells. That is because the variables are now part of the python interpreter's state.

Go ahead and run the cell just under this.

In [None]:
x = 10
y = x / 2

**3.** No output this time because variable assignments don't return values or print. How do we know it worked? Let's use another cell to check the values of ```x``` and ```y```. Run the next two cells.

In [None]:
x

In [None]:
y

In [None]:
x = x + 2

If you need to change some code, you can change cells and run them again. Go back up to code cell under #2 and change the first line of the cell to ```x = 20```. Then run the code cells under #2 and #3 again.

What happened? After you changed cell under #2 and ran it, the values of ```x``` and ```y``` changed.

**4.** Since you can run cells in any order, you need to be careful. Run the cell below.

In [None]:
y = y * 10

In [None]:
y

What do you think is going to happen if we re-run the cells under #4, above?

You should see that ```y``` is now equal to ```100.0```. Even though cell #3 is in the notebook before cell #4, the python interpreter's internal state has changed and the cells respond accordingly.

You might have noticed that after you run a cell, a number appears in the place of the run buttom. This tells you what order the cells have been run in. So now the execution number for cell #3 is larger than the execution number for cell #4. This tells you that cell #3 may have been affected by anything that ran in cell #4.

**6.** Let's get a bit fancier. Run the next cell.

In [None]:
def my_function(x):
  return x ** 2

**7.** You just created a function that takes any value and multiplies to by 2 (very fancy!). Like variables, this function is now part of the python interpreter's internal state. You can call the function from other cells, like the one below.

In [None]:
my_function(3)

You should have gotten a six.

But wait, I meant to make the function *square* the number. We'd better fix that. Go back up to cell #6 and change the code so that it reads:
```
def my_function(x):
  return x ** 2
```
(Note the double asterisks, which is the python operator raise x to exponentiate.)

After you have done that run the cell under #6 again and then run the cell under #7 again. The cell under #7 should now output 9 (which is 3^2).

A really good coding practice is to write functions in cells and then test them in other cells. This is called *unit testing*. You can work on a function until you know it is right and then you can go on to the next part of a program and use that earlier function to make something more sophisticated (and test that too).



**8.** You can create new code cells. Depending on whether you are running jupyter lab or Google Colab, the interface will be different. Most likely there is a button on or near the menu bar with a "+".

Create a new code cell just below here and write a function called `is_odd(x)` that determines if x is odd.

In [None]:
def is_odd(x):
  return x % 2 == 1

In [None]:
is_odd(101)

If you ran your new cell, then the python interpreter should now know your function. If you ran the cell and you got an interpreter error, you can fix your python syntax and run it again until you don't get an error.

**9.** We should probably make sure your `is_odd` function is correct. Create a new code cell just below and try out some test cases like `is_odd(0)`, and `is_odd(1)`, `is_odd(100)`, and `is_odd(101)`. Run each of these statements and verify that the outputs are correct.

**10.** Here are some other useful things you should know about notebooks.

If you want to get rid of all the variables and function in the python interpreter's internal state, use the menu options at the top to "Restart Kernel" or "Restart Runtime". These are under either the "Kernel" or "Runtime" menus. This kills the python interpreter and restarts it in a fresh state.

You can run more than one cell at a time. There will be menu options for "Run all", "Run all above selected cell".

It is generally a good idea before you are done to make sure your notebook runs from start to finish in order. Make sure to always do one last "Run all". If you've been running cells in non-linear order, which is often the case when you are debugging something, then you might have introduced something in a later cell that changes how an earlier cell works.

If you have a cell that gets stuck in an infinite loop or is running too long, there is an option to "Interrupt kernel" or "interrupt execution". This is the equivalent of CTRL-C.

# Text Adventure Games

*Text Adventure Games* are games in which the player interacts with a rich world only through text. Text adventure games predate computers with graphics. However, in many ways they are more complex than conventional video games because they can involve complicated interactions (e.g., "build a rope bridge") that require a fair amount of imagination. Indeed, text adventure games are used as [research testbeds](https://arxiv.org/abs/1909.05398) for natural language processing agents.

The canonical text adventure game is [Zork](https://en.wikipedia.org/wiki/Zork), in which the player discover an abandoned underworld realm full of treasure. You can find online playable versions.

A text game is made up of individual locations--also called "rooms", though they need not be indoor enclosed spaces as the term might imply. The agent can move between rooms and interact with objects by typing in short commands like "move north" and "take lamp".

# TextWorld-Express

[TextWorld-Express](https://github.com/cognitiveailab/TextWorldExpress) is a lightweight python package that emulates very simple text games.

For simplicity, TextWorld-Express worlds are laid out on grids, with the ability to move only between certain rooms. In some cases, there may be doors between rooms that need to be opened before moving.

TextWorld-Express supports several different types of games, with their own win conditions. We will focus on two games in particular:
- Coin Game: a simple game in which the agent must search for and pick up a single coin.
- Map Reader: a game in which the agent must find a key and return it to a box at the starting location.

Install the Textworld-Express package

In [None]:
!pip install textworld-express
!pip install gymnasium

Load the package into the interpreter

In [None]:
from textworld_express import TextWorldExpressEnv
import random

Get an instance of a TextWorld-Express environment. An *environment* is an API that allows an agent to get observations and to perform actions.

In [None]:
env = TextWorldExpressEnv(envStepLimit=100)

This is an empty environment. What should be in the environment? Rooms, doors, objects, coins, other things? The following tells the TextWorld-Express package what the game world should contain, and what the "rules" of the game should be.

In this case, we will load the Coin Game rules. In this game, the agent must find a coin somewhere in the world and pick it up. We are also specifying that there should be five rooms (locations), and there should be doors between each room.

In [None]:
env.load(gameName="coin", gameParams="numLocations=5,includeDoors=1,numDistractorItems=3")

Next we reset the environment. This is always done before starting to interact with the environment. We will give it a seed, which will initialize a random number generator. Each time the seed is changed, the rooms, doors, and objects will be different.

`reset()` returns two values. The first is an "observation", which is what the agent can directly observe. In this case it is the text that describes the room that the agent is in. The second is a dictionary with additional information about the environment.

In [None]:
obs, infos = env.reset(seed=3, gameFold="train", generateGoldPath=True)
print('obs:', obs)
print('infos:', infos)

While in a text game you can type anything you want as an action, there are only a few commands that will actually do anything useful. These are called the "valid actions". You can see what they are like this:

In [None]:
validActions = infos['validActions']
print(validActions)

Let's choose a random action and execute it.

In [None]:
obs, reward, done, infos = env.step('move south')
print(obs)

The `step()` function performs the requested action. The function returns several values:
- `observation`: the text response. If the action is a move action, the text response will be the description of the new room. Otherwise, it will be text that describes your action and the outcome. Some actions, like closing a door that is already closed, or trying to go through a door that is not open will result in a message that that action could not be performed.

- `reward`: reward is a numerical value that indicates whether that action is good (positive) or bad (negative). Most actions produce 0 reward because they are inherently neither good or bad. This is helpful for agents that learn.

- `done`: a boolean indicating whether the game has terminated or not.

- `infos`: a dictionary of additional information about the environment
  - `observation`: the agent's last observation

  - `look`: what the agent will observe if they executed the 'look' action (not always the same as the observation)

  - `inventory`: what the agent has in inventory

  - `validActions`: the actions the agent can take, regardless of whether they will be successful

  - `reward`: the last reward the agent received

  - `done`: whether the game has terminated
  
  - There are other elements as well that you will unlikely need.

In [None]:
infos['look']

When the environment is generated, the "gold path" is also generated. This is a sequence of actions that is guaranteed to complete the game successfully. The gold path is not guaranteed to be optimal. It is secret information. An AI agent should never access this information. It is helpful for debugging.

In [None]:
gold = env.getGoldActionSequence()
print(gold)

# Interactive Mode

Run this to play the game worlds yourself. Actions are:
- `look around`
- `move <north, south, east, west>`
- `open door to  <north, south, east, west>`
- `take <item>` # wht are the items here

Type `exit` to quit.

In [None]:
# Reset the environment
obs, infos = env.reset(seed=3, gameFold="train")
cmd = '' # the user's command
# Iterate until the user enters 'exit'
while cmd != 'exit':
  # print the state
  print(obs, '\n')
  # get the user's next command
  cmd = input(">> ")
  if cmd != 'exit':
    # execute the user's command
    obs, reward, done, infos = env.step(cmd)

# Test Environment

In this assignment, we will create an agent whose job it is to visit every location in the world at least once. We will build on top of CoinGame, so picking up the coin will also terminate the game.

**So if the agent picks up the coin before visiting all the locations in the world, it will not score well. Therefore, don't pick up the coin!!.**

## Some more imports

In [None]:
import re
import inspect
import gymnasium

## Helper Functions

Parse the name of the location out of the observation

In [None]:
#export
def get_location_from_obs(obs):
  match = re.search(r'You are in the ([a-zA-Z0-9 ]+).', obs)
  if match is not None and len(match.groups()) >= 1:
    return match.group(1)
  else:
    return None

The agent always starts in (0, 0) and the environment is laid out in a grid-like fashion. Thus, we can compute the coordinates of the agent after actions such as 'north', 'south', 'east', or 'west'.

In [None]:
# coordinates are a tuple (x, y)
# direction is a string, 'north', 'south', 'east', 'west'
def change_coordinates(coordinates, direction):
    if direction == 'north':
      return (coordinates[0], coordinates[1]+1)
    elif direction == 'south':
      return (coordinates[0], coordinates[1]-1)
    elif direction == 'east':
      return (coordinates[0]+1, coordinates[1])
    elif direction == 'west':
      return (coordinates[0]-1, coordinates[1])
    return coordinates

Directions of travel are always north, south, east, west. Sometimes you might want to know which way is the opposite direction of travel.

In [None]:
#export
def reverse_direction(dir):
  if dir == "north":
    return "south"
  elif dir == "south":
    return "north"
  elif dir == "west":
    return "east"
  elif dir == "east":
    return "west"
  else:
    return None

## The Test Environment Class

This creates a new, special environment just for this homework. The environment keeps track of how many locations are in the world and which locations the agent has been to. The game terminates when the agent has visited all the locations.

In [None]:
class TestTextWorldExpressEnv(TextWorldExpressEnv):

  def __init__(self, serverPath=None, envStepLimit=100):
    # Call the super constructor
    super().__init__(serverPath, envStepLimit)
    # Store the locations the agent has visited in a set of unique strings
    self._visited_locations = set()
    # The number of locations in the environment
    self._num_locations = 0

  ### Override for the environment load function
  def load(self, gameName, gameParams):
    # Call the super method
    super().load(gameName, gameParams)
    # Get the number of locations requested
    match = re.search(r'numLocations\=([0-9]+)', gameParams)
    if match is not None and len(match.groups()) >= 1:
      self._num_locations = int(match.group(1))

  ### Override for the environment reset function
  def reset(self, seed=None, gameFold=None, gameName=None, gameParams=None, generateGoldPath=False):
    # Call the super method
    obs, infos = super().reset(seed, gameFold, gameName, gameParams, generateGoldPath)
    # get the beginning location
    current_location = get_location_from_obs(obs)
    # reset visited locations and initialize it with current location
    self._visited_locations = set([current_location])
    return obs, infos

  def step(self, action:str):
    obs, reward, done, infos = super().step(action)
    current_location = get_location_from_obs(obs)
    # Add current location to visited locations
    if current_location is not None:
      self._visited_locations.add(current_location)
    # If the max number of locations have been visited, then set done = True, the game is over
    if len(self._visited_locations) >= self._num_locations:
      done = True
    return obs, reward, done, infos


Register the new game type with the Gymnasium infrastructure

In [None]:
gymnasium.register(id='TextWorldExpress-TestTextWorldExpressEnv-v0',
                   entry_point='__main__:TestTextWorldExpressEnv')

Set up a test environment. It will be a CoinGame with doors and distractor items.

In [None]:
ENV = TestTextWorldExpressEnv(envStepLimit=100)
SEED = 3 # you can change this
GAME_TYPE = "coin" # do not change this
GAME_PARAMS = "numLocations=5,includeDoors=1,numDistractorItems=3" # you can change this
ENV.load(gameName=GAME_TYPE, gameParams=GAME_PARAMS)

Resetting the environment makes sure it is ready to go.

In [None]:
obs, infos = ENV.reset(seed = SEED, gameFold="train", generateGoldPath=True)
print(obs)
print('\nLocations the agent has visited:', ENV._visited_locations)

# Implementing the Agent (TODO!!)

In this section, you will implement the Agent, whose job is to explore the world.

Add your code in the between the sections that say:
```
### YOUR CODE BELOW HERE ###

### YOUR CODE ABOVE HERE ###
```

## Main Objective
Your agent’s goal is to visit **every location in the world**

- The world layout and number of rooms may vary.
- The agent should work for any layout.
- **Do not pick up the coin before visiting all locations (picking it up ends the game).**

## What You Need to Do
You will complete the `MyAgent` class, focusing mainly on the `choose_action()` method:

- **`choose_action(self, infos)`**

  - possible directions: `[north, south, east, west]`

  - This method must choose a single action (e.g., `'move north'`, `'open door to north'`, etc.) and return it as a string.

  - You should use `infos['look']` and `infos['validActions']` to decide what to do.

  - This is where you’ll implement the logic to explore the world systematically.
  - There will be a `run_agent()` loop, which will call `choose_action()` every iteration. If you need to keep track of any information from iteration to iteration, you may find it useful to create new class member variables in the constructor.

## Helper Methods

- `__init__`: you may modify this method to set up additional tracking variables that you can use in your implementation.

- The `step` and `reset` methods are already implemented for you.

  - `step`: Executes the chosen action in the environment and updates the agent’s internal state (such as current location and how to return to previous rooms).

  - `reset`: Restarts the environment and reinitializes the agent’s state so it can begin exploring from the start again.


**Hints:**
- Keep track of what rooms you have been to.
- Keep track of what directions you've tried from every room.
- Keep track of which doors you've opened.
- Open doors before trying to move in a direction.
- If you've exhausted all directions from any given room, remember how you got. there and go to another room that hasn't been exhausted.
- You can create whatever member variables you need, and create helper function either inside the class or outside in separate cells.
- You cannot call `step()` from `choose_action()`.
- You should not be accessing `env._visited_locations` or `env._num_locations`. You can keep track of the former in your agent. Your solution should be written so that it does not need to know the latter.
- You should not make any changes to the `TestTextWorldExpressEnv` class

In [None]:
from typing import overload
#export
class MyAgent:

  ### Constructor
  def __init__(self, env):
    self.coordinates = (0, 0) # initialize the agent's coordinates to (x, y) = (0, 0)
    self.location = None # the name of the agent's location as a string e.g., 'backyard'
    self.env = env
    self.doors = {} # how many doors opened per location
    self.exploring = {} # how many directions moved per location
    self.parent_directions = {} # what direction to go to get back to parent
    self.previous_location = None

    ### (OPTIONAL) initialize additional member variables below here:
    ### YOUR CODE BELOW HERE ###
    self.visited_rooms = set()
    self.last_taken ='move east'
    ### YOUR CODE ABOVE HERE ###


  ### This function should "think" about which action to perform and return a string
  ### The string should be one of the infos['validActions'].
  ### You can do whatever you want to make this happen.
  ### Do not calls self.step() from this function.
  def choose_action(self, infos):
    observation = infos['look']
    valid_actions = infos['validActions']
    action = random.choice(infos['validActions'])
    ### YOUR CODE BELOW HERE ###
    # bad cause traversing list that I am currently modifying in place
    true_valid_actions = []
    for action in valid_actions:
      if "close door" not in action and "coin" not in action:
        true_valid_actions.append(action)

    # get last word in direction to see if we can open a door to that direction
    last_direction_taken = self.last_taken.split()[-1]

    # want to continue prev direction
    if self.last_taken in true_valid_actions:
      action = self.last_taken
    
    
    elif "open door to " + last_direction_taken in true_valid_actions:
      action = "open door to " + last_direction_taken
    else:
      # pick another direction
      action = random.choice(true_valid_actions)
      self.last_taken = action

    ### YOUR CODE ABOVE HERE ###
    return action

  ### Tell the agent to execute an action.
  def step(self, action):
    # Run the action in the environment
    obs, reward, done, infos = self.env.step(action)
    self.previous_location = self.location
    self.location = get_location_from_obs(infos['look'])
    if 'move' in action and self.location != self.previous_location and self.location not in self.parent_directions:
      direction_traveled = action[action.rfind(' '):].strip()
      self.parent_directions[self.location] = reverse_direction(direction_traveled)
    return obs, reward, done, infos

  ### Reset the agent and the world
  def reset(self, seed = 3):
    obs, infos = self.env.reset(gameFold="train", generateGoldPath=True)
    self.location = get_location_from_obs(infos['look'])
    return obs, infos

Let's test your agent with a single action. You can run this over and over again to see what happens. If you haven't implemented your own `choose_action`, you will execute a random action each time.

In [None]:
agent = MyAgent(ENV)
observation, infos = agent.reset(SEED)
print(observation)

In [None]:
action = agent.choose_action(infos)
print("action:", action, '\n')
observation, reward, done, infos = agent.step(action)
print(observation)
print("\nVisited:", ENV._visited_locations)


This function will run a loop that asks the agent to choose and action and then executes it. It will terminate under the following conditions: (a) the agent has visited all rooms, (b) the agent has picked up the coin, (c) it has reached a maximum number of steps.

The function takes four arguments:
- `num_locations`: the number of rooms in the world.
- `seed`: controls the random generation of the world for repeatability.
- `timeout`: maximum number of steps
- `verbose`: use print statements when True

**This function returns true only if the agent has visited all the rooms.**

In [None]:
def run_agent(num_locations = 5, seed = 3, timeout = 100, verbose = True):
  # Create a new environment
  env = TestTextWorldExpressEnv(envStepLimit=100)
  # load the environment with the correct number of locations
  params = "numLocations={loc},includeDoors=1,numDistractorItems=3".format(loc = num_locations)
  env.load(gameName="coin", gameParams=params)
  # Create a new agent
  agent = MyAgent(env = env)
  # Reset the agent and the world
  observation, infos = agent.reset(seed = seed)
  if verbose:
    print(observation)
  done = False # is the environment in a terminal state?
  num_steps = 0 # keep track of steps taken
  # The execution loop, until the environment is terminal or max steps reached
  while not done and num_steps < timeout:
    # Agent picks an action
    action = agent.choose_action(infos)
    if verbose:
      print("\naction:", action, '\n')
    # Agent executes the action
    observation, reward, done, infos = agent.step(action)
    if verbose:
      print(observation)
      print('\nNum locations visited:', len(env._visited_locations))
    # Update step count
    num_steps = num_steps + 1
  ## For debugging purposes:
  locations_visited = set(env._visited_locations)
  # Assert: We have visited all rooms, reached max number of steps, or picked up the coin
  # Return true only if we have visited the correct number of locations
  return len(env._visited_locations) == env._num_locations, locations_visited


Run the agent. **You can change the parameters if you want**

In [None]:
num_locations = 5 # you can change this parameter
success, visited_locations = run_agent(num_locations=num_locations, seed=3, timeout=100, verbose=False) # change verbose to True for more debugging information
print("Success:", success)
print("Locations Visited:", visited_locations)
print("Expected number of locations visited", num_locations)
# adding to check what locations i'm missing
print("Expected locations visited: ", )

# Evaluation

The code below will run your agent a fixed number of times with increasing number of rooms (3 through 10) and with random seeds. Your score will be determined using the scoring policy described in the section below.

For debugging purposes you may find it helpful to fix the seed temporarily for reproducability, however, your solution should work on a variety of seeds.

In [None]:
test_seeds = list(range(0, 100, 1))

In [None]:
def local_evaluation(testing_seeds):
    successes = 0
    num_trials = len(testing_seeds)
    for n in range(len(testing_seeds)):
      seed = testing_seeds[n]
      num_locations = min(3 + (n//2), 10)
      print("trial:", n, "locations:", num_locations, "seed:", seed)

      success, visited_locations = run_agent(num_locations = num_locations,
                         seed = seed,
                         timeout = 100,
                         verbose = False,
                         )
      print("result:", success)
      successes = successes + (1 if success == True else 0)
    print("\nsuccesses:", successes)
    print("num trials:", num_trials)
    raw_score = successes / num_trials
    if raw_score >= 0.6:
        score = 100.0
    else:
        score = (raw_score / 0.6) * 100.0
    print("SCORE [0, 100]:", score)

In [None]:
local_evaluation(testing_seeds=test_seeds)

# Grading

Your agent will be evaluated by running it on a range of random seeds on a variety of environments and game variations, as in the local evaluation above.

- These test configurations will not include invalid game settings or unsolvable cases — the goal is simply to check that your solution generalizes and is not hardcoded.

- The tests will be visible in Gradescope and are the same as the local evaluation given. Future assignments will have hidden tests in Gradescope (not visible to you). Your final grade will be based on performance across both visible and hidden tests.

## Scoring Policy

- If your agent achieves a success rate of 60% or higher, you will receive full credit (100%).
- If your agent’s success rate is below 60%, then you will receive partial credit on a linear scale, calculated as:

$$
\text{Grade} = \frac{\text{Success Rate}}{0.60} \times 100
$$




# Submission Instructions

Upload this notebook with the name `submission.ipynb` file to Gradescope. The autograder will **only** run successfully if your file is named this way. You must ensure that you have removed all print statements from **your** code, or the autograder may fail to run. Excessive print statements will also result in muddled test case outputs, which makes it more difficult to interpret your score.

We've added appropriate comments to the top of certain cells for the autograder to export (`# export`). You do NOT have to do anything (e.g. remove print statements) to cells we have provided - anything related to those have been handled for you. You are responsible for ensuring your own code has no syntax errors or unnecessary print statements. You **CANNOT** modify the export comments at the top of the cells, or the autograder will fail to run on your submission. This includes adding a line above the `# export` line.

You should not **add** any cells that your code requires to the notebook when submitting. You're welcome to add any code as you need to extra cells when testing, but they will not be graded. Only the provided cells will be graded. As mentioned in the top of the notebook, **any helper functions that you add should be nested within the function that uses them.**

If you encounter any issues with the autograder, please feel free to make a post on Ed Discussion. We highly recommend making a public post to clarify any questions, as it's likely that other students have the same questions as you! If you have a question that needs to be private, please make a private post.