# Outlook

In this notebook, using BBRL, we code a simple agent writing in a simple workspace to implement the Fibonacci sequence of numbers.

To understand this code, you need to first read 
[the BBRL documentation](https://github.com/osigaud/bbrl/docs/index.html).

## Installation and Imports

### Installation

The BBRL library is [here](https://github.com/osigaud/bbrl).

Below, we import standard python packages, pytorch packages and gymnasium
environments.

In [1]:
# Installs the necessary Python and system libraries
try:
    from easypip import easyimport, easyinstall, is_notebook
except ModuleNotFoundError as e:
    get_ipython().run_line_magic("pip", "install easypip")
    from easypip import easyimport, easyinstall, is_notebook

easyinstall("bbrl>=0.2.2")

In [2]:
import os
import sys
from pathlib import Path
import math

import time
from tqdm.auto import tqdm

import copy
from abc import abstractmethod, ABC
import torch
import torch.nn as nn
import torch.nn.functional as F

In [3]:
# Imports all the necessary classes and functions from BBRL
from bbrl.agents.agent import Agent
# The workspace is the main class in BBRL, this is where all data is collected and stored
from bbrl.workspace import Workspace

# Agents(agent1,agent2,agent3,...) executes the different agents the one after the other
# TemporalAgent(agent) executes an agent over multiple timesteps in the workspace, 
# or until a given condition is reached
from bbrl.agents import Agents, TemporalAgent

## Definition of agents

Our Fibonacci agent reads the current number, adds time and write the resulting number at the next time step

In [4]:
class FibonacciAgent(Agent):
    """ An agent to compute the Fibonacci sequence of numbers."""
    
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def forward(self, t, **kwargs):
        number = self.get(("number", t))
        next_value = number + torch.Tensor([t])
        self.set(("number", t+1), next_value)

We need a specific agent to write a 1 at the first time steps

In [9]:
class InitAgent(Agent):
    """ The agent to initialize the sequence of numbers."""
    
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def forward(self, t, **kwargs):
        self.set(("number", t), torch.Tensor([1]))

In [10]:
# Creates a new workspace
agent = FibonacciAgent()
temp_agent = TemporalAgent(Agents(agent))

init_agent = TemporalAgent(Agents(InitAgent()))

workspace = Workspace() 

# Execute the first step
init_agent(workspace, t=0, n_steps=1)

print("init:", workspace["number"])

temp_agent(workspace, t=0, n_steps=10)

fib5 = workspace.get("number", 5)
print("5th Fibonacci number : ", fib5)

sequence = workspace["number"]

init: tensor([[1.]])
5th Fibonacci number :  tensor([11.])


Let us now see the workspace

In [7]:
for key in workspace.variables.keys():
    print(key, workspace[key])

number tensor([[5.1886e+11],
        [5.1886e+11],
        [5.1886e+11],
        [5.1886e+11],
        [5.1886e+11],
        [5.1886e+11],
        [5.1886e+11],
        [5.1886e+11],
        [5.1886e+11],
        [5.1886e+11],
        [5.1886e+11]])


### Termination

`env/done` tells us if the episode was finished or not
here, with NoAutoReset, (1) we wait that all episodes are "done",
and when an episode is finished the flag remains True.
Note that when an environment is done before the others, its content is copied until the termination of all environments.
This is convenient for collecting the final reward.

In [8]:
workspace["env/done"].shape, workspace["env/done"][-10:]

AssertionError: [Workspace.get_full] unknown variable: env/done

The resulting tensor of observations, with the last two observations

In [None]:
workspace["env/env_obs"].shape, workspace["env/env_obs"][-2:]

The resulting tensor of rewards, with the last 8 rewards

In [None]:
workspace["env/reward"].shape, workspace["env/reward"][-8:]

The resulting tensor of actions, with the last two actions

In [None]:
workspace["action"].shape, workspace["action"][-2:]