# Tutorial 

This notebook provides a quick overview of how ns_gym is designed and how to use it.

In [1]:
# # Install the ns_bench package
# %pip install -e  .. --quiet

# Creating a Non-Stationary Gym Like Environment 

NS-Gym in at a high level lets users design non-stationary marchov decision processes by modifiing exposed paramters of some base Gymnasium environment.

NS-Gym provides wrappers for the Clasic control suite of Gymnasium  environments and three gridworld environments. At each time step these wrappers will modfify the value of the environment parameter thereby altering the transition function of the MDP. Aditionally, the wrapper can can control the level of notification available to decision making agent when there is a change to the tranition function.  

The available wrappers are as follows:

- The `NSClassicControlWrapper` is compatable with all environments in the classic control suite of environments.
- The `NSCliffWalkingWrapper` augments the CliffWalking environment. 
- The `NSFrozenLakeWrapper` augments the FrozenLake environment. 
- The `NSBridgeWrapper` augments the Bridge environment.

The table below lists all environments and set of observable parameters that can be tuned by NS-Gym.

| Env | What are the tunable parameters (AKA the hidden theta) | 
| --- | -------------------------------------------------------| 
| Acrobot|"dt","LINK_LENGTH_1","LINK_LENGTH_2","LINK_MASS_1","LINK_MASS_2","LINK_COM_POS_1","LINK_COM_POS_2","LINK_MOI"| 
| Cartpole|"gravity","masscart","masspole","force_mag","tau","length"| 
| Mountain Car Contin | 'power'| 
| Mountain Car | 'force','gravity'| 
| Pendulum |'dt', 'g','l', 'm',| 
| Frozen Lake |'P' (P is a probablity table that defines a categorical distribution for each state action pair)| 
| Cliff Walking |'P'| 
| Bridge | "P", "left_side_prob","right_side_prob" |






## Schedulers and Parameter Update Function

In addition to a wrapper, the two other essential components of the package are schedulers and pupdate functions.

- **Schedulers:** Functions (really callables) that define **when** to update the value of an envronement transition funciton. They simple return a boolean flag at time steps where the paramters need to updated.
- **Update Function:** Defines **how** to update the value of a parameter or probability distribution. If the scheduler returns true, update the parameter accordingly.


Seperating each component allows for greater flexibility and in designing experiments.

Availble schedulers are found under `ns_gym/schedulers.py`. Parameter update functions are found under the `ns_gym/update_functions` directory. The sch



## Notification Levels and custom observation and reward types.

A key feature of the NS-Gym library is its ability to manage
the interaction between the environment and the decision-
making agent. 
Users can configure notifications the agent receives about
changes in the NS-MDP at three distinct levels:
1. **Basic Notification:** The agent receives a boolean flag in-
dicating a change in an environment parameter.
2. **Detailed Notification:** In addition to the boolean flag, the
agent is informed of the magnitude of the change.
3. **Full Environment Model:** Additionally, if the agent re-
quires an environmental model for planning purposes (such
as in Monte Carlo tree search), NS-Gym can provide a sta-
tionary snapshot of the environment. This snapshot aligns
with the basic or detailed notification settings configured by
the user. If the user seeks a model without detailed notifi-
cation, the planning environment is a stationary snapshot of
the base environment. Conversely, if detailed notifications
are enabled, the agent receives the most up-to-date version
of the environment model (but not any future evolutions)

To handle the different levels of notification NS-Gym has custom Obsevation and Reward types. The base Observation type is outlined below.

```python


@dataclass(frozen=True)
class Observation:
    """Observation dataclass type. This is the output of the step function in the environment.

    Attributes:
        state (Union[np.ndarray,int]): The state of the environment
        env_change (Union[dict[str, bool],None]): A dictionary of boolean flags indicating what param of the environment has changed.
        delta_change (Union[dict[str,float],float]): The amount of change in the transition function of the environment
        relative_time (Union[int,float]): The relative time of the observation since the start of the environment episode.
    """
    state : Union[np.ndarray,int]
    env_change: Union[dict[str, bool],None] 
    delta_change: Union[dict[str,float],float,None]
    relative_time: Union[int,float]


```

# Example of how to use the package

The general blueprint to design a non-stationry gym like environment is as follows

1. Create a standard Gymnasium environment. This is the "base" environment.
2. Define which parameters of the base environment we want to update each time step. Observable paramters are in the table above. 
3. For each parameter map a scheduler to dictate when to update this parameter and an update function to dictate how the paramter is updated.
4. Pass the parameters, the update function, and base environment into wrapper.


# NS Frozenlake

Here is an example creating non-stationary FrozenLake. Suppose we wanted to createa an evironment where for the first 3 time steps the transition probablities are deterministic. For each step after the third the probability of going in the indented direction direcition will decrease by a value of 0.1.

## Step 1: Make a standard Gymnasium environment

In [2]:
# from ns_bench.wrappers import NSToyTextWrapper
import gymnasium as gym

env = gym.make("FrozenLake-v1",is_slippery=False)


## Step 2: Define tunable parameters and update functions.

From `ns_gym.schedulers` we can import the `ContinuousScheduler` and from `ns_gym.update_functions` we can import the `DistributionDecrmentUpdate` function.


In [3]:
import ns_gym
from ns_gym.schedulers import ContinuousScheduler
from ns_gym.update_functions import DistributionDecrmentUpdate

scheduler = ContinuousScheduler(start=3) #Update the slipperiness at each timestep starting from timestep 4
update_function = DistributionDecrmentUpdate(scheduler=scheduler,k = 0.1) #Decrement the slipperiness by 0.1 at each timestep where the scheduler fires true

param_name = "P"



## Step 3: Map update functions to parameter name

In [4]:
params = {param_name:update_function}


## Step 4: Pass the parameters, the update function, and base environment into wrapper.

We can import all wrappers from `ns_gym.wrappers`. In this case we want thee `NSFrozenLakeWrapper`. There are a few parameters of note that control the notification level of the environment. The `change_notification` controls  the basic notification level. The `delta_change_notification` level controls the detailed notification. If the agent needs a copy of the environment for planning, ns_gym can procide a copy at the appropiate notification level using `env.get_planning_env()`. If `change_notification` is set to true then the agent receives the most up to date (stationary) verision of the MDP for planning. Otherwese `env.get_planning_env()` returns the initial MDP.

In [5]:
from ns_gym.wrappers import NSFrozenLakeWrapper


env = NSFrozenLakeWrapper(env,params,change_notification=True, delta_change_notification=True, initial_prob_dist=[1,0,0])


# Set up the env/agent interaction loop

ns_gym has some decision making agents included in the package. We can import these agents from `ns_gym.benchmark_algorithms`. For illustrative purposes we can import a standard MCTS agent to act on this NS-FrozenLake environment. The environment/agent interaction loop is effectively the same as Gymnasium but with custom obsevation and reward types.



In [10]:
from ns_gym.benchmark_algorithms import MCTS

done = False
truncated = False
obs,_  = env.reset()


episode_reward = 0
while not done and not truncated:
    planning_env = env.get_planning_env()
    agent = MCTS(planning_env,obs,d=100,m=100,c=1.44,gamma=0.99)
    action,_ = agent.search()
    obs, reward, done, truncatd, info = env.step(action)
    episode_reward += reward.reward

print(f"Episode reward: {episode_reward}")




Episode reward: 1.0


## Custom Scheduler
One can easily implement their own custom scheduler. The scheduler just needs to be a callable that takes in the current time and outputs a boolean. It also needs to be a subclass of ns_bench.base.Scheduler. This sample custorm scheduler returns true every three MDP steps.


In [8]:
### Custom Scheduler just needs to be a callable that takes in the current time and outputs a boolean. It also needs to be a subclass of ns_bench.base.Scheduler
import ns_gym.base as base

class MyCustomSchedulerClass(base.Scheduler):
    """Custom Scheduler as a class
    """
    def __init__(self,k = 3):
        self.k = k
    def __call__(self,t:int):
        return t%self.k==0
    
scheduler3 = MyCustomSchedulerClass(k=3)
    

## Custom Update Function

We can also define a custome update function. We can define a parameter update function as a callable class. The the parameter update function is sa subclass of `ns_gym.base.UpdateFn` if its a scalar parameter update. If we are modifying a probability `ns_gym.base.UpdateDistributionFn`. We initialize the update function with its associated scheduler then all we need to implement it the `update` method. The update funcition will only "fire" when the scheduler returns true. If there is no update, the `ns_gym.base.UpdateFn` will return the parameter with no change. 

All update functions return a three tuple `(param, update_bool, parameter_update_amount)`

```
    Returns:
            Any: The updated parameter
            bool: Boolean flag indicating whether the parameter was updated or not
            float: The amount of change in the parameter
```

This custom update fuction devides the current parameter value by 2. 

In [9]:
from typing import Any, Type, Union 
from ns_gym.base import Scheduler

class MyCustomUpdateFn(base.UpdateFn):
    def __init__(self,scheduler) -> None:
        super().__init__(scheduler=scheduler) 
    def update(self,param,t):
        return param//2
    
updateFn3 = MyCustomUpdateFn(scheduler=scheduler3)