# Gymnasium Integration

[![Open in Colab](https://img.shields.io/badge/Open%20in%20Colab-F9AB00?style=for-the-badge&logo=googlecolab&color=525252)](https://colab.research.google.com/github/H-IAAC/CST-Python/blob/main/examples/Gymnasium%20Integration.ipynb) [![Open in Github](https://img.shields.io/badge/Open%20in%20Github-100000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/H-IAAC/CST-Python/blob/main/examples/Gymnasium%20Integration.ipynb)

[Gymnasium](https://gymnasium.farama.org/) is the library that defines the most widely used interface for creating environments for reinforcement learning problems. CST-Python provides an interface for interacting with environments using a cognitive agent.

Lets start by importing the CST-Python and other required modules:

In [1]:
try:
    import cst_python as cst
    import gymnasium as gym
except:
    !python3 -m pip install cst_python[gym]

In [2]:
import time

from gymnasium.wrappers import TransformAction, TransformObservation

from cst_python.python.gym import GymCodelet

## The GymCodelet

The GymCodelet is the main interface with environments. Before we use it, we need to create the environment and the agent's mind.

The environment we gonna use is the Blackjack card game. See the [environment documentation](https://gymnasium.farama.org/environments/toy_text/blackjack/) for more details about the game and the environment.

In [3]:
env = gym.make("Blackjack-v1")

In [4]:
mind = cst.Mind()

With the mind and environment, we can create the codelet, insert it inside the mind and start it:

In [5]:
gym_codelet = GymCodelet(mind, env)
mind.insert_codelet(gym_codelet)

mind.start()

One important detail is that the GymCodelet always runs in the [Publisher-Subscriber](https://h-iaac.github.io/CST-Python/_build/html/_examples/Publisher-Subscriber.html) mode.

It creates two important memories for starting the environment: the seed memory and the reset memory.

We gonna set the environment seed to 42 to exemplify how it works, and restart the environment: 

In [6]:
gym_codelet.seed_memory.set_info(42)
gym_codelet.reset_memory.set_info(True)

-1

If we look the observation memories, we gonna see a single memory with the environment provided observation, a tuple with the player current sum, dealer showing card value and usable ace:

In [7]:
gym_codelet.observation_memories

{'observation': MemoryObject [idmemoryobject=0, timestamp=1732730372039, evaluation=0.0, I=(15, 2, 0), name=observation]}

In [8]:
gym_codelet.observation_memories["observation"].get_info()

(15, 2, 0)

The step count memory shows the steps since the episode start:

In [9]:
gym_codelet.step_count_memory.get_info()

0

The action memories also contains a single "action" memory:

In [10]:
gym_codelet.action_memories

{'action': MemoryObject [idmemoryobject=1, timestamp=1732730372025, evaluation=0.0, I=1, name=action]}

We gonna set it to `1` for a hit.

In [11]:
gym_codelet.action_memories["action"].set_info(1)

-1

When the action memory changes, the codelet executes a step in the environment. We can see that the step count and observation changes:

In [12]:
gym_codelet.step_count_memory.get_info(), gym_codelet.observation_memories["observation"].get_info()

(1, (25, 2, 0))

As we busted, the environment terminated:

In [13]:
gym_codelet.terminated_memory.get_info()

True

And the step reward is -1 as we lost:

In [14]:
gym_codelet.reward_memory.get_info()

-1.0

We gonna start a new episode. Observes that the codelet resets the environment each time the reset memory timestamp changes, even if the content is the same. The first observation is the same as before, since we setted the environment seed:

In [15]:
gym_codelet.reset_memory.set_info(True)
gym_codelet.observation_memories["observation"].get_info()

(15, 2, 0)

This time, we gonna choose to stick:

In [16]:
gym_codelet.action_memories["action"].set_info(0)
gym_codelet.observation_memories["observation"].get_info()

(15, 2, 0)

And we won this game:

In [17]:
gym_codelet.terminated_memory.get_info(), gym_codelet.reward_memory.get_info()

(True, 1.0)

## Dict Spaces

So far, we have used the codelet to map all observations in the environment to a single memory with a generic name. However, if the environment has observation and action spaces of type Dict, the Codelet will map each observation and each action to a specific memory.

Let's see this.

In [18]:
env = gym.make("Blackjack-v1")

Different from before, we will use TransformObservation and TransformAction to transform the original observations and actions into Dict Spaces:

In [19]:
env = TransformObservation(env, 
                           lambda obs:{"player_sum":obs[0], "dealer_card":obs[1], "usable_ace":obs[2]}, 
                           gym.spaces.Dict({"player_sum":env.observation_space[0], "dealer_card":env.observation_space[1], "usable_ace":env.observation_space[2]}))

env = TransformAction(env, 
                           lambda action:action["hit"], 
                           gym.spaces.Dict({"hit":env.action_space}))

Let's create and start the agent and environment just like before:

In [20]:
mind = cst.Mind()
gym_codelet = GymCodelet(mind, env)
mind.insert_codelet(gym_codelet)

mind.start()

gym_codelet.seed_memory.set_info(42)
gym_codelet.reset_memory.set_info(True)

-1

This time, we can see that the observation memories changed, with a single memory for each observation:

In [21]:
gym_codelet.observation_memories

{'dealer_card': MemoryObject [idmemoryobject=0, timestamp=1732730372367, evaluation=0.0, I=2, name=dealer_card],
 'player_sum': MemoryObject [idmemoryobject=1, timestamp=1732730372367, evaluation=0.0, I=15, name=player_sum],
 'usable_ace': MemoryObject [idmemoryobject=2, timestamp=1732730372367, evaluation=0.0, I=0, name=usable_ace]}

In [22]:
{memory_name:gym_codelet.observation_memories[memory_name].get_info() for memory_name in gym_codelet.observation_memories}

{'dealer_card': 2, 'player_sum': 15, 'usable_ace': 0}

The action memory also changed it's name:

In [23]:
gym_codelet.action_memories

{'hit': MemoryObject [idmemoryobject=3, timestamp=1732730372365, evaluation=0.0, I=0, name=hit]}

Just like before, we choose to stick:

In [24]:
gym_codelet.action_memories["hit"].set_info(0)

-1

And won:

In [25]:
gym_codelet.terminated_memory.get_info(), gym_codelet.reward_memory.get_info()

(True, 1.0)

## Next steps

The idea is not to use the Codelet to manually interface with the environment like this example, but to create a cognitive architecture to perform the environment's task.

Another possibility is to combine GymCodelet with MemoryStorage to use gym environments with a remote cognitive agent or in CST-Java.