<a href="https://colab.research.google.com/github/prithwis/AGI/blob/main/TaxiV3_v5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![CC-BY-SA](https://licensebuttons.net/l/by-sa/3.0/88x31.png)<br>


![alt text](https://github.com/Praxis-QR/RDWH/raw/main/images/YantraJaalBanner.png)<br>




[Prithwis Mukerjee](http://www.linkedin.com/in/prithwis)<br>

#Multiple Brains - Sensitivity Analysis
for simulating Reinforcement Learning Applications

In [2]:
# 1. Update package list and install the NEW opengl names
!apt-get update > /dev/null 2>&1
!apt-get install -y xvfb ffmpeg freeglut3-dev python3-opengl libgl1-mesa-dev libglu1-mesa-dev mesa-utils > /dev/null 2>&1

# 2. Install the system dependencies (SWIG is the key here)
#!apt-get update
!apt-get install -y swig build-essential python3-dev > /dev/null 2>&1

# 3. Upgrade pip and setuptools to handle the build process better
!pip install --upgrade pip setuptools wheel > /dev/null 2>&1

# 4. Now install gymnasium with box2d support
!pip install "gymnasium[box2d]" > /dev/null 2>&1

# 5. Install the Python libraries
!pip install pyvirtualdisplay  pygame opencv-python > /dev/null 2>&1


In [13]:
import pygame
import cv2
import numpy as np
import os

import pandas as pd


from IPython.display import Video, display


In [4]:
# --- 1. THE RECORDER CLASS ---
class VideoRecorder:
    def __init__(self, filename='simulation.avi', width=640, height=480, fps=30):
        self.filename = filename
        self.width = width
        self.height = height
        self.fps = fps
        self.fourcc = cv2.VideoWriter_fourcc(*'XVID')
        self.video_writer = None

    def start(self):
        if os.path.exists(self.filename): os.remove(self.filename)
        self.video_writer = cv2.VideoWriter(self.filename, self.fourcc, self.fps, (self.width, self.height))


    def record_frame_with_hud(self, frame_array, reward, step):
        # 1. Convert RGB to BGR for OpenCV
        view = cv2.cvtColor(frame_array, cv2.COLOR_RGB2BGR)
        view = cv2.resize(view, (self.width, self.height))

        # 2. Add text overlay (The HUD)
        font = cv2.FONT_HERSHEY_SIMPLEX
        # Display Step and Reward
        cv2.putText(view, f"Step: {step}", (10, 30), font, 0.7, (255, 255, 255), 2)
        cv2.putText(view, f"Reward: {reward}", (10, 60), font, 0.7, (255, 255, 255), 2)
        self.video_writer.write(view)

    def stop(self):
        self.video_writer.release()
        # Convert to MP4 for browser compatibility
        output_mp4 = self.filename.replace('.avi', '.mp4')
        os.system(f"ffmpeg -y -i {self.filename} -c:v libx264 -pix_fmt yuv420p {output_mp4} -hide_banner -loglevel error")
        return output_mp4

In [5]:
import gymnasium as gym

# --- THE SWAP ---
# Old: world = FrogSnakeWorldV3A()


try:
    #env = gym.make("LunarLander-v3")
    env = gym.make("Taxi-v3", render_mode="rgb_array")
    print("Success! The physics engine is ready for AGI training.")
    #env.close()
except Exception as e:
    print(f"Error: {e}")


Success! The physics engine is ready for AGI training.


In [16]:
def train_taxi(alpha, gamma, epsilon, episodes=5000):
    q_table = np.zeros([env.observation_space.n, env.action_space.n])
    for i in range(episodes):
        state, info = env.reset()
        terminated = False
        while not terminated:
            # Epsilon-greedy
            if np.random.uniform(0, 1) < epsilon:
                action = env.action_space.sample()
            else:
                action = np.argmax(q_table[state])

            next_state, reward, terminated, truncated, info = env.step(action)

            # Bellman Equation
            old_value = q_table[state, action]
            next_max = np.max(q_table[next_state])
            new_value = (1 - alpha) * old_value + alpha * (reward + gamma * next_max)
            q_table[state, action] = new_value
            state = next_state
            if truncated: break
    return q_table

In [17]:
def evaluate_brain(q_table, num_trips=20):
    success_rewards = []
    truncated_count = 0

    for _ in range(num_trips):
        state, info = env.reset()
        trip_reward = 0
        terminated = truncated = False

        while not (terminated or truncated):
            action = np.argmax(q_table[state]) # Pure exploitation for testing
            state, reward, terminated, truncated, info = env.step(action)
            trip_reward += reward

        if terminated:
            success_rewards.append(trip_reward)
        else:
            truncated_count += 1

    avg_success = np.mean(success_rewards) if success_rewards else 0
    return avg_success, truncated_count

In [18]:

# --- SENSITIVITY ANALYSIS EXECUTION ---
results = []
alphas = [0.1, 0.5, 0.9] # Low, Med, High
gammas = [0.1, 0.5, 0.9] # Low, Med, High
constant_epsilon = 0.1
training_episodes = 5000

print("Starting Sensitivity Analysis...")
for a in alphas:
    for g in gammas:
        print(f"Training Brain: Alpha={a}, Gamma={g}...", end=" ")
        brain = train_taxi(a, g, constant_epsilon, episodes=training_episodes)
        avg_r, failures = evaluate_brain(brain, num_trips=20)

        results.append({
            'Alpha': a,
            'Gamma': g,
            'Avg_Success_Reward': avg_r,
            'Truncated_Count': failures,
            'Success_Rate': (20 - failures) / 20
        })
        print("Done.")

# 2. Store in Pandas
df = pd.DataFrame(results)

Starting Sensitivity Analysis...
Training Brain: Alpha=0.1, Gamma=0.1... Done.
Training Brain: Alpha=0.1, Gamma=0.5... Done.
Training Brain: Alpha=0.1, Gamma=0.9... Done.
Training Brain: Alpha=0.5, Gamma=0.1... Done.
Training Brain: Alpha=0.5, Gamma=0.5... Done.
Training Brain: Alpha=0.5, Gamma=0.9... Done.
Training Brain: Alpha=0.9, Gamma=0.1... Done.
Training Brain: Alpha=0.9, Gamma=0.5... Done.
Training Brain: Alpha=0.9, Gamma=0.9... Done.


In [19]:
# Display the raw dataframe
print("\n--- Sensitivity Results ---")
display(df)

# Pivot table for a heatmap-style view
pivot = df.pivot(index='Alpha', columns='Gamma', values='Avg_Success_Reward')
print("\n--- Average Reward Heatmap Table ---")
display(pivot)


--- Sensitivity Results ---


Unnamed: 0,Alpha,Gamma,Avg_Success_Reward,Truncated_Count,Success_Rate
0,0.1,0.1,9.222222,11,0.45
1,0.1,0.5,9.666667,8,0.6
2,0.1,0.9,7.7,0,1.0
3,0.5,0.1,6.526316,1,0.95
4,0.5,0.5,8.15,0,1.0
5,0.5,0.9,7.45,0,1.0
6,0.9,0.1,9.4,0,1.0
7,0.9,0.5,7.25,0,1.0
8,0.9,0.9,7.45,0,1.0



--- Average Reward Heatmap Table ---


Gamma,0.1,0.5,0.9
Alpha,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0.1,9.222222,9.666667,7.7
0.5,6.526316,8.15,7.45
0.9,9.4,7.25,7.45


In reinforcement learning, **Alpha** and **Gamma** are the two "tuning knobs" that determine the personality of your AI. They control how the agent processes information, but they look in opposite directions: one looks at the **past**, and the other looks at the **future**.

---

### 1. Alpha (): The Learning Rate (The Past)

Alpha determines how much the agent values **new** information versus **old** information.

* **If :** The agent learns nothing. It stays with its initial (often zero) knowledge.
* **If :** The agent is "forgetful." It completely overwrites its old memory with the very last experience it had.
* **The "Sweet Spot" (e.g., 0.1):** The agent keeps 90% of its old knowledge and nudges its "belief" 10% toward the new result. It requires multiple experiences to change its mind.

> **Analogy:** Think of Alpha as a student's memory. A high Alpha is like a student who memorizes only the last page they read and forgets the rest of the book. A low Alpha is a student who builds knowledge slowly, chapter by chapter.

---

### 2. Gamma (): The Discount Factor (The Future)

Gamma determines the "time horizon" of the agent. It tells the agent how much it should value **future** rewards compared to **immediate** rewards.

* **If  (Short-sighted):** The agent only cares about the reward it gets in the *very next step*. In the Taxi game, it would just try to avoid the -1 move penalty and wouldn't care about the +20 delivery bonus because that's too far away.
* **If  (Visionary):** The agent values a reward 50 steps from now almost as much as a reward right now. It is willing to take a series of "painful" -1 moves to reach that "glory" +20 finish line.

---

### Key Differences at a Glance

| Feature | Alpha () | Gamma () |
| --- | --- | --- |
| **Direction** | Backward-looking (Memory) | Forward-looking (Vision) |
| **Core Question** | "How much should I trust this new result?" | "How much do I care about future goals?" |
| **High Value** | Fast learning, but unstable/erratic. | Patient, strategic, looks for long-term wins. |
| **Low Value** | Slow, stable learning. | Impulsive, only cares about the "now." |

---

### Why this matters for your Sensitivity Analysis

In your 3x3 grid:

* If you have **High Gamma + Low Alpha**, you'll likely see a very "smart" taxi that takes the shortest path to the destination.
* If you have **Low Gamma + High Alpha**, you'll likely see a taxi that gets "stuck" (Truncated) because it doesn't see the point in taking 10 steps to get a passenger; itâ€™s too focused on the immediate -1 penalty of moving.



#Chronobooks <br>
Three science fiction novels by Prithwis Mukerjee. A dystopian Earth. A technocratic society managed by artificial intelligence. Escape and epiphany on Mars. Can man and machine, carbon and silicon explore and escape into other dimensions of existence? An Indic perspective rooted in Advaita Vedanta and the Divine Feminine.  [More information](http://bit.ly/chrono3) <br>
![alt text](https://blogger.googleusercontent.com/img/a/AVvXsEjsZufX_KYaLwAnJP6bUxvDg5RSPn6r8HIZe749nLWX3RuwyshrYEAUpdw03a9WIWRdnzA9epwJOE05eDJ0Ad7kGyfWiUrC2vNuOskb2jA-e8aOZSx8YqzT8mfZi3E4X1Rz3qlEAiv-aTxlCM976BEeTjx4J64ctY3C_FoV4v9aY_U23F8xRqI5Eg=s1600)