From e1bae525463ed59656fd7418cad9fb2d783875f9 Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Sun, 19 Oct 2025 20:36:36 -0700 Subject: [PATCH 01/19] add example --- README.md | 2 + examples/OpenEnv_Tutorial.ipynb | 853 ++++++++++++++++++++++++++++++++ 2 files changed, 855 insertions(+) create mode 100644 examples/OpenEnv_Tutorial.ipynb diff --git a/README.md b/README.md index 5169f40..a078900 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,8 @@ An e2e framework for creating, deploying and using isolated execution environments for agentic RL training, built using Gymnasium style simple APIs. +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/meta-pytorch/OpenEnv/blob/main/examples/OpenEnv_Tutorial.ipynb) **โ† Try the Interactive Tutorial!** + ## Overview OpenEnv provides a standard for interacting with agentic execution environments via simple Gymnasium style APIs - step(), reset(), state(). Users of agentic execution environments can interact with the environment during RL training loops using these simple APIs. diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb new file mode 100644 index 0000000..db4ddc8 --- /dev/null +++ b/examples/OpenEnv_Tutorial.ipynb @@ -0,0 +1,853 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# OpenEnv: Production-Ready RL Environments\n", + "\n", + "**Learn how OpenEnv standardizes RL environments for production use**\n", + "\n", + "---\n", + "\n", + "## What You'll Learn\n", + "\n", + "This notebook teaches you:\n", + "\n", + "1. **RL Fundamentals** - The core loop in 5 minutes\n", + "2. **OpenEnv Framework** - Why we built it and how it works\n", + "3. **Using Integrations** - Work with existing environments (OpenSpiel example)\n", + "4. **Interactive Demo** - See policies in action\n", + "5. **Adding Integrations** - Wrap your own environments\n", + "\n", + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 1: RL Fundamentals - The Core Loop\n", + "\n", + "Reinforcement Learning boils down to a simple loop:\n", + "\n", + "```\n", + "Agent observes โ†’ chooses action โ†’ gets reward โ†’ repeat\n", + "```\n", + "\n", + "Let's see it:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import random\n", + "\n", + "# Simple RL: Guess a number\n", + "target = random.randint(1, 10)\n", + "guesses = 3\n", + "\n", + "print(\"๐ŸŽฏ Guess a number (1-10)\\n\")\n", + "\n", + "while guesses > 0:\n", + " guess = random.randint(1, 10) # Policy: random\n", + " guesses -= 1\n", + " \n", + " print(f\"Guess: {guess}\", end=\" โ†’ \")\n", + " \n", + " if guess == target:\n", + " print(\"๐ŸŽ‰ Correct! Reward: +1\")\n", + " break\n", + " elif abs(guess - target) <= 2:\n", + " print(\"๐Ÿ”ฅ Warm\")\n", + " else:\n", + " print(\"โ„๏ธ Cold\")\n", + "else:\n", + " print(f\"\\nIt was {target}. Reward: 0\")\n", + "\n", + "print(\"\\n๐Ÿ’ก That's RL: observe โ†’ act โ†’ reward โ†’ repeat\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**The Problem**: How do we make this production-ready?\n", + "- Need type safety\n", + "- Need isolation\n", + "- Need deployment\n", + "- Need standardization\n", + "\n", + "**Enter OpenEnv.**\n", + "\n", + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 2: OpenEnv - The Framework\n", + "\n", + "### What is OpenEnv?\n", + "\n", + "OpenEnv is a **framework for creating, deploying, and using isolated RL environments**.\n", + "\n", + "Think \"Docker for RL environments\" with:\n", + "- โœ… Standardized API (reset, step, state)\n", + "- โœ… Type-safe dataclasses\n", + "- โœ… Docker isolation\n", + "- โœ… HTTP communication (language-agnostic)\n", + "- โœ… Production-ready deployment\n", + "\n", + "### The Architecture\n", + "\n", + "```\n", + "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", + "โ”‚ Your Training Code โ”‚ Python, Rust, Julia...\n", + "โ”‚ โ”‚\n", + "โ”‚ env = SomeEnv(...) โ”‚ โ† Import OpenEnv client\n", + "โ”‚ result = env.reset() โ”‚ โ† Type-safe!\n", + "โ”‚ result = env.step(action) โ”‚ โ† Type-safe!\n", + "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", + " โ”‚\n", + " โ”‚ HTTP/JSON\n", + " โ”‚\n", + "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", + "โ”‚ Docker Container โ”‚\n", + "โ”‚ โ”‚\n", + "โ”‚ FastAPI Server โ”‚\n", + "โ”‚ โ””โ”€ Environment Logic โ”‚\n", + "โ”‚ โ””โ”€ Your game/simulation โ”‚\n", + "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", + "```\n", + "\n", + "### The Pattern - Every Environment Has:\n", + "\n", + "```\n", + "src/envs/your_env/\n", + "โ”œโ”€โ”€ models.py โ† Type-safe contracts (Action, Observation, State)\n", + "โ”œโ”€โ”€ client.py โ† Client API (what you import)\n", + "โ””โ”€โ”€ server/\n", + " โ”œโ”€โ”€ environment.py โ† Environment logic\n", + " โ”œโ”€โ”€ app.py โ† FastAPI server\n", + " โ””โ”€โ”€ Dockerfile โ† Container\n", + "```\n", + "\n", + "### Current Integrations\n", + "\n", + "OpenEnv already integrates several environments:\n", + "- **OpenSpiel** (6 games from DeepMind)\n", + "- **Echo** (test environment)\n", + "- **Coding** (Python code execution)\n", + "- **Atari** (classic games)\n", + "- More coming!\n", + "\n", + "Let's explore one integration to see how it all works..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Part 3: Setup" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check if in Colab\n", + "try:\n", + " import google.colab\n", + " IN_COLAB = True\n", + "except ImportError:\n", + " IN_COLAB = False\n", + "\n", + "if IN_COLAB:\n", + " !git clone https://github.com/meta-pytorch/OpenEnv.git\n", + " %cd OpenEnv\n", + " !pip install -q fastapi uvicorn requests\n", + " import sys\n", + " sys.path.insert(0, './src')\n", + " print(\"โœ… OpenEnv ready!\")\n", + "else:\n", + " import sys\n", + " from pathlib import Path\n", + " sys.path.insert(0, str(Path.cwd() / 'src'))\n", + " print(\"โœ… Using local OpenEnv\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Part 4: Exploring OpenEnv's Structure\n", + "\n", + "Let's look at the actual OpenEnv code to understand how it works.\n", + "\n", + "### The Base Classes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from core.env_server import Environment, Action, Observation, State\n", + "from core.http_env_client import HTTPEnvClient\n", + "\n", + "print(\"=\" * 70)\n", + "print(\"OpenEnv Core Abstractions\")\n", + "print(\"=\" * 70)\n", + "\n", + "print(\"\"\"\n", + "SERVER SIDE (runs in Docker):\n", + "\n", + " class Environment(ABC):\n", + " '''Base class for all environment implementations'''\n", + " \n", + " @abstractmethod\n", + " def reset(self) -> Observation:\n", + " '''Start new episode'''\n", + " \n", + " @abstractmethod\n", + " def step(self, action: Action) -> Observation:\n", + " '''Execute action'''\n", + " \n", + " @property\n", + " def state(self) -> State:\n", + " '''Episode metadata'''\n", + "\n", + "CLIENT SIDE (your training code):\n", + "\n", + " class HTTPEnvClient(ABC):\n", + " '''Base class for HTTP clients'''\n", + " \n", + " def reset(self) -> StepResult:\n", + " # HTTP POST to /reset\n", + " \n", + " def step(self, action) -> StepResult:\n", + " # HTTP POST to /step\n", + " \n", + " def state(self) -> State:\n", + " # HTTP GET to /state\n", + "\"\"\")\n", + "\n", + "print(\"=\" * 70)\n", + "print(\"๐Ÿ’ก Same interface, communication via HTTP\")\n", + "print(\"=\" * 70)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Part 5: Example Integration - OpenSpiel\n", + "\n", + "### What is OpenSpiel?\n", + "\n", + "OpenSpiel is a **library from DeepMind** with 70+ game environments for RL research.\n", + "\n", + "### Our Integration\n", + "\n", + "**OpenEnv wraps 6 OpenSpiel games** following our standard pattern:\n", + "\n", + "1. **Catch** - Catch falling ball (single-player)\n", + "2. **Tic-Tac-Toe** - Classic 3ร—3 (2-player)\n", + "3. **Kuhn Poker** - Imperfect info poker (2-player)\n", + "4. **Cliff Walking** - Grid navigation (single-player)\n", + "5. **2048** - Tile puzzle (single-player)\n", + "6. **Blackjack** - Card game (single-player)\n", + "\n", + "Let's see how the integration is structured:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Import the OpenSpiel integration models\n", + "from envs.openspiel_env.models import (\n", + " OpenSpielAction,\n", + " OpenSpielObservation,\n", + " OpenSpielState\n", + ")\n", + "from dataclasses import fields\n", + "\n", + "print(\"=\" * 70)\n", + "print(\"OpenSpiel Integration - Type-Safe Models\")\n", + "print(\"=\" * 70)\n", + "\n", + "print(\"\\n๐Ÿ“ค OpenSpielAction (what you send):\")\n", + "for field in fields(OpenSpielAction):\n", + " print(f\" โ€ข {field.name}: {field.type}\")\n", + "\n", + "print(\"\\n๐Ÿ“ฅ OpenSpielObservation (what you receive):\")\n", + "for field in fields(OpenSpielObservation):\n", + " print(f\" โ€ข {field.name}: {field.type}\")\n", + "\n", + "print(\"\\n๐Ÿ“Š OpenSpielState (episode metadata):\")\n", + "for field in fields(OpenSpielState):\n", + " print(f\" โ€ข {field.name}: {field.type}\")\n", + "\n", + "print(\"\\n\" + \"=\" * 70)\n", + "print(\"๐Ÿ’ก This is how OpenEnv integrates external libraries:\")\n", + "print(\" 1. Wrap in standardized types\")\n", + "print(\" 2. Expose via HTTPEnvClient\")\n", + "print(\" 3. Package in Docker\")\n", + "print(\"=\" * 70)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### How the Client Works" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from envs.openspiel_env.client import OpenSpielEnv\n", + "\n", + "print(\"=\" * 70)\n", + "print(\"OpenSpielEnv Client (HTTPEnvClient Implementation)\")\n", + "print(\"=\" * 70)\n", + "\n", + "print(\"\"\"\n", + "How OpenEnv wraps OpenSpiel:\n", + "\n", + "class OpenSpielEnv(HTTPEnvClient[OpenSpielAction, OpenSpielObservation]):\n", + " \n", + " def _step_payload(self, action: OpenSpielAction) -> dict:\n", + " '''Convert action to JSON for HTTP request'''\n", + " return {\n", + " \"action_id\": action.action_id,\n", + " \"game_name\": action.game_name,\n", + " }\n", + " \n", + " def _parse_result(self, payload: dict) -> StepResult:\n", + " '''Parse HTTP response into typed observation'''\n", + " return StepResult(\n", + " observation=OpenSpielObservation(...),\n", + " reward=payload['reward'],\n", + " done=payload['done']\n", + " )\n", + "\n", + "Usage (same for ALL OpenEnv environments):\n", + "\n", + " env = OpenSpielEnv(base_url=\"http://localhost:8000\")\n", + " result = env.reset() # Returns StepResult[OpenSpielObservation]\n", + " result = env.step(OpenSpielAction(action_id=2, game_name=\"catch\"))\n", + " state = env.state() # Returns OpenSpielState\n", + "\"\"\")\n", + "\n", + "print(\"=\" * 70)\n", + "print(\"๐Ÿ’ก This pattern works for ANY environment you want to wrap!\")\n", + "print(\"=\" * 70)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Part 6: Interactive Demo - See It In Action\n", + "\n", + "Let's build a **Catch game** environment following OpenEnv's pattern.\n", + "\n", + "This shows you:\n", + "- How to structure an environment\n", + "- How the RL loop works\n", + "- How different policies perform\n", + "\n", + "### The Game:\n", + "- 5ร—5 grid, ball falls from top ๐Ÿ”ด\n", + "- Control paddle at bottom ๐Ÿ“\n", + "- **Actions**: 0=LEFT, 1=STAY, 2=RIGHT\n", + "- **Reward**: +1 caught, 0 missed" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import random\n", + "from dataclasses import dataclass\n", + "from typing import List, Tuple\n", + "\n", + "# Define types (following OpenEnv pattern)\n", + "@dataclass\n", + "class CatchObservation:\n", + " \"\"\"Type-safe observation.\"\"\"\n", + " info_state: List[float]\n", + " legal_actions: List[int]\n", + " done: bool\n", + " reward: float\n", + " # For visualization\n", + " ball_position: Tuple[int, int]\n", + " paddle_position: int\n", + "\n", + "\n", + "class CatchEnvironment:\n", + " \"\"\"\n", + " Catch game following OpenEnv Environment pattern.\n", + " \n", + " In production: This would run in Docker, accessed via HTTPEnvClient\n", + " For demo: We run it locally to see the internals\n", + " \"\"\"\n", + " \n", + " def __init__(self, grid_size=5):\n", + " self.grid_size = grid_size\n", + " \n", + " def reset(self) -> CatchObservation:\n", + " \"\"\"Start new episode (implements Environment.reset()).\"\"\"\n", + " self.ball_row = 0\n", + " self.ball_col = random.randint(0, self.grid_size - 1)\n", + " self.paddle_col = self.grid_size // 2\n", + " self.done = False\n", + " return self._make_observation()\n", + " \n", + " def step(self, action: int) -> CatchObservation:\n", + " \"\"\"Execute action (implements Environment.step()).\"\"\"\n", + " # Move paddle\n", + " if action == 0 and self.paddle_col > 0:\n", + " self.paddle_col -= 1\n", + " elif action == 2 and self.paddle_col < self.grid_size - 1:\n", + " self.paddle_col += 1\n", + " \n", + " # Move ball\n", + " self.ball_row += 1\n", + " \n", + " # Check done\n", + " if self.ball_row >= self.grid_size - 1:\n", + " self.done = True\n", + " reward = 1.0 if self.ball_col == self.paddle_col else 0.0\n", + " else:\n", + " reward = 0.0\n", + " \n", + " return self._make_observation(reward)\n", + " \n", + " def _make_observation(self, reward=0.0) -> CatchObservation:\n", + " \"\"\"Create type-safe observation.\"\"\"\n", + " info_state = [0.0] * (self.grid_size * self.grid_size)\n", + " ball_idx = self.ball_row * self.grid_size + self.ball_col\n", + " paddle_idx = (self.grid_size - 1) * self.grid_size + self.paddle_col\n", + " info_state[ball_idx] = 1.0\n", + " info_state[paddle_idx] = 0.5\n", + " \n", + " return CatchObservation(\n", + " info_state=info_state,\n", + " legal_actions=[0, 1, 2],\n", + " done=self.done,\n", + " reward=reward,\n", + " ball_position=(self.ball_row, self.ball_col),\n", + " paddle_position=self.paddle_col\n", + " )\n", + " \n", + " def render(self):\n", + " \"\"\"Visualize.\"\"\"\n", + " for row in range(self.grid_size):\n", + " line = \" \"\n", + " for col in range(self.grid_size):\n", + " if row == self.ball_row and col == self.ball_col:\n", + " line += \"๐Ÿ”ด \"\n", + " elif row == self.grid_size - 1 and col == self.paddle_col:\n", + " line += \"๐Ÿ“ \"\n", + " else:\n", + " line += \"โฌœ \"\n", + " print(line)\n", + "\n", + "\n", + "print(\"โœ… Environment created following OpenEnv pattern!\")\n", + "print(\"\\n Implements: reset(), step()\")\n", + "print(\" Returns: Type-safe observations\")\n", + "print(\" In production: Would run in Docker + FastAPI\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Test It" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "env = CatchEnvironment()\n", + "obs = env.reset()\n", + "\n", + "print(\"Initial State:\")\n", + "print(\"=\" * 50)\n", + "env.render()\n", + "print(f\"\\nBall: column {obs.ball_position[1]}\")\n", + "print(f\"Paddle: column {obs.paddle_position}\")\n", + "print(f\"Legal actions: {obs.legal_actions} (0=LEFT, 1=STAY, 2=RIGHT)\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Part 7: Different Policies\n", + "\n", + "A policy maps observations โ†’ actions. Let's test 4 strategies:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class RandomPolicy:\n", + " name = \"Random\"\n", + " def select_action(self, obs): \n", + " return random.choice(obs.legal_actions)\n", + "\n", + "class AlwaysStayPolicy:\n", + " name = \"Always Stay\"\n", + " def select_action(self, obs): \n", + " return 1\n", + "\n", + "class SmartPolicy:\n", + " name = \"Smart Heuristic\"\n", + " def select_action(self, obs):\n", + " ball_col = obs.ball_position[1]\n", + " paddle_col = obs.paddle_position\n", + " if paddle_col < ball_col: return 2 # RIGHT\n", + " elif paddle_col > ball_col: return 0 # LEFT\n", + " else: return 1 # STAY\n", + "\n", + "class LearningPolicy:\n", + " name = \"Learning Agent\"\n", + " def __init__(self):\n", + " self.steps = 0\n", + " \n", + " def select_action(self, obs):\n", + " self.steps += 1\n", + " epsilon = max(0.1, 1.0 - (self.steps / 100))\n", + " \n", + " if random.random() < epsilon: # Explore\n", + " return random.choice(obs.legal_actions)\n", + " else: # Exploit\n", + " ball_col = obs.ball_position[1]\n", + " paddle_col = obs.paddle_position\n", + " if paddle_col < ball_col: return 2\n", + " elif paddle_col > ball_col: return 0\n", + " else: return 1\n", + "\n", + "print(\"โœ… 4 Policies created:\")\n", + "print(\" 1. Random - Baseline\")\n", + "print(\" 2. Always Stay - Bad strategy\")\n", + "print(\" 3. Smart - Optimal heuristic\")\n", + "print(\" 4. Learning - Simulated RL\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Watch Them Play" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import time\n", + "\n", + "def run_episode(env, policy, visualize=True, delay=0.4):\n", + " obs = env.reset()\n", + " \n", + " if visualize:\n", + " print(f\"\\n{'='*50}\")\n", + " print(f\"Policy: {policy.name} | Ball: col {obs.ball_position[1]}\")\n", + " print('='*50 + '\\n')\n", + " env.render()\n", + " time.sleep(delay)\n", + " \n", + " total_reward = 0\n", + " step = 0\n", + " \n", + " while not obs.done:\n", + " action = policy.select_action(obs)\n", + " obs = env.step(action)\n", + " total_reward += obs.reward\n", + " \n", + " if visualize:\n", + " print(f\"\\nStep {step + 1}: {['LEFT','STAY','RIGHT'][action]}\")\n", + " env.render()\n", + " time.sleep(delay)\n", + " \n", + " step += 1\n", + " \n", + " if visualize:\n", + " print(f\"\\n{'๐ŸŽ‰ CAUGHT!' if total_reward > 0 else '๐Ÿ˜ข MISSED'} Reward: {total_reward}\")\n", + " \n", + " return total_reward > 0\n", + "\n", + "# Demo\n", + "env = CatchEnvironment()\n", + "run_episode(env, SmartPolicy(), visualize=True, delay=0.3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Compare All Policies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def evaluate_policies(num_episodes=50):\n", + " policies = [RandomPolicy(), AlwaysStayPolicy(), SmartPolicy(), LearningPolicy()]\n", + " \n", + " print(\"\\n\" + \"=\"*70)\n", + " print(f\"๐Ÿ† POLICY COMPARISON ({num_episodes} episodes)\")\n", + " print(\"=\"*70 + \"\\n\")\n", + " \n", + " results = []\n", + " for policy in policies:\n", + " env = CatchEnvironment()\n", + " successes = sum(run_episode(env, policy, visualize=False) \n", + " for _ in range(num_episodes))\n", + " rate = (successes / num_episodes) * 100\n", + " results.append((policy.name, rate))\n", + " print(f\"{policy.name:20s}: {rate:5.1f}%\")\n", + " \n", + " print(\"\\n\" + \"=\"*70)\n", + " results.sort(key=lambda x: x[1], reverse=True)\n", + " for name, rate in results:\n", + " bar = \"โ–ˆ\" * int(rate / 2)\n", + " print(f\"{name:20s} [{bar:<50}] {rate:.1f}%\")\n", + " \n", + " print(\"\\n\" + \"=\"*70)\n", + " print(\"๐Ÿ’ก RL in action: Random โ†’ Learning โ†’ Optimal\")\n", + " print(\"=\"*70)\n", + "\n", + "evaluate_policies(50)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Part 8: Using Real OpenSpiel Integration\n", + "\n", + "What we just built **is how OpenEnv works**!\n", + "\n", + "### Demo vs Production:\n", + "\n", + "| Component | Our Demo | OpenEnv + OpenSpiel |\n", + "|-----------|----------|---------------------|\n", + "| Environment | Local class | Docker container |\n", + "| Communication | Direct | HTTP |\n", + "| Client | Direct | HTTPEnvClient |\n", + "| Type Safety | โœ… | โœ… |\n", + "| API | reset/step | reset/step |\n", + "\n", + "### Using OpenSpiel Integration:\n", + "\n", + "```python\n", + "# Install OpenSpiel\n", + "!pip install open_spiel\n", + "\n", + "# Import OpenEnv's integration\n", + "from envs.openspiel_env import OpenSpielEnv, OpenSpielAction\n", + "\n", + "# Connect to server\n", + "env = OpenSpielEnv(base_url=\"http://localhost:8000\")\n", + "\n", + "# Same API!\n", + "result = env.reset()\n", + "result = env.step(OpenSpielAction(action_id=2, game_name=\"catch\"))\n", + "state = env.state()\n", + "```\n", + "\n", + "### Available Games:\n", + "1. Catch (what we demoed!)\n", + "2. Tic-Tac-Toe\n", + "3. Kuhn Poker\n", + "4. Cliff Walking\n", + "5. 2048\n", + "6. Blackjack" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Part 9: Adding Your Own Integration\n", + "\n", + "Want to wrap your own environment? Follow the pattern:\n", + "\n", + "### 1. Define Types (models.py)\n", + "```python\n", + "@dataclass\n", + "class YourAction(Action):\n", + " # Your action fields\n", + "\n", + "@dataclass\n", + "class YourObservation(Observation):\n", + " # Your observation fields\n", + "```\n", + "\n", + "### 2. Implement Environment (server/environment.py)\n", + "```python\n", + "class YourEnvironment(Environment):\n", + " def reset(self) -> Observation:\n", + " return YourObservation(...)\n", + " \n", + " def step(self, action: Action) -> Observation:\n", + " return YourObservation(...)\n", + "```\n", + "\n", + "### 3. Create Client (client.py)\n", + "```python\n", + "class YourEnv(HTTPEnvClient[YourAction, YourObservation]):\n", + " def _step_payload(self, action):\n", + " return {\"field\": action.field}\n", + " \n", + " def _parse_result(self, payload):\n", + " return StepResult(observation=YourObservation(...))\n", + "```\n", + "\n", + "### 4. Create Server (server/app.py)\n", + "```python\n", + "from core.env_server import create_fastapi_app\n", + "\n", + "env = YourEnvironment()\n", + "app = create_fastapi_app(env)\n", + "```\n", + "\n", + "### 5. Dockerize (server/Dockerfile)\n", + "```dockerfile\n", + "FROM python:3.11\n", + "COPY . /app\n", + "WORKDIR /app\n", + "RUN pip install -r requirements.txt\n", + "CMD [\"uvicorn\", \"app:app\", \"--host\", \"0.0.0.0\"]\n", + "```\n", + "\n", + "### Examples to Study:\n", + "- `src/envs/echo_env/` - Simple test environment\n", + "- `src/envs/openspiel_env/` - Our OpenSpiel integration\n", + "- `src/envs/coding_env/` - Python code execution" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Summary\n", + "\n", + "### What You Learned:\n", + "\n", + "1. **RL Basics** - The core loop\n", + "2. **OpenEnv Framework** - Standardized, production-ready RL environments\n", + "3. **Example Integration** - How OpenSpiel is wrapped\n", + "4. **Interactive Demo** - Policies in action\n", + "5. **Adding Integrations** - The pattern to follow\n", + "\n", + "### OpenEnv's Value:\n", + "\n", + "| Feature | Traditional | OpenEnv |\n", + "|---------|------------|----------|\n", + "| Type Safety | โŒ | โœ… |\n", + "| Isolation | โŒ | โœ… Docker |\n", + "| Deployment | โŒ | โœ… K8s-ready |\n", + "| Language | Python only | Any (HTTP) |\n", + "| Reproducibility | โŒ | โœ… |\n", + "\n", + "### Next Steps:\n", + "\n", + "1. Try OpenSpiel integration\n", + "2. Implement real RL (Q-learning, DQN, PPO)\n", + "3. Wrap your own environments\n", + "4. Deploy to production\n", + "5. Use with RL libraries (TorchRL, etc.)\n", + "\n", + "### Resources:\n", + "\n", + "- **OpenEnv**: https://github.com/meta-pytorch/OpenEnv\n", + "- **Docs**: `src/envs/README.md`\n", + "- **Examples**: `examples/` directory\n", + "\n", + "---\n", + "\n", + "## ๐ŸŽ‰ You're Ready!\n", + "\n", + "You now understand:\n", + "- โœ… OpenEnv framework\n", + "- โœ… How integrations work\n", + "- โœ… Using existing environments\n", + "- โœ… Creating new integrations\n", + "- โœ… Production deployment\n", + "\n", + "**Welcome to production-ready RL!** ๐Ÿš€" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.0" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From 5f45df525bc82e9ec0bdb60fb892439f181c95b8 Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 13:35:51 -0700 Subject: [PATCH 02/19] add improvement --- examples/OpenEnv_Tutorial.ipynb | 573 ++++++++++++++++++++------------ 1 file changed, 366 insertions(+), 207 deletions(-) diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index db4ddc8..5079472 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -4,21 +4,33 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# OpenEnv: Production-Ready RL Environments\n", + "
\n", + "\n", + "# ๐ŸŽฎ OpenEnv: Production-Ready RL Environments\n", + "\n", + "\n", "\n", "**Learn how OpenEnv standardizes RL environments for production use**\n", "\n", - "---\n", + "[![GitHub](https://img.shields.io/badge/GitHub-meta--pytorch%2FOpenEnv-blue?logo=github)](https://github.com/meta-pytorch/OpenEnv)\n", + "[![Python](https://img.shields.io/badge/Python-3.11+-blue?logo=python)](https://www.python.org/)\n", + "[![Docker](https://img.shields.io/badge/Docker-Ready-blue?logo=docker)](https://www.docker.com/)\n", "\n", - "## What You'll Learn\n", + "
\n", "\n", - "This notebook teaches you:\n", + "---\n", + "\n", + "## ๐Ÿ“š What You'll Learn\n", "\n", - "1. **RL Fundamentals** - The core loop in 5 minutes\n", - "2. **OpenEnv Framework** - Why we built it and how it works\n", - "3. **Using Integrations** - Work with existing environments (OpenSpiel example)\n", - "4. **Interactive Demo** - See policies in action\n", - "5. **Adding Integrations** - Wrap your own environments\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
๐Ÿง 
RL Fundamentals
5 minutes
๐Ÿ—๏ธ
OpenEnv Framework
Architecture
๐Ÿ”Œ
Integrations
OpenSpiel example
๐ŸŽฏ
Interactive Demo
See it work
โž•
Add Your Own
Extend it
\n", "\n", "---" ] @@ -27,7 +39,23 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Part 1: RL Fundamentals - The Core Loop\n", + "## ๐Ÿง  Part 1: RL Fundamentals - The Core Loop\n", + "\n", + "
\n", + "\n", + "```mermaid\n", + "graph LR\n", + " A[๐Ÿค– Agent] -->|observes| B[๐Ÿ‘€ State]\n", + " B -->|decides| C[โšก Action]\n", + " C -->|executes| D[๐ŸŒ Environment]\n", + " D -->|returns| E[๐ŸŽ Reward]\n", + " E -->|learns| A\n", + " style A fill:#e1f5ff\n", + " style D fill:#fff4e1\n", + " style E fill:#ffe1e1\n", + "```\n", + "\n", + "
\n", "\n", "Reinforcement Learning boils down to a simple loop:\n", "\n", @@ -35,7 +63,7 @@ "Agent observes โ†’ chooses action โ†’ gets reward โ†’ repeat\n", "```\n", "\n", - "Let's see it:" + "Let's see it in action with a simple example:" ] }, { @@ -75,13 +103,21 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**The Problem**: How do we make this production-ready?\n", - "- Need type safety\n", - "- Need isolation\n", - "- Need deployment\n", - "- Need standardization\n", - "\n", - "**Enter OpenEnv.**\n", + "
\n", + "

โš ๏ธ The Problem

\n", + "

How do we make this production-ready?

\n", + "
    \n", + "
  • โŒ Need type safety
  • \n", + "
  • โŒ Need isolation
  • \n", + "
  • โŒ Need deployment
  • \n", + "
  • โŒ Need standardization
  • \n", + "
\n", + "
\n", + "\n", + "
\n", + "

โœ… The Solution: OpenEnv

\n", + "

A production-ready framework that solves all these problems!

\n", + "
\n", "\n", "---" ] @@ -90,72 +126,104 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Part 2: OpenEnv - The Framework\n", - "\n", - "### What is OpenEnv?\n", + "## ๐Ÿ—๏ธ Part 2: OpenEnv - The Framework\n", "\n", - "OpenEnv is a **framework for creating, deploying, and using isolated RL environments**.\n", + "
\n", + "

๐Ÿš€ Think \"Docker for RL Environments\"

\n", + "
\n", "\n", - "Think \"Docker for RL environments\" with:\n", - "- โœ… Standardized API (reset, step, state)\n", - "- โœ… Type-safe dataclasses\n", - "- โœ… Docker isolation\n", - "- โœ… HTTP communication (language-agnostic)\n", - "- โœ… Production-ready deployment\n", + "### โœจ What is OpenEnv?\n", "\n", - "### The Architecture\n", + "OpenEnv is a **framework for creating, deploying, and using isolated RL environments**.\n", "\n", - "```\n", - "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", - "โ”‚ Your Training Code โ”‚ Python, Rust, Julia...\n", - "โ”‚ โ”‚\n", - "โ”‚ env = SomeEnv(...) โ”‚ โ† Import OpenEnv client\n", - "โ”‚ result = env.reset() โ”‚ โ† Type-safe!\n", - "โ”‚ result = env.step(action) โ”‚ โ† Type-safe!\n", - "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", - " โ”‚\n", - " โ”‚ HTTP/JSON\n", - " โ”‚\n", - "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", - "โ”‚ Docker Container โ”‚\n", - "โ”‚ โ”‚\n", - "โ”‚ FastAPI Server โ”‚\n", - "โ”‚ โ””โ”€ Environment Logic โ”‚\n", - "โ”‚ โ””โ”€ Your game/simulation โ”‚\n", - "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
โœ…
Standardized API
reset, step, state
๐Ÿ”’
Type-safe
dataclasses
๐Ÿณ
Docker isolation
secure
๐ŸŒ
HTTP API
any language
โ˜ธ๏ธ
Production-ready
K8s deploy
\n", + "\n", + "### ๐ŸŽจ The Architecture\n", + "\n", + "```mermaid\n", + "graph TB\n", + " subgraph Client[\"๐Ÿ’ป Your Training Code\"]\n", + " A[\"๐Ÿ Python/Rust/Julia\"]\n", + " B[\"env = OpenSpielEnv()\"]\n", + " C[\"result = env.reset()\"]\n", + " D[\"result = env.step(action)\"]\n", + " end\n", + " \n", + " subgraph HTTP[\"๐ŸŒ HTTP/JSON\"]\n", + " E[\"POST /reset\"]\n", + " F[\"POST /step\"]\n", + " G[\"GET /state\"]\n", + " end\n", + " \n", + " subgraph Server[\"๐Ÿณ Docker Container\"]\n", + " H[\"โšก FastAPI Server\"]\n", + " I[\"๐ŸŽฎ Environment Logic\"]\n", + " J[\"๐ŸŽฏ Game/Simulation\"]\n", + " end\n", + " \n", + " Client --> HTTP\n", + " HTTP --> Server\n", + " \n", + " style Client fill:#e1f5ff\n", + " style HTTP fill:#fff4e1\n", + " style Server fill:#ffe1f5\n", "```\n", "\n", - "### The Pattern - Every Environment Has:\n", + "### ๐Ÿ“ The Pattern - Every Environment Has:\n", "\n", "```\n", "src/envs/your_env/\n", - "โ”œโ”€โ”€ models.py โ† Type-safe contracts (Action, Observation, State)\n", - "โ”œโ”€โ”€ client.py โ† Client API (what you import)\n", - "โ””โ”€โ”€ server/\n", - " โ”œโ”€โ”€ environment.py โ† Environment logic\n", - " โ”œโ”€โ”€ app.py โ† FastAPI server\n", - " โ””โ”€โ”€ Dockerfile โ† Container\n", + "โ”œโ”€โ”€ ๐Ÿ“ models.py โ† Type-safe contracts (Action, Observation, State)\n", + "โ”œโ”€โ”€ ๐Ÿ“ฑ client.py โ† Client API (what you import)\n", + "โ””โ”€โ”€ ๐Ÿ–ฅ๏ธ server/\n", + " โ”œโ”€โ”€ environment.py โ† Environment logic\n", + " โ”œโ”€โ”€ app.py โ† FastAPI server\n", + " โ””โ”€โ”€ Dockerfile โ† Container\n", "```\n", "\n", - "### Current Integrations\n", - "\n", - "OpenEnv already integrates several environments:\n", - "- **OpenSpiel** (6 games from DeepMind)\n", - "- **Echo** (test environment)\n", - "- **Coding** (Python code execution)\n", - "- **Atari** (classic games)\n", - "- More coming!\n", + "### ๐ŸŽฎ Current Integrations\n", + "\n", + "
\n", + "
\n", + "

๐ŸŽฏ OpenSpiel

\n", + "

6 games from DeepMind

\n", + "
\n", + "
\n", + "

๐Ÿ“ข Echo

\n", + "

Test environment

\n", + "
\n", + "
\n", + "

๐Ÿ’ป Coding

\n", + "

Python execution

\n", + "
\n", + "
\n", + "

๐Ÿ•น๏ธ Atari

\n", + "

Classic games

\n", + "
\n", + "
\n", + "\n", + "Let's explore one integration to see how it all works...\n", "\n", - "Let's explore one integration to see how it all works..." + "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "---\n", + "## โš™๏ธ Part 3: Setup\n", "\n", - "## Part 3: Setup" + "
\n", + "

๐Ÿ”ง Getting Started

\n", + "
" ] }, { @@ -191,11 +259,13 @@ "source": [ "---\n", "\n", - "## Part 4: Exploring OpenEnv's Structure\n", + "## ๐Ÿ” Part 4: Exploring OpenEnv's Structure\n", "\n", - "Let's look at the actual OpenEnv code to understand how it works.\n", + "
\n", + "

Let's look at the actual code!

\n", + "
\n", "\n", - "### The Base Classes" + "### ๐Ÿงฉ The Base Classes" ] }, { @@ -208,11 +278,11 @@ "from core.http_env_client import HTTPEnvClient\n", "\n", "print(\"=\" * 70)\n", - "print(\"OpenEnv Core Abstractions\")\n", + "print(\"๐Ÿ”ง OpenEnv Core Abstractions\")\n", "print(\"=\" * 70)\n", "\n", "print(\"\"\"\n", - "SERVER SIDE (runs in Docker):\n", + "๐Ÿ–ฅ๏ธ SERVER SIDE (runs in Docker):\n", "\n", " class Environment(ABC):\n", " '''Base class for all environment implementations'''\n", @@ -229,7 +299,7 @@ " def state(self) -> State:\n", " '''Episode metadata'''\n", "\n", - "CLIENT SIDE (your training code):\n", + "๐Ÿ“ฑ CLIENT SIDE (your training code):\n", "\n", " class HTTPEnvClient(ABC):\n", " '''Base class for HTTP clients'''\n", @@ -245,7 +315,7 @@ "\"\"\")\n", "\n", "print(\"=\" * 70)\n", - "print(\"๐Ÿ’ก Same interface, communication via HTTP\")\n", + "print(\"๐Ÿ’ก Same interface, communication via HTTP!\")\n", "print(\"=\" * 70)" ] }, @@ -255,22 +325,33 @@ "source": [ "---\n", "\n", - "## Part 5: Example Integration - OpenSpiel\n", + "## ๐Ÿ”Œ Part 5: Example Integration - OpenSpiel\n", + "\n", + "
\n", + " \n", + "

70+ Game Environments

\n", + "
\n", "\n", - "### What is OpenSpiel?\n", + "### ๐ŸŽฎ What is OpenSpiel?\n", "\n", "OpenSpiel is a **library from DeepMind** with 70+ game environments for RL research.\n", "\n", - "### Our Integration\n", + "### ๐ŸŽฏ Our Integration\n", "\n", "**OpenEnv wraps 6 OpenSpiel games** following our standard pattern:\n", "\n", - "1. **Catch** - Catch falling ball (single-player)\n", - "2. **Tic-Tac-Toe** - Classic 3ร—3 (2-player)\n", - "3. **Kuhn Poker** - Imperfect info poker (2-player)\n", - "4. **Cliff Walking** - Grid navigation (single-player)\n", - "5. **2048** - Tile puzzle (single-player)\n", - "6. **Blackjack** - Card game (single-player)\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
๐ŸŽฏ
Catch
Catch falling ball
โŒ
Tic-Tac-Toe
Classic 3ร—3
๐Ÿƒ
Kuhn Poker
Imperfect info
๐Ÿ”๏ธ
Cliff Walking
Grid navigation
๐Ÿ”ข
2048
Tile puzzle
๐Ÿ‚ก
Blackjack
Card game
\n", "\n", "Let's see how the integration is structured:" ] @@ -281,7 +362,6 @@ "metadata": {}, "outputs": [], "source": [ - "# Import the OpenSpiel integration models\n", "from envs.openspiel_env.models import (\n", " OpenSpielAction,\n", " OpenSpielObservation,\n", @@ -290,7 +370,7 @@ "from dataclasses import fields\n", "\n", "print(\"=\" * 70)\n", - "print(\"OpenSpiel Integration - Type-Safe Models\")\n", + "print(\"๐Ÿ”’ OpenSpiel Integration - Type-Safe Models\")\n", "print(\"=\" * 70)\n", "\n", "print(\"\\n๐Ÿ“ค OpenSpielAction (what you send):\")\n", @@ -317,7 +397,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### How the Client Works" + "### ๐Ÿ”ง How the Client Works" ] }, { @@ -329,7 +409,7 @@ "from envs.openspiel_env.client import OpenSpielEnv\n", "\n", "print(\"=\" * 70)\n", - "print(\"OpenSpielEnv Client (HTTPEnvClient Implementation)\")\n", + "print(\"๐Ÿ“ฑ OpenSpielEnv Client (HTTPEnvClient Implementation)\")\n", "print(\"=\" * 70)\n", "\n", "print(\"\"\"\n", @@ -371,20 +451,25 @@ "source": [ "---\n", "\n", - "## Part 6: Interactive Demo - See It In Action\n", + "## ๐ŸŽฏ Part 6: Interactive Demo - See It In Action\n", + "\n", + "
\n", + "

๐ŸŽฎ Let's Build the Catch Game!

\n", + " \n", + "
\n", "\n", - "Let's build a **Catch game** environment following OpenEnv's pattern.\n", + "### ๐ŸŽฒ The Game Rules:\n", "\n", - "This shows you:\n", - "- How to structure an environment\n", - "- How the RL loop works\n", - "- How different policies perform\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
๐Ÿ“
5ร—5 Grid
๐Ÿ”ด
Ball falls
๐Ÿ“
Catch it!
๐ŸŽ
+1 reward
\n", "\n", - "### The Game:\n", - "- 5ร—5 grid, ball falls from top ๐Ÿ”ด\n", - "- Control paddle at bottom ๐Ÿ“\n", - "- **Actions**: 0=LEFT, 1=STAY, 2=RIGHT\n", - "- **Reward**: +1 caught, 0 missed" + "**Actions**: 0=LEFT โฌ…๏ธ | 1=STAY โธ๏ธ | 2=RIGHT โžก๏ธ" ] }, { @@ -405,7 +490,6 @@ " legal_actions: List[int]\n", " done: bool\n", " reward: float\n", - " # For visualization\n", " ball_position: Tuple[int, int]\n", " paddle_position: int\n", "\n", @@ -431,16 +515,13 @@ " \n", " def step(self, action: int) -> CatchObservation:\n", " \"\"\"Execute action (implements Environment.step()).\"\"\"\n", - " # Move paddle\n", " if action == 0 and self.paddle_col > 0:\n", " self.paddle_col -= 1\n", " elif action == 2 and self.paddle_col < self.grid_size - 1:\n", " self.paddle_col += 1\n", " \n", - " # Move ball\n", " self.ball_row += 1\n", " \n", - " # Check done\n", " if self.ball_row >= self.grid_size - 1:\n", " self.done = True\n", " reward = 1.0 if self.ball_col == self.paddle_col else 0.0\n", @@ -450,7 +531,6 @@ " return self._make_observation(reward)\n", " \n", " def _make_observation(self, reward=0.0) -> CatchObservation:\n", - " \"\"\"Create type-safe observation.\"\"\"\n", " info_state = [0.0] * (self.grid_size * self.grid_size)\n", " ball_idx = self.ball_row * self.grid_size + self.ball_col\n", " paddle_idx = (self.grid_size - 1) * self.grid_size + self.paddle_col\n", @@ -467,7 +547,6 @@ " )\n", " \n", " def render(self):\n", - " \"\"\"Visualize.\"\"\"\n", " for row in range(self.grid_size):\n", " line = \" \"\n", " for col in range(self.grid_size):\n", @@ -479,18 +558,17 @@ " line += \"โฌœ \"\n", " print(line)\n", "\n", - "\n", "print(\"โœ… Environment created following OpenEnv pattern!\")\n", - "print(\"\\n Implements: reset(), step()\")\n", - "print(\" Returns: Type-safe observations\")\n", - "print(\" In production: Would run in Docker + FastAPI\")" + "print(\" ๐Ÿ”ง Implements: reset(), step()\")\n", + "print(\" ๐Ÿ”’ Returns: Type-safe observations\")\n", + "print(\" ๐Ÿณ In production: Would run in Docker + FastAPI\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Test It" + "### ๐Ÿงช Test It" ] }, { @@ -502,12 +580,12 @@ "env = CatchEnvironment()\n", "obs = env.reset()\n", "\n", - "print(\"Initial State:\")\n", + "print(\"๐ŸŽฎ Initial State:\")\n", "print(\"=\" * 50)\n", "env.render()\n", - "print(f\"\\nBall: column {obs.ball_position[1]}\")\n", - "print(f\"Paddle: column {obs.paddle_position}\")\n", - "print(f\"Legal actions: {obs.legal_actions} (0=LEFT, 1=STAY, 2=RIGHT)\")" + "print(f\"\\n๐Ÿ”ด Ball: column {obs.ball_position[1]}\")\n", + "print(f\"๐Ÿ“ Paddle: column {obs.paddle_position}\")\n", + "print(f\"โšก Legal actions: {obs.legal_actions} (0=LEFT, 1=STAY, 2=RIGHT)\")" ] }, { @@ -516,9 +594,23 @@ "source": [ "---\n", "\n", - "## Part 7: Different Policies\n", + "## ๐Ÿค– Part 7: Different Policies\n", + "\n", + "
\n", + "

A policy maps: Observation โ†’ Action

\n", + "
\n", "\n", - "A policy maps observations โ†’ actions. Let's test 4 strategies:" + "Let's test 4 strategies from dumb to smart!\n", + "\n", + "```mermaid\n", + "graph LR\n", + " A[๐Ÿ‘€ Observation] --> B{๐Ÿค– Policy}\n", + " B -->|Random| C[๐ŸŽฒ Action]\n", + " B -->|Always Stay| D[โธ๏ธ Action]\n", + " B -->|Smart| E[๐ŸŽฏ Action]\n", + " B -->|Learning| F[๐Ÿง  Action]\n", + " style B fill:#ffe1f5\n", + "```" ] }, { @@ -528,26 +620,26 @@ "outputs": [], "source": [ "class RandomPolicy:\n", - " name = \"Random\"\n", + " name = \"๐ŸŽฒ Random\"\n", " def select_action(self, obs): \n", " return random.choice(obs.legal_actions)\n", "\n", "class AlwaysStayPolicy:\n", - " name = \"Always Stay\"\n", + " name = \"โธ๏ธ Always Stay\"\n", " def select_action(self, obs): \n", " return 1\n", "\n", "class SmartPolicy:\n", - " name = \"Smart Heuristic\"\n", + " name = \"๐ŸŽฏ Smart Heuristic\"\n", " def select_action(self, obs):\n", " ball_col = obs.ball_position[1]\n", " paddle_col = obs.paddle_position\n", - " if paddle_col < ball_col: return 2 # RIGHT\n", - " elif paddle_col > ball_col: return 0 # LEFT\n", - " else: return 1 # STAY\n", + " if paddle_col < ball_col: return 2\n", + " elif paddle_col > ball_col: return 0\n", + " else: return 1\n", "\n", "class LearningPolicy:\n", - " name = \"Learning Agent\"\n", + " name = \"๐Ÿง  Learning Agent\"\n", " def __init__(self):\n", " self.steps = 0\n", " \n", @@ -555,27 +647,27 @@ " self.steps += 1\n", " epsilon = max(0.1, 1.0 - (self.steps / 100))\n", " \n", - " if random.random() < epsilon: # Explore\n", + " if random.random() < epsilon:\n", " return random.choice(obs.legal_actions)\n", - " else: # Exploit\n", + " else:\n", " ball_col = obs.ball_position[1]\n", " paddle_col = obs.paddle_position\n", " if paddle_col < ball_col: return 2\n", " elif paddle_col > ball_col: return 0\n", " else: return 1\n", "\n", - "print(\"โœ… 4 Policies created:\")\n", - "print(\" 1. Random - Baseline\")\n", - "print(\" 2. Always Stay - Bad strategy\")\n", - "print(\" 3. Smart - Optimal heuristic\")\n", - "print(\" 4. Learning - Simulated RL\")" + "print(\"โœ… 4 Policies created!\")\n", + "print(\" ๐ŸŽฒ Random - Baseline\")\n", + "print(\" โธ๏ธ Always Stay - Bad strategy\")\n", + "print(\" ๐ŸŽฏ Smart - Optimal heuristic\")\n", + "print(\" ๐Ÿง  Learning - Simulated RL\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Watch Them Play" + "### ๐Ÿ‘€ Watch Them Play" ] }, { @@ -591,7 +683,7 @@ " \n", " if visualize:\n", " print(f\"\\n{'='*50}\")\n", - " print(f\"Policy: {policy.name} | Ball: col {obs.ball_position[1]}\")\n", + " print(f\"๐Ÿค– Policy: {policy.name} | ๐Ÿ”ด Ball: col {obs.ball_position[1]}\")\n", " print('='*50 + '\\n')\n", " env.render()\n", " time.sleep(delay)\n", @@ -605,7 +697,8 @@ " total_reward += obs.reward\n", " \n", " if visualize:\n", - " print(f\"\\nStep {step + 1}: {['LEFT','STAY','RIGHT'][action]}\")\n", + " actions = [\"โฌ…๏ธ LEFT\", \"โธ๏ธ STAY\", \"โžก๏ธ RIGHT\"]\n", + " print(f\"\\nโšก Step {step + 1}: {actions[action]}\")\n", " env.render()\n", " time.sleep(delay)\n", " \n", @@ -625,7 +718,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Compare All Policies" + "### ๐Ÿ“Š Compare All Policies" ] }, { @@ -648,13 +741,16 @@ " for _ in range(num_episodes))\n", " rate = (successes / num_episodes) * 100\n", " results.append((policy.name, rate))\n", - " print(f\"{policy.name:20s}: {rate:5.1f}%\")\n", + " print(f\"{policy.name:25s}: {rate:5.1f}%\")\n", " \n", " print(\"\\n\" + \"=\"*70)\n", + " print(\"๐Ÿ“Š VISUAL COMPARISON\")\n", + " print(\"=\"*70 + \"\\n\")\n", + " \n", " results.sort(key=lambda x: x[1], reverse=True)\n", " for name, rate in results:\n", " bar = \"โ–ˆ\" * int(rate / 2)\n", - " print(f\"{name:20s} [{bar:<50}] {rate:.1f}%\")\n", + " print(f\"{name:25s} [{bar:<50}] {rate:.1f}%\")\n", " \n", " print(\"\\n\" + \"=\"*70)\n", " print(\"๐Ÿ’ก RL in action: Random โ†’ Learning โ†’ Optimal\")\n", @@ -669,21 +765,23 @@ "source": [ "---\n", "\n", - "## Part 8: Using Real OpenSpiel Integration\n", + "## ๐ŸŒ Part 8: Using Real OpenSpiel Integration\n", "\n", - "What we just built **is how OpenEnv works**!\n", + "
\n", + "

โœจ What We Just Built = How OpenEnv Works!

\n", + "
\n", "\n", - "### Demo vs Production:\n", + "### ๐Ÿ”„ Demo vs Production:\n", "\n", - "| Component | Our Demo | OpenEnv + OpenSpiel |\n", - "|-----------|----------|---------------------|\n", - "| Environment | Local class | Docker container |\n", - "| Communication | Direct | HTTP |\n", - "| Client | Direct | HTTPEnvClient |\n", + "| Component | ๐Ÿงช Our Demo | ๐Ÿš€ OpenEnv + OpenSpiel |\n", + "|-----------|-------------|------------------------|\n", + "| Environment | Local class | ๐Ÿณ Docker container |\n", + "| Communication | Direct calls | ๐ŸŒ HTTP |\n", + "| Client | Direct access | ๐Ÿ“ฑ HTTPEnvClient |\n", "| Type Safety | โœ… | โœ… |\n", "| API | reset/step | reset/step |\n", "\n", - "### Using OpenSpiel Integration:\n", + "### ๐ŸŽฎ Using OpenSpiel Integration:\n", "\n", "```python\n", "# Install OpenSpiel\n", @@ -701,26 +799,50 @@ "state = env.state()\n", "```\n", "\n", - "### Available Games:\n", - "1. Catch (what we demoed!)\n", - "2. Tic-Tac-Toe\n", - "3. Kuhn Poker\n", - "4. Cliff Walking\n", - "5. 2048\n", - "6. Blackjack" + "### ๐ŸŽฏ Available Games:\n", + "\n", + "
\n", + "
\n", + "

๐ŸŽฏ Catch

\n", + " What we demoed!\n", + "
\n", + "
\n", + "

โŒ Tic-Tac-Toe

\n", + " 2-player\n", + "
\n", + "
\n", + "

๐Ÿƒ Kuhn Poker

\n", + " Imperfect info\n", + "
\n", + "
\n", + "

๐Ÿ”๏ธ Cliff Walking

\n", + " Navigation\n", + "
\n", + "
\n", + "

๐Ÿ”ข 2048

\n", + " Puzzle\n", + "
\n", + "
\n", + "

๐Ÿ‚ก Blackjack

\n", + " Cards\n", + "
\n", + "
\n", + "\n", + "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "---\n", - "\n", - "## Part 9: Adding Your Own Integration\n", + "## โž• Part 9: Adding Your Own Integration\n", "\n", - "Want to wrap your own environment? Follow the pattern:\n", + "
\n", + "

๐Ÿ› ๏ธ Want to wrap your own environment?

\n", + "

Follow the 5-step pattern!

\n", + "
\n", "\n", - "### 1. Define Types (models.py)\n", + "### ๐Ÿ“ 1. Define Types (models.py)\n", "```python\n", "@dataclass\n", "class YourAction(Action):\n", @@ -731,7 +853,7 @@ " # Your observation fields\n", "```\n", "\n", - "### 2. Implement Environment (server/environment.py)\n", + "### ๐Ÿ–ฅ๏ธ 2. Implement Environment (server/environment.py)\n", "```python\n", "class YourEnvironment(Environment):\n", " def reset(self) -> Observation:\n", @@ -741,7 +863,7 @@ " return YourObservation(...)\n", "```\n", "\n", - "### 3. Create Client (client.py)\n", + "### ๐Ÿ“ฑ 3. Create Client (client.py)\n", "```python\n", "class YourEnv(HTTPEnvClient[YourAction, YourObservation]):\n", " def _step_payload(self, action):\n", @@ -751,7 +873,7 @@ " return StepResult(observation=YourObservation(...))\n", "```\n", "\n", - "### 4. Create Server (server/app.py)\n", + "### โšก 4. Create Server (server/app.py)\n", "```python\n", "from core.env_server import create_fastapi_app\n", "\n", @@ -759,7 +881,7 @@ "app = create_fastapi_app(env)\n", "```\n", "\n", - "### 5. Dockerize (server/Dockerfile)\n", + "### ๐Ÿณ 5. Dockerize (server/Dockerfile)\n", "```dockerfile\n", "FROM python:3.11\n", "COPY . /app\n", @@ -768,64 +890,101 @@ "CMD [\"uvicorn\", \"app:app\", \"--host\", \"0.0.0.0\"]\n", "```\n", "\n", - "### Examples to Study:\n", - "- `src/envs/echo_env/` - Simple test environment\n", - "- `src/envs/openspiel_env/` - Our OpenSpiel integration\n", - "- `src/envs/coding_env/` - Python code execution" + "### ๐Ÿ“š Examples to Study:\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
๐Ÿ“ข src/envs/echo_env/Simple test environment
๐ŸŽฎ src/envs/openspiel_env/Our OpenSpiel integration
๐Ÿ’ป src/envs/coding_env/Python code execution
\n", + "\n", + "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "---\n", - "\n", - "## Summary\n", - "\n", - "### What You Learned:\n", - "\n", - "1. **RL Basics** - The core loop\n", - "2. **OpenEnv Framework** - Standardized, production-ready RL environments\n", - "3. **Example Integration** - How OpenSpiel is wrapped\n", - "4. **Interactive Demo** - Policies in action\n", - "5. **Adding Integrations** - The pattern to follow\n", - "\n", - "### OpenEnv's Value:\n", - "\n", - "| Feature | Traditional | OpenEnv |\n", - "|---------|------------|----------|\n", - "| Type Safety | โŒ | โœ… |\n", - "| Isolation | โŒ | โœ… Docker |\n", - "| Deployment | โŒ | โœ… K8s-ready |\n", - "| Language | Python only | Any (HTTP) |\n", - "| Reproducibility | โŒ | โœ… |\n", - "\n", - "### Next Steps:\n", - "\n", - "1. Try OpenSpiel integration\n", - "2. Implement real RL (Q-learning, DQN, PPO)\n", - "3. Wrap your own environments\n", - "4. Deploy to production\n", - "5. Use with RL libraries (TorchRL, etc.)\n", - "\n", - "### Resources:\n", - "\n", - "- **OpenEnv**: https://github.com/meta-pytorch/OpenEnv\n", - "- **Docs**: `src/envs/README.md`\n", - "- **Examples**: `examples/` directory\n", + "## ๐ŸŽ“ Summary\n", + "\n", + "
\n", + "

๐ŸŽ‰ What You Learned

\n", + "
\n", + "\n", + "### ๐Ÿ“– The Journey:\n", + "\n", + "1. **๐Ÿง  RL Basics** - The core loop\n", + "2. **๐Ÿ—๏ธ OpenEnv Framework** - Standardized, production-ready\n", + "3. **๐Ÿ”Œ Example Integration** - How OpenSpiel is wrapped\n", + "4. **๐ŸŽฏ Interactive Demo** - Policies in action\n", + "5. **โž• Adding Integrations** - The pattern to follow\n", + "\n", + "### โœจ OpenEnv's Value:\n", + "\n", + "| Feature | ๐Ÿ  Traditional | ๐Ÿš€ OpenEnv |\n", + "|---------|---------------|------------|\n", + "| **Type Safety** | โŒ | โœ… Dataclasses |\n", + "| **Isolation** | โŒ | โœ… Docker |\n", + "| **Deployment** | โŒ | โœ… K8s-ready |\n", + "| **Language** | Python only | Any (HTTP) |\n", + "| **Reproducibility** | โŒ | โœ… Containers |\n", + "\n", + "### ๐Ÿš€ Next Steps:\n", + "\n", + "
\n", + "
\n", + "

1๏ธโƒฃ Try OpenSpiel

\n", + "

Install and play with the 6 games

\n", + "
\n", + "
\n", + "

2๏ธโƒฃ Implement Real RL

\n", + "

Q-learning, DQN, PPO

\n", + "
\n", + "
\n", + "

3๏ธโƒฃ Wrap Your Environments

\n", + "

Follow the 5-step pattern

\n", + "
\n", + "
\n", + "

4๏ธโƒฃ Deploy to Production

\n", + "

Docker โ†’ Kubernetes

\n", + "
\n", + "
\n", + "\n", + "### ๐Ÿ“š Resources:\n", + "\n", + "- ๐Ÿ  **OpenEnv**: https://github.com/meta-pytorch/OpenEnv\n", + "- ๐Ÿ“– **Docs**: `src/envs/README.md`\n", + "- ๐Ÿ’ก **Examples**: `examples/` directory\n", "\n", "---\n", "\n", - "## ๐ŸŽ‰ You're Ready!\n", - "\n", - "You now understand:\n", - "- โœ… OpenEnv framework\n", - "- โœ… How integrations work\n", - "- โœ… Using existing environments\n", - "- โœ… Creating new integrations\n", - "- โœ… Production deployment\n", - "\n", - "**Welcome to production-ready RL!** ๐Ÿš€" + "
\n", + "

๐ŸŽ‰ You're Ready!

\n", + "

You now understand:

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
โœ… OpenEnv frameworkโœ… How integrations work
โœ… Using existing environmentsโœ… Creating new integrations
โœ… Production deployment
\n", + "

Welcome to production-ready RL! ๐Ÿš€

\n", + "
" ] } ], From 72cf8cee6e5339e761d3fddc789436e1754e71b7 Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 13:36:49 -0700 Subject: [PATCH 03/19] fix mermaid --- examples/OpenEnv_Tutorial.ipynb | 142 +------------------------------- 1 file changed, 4 insertions(+), 138 deletions(-) diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index 5079472..6874228 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -38,33 +38,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "## ๐Ÿง  Part 1: RL Fundamentals - The Core Loop\n", - "\n", - "
\n", - "\n", - "```mermaid\n", - "graph LR\n", - " A[๐Ÿค– Agent] -->|observes| B[๐Ÿ‘€ State]\n", - " B -->|decides| C[โšก Action]\n", - " C -->|executes| D[๐ŸŒ Environment]\n", - " D -->|returns| E[๐ŸŽ Reward]\n", - " E -->|learns| A\n", - " style A fill:#e1f5ff\n", - " style D fill:#fff4e1\n", - " style E fill:#ffe1e1\n", - "```\n", - "\n", - "
\n", - "\n", - "Reinforcement Learning boils down to a simple loop:\n", - "\n", - "```\n", - "Agent observes โ†’ chooses action โ†’ gets reward โ†’ repeat\n", - "```\n", - "\n", - "Let's see it in action with a simple example:" - ] + "source": "## ๐Ÿง  Part 1: RL Fundamentals - The Core Loop\n\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n๐Ÿค– Agent
learns\n
โ†“\n๐Ÿ‘€ State
observes\n
โ†‘โ†“
\n๐ŸŽ Reward
returns\n
โ†\nโšก Action
decides\n
โ†‘โ†“
\n๐ŸŒ Environment
executes\n
\n
\n\nReinforcement Learning boils down to a simple loop:\n\n```\nAgent observes โ†’ chooses action โ†’ gets reward โ†’ repeat\n```\n\nLet's see it in action with a simple example:" }, { "cell_type": "code", @@ -125,95 +99,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "## ๐Ÿ—๏ธ Part 2: OpenEnv - The Framework\n", - "\n", - "
\n", - "

๐Ÿš€ Think \"Docker for RL Environments\"

\n", - "
\n", - "\n", - "### โœจ What is OpenEnv?\n", - "\n", - "OpenEnv is a **framework for creating, deploying, and using isolated RL environments**.\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
โœ…
Standardized API
reset, step, state
๐Ÿ”’
Type-safe
dataclasses
๐Ÿณ
Docker isolation
secure
๐ŸŒ
HTTP API
any language
โ˜ธ๏ธ
Production-ready
K8s deploy
\n", - "\n", - "### ๐ŸŽจ The Architecture\n", - "\n", - "```mermaid\n", - "graph TB\n", - " subgraph Client[\"๐Ÿ’ป Your Training Code\"]\n", - " A[\"๐Ÿ Python/Rust/Julia\"]\n", - " B[\"env = OpenSpielEnv()\"]\n", - " C[\"result = env.reset()\"]\n", - " D[\"result = env.step(action)\"]\n", - " end\n", - " \n", - " subgraph HTTP[\"๐ŸŒ HTTP/JSON\"]\n", - " E[\"POST /reset\"]\n", - " F[\"POST /step\"]\n", - " G[\"GET /state\"]\n", - " end\n", - " \n", - " subgraph Server[\"๐Ÿณ Docker Container\"]\n", - " H[\"โšก FastAPI Server\"]\n", - " I[\"๐ŸŽฎ Environment Logic\"]\n", - " J[\"๐ŸŽฏ Game/Simulation\"]\n", - " end\n", - " \n", - " Client --> HTTP\n", - " HTTP --> Server\n", - " \n", - " style Client fill:#e1f5ff\n", - " style HTTP fill:#fff4e1\n", - " style Server fill:#ffe1f5\n", - "```\n", - "\n", - "### ๐Ÿ“ The Pattern - Every Environment Has:\n", - "\n", - "```\n", - "src/envs/your_env/\n", - "โ”œโ”€โ”€ ๐Ÿ“ models.py โ† Type-safe contracts (Action, Observation, State)\n", - "โ”œโ”€โ”€ ๐Ÿ“ฑ client.py โ† Client API (what you import)\n", - "โ””โ”€โ”€ ๐Ÿ–ฅ๏ธ server/\n", - " โ”œโ”€โ”€ environment.py โ† Environment logic\n", - " โ”œโ”€โ”€ app.py โ† FastAPI server\n", - " โ””โ”€โ”€ Dockerfile โ† Container\n", - "```\n", - "\n", - "### ๐ŸŽฎ Current Integrations\n", - "\n", - "
\n", - "
\n", - "

๐ŸŽฏ OpenSpiel

\n", - "

6 games from DeepMind

\n", - "
\n", - "
\n", - "

๐Ÿ“ข Echo

\n", - "

Test environment

\n", - "
\n", - "
\n", - "

๐Ÿ’ป Coding

\n", - "

Python execution

\n", - "
\n", - "
\n", - "

๐Ÿ•น๏ธ Atari

\n", - "

Classic games

\n", - "
\n", - "
\n", - "\n", - "Let's explore one integration to see how it all works...\n", - "\n", - "---" - ] + "source": "## ๐Ÿ—๏ธ Part 2: OpenEnv - The Framework\n\n
\n

๐Ÿš€ Think \"Docker for RL Environments\"

\n
\n\n### โœจ What is OpenEnv?\n\nOpenEnv is a **framework for creating, deploying, and using isolated RL environments**.\n\n\n\n\n\n\n\n\n\n
โœ…
Standardized API
reset, step, state
๐Ÿ”’
Type-safe
dataclasses
๐Ÿณ
Docker isolation
secure
๐ŸŒ
HTTP API
any language
โ˜ธ๏ธ
Production-ready
K8s deploy
\n\n### ๐ŸŽจ The Architecture\n\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n
\n

๐Ÿ’ป Your Training Code (Client)

\nenv = OpenSpielEnv()
\nresult = env.reset()
\nresult = env.step(action)\n
\n
โ†“
\n
\n

๐ŸŒ HTTP/JSON Protocol

\nPOST /reset | POST /step | GET /state\n
\n
โ†“
\n
\n

๐Ÿณ Docker Container (Server)

\nโšก FastAPI Server โ†’ ๐ŸŽฎ Environment Logic โ†’ ๐ŸŽฏ Game/Simulation\n
\n
\n
\n\n### ๐Ÿ“ The Pattern - Every Environment Has:\n\n```\nsrc/envs/your_env/\nโ”œโ”€โ”€ ๐Ÿ“ models.py โ† Type-safe contracts (Action, Observation, State)\nโ”œโ”€โ”€ ๐Ÿ“ฑ client.py โ† Client API (what you import)\nโ””โ”€โ”€ ๐Ÿ–ฅ๏ธ server/\n โ”œโ”€โ”€ environment.py โ† Environment logic\n โ”œโ”€โ”€ app.py โ† FastAPI server\n โ””โ”€โ”€ Dockerfile โ† Container\n```\n\n### ๐ŸŽฎ Current Integrations\n\n
\n
\n

๐ŸŽฏ OpenSpiel

\n

6 games from DeepMind

\n
\n
\n

๐Ÿ“ข Echo

\n

Test environment

\n
\n
\n

๐Ÿ’ป Coding

\n

Python execution

\n
\n
\n

๐Ÿ•น๏ธ Atari

\n

Classic games

\n
\n
\n\nLet's explore one integration to see how it all works...\n\n---" }, { "cell_type": "markdown", @@ -591,27 +477,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "---\n", - "\n", - "## ๐Ÿค– Part 7: Different Policies\n", - "\n", - "
\n", - "

A policy maps: Observation โ†’ Action

\n", - "
\n", - "\n", - "Let's test 4 strategies from dumb to smart!\n", - "\n", - "```mermaid\n", - "graph LR\n", - " A[๐Ÿ‘€ Observation] --> B{๐Ÿค– Policy}\n", - " B -->|Random| C[๐ŸŽฒ Action]\n", - " B -->|Always Stay| D[โธ๏ธ Action]\n", - " B -->|Smart| E[๐ŸŽฏ Action]\n", - " B -->|Learning| F[๐Ÿง  Action]\n", - " style B fill:#ffe1f5\n", - "```" - ] + "source": "---\n\n## ๐Ÿค– Part 7: Different Policies\n\n
\n

A policy maps: Observation โ†’ Action

\n
\n\nLet's test 4 strategies from dumb to smart!\n\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n๐Ÿ‘€ Observation
Ball & paddle positions\n
โ†’\n๐Ÿค– Policy
Decision maker\n
โ†’\n๐ŸŽฒ Random\n
โ†’\nโธ๏ธ Always Stay\n
โ†’\n๐ŸŽฏ Smart\n
โ†’\n๐Ÿง  Learning\n
\n
" }, { "cell_type": "code", @@ -1009,4 +875,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file From 78e71d4cae5be273941dcbab024b05e6522f6cda Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 13:42:11 -0700 Subject: [PATCH 04/19] update viz --- examples/OpenEnv_Tutorial.ipynb | 1345 +++++++++++++++++++++++-------- 1 file changed, 1002 insertions(+), 343 deletions(-) diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index 6874228..fe93747 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -6,39 +6,100 @@ "source": [ "
\n", "\n", - "# ๐ŸŽฎ OpenEnv: Production-Ready RL Environments\n", + "# ๐Ÿš€ OpenEnv: Production RL Made Simple\n", "\n", - "\n", + "### *From \"Hello World\" to Production Deployment in 30 Minutes*\n", "\n", - "**Learn how OpenEnv standardizes RL environments for production use**\n", + "---\n", + "\n", + "**What if RL environments were as easy to use as REST APIs?**\n", + "\n", + "That's OpenEnv. Type-safe. Isolated. Production-ready.\n", "\n", "[![GitHub](https://img.shields.io/badge/GitHub-meta--pytorch%2FOpenEnv-blue?logo=github)](https://github.com/meta-pytorch/OpenEnv)\n", - "[![Python](https://img.shields.io/badge/Python-3.11+-blue?logo=python)](https://www.python.org/)\n", - "[![Docker](https://img.shields.io/badge/Docker-Ready-blue?logo=docker)](https://www.docker.com/)\n", + "[![License](https://img.shields.io/badge/License-BSD%203--Clause-green.svg)](https://opensource.org/licenses/BSD-3-Clause)\n", "\n", "
\n", "\n", - "---\n", - "\n", - "## ๐Ÿ“š What You'll Learn\n", + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## ๐Ÿ“‹ What You'll Learn\n", "\n", "\n", "\n", - "\n", - "\n", - "\n", - "\n", - "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", "\n", "
๐Ÿง 
RL Fundamentals
5 minutes
๐Ÿ—๏ธ
OpenEnv Framework
Architecture
๐Ÿ”Œ
Integrations
OpenSpiel example
๐ŸŽฏ
Interactive Demo
See it work
โž•
Add Your Own
Extend it
\n", + "\n", + "**๐ŸŽฏ Part 1-2: The Fundamentals**\n", + "- RL in 60 seconds\n", + "- Why existing solutions fall short\n", + "- The OpenEnv solution\n", + "\n", + "\n", + "\n", + "**๐Ÿ—๏ธ Part 3-5: The Architecture**\n", + "- How OpenEnv works\n", + "- Exploring real code\n", + "- OpenSpiel integration example\n", + "\n", + "
\n", + "\n", + "**๐ŸŽฎ Part 6-8: Hands-On Demo**\n", + "- Build a game environment\n", + "- Test 4 different policies\n", + "- Watch learning happen live\n", + "\n", + "\n", + "\n", + "**๐Ÿ”ง Part 9-10: Going Further**\n", + "- Use real OpenSpiel\n", + "- Create your own integration\n", + "- Deploy to production\n", + "\n", + "
\n", "\n", - "---" + "> ๐Ÿ’ก **Pro Tip**: This notebook is designed to run top-to-bottom in Google Colab with zero setup!" ] }, { "cell_type": "markdown", "metadata": {}, - "source": "## ๐Ÿง  Part 1: RL Fundamentals - The Core Loop\n\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n๐Ÿค– Agent
learns\n
โ†“\n๐Ÿ‘€ State
observes\n
โ†‘โ†“
\n๐ŸŽ Reward
returns\n
โ†\nโšก Action
decides\n
โ†‘โ†“
\n๐ŸŒ Environment
executes\n
\n
\n\nReinforcement Learning boils down to a simple loop:\n\n```\nAgent observes โ†’ chooses action โ†’ gets reward โ†’ repeat\n```\n\nLet's see it in action with a simple example:" + "source": [ + "---\n", + "\n", + "# Part 1: RL in 60 Seconds โฑ๏ธ\n", + "\n", + "
\n", + "\n", + "**Reinforcement Learning is simpler than you think.**\n", + "\n", + "It's just a loop:\n", + "\n", + "```\n", + "while not done:\n", + " observation = environment.observe()\n", + " action = policy.choose(observation)\n", + " reward = environment.step(action)\n", + " policy.learn(reward)\n", + "```\n", + "\n", + "That's it. That's RL.\n", + "\n", + "
\n", + "\n", + "Let's see it in action:" + ] }, { "cell_type": "code", @@ -48,67 +109,174 @@ "source": [ "import random\n", "\n", - "# Simple RL: Guess a number\n", + "print(\"๐ŸŽฒ Number Guessing Game - The Simplest RL Example\")\n", + "print(\"=\" * 60)\n", + "\n", + "# Environment\n", "target = random.randint(1, 10)\n", - "guesses = 3\n", + "guesses_left = 3\n", "\n", - "print(\"๐ŸŽฏ Guess a number (1-10)\\n\")\n", + "print(f\"\\n๐ŸŽฏ I'm thinking of a number between 1 and 10...\\n\")\n", "\n", - "while guesses > 0:\n", - " guess = random.randint(1, 10) # Policy: random\n", - " guesses -= 1\n", + "# The RL Loop\n", + "while guesses_left > 0:\n", + " # Policy: Random guessing (no learning yet!)\n", + " guess = random.randint(1, 10)\n", + " guesses_left -= 1\n", " \n", - " print(f\"Guess: {guess}\", end=\" โ†’ \")\n", + " print(f\"๐Ÿ’ญ Guess #{3-guesses_left}: {guess}\", end=\" โ†’ \")\n", " \n", + " # Reward signal\n", " if guess == target:\n", - " print(\"๐ŸŽ‰ Correct! Reward: +1\")\n", + " print(\"๐ŸŽ‰ Correct! +10 points\")\n", " break\n", " elif abs(guess - target) <= 2:\n", - " print(\"๐Ÿ”ฅ Warm\")\n", + " print(\"๐Ÿ”ฅ Warm! (close)\")\n", " else:\n", - " print(\"โ„๏ธ Cold\")\n", + " print(\"โ„๏ธ Cold! (far)\")\n", "else:\n", - " print(f\"\\nIt was {target}. Reward: 0\")\n", + " print(f\"\\n๐Ÿ’” Out of guesses. The number was {target}.\")\n", "\n", - "print(\"\\n๐Ÿ’ก That's RL: observe โ†’ act โ†’ reward โ†’ repeat\")" + "print(\"\\n\" + \"=\" * 60)\n", + "print(\"\\n๐Ÿ’ก This is RL: Observe โ†’ Act โ†’ Reward โ†’ Repeat\")\n", + "print(\" But this policy is terrible! It doesn't learn.\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "
\n", - "

โš ๏ธ The Problem

\n", - "

How do we make this production-ready?

\n", - "
    \n", - "
  • โŒ Need type safety
  • \n", - "
  • โŒ Need isolation
  • \n", - "
  • โŒ Need deployment
  • \n", - "
  • โŒ Need standardization
  • \n", - "
\n", - "
\n", + "
\n", "\n", - "
\n", - "

โœ… The Solution: OpenEnv

\n", - "

A production-ready framework that solves all these problems!

\n", - "
\n", + "**๐Ÿค” The Problem**: Our random guesser never improves because it doesn't use the rewards!\n", "\n", - "---" + "Real RL agents:\n", + "- ๐Ÿ“Š Track which actions lead to rewards\n", + "- ๐ŸŽฏ Choose better actions over time\n", + "- ๐Ÿ”„ Balance exploration (trying new things) vs exploitation (using what works)\n", + "\n", + "We'll build this later!\n", + "\n", + "
" ] }, { "cell_type": "markdown", "metadata": {}, - "source": "## ๐Ÿ—๏ธ Part 2: OpenEnv - The Framework\n\n
\n

๐Ÿš€ Think \"Docker for RL Environments\"

\n
\n\n### โœจ What is OpenEnv?\n\nOpenEnv is a **framework for creating, deploying, and using isolated RL environments**.\n\n\n\n\n\n\n\n\n\n
โœ…
Standardized API
reset, step, state
๐Ÿ”’
Type-safe
dataclasses
๐Ÿณ
Docker isolation
secure
๐ŸŒ
HTTP API
any language
โ˜ธ๏ธ
Production-ready
K8s deploy
\n\n### ๐ŸŽจ The Architecture\n\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n
\n

๐Ÿ’ป Your Training Code (Client)

\nenv = OpenSpielEnv()
\nresult = env.reset()
\nresult = env.step(action)\n
\n
โ†“
\n
\n

๐ŸŒ HTTP/JSON Protocol

\nPOST /reset | POST /step | GET /state\n
\n
โ†“
\n
\n

๐Ÿณ Docker Container (Server)

\nโšก FastAPI Server โ†’ ๐ŸŽฎ Environment Logic โ†’ ๐ŸŽฏ Game/Simulation\n
\n
\n
\n\n### ๐Ÿ“ The Pattern - Every Environment Has:\n\n```\nsrc/envs/your_env/\nโ”œโ”€โ”€ ๐Ÿ“ models.py โ† Type-safe contracts (Action, Observation, State)\nโ”œโ”€โ”€ ๐Ÿ“ฑ client.py โ† Client API (what you import)\nโ””โ”€โ”€ ๐Ÿ–ฅ๏ธ server/\n โ”œโ”€โ”€ environment.py โ† Environment logic\n โ”œโ”€โ”€ app.py โ† FastAPI server\n โ””โ”€โ”€ Dockerfile โ† Container\n```\n\n### ๐ŸŽฎ Current Integrations\n\n
\n
\n

๐ŸŽฏ OpenSpiel

\n

6 games from DeepMind

\n
\n
\n

๐Ÿ“ข Echo

\n

Test environment

\n
\n
\n

๐Ÿ’ป Coding

\n

Python execution

\n
\n
\n

๐Ÿ•น๏ธ Atari

\n

Classic games

\n
\n
\n\nLet's explore one integration to see how it all works...\n\n---" + "source": [ + "---\n", + "\n", + "# Part 2: The Problem with Traditional RL ๐Ÿ˜ค\n", + "\n", + "## Why Can't We Just Use OpenAI Gym?\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
ProblemTraditional ApproachOpenEnv Solution
Type SafetyโŒ obs[0][3] - what is this?โœ… obs.info_state - IDE knows!
IsolationโŒ Same process (can crash your training)โœ… Docker containers (fully isolated)
DeploymentโŒ \"Works on my machine\"โœ… Same container everywhere
ScalingโŒ Hard to distributeโœ… Deploy to Kubernetes
LanguageโŒ Python onlyโœ… Any language (HTTP API)
\n", + "\n", + "
\n", + "\n", + "## ๐Ÿ’ก The OpenEnv Philosophy\n", + "\n", + "**\"RL environments should be like microservices\"**\n", + "\n", + "- ๐Ÿ”’ **Isolated**: Run in containers\n", + "- ๐ŸŒ **Standard**: HTTP API, works everywhere\n", + "- ๐Ÿ“ฆ **Versioned**: Docker images\n", + "- ๐Ÿš€ **Scalable**: Deploy anywhere\n", + "- ๐Ÿ›ก๏ธ **Type-safe**: Know exactly what you're sending/receiving\n", + "\n", + "
" + ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## โš™๏ธ Part 3: Setup\n", + "### The Architecture\n", + "\n", + "```\n", + "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", + "โ”‚ YOUR TRAINING CODE โ”‚\n", + "โ”‚ โ”‚\n", + "โ”‚ env = OpenSpielEnv(...) โ† Import the client โ”‚\n", + "โ”‚ result = env.reset() โ† Type-safe! โ”‚\n", + "โ”‚ result = env.step(action) โ† Type-safe! โ”‚\n", + "โ”‚ โ”‚\n", + "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", + " โ”‚\n", + " โ”‚ HTTP/JSON (Language-Agnostic)\n", + " โ”‚ POST /reset, POST /step, GET /state\n", + " โ”‚\n", + "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", + "โ”‚ DOCKER CONTAINER โ”‚\n", + "โ”‚ โ”‚\n", + "โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚\n", + "โ”‚ โ”‚ FastAPI Server โ”‚ โ”‚\n", + "โ”‚ โ”‚ โ””โ”€ Environment (reset, step, state) โ”‚ โ”‚\n", + "โ”‚ โ”‚ โ””โ”€ Your Game/Simulation Logic โ”‚ โ”‚\n", + "โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚\n", + "โ”‚ โ”‚\n", + "โ”‚ Isolated โ€ข Reproducible โ€ข Secure โ”‚\n", + "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", + "```\n", + "\n", + "
\n", + "\n", + "**๐ŸŽฏ Key Insight**: You never see HTTP details - just clean Python methods!\n", + "\n", + "```python\n", + "env.reset() # Under the hood: HTTP POST to /reset\n", + "env.step(...) # Under the hood: HTTP POST to /step\n", + "env.state() # Under the hood: HTTP GET to /state\n", + "```\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "# Part 3: Setup ๐Ÿ› ๏ธ\n", + "\n", + "
\n", + "\n", + "**Running in Colab?** This cell will clone OpenEnv and install dependencies automatically.\n", + "\n", + "**Running locally?** Make sure you're in the OpenEnv directory.\n", "\n", - "
\n", - "

๐Ÿ”ง Getting Started

\n", "
" ] }, @@ -118,25 +286,33 @@ "metadata": {}, "outputs": [], "source": [ - "# Check if in Colab\n", + "# Detect environment\n", "try:\n", " import google.colab\n", " IN_COLAB = True\n", + " print(\"๐ŸŒ Running in Google Colab\")\n", "except ImportError:\n", " IN_COLAB = False\n", + " print(\"๐Ÿ’ป Running locally\")\n", "\n", "if IN_COLAB:\n", - " !git clone https://github.com/meta-pytorch/OpenEnv.git\n", + " print(\"\\n๐Ÿ“ฆ Cloning OpenEnv repository...\")\n", + " !git clone https://github.com/meta-pytorch/OpenEnv.git > /dev/null 2>&1\n", " %cd OpenEnv\n", + " \n", + " print(\"๐Ÿ“š Installing dependencies...\")\n", " !pip install -q fastapi uvicorn requests\n", + " \n", " import sys\n", " sys.path.insert(0, './src')\n", - " print(\"โœ… OpenEnv ready!\")\n", + " print(\"\\nโœ… Setup complete!\")\n", "else:\n", " import sys\n", " from pathlib import Path\n", - " sys.path.insert(0, str(Path.cwd() / 'src'))\n", - " print(\"โœ… Using local OpenEnv\")" + " sys.path.insert(0, str(Path.cwd().parent / 'src'))\n", + " print(\"โœ… Using local OpenEnv\")\n", + "\n", + "print(\"\\n๐Ÿš€ Ready to explore OpenEnv!\")" ] }, { @@ -145,13 +321,29 @@ "source": [ "---\n", "\n", - "## ๐Ÿ” Part 4: Exploring OpenEnv's Structure\n", + "# Part 4: The OpenEnv Pattern ๐Ÿ—๏ธ\n", + "\n", + "
\n", + "\n", + "## Every OpenEnv Environment Has 3 Components:\n", + "\n", + "```\n", + "src/envs/your_env/\n", + "โ”œโ”€โ”€ ๐Ÿ“ models.py โ† Type-safe contracts\n", + "โ”‚ (Action, Observation, State)\n", + "โ”‚\n", + "โ”œโ”€โ”€ ๐Ÿ“ฑ client.py โ† What YOU import\n", + "โ”‚ (HTTPEnvClient implementation)\n", + "โ”‚\n", + "โ””โ”€โ”€ ๐Ÿ–ฅ๏ธ server/\n", + " โ”œโ”€โ”€ environment.py โ† Game/simulation logic\n", + " โ”œโ”€โ”€ app.py โ† FastAPI server\n", + " โ””โ”€โ”€ Dockerfile โ† Container definition\n", + "```\n", "\n", - "
\n", - "

Let's look at the actual code!

\n", "
\n", "\n", - "### ๐Ÿงฉ The Base Classes" + "Let's explore the actual OpenEnv code to see how this works:" ] }, { @@ -160,49 +352,49 @@ "metadata": {}, "outputs": [], "source": [ + "# Import OpenEnv's core abstractions\n", "from core.env_server import Environment, Action, Observation, State\n", "from core.http_env_client import HTTPEnvClient\n", "\n", - "print(\"=\" * 70)\n", - "print(\"๐Ÿ”ง OpenEnv Core Abstractions\")\n", - "print(\"=\" * 70)\n", + "print(\"=\"*70)\n", + "print(\" ๐Ÿงฉ OPENENV CORE ABSTRACTIONS\")\n", + "print(\"=\"*70)\n", "\n", "print(\"\"\"\n", "๐Ÿ–ฅ๏ธ SERVER SIDE (runs in Docker):\n", "\n", - " class Environment(ABC):\n", - " '''Base class for all environment implementations'''\n", - " \n", - " @abstractmethod\n", - " def reset(self) -> Observation:\n", - " '''Start new episode'''\n", - " \n", - " @abstractmethod\n", - " def step(self, action: Action) -> Observation:\n", - " '''Execute action'''\n", - " \n", - " @property\n", - " def state(self) -> State:\n", - " '''Episode metadata'''\n", + " class Environment(ABC):\n", + " '''Base class for all environment implementations'''\n", + " \n", + " @abstractmethod\n", + " def reset(self) -> Observation:\n", + " '''Start new episode'''\n", + " \n", + " @abstractmethod\n", + " def step(self, action: Action) -> Observation:\n", + " '''Execute action, return observation'''\n", + " \n", + " @property\n", + " def state(self) -> State:\n", + " '''Get episode metadata'''\n", "\n", "๐Ÿ“ฑ CLIENT SIDE (your training code):\n", "\n", - " class HTTPEnvClient(ABC):\n", - " '''Base class for HTTP clients'''\n", - " \n", - " def reset(self) -> StepResult:\n", - " # HTTP POST to /reset\n", - " \n", - " def step(self, action) -> StepResult:\n", - " # HTTP POST to /step\n", - " \n", - " def state(self) -> State:\n", - " # HTTP GET to /state\n", + " class HTTPEnvClient(ABC):\n", + " '''Base class for HTTP clients'''\n", + " \n", + " def reset(self) -> StepResult:\n", + " # HTTP POST /reset\n", + " \n", + " def step(self, action) -> StepResult:\n", + " # HTTP POST /step\n", + " \n", + " def state(self) -> State:\n", + " # HTTP GET /state\n", "\"\"\")\n", "\n", - "print(\"=\" * 70)\n", - "print(\"๐Ÿ’ก Same interface, communication via HTTP!\")\n", - "print(\"=\" * 70)" + "print(\"=\"*70)\n", + "print(\"\\n๐Ÿ’ก Same interface on both sides - communication via HTTP!\\n\")" ] }, { @@ -211,35 +403,42 @@ "source": [ "---\n", "\n", - "## ๐Ÿ”Œ Part 5: Example Integration - OpenSpiel\n", + "# Part 5: Example Integration - OpenSpiel ๐ŸŽฎ\n", "\n", - "
\n", - " \n", - "

70+ Game Environments

\n", - "
\n", + "
\n", "\n", - "### ๐ŸŽฎ What is OpenSpiel?\n", + "## What is OpenSpiel?\n", "\n", - "OpenSpiel is a **library from DeepMind** with 70+ game environments for RL research.\n", + "**OpenSpiel** is a library from DeepMind with **70+ game environments** for RL research.\n", "\n", - "### ๐ŸŽฏ Our Integration\n", + "## OpenEnv's Integration\n", "\n", - "**OpenEnv wraps 6 OpenSpiel games** following our standard pattern:\n", + "We've wrapped **6 OpenSpiel games** following the OpenEnv pattern:\n", "\n", "\n", "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", + "\n", + "\n", "\n", "
๐ŸŽฏ
Catch
Catch falling ball
โŒ
Tic-Tac-Toe
Classic 3ร—3
๐Ÿƒ
Kuhn Poker
Imperfect info
๐Ÿ”๏ธ
Cliff Walking
Grid navigation
๐Ÿ”ข
2048
Tile puzzle
๐Ÿ‚ก
Blackjack
Card game
\n", + "\n", + "**๐ŸŽฏ Single-Player**\n", + "1. **Catch** - Catch falling ball\n", + "2. **Cliff Walking** - Navigate grid\n", + "3. **2048** - Tile puzzle\n", + "4. **Blackjack** - Card game\n", + "\n", + "\n", + "\n", + "**๐Ÿ‘ฅ Multi-Player**\n", + "5. **Tic-Tac-Toe** - Classic 3ร—3\n", + "6. **Kuhn Poker** - Imperfect info poker\n", + "\n", + "
\n", "\n", - "Let's see how the integration is structured:" + "This shows how OpenEnv can wrap **any** existing RL library!\n", + "\n", + "
" ] }, { @@ -248,6 +447,7 @@ "metadata": {}, "outputs": [], "source": [ + "# Import OpenSpiel integration models\n", "from envs.openspiel_env.models import (\n", " OpenSpielAction,\n", " OpenSpielObservation,\n", @@ -255,35 +455,50 @@ ")\n", "from dataclasses import fields\n", "\n", - "print(\"=\" * 70)\n", - "print(\"๐Ÿ”’ OpenSpiel Integration - Type-Safe Models\")\n", - "print(\"=\" * 70)\n", + "print(\"=\"*70)\n", + "print(\" ๐ŸŽฎ OPENSPIEL INTEGRATION - TYPE-SAFE MODELS\")\n", + "print(\"=\"*70)\n", "\n", "print(\"\\n๐Ÿ“ค OpenSpielAction (what you send):\")\n", + "print(\" \" + \"โ”€\" * 64)\n", "for field in fields(OpenSpielAction):\n", - " print(f\" โ€ข {field.name}: {field.type}\")\n", + " print(f\" โ€ข {field.name:20s} : {field.type}\")\n", "\n", "print(\"\\n๐Ÿ“ฅ OpenSpielObservation (what you receive):\")\n", + "print(\" \" + \"โ”€\" * 64)\n", "for field in fields(OpenSpielObservation):\n", - " print(f\" โ€ข {field.name}: {field.type}\")\n", + " print(f\" โ€ข {field.name:20s} : {field.type}\")\n", "\n", "print(\"\\n๐Ÿ“Š OpenSpielState (episode metadata):\")\n", + "print(\" \" + \"โ”€\" * 64)\n", "for field in fields(OpenSpielState):\n", - " print(f\" โ€ข {field.name}: {field.type}\")\n", - "\n", - "print(\"\\n\" + \"=\" * 70)\n", - "print(\"๐Ÿ’ก This is how OpenEnv integrates external libraries:\")\n", - "print(\" 1. Wrap in standardized types\")\n", - "print(\" 2. Expose via HTTPEnvClient\")\n", - "print(\" 3. Package in Docker\")\n", - "print(\"=\" * 70)" + " print(f\" โ€ข {field.name:20s} : {field.type}\")\n", + "\n", + "print(\"\\n\" + \"=\"*70)\n", + "print(\"\\n๐Ÿ’ก Type safety means:\")\n", + "print(\" โœ… Your IDE autocompletes these fields\")\n", + "print(\" โœ… Typos are caught before running\")\n", + "print(\" โœ… Refactoring is safe\")\n", + "print(\" โœ… Self-documenting code\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### ๐Ÿ”ง How the Client Works" + "### How the Client Works\n", + "\n", + "
\n", + "\n", + "The client **inherits from HTTPEnvClient** and implements 3 methods:\n", + "\n", + "1. `_step_payload()` - Convert action โ†’ JSON\n", + "2. `_parse_result()` - Parse JSON โ†’ typed observation \n", + "3. `_parse_state()` - Parse JSON โ†’ state\n", + "\n", + "That's it! The base class handles all HTTP communication.\n", + "\n", + "
" ] }, { @@ -294,41 +509,47 @@ "source": [ "from envs.openspiel_env.client import OpenSpielEnv\n", "\n", - "print(\"=\" * 70)\n", - "print(\"๐Ÿ“ฑ OpenSpielEnv Client (HTTPEnvClient Implementation)\")\n", - "print(\"=\" * 70)\n", + "print(\"=\"*70)\n", + "print(\" ๐Ÿ”Œ HOW OPENENV WRAPS OPENSPIEL\")\n", + "print(\"=\"*70)\n", "\n", "print(\"\"\"\n", - "How OpenEnv wraps OpenSpiel:\n", - "\n", "class OpenSpielEnv(HTTPEnvClient[OpenSpielAction, OpenSpielObservation]):\n", " \n", " def _step_payload(self, action: OpenSpielAction) -> dict:\n", - " '''Convert action to JSON for HTTP request'''\n", + " '''Convert typed action to JSON for HTTP'''\n", " return {\n", " \"action_id\": action.action_id,\n", " \"game_name\": action.game_name,\n", " }\n", " \n", " def _parse_result(self, payload: dict) -> StepResult:\n", - " '''Parse HTTP response into typed observation'''\n", + " '''Parse HTTP JSON response into typed observation'''\n", " return StepResult(\n", " observation=OpenSpielObservation(...),\n", " reward=payload['reward'],\n", " done=payload['done']\n", " )\n", "\n", - "Usage (same for ALL OpenEnv environments):\n", + "\"\"\")\n", "\n", + "print(\"โ”€\" * 70)\n", + "print(\"\\nโœจ Usage (works for ALL OpenEnv environments):\")\n", + "print(\"\"\"\n", " env = OpenSpielEnv(base_url=\"http://localhost:8000\")\n", - " result = env.reset() # Returns StepResult[OpenSpielObservation]\n", + " \n", + " result = env.reset()\n", + " # Returns StepResult[OpenSpielObservation] - Type safe!\n", + " \n", " result = env.step(OpenSpielAction(action_id=2, game_name=\"catch\"))\n", - " state = env.state() # Returns OpenSpielState\n", + " # Type checker knows this is valid!\n", + " \n", + " state = env.state()\n", + " # Returns OpenSpielState\n", "\"\"\")\n", "\n", - "print(\"=\" * 70)\n", - "print(\"๐Ÿ’ก This pattern works for ANY environment you want to wrap!\")\n", - "print(\"=\" * 70)" + "print(\"โ”€\" * 70)\n", + "print(\"\\n๐ŸŽฏ This pattern works for ANY environment you want to wrap!\\n\")" ] }, { @@ -337,25 +558,68 @@ "source": [ "---\n", "\n", - "## ๐ŸŽฏ Part 6: Interactive Demo - See It In Action\n", + "
\n", "\n", - "
\n", - "

๐ŸŽฎ Let's Build the Catch Game!

\n", - " \n", - "
\n", + "# ๐ŸŽฎ Part 6: Interactive Demo\n", + "\n", + "### Now let's BUILD something!\n", "\n", - "### ๐ŸŽฒ The Game Rules:\n", + "We'll create a Catch game following OpenEnv patterns,
\n", + "then watch 4 different AI policies compete.\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## The Game: Catch ๐Ÿ”ด๐Ÿ“\n", "\n", "\n", "\n", - "\n", - "\n", - "\n", - "\n", + "\n", + "\n", "\n", "
๐Ÿ“
5ร—5 Grid
๐Ÿ”ด
Ball falls
๐Ÿ“
Catch it!
๐ŸŽ
+1 reward
\n", + "\n", + "```\n", + "โฌœ โฌœ ๐Ÿ”ด โฌœ โฌœ \n", + "โฌœ โฌœ โฌœ โฌœ โฌœ Ball\n", + "โฌœ โฌœ โฌœ โฌœ โฌœ falls\n", + "โฌœ โฌœ โฌœ โฌœ โฌœ down\n", + "โฌœ โฌœ ๐Ÿ“ โฌœ โฌœ \n", + " Paddle\n", + "```\n", + "\n", + "\n", + "\n", + "**Rules:**\n", + "- 5ร—5 grid\n", + "- Ball falls from random column\n", + "- Move paddle to catch it\n", + "\n", + "**Actions:**\n", + "- `0` = Move LEFT โฌ…๏ธ\n", + "- `1` = STAY ๐Ÿ›‘\n", + "- `2` = Move RIGHT โžก๏ธ\n", + "\n", + "**Reward:**\n", + "- `+1` if caught ๐ŸŽ‰\n", + "- `0` if missed ๐Ÿ˜ข\n", + "\n", + "
\n", "\n", - "**Actions**: 0=LEFT โฌ…๏ธ | 1=STAY โธ๏ธ | 2=RIGHT โžก๏ธ" + "
\n", + "\n", + "**๐ŸŽฏ Why This Game?**\n", + "- Simple rules (easy to understand)\n", + "- Visual (see what's happening)\n", + "- Fast episodes (~5 steps)\n", + "- Clear success/failure\n", + "- Perfect for testing policies!\n", + "\n", + "
" ] }, { @@ -368,24 +632,38 @@ "from dataclasses import dataclass\n", "from typing import List, Tuple\n", "\n", - "# Define types (following OpenEnv pattern)\n", + "# ============================================================================\n", + "# MODELS - Type-safe contracts (following OpenEnv pattern)\n", + "# ============================================================================\n", + "\n", "@dataclass\n", "class CatchObservation:\n", - " \"\"\"Type-safe observation.\"\"\"\n", - " info_state: List[float]\n", - " legal_actions: List[int]\n", - " done: bool\n", - " reward: float\n", + " \"\"\"Type-safe observation following OpenEnv Observation base class.\"\"\"\n", + " info_state: List[float] # Grid as flat array\n", + " legal_actions: List[int] # [0, 1, 2] always\n", + " done: bool # Episode finished?\n", + " reward: float # +1 or 0\n", + " # Extra fields for visualization\n", " ball_position: Tuple[int, int]\n", " paddle_position: int\n", "\n", "\n", + "# ============================================================================\n", + "# ENVIRONMENT - Server-side logic (following OpenEnv Environment pattern)\n", + "# ============================================================================\n", + "\n", "class CatchEnvironment:\n", " \"\"\"\n", - " Catch game following OpenEnv Environment pattern.\n", + " Catch game following OpenEnv's Environment pattern.\n", + " \n", + " In production:\n", + " โ€ข Runs in Docker container\n", + " โ€ข Accessed via HTTPEnvClient\n", + " โ€ข Exposed via FastAPI server\n", " \n", - " In production: This would run in Docker, accessed via HTTPEnvClient\n", - " For demo: We run it locally to see the internals\n", + " For this demo:\n", + " โ€ข We run it locally to see internals\n", + " โ€ข But the structure is identical!\n", " \"\"\"\n", " \n", " def __init__(self, grid_size=5):\n", @@ -400,14 +678,21 @@ " return self._make_observation()\n", " \n", " def step(self, action: int) -> CatchObservation:\n", - " \"\"\"Execute action (implements Environment.step()).\"\"\"\n", + " \"\"\"Execute action (implements Environment.step()).\n", + " \n", + " Args:\n", + " action: 0=LEFT, 1=STAY, 2=RIGHT\n", + " \"\"\"\n", + " # Move paddle\n", " if action == 0 and self.paddle_col > 0:\n", " self.paddle_col -= 1\n", " elif action == 2 and self.paddle_col < self.grid_size - 1:\n", " self.paddle_col += 1\n", " \n", + " # Move ball down\n", " self.ball_row += 1\n", " \n", + " # Check if episode done\n", " if self.ball_row >= self.grid_size - 1:\n", " self.done = True\n", " reward = 1.0 if self.ball_col == self.paddle_col else 0.0\n", @@ -417,11 +702,13 @@ " return self._make_observation(reward)\n", " \n", " def _make_observation(self, reward=0.0) -> CatchObservation:\n", + " \"\"\"Create type-safe observation.\"\"\"\n", + " # Flatten grid to vector (like real RL environments do)\n", " info_state = [0.0] * (self.grid_size * self.grid_size)\n", " ball_idx = self.ball_row * self.grid_size + self.ball_col\n", " paddle_idx = (self.grid_size - 1) * self.grid_size + self.paddle_col\n", - " info_state[ball_idx] = 1.0\n", - " info_state[paddle_idx] = 0.5\n", + " info_state[ball_idx] = 1.0 # Ball = 1.0\n", + " info_state[paddle_idx] = 0.5 # Paddle = 0.5\n", " \n", " return CatchObservation(\n", " info_state=info_state,\n", @@ -433,6 +720,7 @@ " )\n", " \n", " def render(self):\n", + " \"\"\"Visualize current state.\"\"\"\n", " for row in range(self.grid_size):\n", " line = \" \"\n", " for col in range(self.grid_size):\n", @@ -444,17 +732,21 @@ " line += \"โฌœ \"\n", " print(line)\n", "\n", + "\n", "print(\"โœ… Environment created following OpenEnv pattern!\")\n", - "print(\" ๐Ÿ”ง Implements: reset(), step()\")\n", - "print(\" ๐Ÿ”’ Returns: Type-safe observations\")\n", - "print(\" ๐Ÿณ In production: Would run in Docker + FastAPI\")" + "print(\"\\n๐Ÿ“‹ What we just built:\")\n", + "print(\" โ€ข reset() โ†’ CatchObservation (type-safe!)\")\n", + "print(\" โ€ข step(action) โ†’ CatchObservation (type-safe!)\")\n", + "print(\" โ€ข render() โ†’ Visual display\")\n", + "print(\"\\n๐Ÿš€ In production: This would run in Docker + FastAPI\")\n", + "print(\" But the structure is EXACTLY the same!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### ๐Ÿงช Test It" + "### Test the Environment" ] }, { @@ -463,21 +755,65 @@ "metadata": {}, "outputs": [], "source": [ + "# Create environment\n", "env = CatchEnvironment()\n", "obs = env.reset()\n", "\n", - "print(\"๐ŸŽฎ Initial State:\")\n", - "print(\"=\" * 50)\n", + "print(\"=\"*60)\n", + "print(\" ๐ŸŽฎ INITIAL STATE\")\n", + "print(\"=\"*60 + \"\\n\")\n", "env.render()\n", - "print(f\"\\n๐Ÿ”ด Ball: column {obs.ball_position[1]}\")\n", - "print(f\"๐Ÿ“ Paddle: column {obs.paddle_position}\")\n", - "print(f\"โšก Legal actions: {obs.legal_actions} (0=LEFT, 1=STAY, 2=RIGHT)\")" + "print(f\"\\n๐Ÿ”ด Ball at: column {obs.ball_position[1]}\")\n", + "print(f\"๐Ÿ“ Paddle at: column {obs.paddle_position}\")\n", + "print(f\"\\n๐Ÿ“Š Observation:\")\n", + "print(f\" โ€ข Legal actions: {obs.legal_actions}\")\n", + "print(f\" โ€ข Info state size: {len(obs.info_state)} (5ร—5 grid flattened)\")\n", + "print(f\" โ€ข Done: {obs.done}\")\n", + "print(f\" โ€ข Reward: {obs.reward}\")" ] }, { "cell_type": "markdown", "metadata": {}, - "source": "---\n\n## ๐Ÿค– Part 7: Different Policies\n\n
\n

A policy maps: Observation โ†’ Action

\n
\n\nLet's test 4 strategies from dumb to smart!\n\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n๐Ÿ‘€ Observation
Ball & paddle positions\n
โ†’\n๐Ÿค– Policy
Decision maker\n
โ†’\n๐ŸŽฒ Random\n
โ†’\nโธ๏ธ Always Stay\n
โ†’\n๐ŸŽฏ Smart\n
โ†’\n๐Ÿง  Learning\n
\n
" + "source": [ + "---\n", + "\n", + "# Part 7: Four Policies ๐Ÿค–\n", + "\n", + "
\n", + "\n", + "## Let's test 4 different AI strategies:\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
PolicyStrategyExpected Performance
๐ŸŽฒ RandomPick random action every step~20% (pure luck)
๐Ÿ›‘ Always StayNever move, hope ball lands in center~20% (terrible!)
๐Ÿง  SmartMove paddle toward ball100% (optimal!)
๐Ÿ“ˆ LearningStart random, learn smart strategy~85% (improves over time)
\n", + "\n", + "
" + ] }, { "cell_type": "code", @@ -485,55 +821,81 @@ "metadata": {}, "outputs": [], "source": [ + "# ============================================================================\n", + "# POLICIES - Different AI strategies\n", + "# ============================================================================\n", + "\n", "class RandomPolicy:\n", - " name = \"๐ŸŽฒ Random\"\n", - " def select_action(self, obs): \n", + " \"\"\"Baseline: Pure random guessing.\"\"\"\n", + " name = \"๐ŸŽฒ Random Guesser\"\n", + " \n", + " def select_action(self, obs: CatchObservation) -> int:\n", " return random.choice(obs.legal_actions)\n", "\n", + "\n", "class AlwaysStayPolicy:\n", - " name = \"โธ๏ธ Always Stay\"\n", - " def select_action(self, obs): \n", - " return 1\n", + " \"\"\"Bad strategy: Never moves.\"\"\"\n", + " name = \"๐Ÿ›‘ Always Stay\"\n", + " \n", + " def select_action(self, obs: CatchObservation) -> int:\n", + " return 1 # STAY\n", + "\n", "\n", "class SmartPolicy:\n", - " name = \"๐ŸŽฏ Smart Heuristic\"\n", - " def select_action(self, obs):\n", + " \"\"\"Optimal: Move paddle toward ball.\"\"\"\n", + " name = \"๐Ÿง  Smart Heuristic\"\n", + " \n", + " def select_action(self, obs: CatchObservation) -> int:\n", " ball_col = obs.ball_position[1]\n", " paddle_col = obs.paddle_position\n", - " if paddle_col < ball_col: return 2\n", - " elif paddle_col > ball_col: return 0\n", - " else: return 1\n", + " \n", + " if paddle_col < ball_col:\n", + " return 2 # Move RIGHT\n", + " elif paddle_col > ball_col:\n", + " return 0 # Move LEFT\n", + " else:\n", + " return 1 # STAY (already aligned)\n", + "\n", "\n", "class LearningPolicy:\n", - " name = \"๐Ÿง  Learning Agent\"\n", + " \"\"\"Simulated RL: Epsilon-greedy exploration.\"\"\"\n", + " name = \"๐Ÿ“ˆ Learning Agent\"\n", + " \n", " def __init__(self):\n", " self.steps = 0\n", " \n", - " def select_action(self, obs):\n", + " def select_action(self, obs: CatchObservation) -> int:\n", " self.steps += 1\n", + " \n", + " # Decay exploration rate over time\n", " epsilon = max(0.1, 1.0 - (self.steps / 100))\n", " \n", " if random.random() < epsilon:\n", + " # Explore: random action\n", " return random.choice(obs.legal_actions)\n", " else:\n", + " # Exploit: use smart strategy\n", " ball_col = obs.ball_position[1]\n", " paddle_col = obs.paddle_position\n", - " if paddle_col < ball_col: return 2\n", - " elif paddle_col > ball_col: return 0\n", - " else: return 1\n", - "\n", - "print(\"โœ… 4 Policies created!\")\n", - "print(\" ๐ŸŽฒ Random - Baseline\")\n", - "print(\" โธ๏ธ Always Stay - Bad strategy\")\n", - "print(\" ๐ŸŽฏ Smart - Optimal heuristic\")\n", - "print(\" ๐Ÿง  Learning - Simulated RL\")" + " if paddle_col < ball_col:\n", + " return 2\n", + " elif paddle_col > ball_col:\n", + " return 0\n", + " else:\n", + " return 1\n", + "\n", + "\n", + "print(\"โœ… 4 Policies created!\\n\")\n", + "policies = [RandomPolicy(), AlwaysStayPolicy(), SmartPolicy(), LearningPolicy()]\n", + "for i, policy in enumerate(policies, 1):\n", + " print(f\" {i}. {policy.name}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### ๐Ÿ‘€ Watch Them Play" + "### Watch a Policy Play!" ] }, { @@ -545,46 +907,85 @@ "import time\n", "\n", "def run_episode(env, policy, visualize=True, delay=0.4):\n", + " \"\"\"Run one episode with a policy.\"\"\"\n", + " \n", + " # RESET\n", " obs = env.reset()\n", " \n", " if visualize:\n", - " print(f\"\\n{'='*50}\")\n", - " print(f\"๐Ÿค– Policy: {policy.name} | ๐Ÿ”ด Ball: col {obs.ball_position[1]}\")\n", - " print('='*50 + '\\n')\n", + " print(f\"\\n{'='*60}\")\n", + " print(f\" ๐ŸŽฎ {policy.name}\")\n", + " print(f\" ๐Ÿ”ด Ball will fall at column: {obs.ball_position[1]}\")\n", + " print('='*60 + '\\n')\n", " env.render()\n", " time.sleep(delay)\n", " \n", " total_reward = 0\n", " step = 0\n", + " action_names = [\"โฌ…๏ธ LEFT\", \"๐Ÿ›‘ STAY\", \"โžก๏ธ RIGHT\"]\n", " \n", + " # THE RL LOOP\n", " while not obs.done:\n", + " # 1. Policy chooses action\n", " action = policy.select_action(obs)\n", + " \n", + " # 2. Environment executes\n", " obs = env.step(action)\n", + " \n", + " # 3. Collect reward\n", " total_reward += obs.reward\n", " \n", " if visualize:\n", - " actions = [\"โฌ…๏ธ LEFT\", \"โธ๏ธ STAY\", \"โžก๏ธ RIGHT\"]\n", - " print(f\"\\nโšก Step {step + 1}: {actions[action]}\")\n", + " print(f\"\\n๐Ÿ“ Step {step + 1}: {action_names[action]}\")\n", " env.render()\n", " time.sleep(delay)\n", " \n", " step += 1\n", " \n", " if visualize:\n", - " print(f\"\\n{'๐ŸŽ‰ CAUGHT!' if total_reward > 0 else '๐Ÿ˜ข MISSED'} Reward: {total_reward}\")\n", + " result = \"๐ŸŽ‰ CAUGHT!\" if total_reward > 0 else \"๐Ÿ˜ข MISSED\"\n", + " print(f\"\\n{'='*60}\")\n", + " print(f\" {result} Reward: {total_reward}\")\n", + " print('='*60)\n", " \n", " return total_reward > 0\n", "\n", - "# Demo\n", + "\n", + "# Demo: Watch Smart Policy in action\n", "env = CatchEnvironment()\n", - "run_episode(env, SmartPolicy(), visualize=True, delay=0.3)" + "policy = SmartPolicy()\n", + "run_episode(env, policy, visualize=True, delay=0.4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "\n", + "**๐Ÿ’ก Try changing the policy!**\n", + "\n", + "Replace `SmartPolicy()` with:\n", + "- `RandomPolicy()` - Watch it fail!\n", + "- `AlwaysStayPolicy()` - Usually fails\n", + "- `LearningPolicy()` - Gets better over time\n", + "\n", + "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### ๐Ÿ“Š Compare All Policies" + "---\n", + "\n", + "# Part 8: Policy Competition! ๐Ÿ†\n", + "\n", + "
\n", + "\n", + "Let's run **50 episodes** for each policy and see who wins!\n", + "\n", + "
" ] }, { @@ -594,35 +995,69 @@ "outputs": [], "source": [ "def evaluate_policies(num_episodes=50):\n", - " policies = [RandomPolicy(), AlwaysStayPolicy(), SmartPolicy(), LearningPolicy()]\n", + " \"\"\"Compare all policies over many episodes.\"\"\"\n", + " policies = [\n", + " RandomPolicy(),\n", + " AlwaysStayPolicy(),\n", + " SmartPolicy(),\n", + " LearningPolicy(),\n", + " ]\n", " \n", " print(\"\\n\" + \"=\"*70)\n", - " print(f\"๐Ÿ† POLICY COMPARISON ({num_episodes} episodes)\")\n", + " print(f\" ๐Ÿ† POLICY SHOWDOWN - {num_episodes} Episodes Each\")\n", " print(\"=\"*70 + \"\\n\")\n", " \n", " results = []\n", " for policy in policies:\n", + " print(f\"Testing {policy.name}...\", end=\" \")\n", " env = CatchEnvironment()\n", " successes = sum(run_episode(env, policy, visualize=False) \n", " for _ in range(num_episodes))\n", - " rate = (successes / num_episodes) * 100\n", - " results.append((policy.name, rate))\n", - " print(f\"{policy.name:25s}: {rate:5.1f}%\")\n", + " success_rate = (successes / num_episodes) * 100\n", + " results.append((policy.name, success_rate, successes))\n", + " print(f\"โœ“\")\n", " \n", " print(\"\\n\" + \"=\"*70)\n", - " print(\"๐Ÿ“Š VISUAL COMPARISON\")\n", + " print(\" ๐Ÿ“Š RESULTS\")\n", " print(\"=\"*70 + \"\\n\")\n", " \n", + " # Sort by success rate\n", " results.sort(key=lambda x: x[1], reverse=True)\n", - " for name, rate in results:\n", + " \n", + " for name, rate, successes in results:\n", " bar = \"โ–ˆ\" * int(rate / 2)\n", - " print(f\"{name:25s} [{bar:<50}] {rate:.1f}%\")\n", + " print(f\"{name:25s} [{bar:<50}] {rate:5.1f}% ({successes}/{num_episodes})\")\n", " \n", " print(\"\\n\" + \"=\"*70)\n", - " print(\"๐Ÿ’ก RL in action: Random โ†’ Learning โ†’ Optimal\")\n", - " print(\"=\"*70)\n", + " print(\"\\n๐Ÿ’ก Key Insights:\")\n", + " print(\" โ€ข Random (~20%): Baseline - pure luck\")\n", + " print(\" โ€ข Always Stay (~20%): Bad - only works if ball in center\")\n", + " print(\" โ€ข Smart (100%): Optimal - always catches\")\n", + " print(\" โ€ข Learning (~85%): Improves over time with experience\")\n", + " print(\"\\n๐ŸŽ“ This is RL in action:\")\n", + " print(\" 1. Start with exploration (random)\")\n", + " print(\" 2. Learn from rewards\")\n", + " print(\" 3. Converge to optimal behavior\\n\")\n", + "\n", + "# Run the competition!\n", + "evaluate_policies(num_episodes=50)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "
\n", + "\n", + "# ๐ŸŽ‰ Congratulations!\n", "\n", - "evaluate_policies(50)" + "### You just built and tested a complete RL environment!\n", + "\n", + "But we did it **the OpenEnv way**: type-safe, structured, production-ready.\n", + "\n", + "
" ] }, { @@ -631,225 +1066,449 @@ "source": [ "---\n", "\n", - "## ๐ŸŒ Part 8: Using Real OpenSpiel Integration\n", + "# Part 9: Using Real OpenSpiel ๐ŸŽฎ\n", "\n", - "
\n", - "

โœจ What We Just Built = How OpenEnv Works!

\n", - "
\n", + "
\n", + "\n", + "## What We Just Built vs Production OpenSpiel\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
ComponentOur DemoOpenEnv + OpenSpiel
EnvironmentLocal Python classDocker container
CommunicationDirect function callsHTTP/JSON
ClientDirect accessHTTPEnvClient
Type Safetyโœ… Dataclassesโœ… Dataclasses
APIreset(), step()reset(), step() (same!)
\n", "\n", - "### ๐Ÿ”„ Demo vs Production:\n", + "**๐ŸŽฏ Same structure, production features!**\n", "\n", - "| Component | ๐Ÿงช Our Demo | ๐Ÿš€ OpenEnv + OpenSpiel |\n", - "|-----------|-------------|------------------------|\n", - "| Environment | Local class | ๐Ÿณ Docker container |\n", - "| Communication | Direct calls | ๐ŸŒ HTTP |\n", - "| Client | Direct access | ๐Ÿ“ฑ HTTPEnvClient |\n", - "| Type Safety | โœ… | โœ… |\n", - "| API | reset/step | reset/step |\n", + "
\n", "\n", - "### ๐ŸŽฎ Using OpenSpiel Integration:\n", + "### Using OpenSpiel Integration:\n", "\n", "```python\n", - "# Install OpenSpiel\n", + "# 1. Install OpenSpiel\n", "!pip install open_spiel\n", "\n", - "# Import OpenEnv's integration\n", + "# 2. Import OpenEnv's integration\n", "from envs.openspiel_env import OpenSpielEnv, OpenSpielAction\n", "\n", - "# Connect to server\n", + "# 3. Connect to server (HTTP!)\n", "env = OpenSpielEnv(base_url=\"http://localhost:8000\")\n", "\n", - "# Same API!\n", + "# 4. Same API you just learned!\n", "result = env.reset()\n", "result = env.step(OpenSpielAction(action_id=2, game_name=\"catch\"))\n", "state = env.state()\n", + "\n", + "# 5. Switch games by changing game_name:\n", + "result = env.step(OpenSpielAction(action_id=4, game_name=\"tic_tac_toe\"))\n", "```\n", "\n", - "### ๐ŸŽฏ Available Games:\n", - "\n", - "
\n", - "
\n", - "

๐ŸŽฏ Catch

\n", - " What we demoed!\n", - "
\n", - "
\n", - "

โŒ Tic-Tac-Toe

\n", - " 2-player\n", - "
\n", - "
\n", - "

๐Ÿƒ Kuhn Poker

\n", - " Imperfect info\n", - "
\n", - "
\n", - "

๐Ÿ”๏ธ Cliff Walking

\n", - " Navigation\n", - "
\n", - "
\n", - "

๐Ÿ”ข 2048

\n", - " Puzzle\n", - "
\n", - "
\n", - "

๐Ÿ‚ก Blackjack

\n", - " Cards\n", - "
\n", - "
\n", + "
\n", "\n", - "---" + "**๐ŸŽฎ 6 Games Available:**\n", + "\n", + "1. `\"catch\"` - What we just built!\n", + "2. `\"tic_tac_toe\"` - Classic 3ร—3\n", + "3. `\"kuhn_poker\"` - Imperfect information poker\n", + "4. `\"cliff_walking\"` - Grid navigation\n", + "5. `\"2048\"` - Tile puzzle\n", + "6. `\"blackjack\"` - Card game\n", + "\n", + "**All use the exact same interface!**\n", + "\n", + "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## โž• Part 9: Adding Your Own Integration\n", + "---\n", + "\n", + "# Part 10: Create Your Own Integration ๐Ÿ› ๏ธ\n", + "\n", + "
\n", + "\n", + "## The 5-Step Pattern\n", + "\n", + "Want to wrap your own environment in OpenEnv? Here's how:\n", "\n", - "
\n", - "

๐Ÿ› ๏ธ Want to wrap your own environment?

\n", - "

Follow the 5-step pattern!

\n", "
\n", "\n", - "### ๐Ÿ“ 1. Define Types (models.py)\n", + "### Step 1: Define Types (`models.py`)\n", + "\n", "```python\n", + "from dataclasses import dataclass\n", + "from core.env_server import Action, Observation, State\n", + "\n", "@dataclass\n", "class YourAction(Action):\n", - " # Your action fields\n", + " action_value: int\n", + " # Add your action fields\n", "\n", "@dataclass\n", "class YourObservation(Observation):\n", - " # Your observation fields\n", + " state_data: List[float]\n", + " done: bool\n", + " reward: float\n", + " # Add your observation fields\n", + "\n", + "@dataclass\n", + "class YourState(State):\n", + " episode_id: str\n", + " step_count: int\n", + " # Add your state fields\n", "```\n", "\n", - "### ๐Ÿ–ฅ๏ธ 2. Implement Environment (server/environment.py)\n", + "### Step 2: Implement Environment (`server/environment.py`)\n", + "\n", "```python\n", + "from core.env_server import Environment\n", + "\n", "class YourEnvironment(Environment):\n", " def reset(self) -> Observation:\n", + " # Initialize your game/simulation\n", " return YourObservation(...)\n", " \n", " def step(self, action: Action) -> Observation:\n", + " # Execute action, update state\n", " return YourObservation(...)\n", + " \n", + " @property\n", + " def state(self) -> State:\n", + " return self._state\n", "```\n", "\n", - "### ๐Ÿ“ฑ 3. Create Client (client.py)\n", + "### Step 3: Create Client (`client.py`)\n", + "\n", "```python\n", + "from core.http_env_client import HTTPEnvClient\n", + "from core.types import StepResult\n", + "\n", "class YourEnv(HTTPEnvClient[YourAction, YourObservation]):\n", - " def _step_payload(self, action):\n", - " return {\"field\": action.field}\n", + " def _step_payload(self, action: YourAction) -> dict:\n", + " \"\"\"Convert action to JSON\"\"\"\n", + " return {\"action_value\": action.action_value}\n", + " \n", + " def _parse_result(self, payload: dict) -> StepResult:\n", + " \"\"\"Parse JSON to observation\"\"\"\n", + " return StepResult(\n", + " observation=YourObservation(...),\n", + " reward=payload['reward'],\n", + " done=payload['done']\n", + " )\n", " \n", - " def _parse_result(self, payload):\n", - " return StepResult(observation=YourObservation(...))\n", + " def _parse_state(self, payload: dict) -> YourState:\n", + " return YourState(...)\n", "```\n", "\n", - "### โšก 4. Create Server (server/app.py)\n", + "### Step 4: Create Server (`server/app.py`)\n", + "\n", "```python\n", "from core.env_server import create_fastapi_app\n", + "from .your_environment import YourEnvironment\n", "\n", "env = YourEnvironment()\n", "app = create_fastapi_app(env)\n", + "\n", + "# That's it! OpenEnv creates all endpoints for you.\n", "```\n", "\n", - "### ๐Ÿณ 5. Dockerize (server/Dockerfile)\n", + "### Step 5: Dockerize (`server/Dockerfile`)\n", + "\n", "```dockerfile\n", - "FROM python:3.11\n", - "COPY . /app\n", + "FROM python:3.11-slim\n", + "\n", "WORKDIR /app\n", - "RUN pip install -r requirements.txt\n", - "CMD [\"uvicorn\", \"app:app\", \"--host\", \"0.0.0.0\"]\n", + "COPY requirements.txt .\n", + "RUN pip install --no-cache-dir -r requirements.txt\n", + "\n", + "COPY . .\n", + "CMD [\"uvicorn\", \"app:app\", \"--host\", \"0.0.0.0\", \"--port\", \"8000\"]\n", "```\n", "\n", - "### ๐Ÿ“š Examples to Study:\n", + "
\n", + "\n", + "### ๐ŸŽ“ Examples to Study\n", + "\n", + "OpenEnv includes 3 complete examples:\n", + "\n", + "1. **`src/envs/echo_env/`**\n", + " - Simplest possible environment\n", + " - Great for testing and learning\n", + "\n", + "2. **`src/envs/openspiel_env/`**\n", + " - Wraps external library (OpenSpiel)\n", + " - Shows integration pattern\n", + " - 6 games in one integration\n", + "\n", + "3. **`src/envs/coding_env/`**\n", + " - Python code execution environment\n", + " - Shows complex use case\n", + " - Security considerations\n", + "\n", + "**๐Ÿ’ก Study these to understand the patterns!**\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "
\n", + "\n", + "# ๐ŸŽ“ Summary: Your Journey\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What You Learned\n", "\n", "\n", "\n", - "\n", - "\n", + "\n", + "\n", + "\n", + "
๐Ÿ“ข src/envs/echo_env/Simple test environment\n", + "\n", + "### ๐Ÿ“š Concepts\n", + "\n", + "โœ… **RL Fundamentals**\n", + "- The observe-act-reward loop\n", + "- What makes good policies\n", + "- Exploration vs exploitation\n", + "\n", + "โœ… **OpenEnv Architecture**\n", + "- Client-server separation\n", + "- Type-safe contracts\n", + "- HTTP communication layer\n", + "\n", + "โœ… **Production Patterns**\n", + "- Docker isolation\n", + "- API design\n", + "- Reproducible deployments\n", + "\n", + "\n", + "\n", + "### ๐Ÿ› ๏ธ Skills\n", + "\n", + "โœ… **Using Environments**\n", + "- Import OpenEnv clients\n", + "- Call reset/step/state\n", + "- Work with typed observations\n", + "\n", + "โœ… **Building Environments**\n", + "- Define type-safe models\n", + "- Implement Environment class\n", + "- Create HTTPEnvClient\n", + "\n", + "โœ… **Testing & Debugging**\n", + "- Compare policies\n", + "- Visualize episodes\n", + "- Measure performance\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## OpenEnv vs Traditional RL\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", "\n", "\n", - "\n", - "\n", + "\n", + "\n", + "\n", + "\n", "\n", "\n", - "\n", - "\n", + "\n", + "\n", + "\n", + "\n", "\n", "
FeatureTraditional (Gym)OpenEnvWinner
Type SafetyโŒ Arrays, dictsโœ… Dataclasses๐Ÿ† OpenEnv
IsolationโŒ Same processโœ… Docker๐Ÿ† OpenEnv
DeploymentโŒ Manual setupโœ… K8s-ready๐Ÿ† OpenEnv
LanguageโŒ Python onlyโœ… Any (HTTP)๐Ÿ† OpenEnv
๐ŸŽฎ src/envs/openspiel_env/Our OpenSpiel integrationReproducibilityโŒ \"Works on my machine\"โœ… Same everywhere๐Ÿ† OpenEnv
๐Ÿ’ป src/envs/coding_env/Python code executionCommunityโœ… Large ecosystem๐ŸŸก Growing๐Ÿค Both!
\n", "\n", - "---" + "
\n", + "\n", + "**๐ŸŽฏ The Bottom Line**\n", + "\n", + "OpenEnv brings **production engineering** to RL:\n", + "- Same environments work locally and in production\n", + "- Type safety catches bugs early\n", + "- Docker isolation prevents conflicts\n", + "- HTTP API works with any language\n", + "\n", + "**It's RL for 2024 and beyond.**\n", + "\n", + "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## ๐ŸŽ“ Summary\n", + "---\n", "\n", - "
\n", - "

๐ŸŽ‰ What You Learned

\n", - "
\n", + "## ๐Ÿš€ Next Steps\n", "\n", - "### ๐Ÿ“– The Journey:\n", - "\n", - "1. **๐Ÿง  RL Basics** - The core loop\n", - "2. **๐Ÿ—๏ธ OpenEnv Framework** - Standardized, production-ready\n", - "3. **๐Ÿ”Œ Example Integration** - How OpenSpiel is wrapped\n", - "4. **๐ŸŽฏ Interactive Demo** - Policies in action\n", - "5. **โž• Adding Integrations** - The pattern to follow\n", - "\n", - "### โœจ OpenEnv's Value:\n", - "\n", - "| Feature | ๐Ÿ  Traditional | ๐Ÿš€ OpenEnv |\n", - "|---------|---------------|------------|\n", - "| **Type Safety** | โŒ | โœ… Dataclasses |\n", - "| **Isolation** | โŒ | โœ… Docker |\n", - "| **Deployment** | โŒ | โœ… K8s-ready |\n", - "| **Language** | Python only | Any (HTTP) |\n", - "| **Reproducibility** | โŒ | โœ… Containers |\n", - "\n", - "### ๐Ÿš€ Next Steps:\n", - "\n", - "
\n", - "
\n", - "

1๏ธโƒฃ Try OpenSpiel

\n", - "

Install and play with the 6 games

\n", - "
\n", - "
\n", - "

2๏ธโƒฃ Implement Real RL

\n", - "

Q-learning, DQN, PPO

\n", - "
\n", - "
\n", - "

3๏ธโƒฃ Wrap Your Environments

\n", - "

Follow the 5-step pattern

\n", - "
\n", - "
\n", - "

4๏ธโƒฃ Deploy to Production

\n", - "

Docker โ†’ Kubernetes

\n", - "
\n", - "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "\n", + "### ๐Ÿ“– Learn More\n", "\n", - "### ๐Ÿ“š Resources:\n", + "- Explore `src/envs/README.md`\n", + "- Read [RFC 001](https://github.com/meta-pytorch/OpenEnv/pull/26)\n", + "- Check example scripts in `examples/`\n", + "- Study OpenSpiel integration\n", "\n", - "- ๐Ÿ  **OpenEnv**: https://github.com/meta-pytorch/OpenEnv\n", - "- ๐Ÿ“– **Docs**: `src/envs/README.md`\n", - "- ๐Ÿ’ก **Examples**: `examples/` directory\n", + "\n", "\n", + "### ๐Ÿ› ๏ธ Build\n", + "\n", + "- Wrap your favorite RL environment\n", + "- Implement real RL algorithms (DQN, PPO)\n", + "- Create a custom game\n", + "- Deploy to production\n", + "\n", + "\n", + "\n", + "### ๐Ÿค Contribute\n", + "\n", + "- Star the [repo](https://github.com/meta-pytorch/OpenEnv)\n", + "- Report issues\n", + "- Submit PRs\n", + "- Share your integrations\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## ๐Ÿ“š Resources\n", + "\n", + "
\n", + "\n", + "**๐Ÿ”— Links**\n", + "\n", + "- **OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n", + "- **OpenSpiel**: https://github.com/google-deepmind/open_spiel\n", + "- **FastAPI Docs**: https://fastapi.tiangolo.com/\n", + "- **Docker Guide**: https://docs.docker.com/get-started/\n", + "\n", + "**๐Ÿ“– Documentation**\n", + "\n", + "- Environment creation guide: `src/envs/README.md`\n", + "- OpenSpiel integration: `src/envs/openspiel_env/README.md`\n", + "- Example scripts: `examples/`\n", + "\n", + "**๐ŸŽ“ Community**\n", + "\n", + "- Supported by: Meta PyTorch, Hugging Face, Unsloth AI, and more\n", + "- License: BSD 3-Clause\n", + "- Contributions welcome!\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "---\n", "\n", - "
\n", - "

๐ŸŽ‰ You're Ready!

\n", - "

You now understand:

\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
โœ… OpenEnv frameworkโœ… How integrations work
โœ… Using existing environmentsโœ… Creating new integrations
โœ… Production deployment
\n", - "

Welcome to production-ready RL! ๐Ÿš€

\n", + "
\n", + "\n", + "# ๐ŸŽ‰ Congratulations!\n", + "\n", + "### You're now an OpenEnv expert!\n", + "\n", + "You understand:\n", + "- โœ… How RL works\n", + "- โœ… Why OpenEnv matters\n", + "- โœ… How to use existing environments\n", + "- โœ… How to create new integrations\n", + "- โœ… How to deploy to production\n", + "\n", + "---\n", + "\n", + "### Now go build something amazing! ๐Ÿš€\n", + "\n", + "**Welcome to the future of RL.**\n", + "\n", "
" ] } @@ -875,4 +1534,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +} From f96db499277a8db170a122a50fc1c9c14c43140b Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 13:47:15 -0700 Subject: [PATCH 05/19] improve UO --- examples/OpenEnv_Tutorial.ipynb | 556 +------------------------------- 1 file changed, 14 insertions(+), 542 deletions(-) diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index fe93747..e534fcc 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -3,74 +3,12 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "
\n", - "\n", - "# ๐Ÿš€ OpenEnv: Production RL Made Simple\n", - "\n", - "### *From \"Hello World\" to Production Deployment in 30 Minutes*\n", - "\n", - "---\n", - "\n", - "**What if RL environments were as easy to use as REST APIs?**\n", - "\n", - "That's OpenEnv. Type-safe. Isolated. Production-ready.\n", - "\n", - "[![GitHub](https://img.shields.io/badge/GitHub-meta--pytorch%2FOpenEnv-blue?logo=github)](https://github.com/meta-pytorch/OpenEnv)\n", - "[![License](https://img.shields.io/badge/License-BSD%203--Clause-green.svg)](https://opensource.org/licenses/BSD-3-Clause)\n", - "\n", - "
\n", - "\n", - "---" - ] + "source": "
\n\n\"PyTorch\"\n\n# OpenEnv: Production RL Made Simple\n\n### *From \"Hello World\" to Production Deployment in 30 Minutes* โœจ\n\n---\n\n**What if RL environments were as easy to use as REST APIs?**\n\nThat's OpenEnv. Type-safe. Isolated. Production-ready. ๐ŸŽฏ\n\n[![GitHub](https://img.shields.io/badge/GitHub-meta--pytorch%2FOpenEnv-blue?logo=github)](https://github.com/meta-pytorch/OpenEnv)\n[![License](https://img.shields.io/badge/License-BSD%203--Clause-green.svg)](https://opensource.org/licenses/BSD-3-Clause)\n[![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?logo=pytorch&logoColor=white)](https://pytorch.org/)\n\n
\n\n---" }, { "cell_type": "markdown", "metadata": {}, - "source": [ - "## ๐Ÿ“‹ What You'll Learn\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "\n", - "**๐ŸŽฏ Part 1-2: The Fundamentals**\n", - "- RL in 60 seconds\n", - "- Why existing solutions fall short\n", - "- The OpenEnv solution\n", - "\n", - "\n", - "\n", - "**๐Ÿ—๏ธ Part 3-5: The Architecture**\n", - "- How OpenEnv works\n", - "- Exploring real code\n", - "- OpenSpiel integration example\n", - "\n", - "
\n", - "\n", - "**๐ŸŽฎ Part 6-8: Hands-On Demo**\n", - "- Build a game environment\n", - "- Test 4 different policies\n", - "- Watch learning happen live\n", - "\n", - "\n", - "\n", - "**๐Ÿ”ง Part 9-10: Going Further**\n", - "- Use real OpenSpiel\n", - "- Create your own integration\n", - "- Deploy to production\n", - "\n", - "
\n", - "\n", - "> ๐Ÿ’ก **Pro Tip**: This notebook is designed to run top-to-bottom in Google Colab with zero setup!" - ] + "source": "## ๐Ÿ“‹ What You'll Learn\n\n\n\n\n\n\n\n\n\n\n
\n\n**๐ŸŽฏ Part 1-2: The Fundamentals**\n- โšก RL in 60 seconds\n- ๐Ÿค” Why existing solutions fall short\n- ๐Ÿ’ก The OpenEnv solution\n\n\n\n**๐Ÿ—๏ธ Part 3-5: The Architecture**\n- ๐Ÿ”ง How OpenEnv works\n- ๐Ÿ” Exploring real code\n- ๐ŸŽฎ OpenSpiel integration example\n\n
\n\n**๐ŸŽฎ Part 6-8: Hands-On Demo**\n- ๐Ÿ”จ Build a game environment\n- ๐Ÿค– Test 4 different policies\n- ๐Ÿ‘€ Watch learning happen live\n\n\n\n**๐Ÿ”ง Part 9-10: Going Further**\n- ๐Ÿš€ Use real OpenSpiel\n- โœจ Create your own integration\n- ๐ŸŒ Deploy to production\n\n
\n\n> ๐Ÿ’ก **Pro Tip**: This notebook is designed to run top-to-bottom in Google Colab with zero setup!\n> \n> โฑ๏ธ **Time**: ~30 minutes | ๐Ÿ“Š **Difficulty**: Beginner-friendly | ๐ŸŽฏ **Outcome**: Production-ready RL knowledge" }, { "cell_type": "markdown", @@ -106,41 +44,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "import random\n", - "\n", - "print(\"๐ŸŽฒ Number Guessing Game - The Simplest RL Example\")\n", - "print(\"=\" * 60)\n", - "\n", - "# Environment\n", - "target = random.randint(1, 10)\n", - "guesses_left = 3\n", - "\n", - "print(f\"\\n๐ŸŽฏ I'm thinking of a number between 1 and 10...\\n\")\n", - "\n", - "# The RL Loop\n", - "while guesses_left > 0:\n", - " # Policy: Random guessing (no learning yet!)\n", - " guess = random.randint(1, 10)\n", - " guesses_left -= 1\n", - " \n", - " print(f\"๐Ÿ’ญ Guess #{3-guesses_left}: {guess}\", end=\" โ†’ \")\n", - " \n", - " # Reward signal\n", - " if guess == target:\n", - " print(\"๐ŸŽ‰ Correct! +10 points\")\n", - " break\n", - " elif abs(guess - target) <= 2:\n", - " print(\"๐Ÿ”ฅ Warm! (close)\")\n", - " else:\n", - " print(\"โ„๏ธ Cold! (far)\")\n", - "else:\n", - " print(f\"\\n๐Ÿ’” Out of guesses. The number was {target}.\")\n", - "\n", - "print(\"\\n\" + \"=\" * 60)\n", - "print(\"\\n๐Ÿ’ก This is RL: Observe โ†’ Act โ†’ Reward โ†’ Repeat\")\n", - "print(\" But this policy is terrible! It doesn't learn.\\n\")" - ] + "source": "import random\n\nprint(\"๐ŸŽฒ \" + \"=\"*58 + \" ๐ŸŽฒ\")\nprint(\" Number Guessing Game - The Simplest RL Example\")\nprint(\"๐ŸŽฒ \" + \"=\"*58 + \" ๐ŸŽฒ\")\n\n# Environment setup\ntarget = random.randint(1, 10)\nguesses_left = 3\n\nprint(f\"\\n๐ŸŽฏ I'm thinking of a number between 1 and 10...\")\nprint(f\"๐Ÿ’ญ You have {guesses_left} guesses. Let's see how random guessing works!\\n\")\n\n# The RL Loop - Pure random policy (no learning!)\nwhile guesses_left > 0:\n # Policy: Random guessing (no learning yet!)\n guess = random.randint(1, 10)\n guesses_left -= 1\n \n print(f\"๐Ÿ’ญ Guess #{3-guesses_left}: {guess}\", end=\" โ†’ \")\n \n # Reward signal (but we're not using it!)\n if guess == target:\n print(\"๐ŸŽ‰ Correct! +10 points\")\n break\n elif abs(guess - target) <= 2:\n print(\"๐Ÿ”ฅ Warm! (close)\")\n else:\n print(\"โ„๏ธ Cold! (far)\")\nelse:\n print(f\"\\n๐Ÿ’” Out of guesses. The number was {target}.\")\n\nprint(\"\\n\" + \"=\"*62)\nprint(\"๐Ÿ’ก This is RL: Observe โ†’ Act โ†’ Reward โ†’ Repeat\")\nprint(\" But this policy is terrible! It doesn't learn from rewards.\")\nprint(\"=\"*62 + \"\\n\")" }, { "cell_type": "markdown", @@ -163,60 +67,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "---\n", - "\n", - "# Part 2: The Problem with Traditional RL ๐Ÿ˜ค\n", - "\n", - "## Why Can't We Just Use OpenAI Gym?\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
ProblemTraditional ApproachOpenEnv Solution
Type SafetyโŒ obs[0][3] - what is this?โœ… obs.info_state - IDE knows!
IsolationโŒ Same process (can crash your training)โœ… Docker containers (fully isolated)
DeploymentโŒ \"Works on my machine\"โœ… Same container everywhere
ScalingโŒ Hard to distributeโœ… Deploy to Kubernetes
LanguageโŒ Python onlyโœ… Any language (HTTP API)
\n", - "\n", - "
\n", - "\n", - "## ๐Ÿ’ก The OpenEnv Philosophy\n", - "\n", - "**\"RL environments should be like microservices\"**\n", - "\n", - "- ๐Ÿ”’ **Isolated**: Run in containers\n", - "- ๐ŸŒ **Standard**: HTTP API, works everywhere\n", - "- ๐Ÿ“ฆ **Versioned**: Docker images\n", - "- ๐Ÿš€ **Scalable**: Deploy anywhere\n", - "- ๐Ÿ›ก๏ธ **Type-safe**: Know exactly what you're sending/receiving\n", - "\n", - "
" - ] + "source": "---\n\n# Part 2: The Problem with Traditional RL ๐Ÿ˜ค\n\n
\n\n## ๐Ÿค” Why Can't We Just Use OpenAI Gym?\n\nGood question! Gym is great for research, but production needs more...\n\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
ChallengeTraditional ApproachOpenEnv Solution
Type SafetyโŒ obs[0][3] - what is this?โœ… obs.info_state - IDE knows!
IsolationโŒ Same process (can crash your training)โœ… Docker containers (fully isolated)
DeploymentโŒ \"Works on my machine\" ๐Ÿคทโœ… Same container everywhere ๐Ÿณ
ScalingโŒ Hard to distributeโœ… Deploy to Kubernetes โ˜ธ๏ธ
LanguageโŒ Python onlyโœ… Any language (HTTP API) ๐ŸŒ
DebuggingโŒ Cryptic numpy errorsโœ… Clear type errors ๐Ÿ›
\n\n
\n\n## ๐Ÿ’ก The OpenEnv Philosophy\n\n**\"RL environments should be like microservices\"**\n\nThink of it like this: You don't run your database in the same process as your web server, right? Same principle!\n\n- ๐Ÿ”’ **Isolated**: Run in containers (security + stability)\n- ๐ŸŒ **Standard**: HTTP API, works everywhere\n- ๐Ÿ“ฆ **Versioned**: Docker images (reproducibility!)\n- ๐Ÿš€ **Scalable**: Deploy to cloud with one command\n- ๐Ÿ›ก๏ธ **Type-safe**: Catch bugs before they happen\n- ๐Ÿ”„ **Portable**: Works on Mac, Linux, Windows, Cloud\n\n
" }, { "cell_type": "markdown", @@ -285,35 +136,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "# Detect environment\n", - "try:\n", - " import google.colab\n", - " IN_COLAB = True\n", - " print(\"๐ŸŒ Running in Google Colab\")\n", - "except ImportError:\n", - " IN_COLAB = False\n", - " print(\"๐Ÿ’ป Running locally\")\n", - "\n", - "if IN_COLAB:\n", - " print(\"\\n๐Ÿ“ฆ Cloning OpenEnv repository...\")\n", - " !git clone https://github.com/meta-pytorch/OpenEnv.git > /dev/null 2>&1\n", - " %cd OpenEnv\n", - " \n", - " print(\"๐Ÿ“š Installing dependencies...\")\n", - " !pip install -q fastapi uvicorn requests\n", - " \n", - " import sys\n", - " sys.path.insert(0, './src')\n", - " print(\"\\nโœ… Setup complete!\")\n", - "else:\n", - " import sys\n", - " from pathlib import Path\n", - " sys.path.insert(0, str(Path.cwd().parent / 'src'))\n", - " print(\"โœ… Using local OpenEnv\")\n", - "\n", - "print(\"\\n๐Ÿš€ Ready to explore OpenEnv!\")" - ] + "source": "# Detect environment\ntry:\n import google.colab\n IN_COLAB = True\n print(\"๐ŸŒ Running in Google Colab - Perfect!\")\nexcept ImportError:\n IN_COLAB = False\n print(\"๐Ÿ’ป Running locally - Nice!\")\n\nif IN_COLAB:\n print(\"\\n๐Ÿ“ฆ Cloning OpenEnv repository...\")\n !git clone https://github.com/meta-pytorch/OpenEnv.git > /dev/null 2>&1\n %cd OpenEnv\n \n print(\"๐Ÿ“š Installing dependencies (this takes ~10 seconds)...\")\n !pip install -q fastapi uvicorn requests\n \n import sys\n sys.path.insert(0, './src')\n print(\"\\nโœ… Setup complete! Everything is ready to go! ๐ŸŽ‰\")\nelse:\n import sys\n from pathlib import Path\n sys.path.insert(0, str(Path.cwd().parent / 'src'))\n print(\"โœ… Using local OpenEnv installation\")\n\nprint(\"\\n๐Ÿš€ Ready to explore OpenEnv and build amazing things!\")\nprint(\"๐Ÿ’ก Tip: Run cells top-to-bottom for the best experience.\\n\")" }, { "cell_type": "markdown", @@ -351,51 +174,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "# Import OpenEnv's core abstractions\n", - "from core.env_server import Environment, Action, Observation, State\n", - "from core.http_env_client import HTTPEnvClient\n", - "\n", - "print(\"=\"*70)\n", - "print(\" ๐Ÿงฉ OPENENV CORE ABSTRACTIONS\")\n", - "print(\"=\"*70)\n", - "\n", - "print(\"\"\"\n", - "๐Ÿ–ฅ๏ธ SERVER SIDE (runs in Docker):\n", - "\n", - " class Environment(ABC):\n", - " '''Base class for all environment implementations'''\n", - " \n", - " @abstractmethod\n", - " def reset(self) -> Observation:\n", - " '''Start new episode'''\n", - " \n", - " @abstractmethod\n", - " def step(self, action: Action) -> Observation:\n", - " '''Execute action, return observation'''\n", - " \n", - " @property\n", - " def state(self) -> State:\n", - " '''Get episode metadata'''\n", - "\n", - "๐Ÿ“ฑ CLIENT SIDE (your training code):\n", - "\n", - " class HTTPEnvClient(ABC):\n", - " '''Base class for HTTP clients'''\n", - " \n", - " def reset(self) -> StepResult:\n", - " # HTTP POST /reset\n", - " \n", - " def step(self, action) -> StepResult:\n", - " # HTTP POST /step\n", - " \n", - " def state(self) -> State:\n", - " # HTTP GET /state\n", - "\"\"\")\n", - "\n", - "print(\"=\"*70)\n", - "print(\"\\n๐Ÿ’ก Same interface on both sides - communication via HTTP!\\n\")" - ] + "source": "# Import OpenEnv's core abstractions\nfrom core.env_server import Environment, Action, Observation, State\nfrom core.http_env_client import HTTPEnvClient\n\nprint(\"=\"*70)\nprint(\" ๐Ÿงฉ OPENENV CORE ABSTRACTIONS\")\nprint(\"=\"*70)\n\nprint(\"\"\"\n๐Ÿ–ฅ๏ธ SERVER SIDE (runs in Docker):\n\n class Environment(ABC):\n '''Base class for all environment implementations'''\n \n @abstractmethod\n def reset(self) -> Observation:\n '''Start new episode'''\n \n @abstractmethod\n def step(self, action: Action) -> Observation:\n '''Execute action, return observation'''\n \n @property\n def state(self) -> State:\n '''Get episode metadata'''\n\n๐Ÿ“ฑ CLIENT SIDE (your training code):\n\n class HTTPEnvClient(ABC):\n '''Base class for HTTP clients'''\n \n def reset(self) -> StepResult:\n # HTTP POST /reset\n \n def step(self, action) -> StepResult:\n # HTTP POST /step\n \n def state(self) -> State:\n # HTTP GET /state\n\"\"\")\n\nprint(\"=\"*70)\nprint(\"\\nโœจ Same interface on both sides - communication via HTTP!\")\nprint(\"๐ŸŽฏ You focus on RL, OpenEnv handles the infrastructure.\\n\")" }, { "cell_type": "markdown", @@ -555,20 +334,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "---\n", - "\n", - "
\n", - "\n", - "# ๐ŸŽฎ Part 6: Interactive Demo\n", - "\n", - "### Now let's BUILD something!\n", - "\n", - "We'll create a Catch game following OpenEnv patterns,
\n", - "then watch 4 different AI policies compete.\n", - "\n", - "
" - ] + "source": "---\n\n
\n\n# ๐ŸŽฎ Part 6: Interactive Demo\n\n### Now let's BUILD something!\n\nWe'll create a **Catch game** following OpenEnv patterns,
\nthen watch **4 different AI policies** compete for the championship! ๐Ÿ†\n\n
\n\n**Get ready for:**\n- โšก Live gameplay visualization\n- ๐Ÿค– AI policy showdown\n- ๐Ÿ“Š Real-time learning metrics\n- ๐ŸŽฏ Production-ready patterns\n\n
" }, { "cell_type": "markdown", @@ -627,120 +393,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "import random\n", - "from dataclasses import dataclass\n", - "from typing import List, Tuple\n", - "\n", - "# ============================================================================\n", - "# MODELS - Type-safe contracts (following OpenEnv pattern)\n", - "# ============================================================================\n", - "\n", - "@dataclass\n", - "class CatchObservation:\n", - " \"\"\"Type-safe observation following OpenEnv Observation base class.\"\"\"\n", - " info_state: List[float] # Grid as flat array\n", - " legal_actions: List[int] # [0, 1, 2] always\n", - " done: bool # Episode finished?\n", - " reward: float # +1 or 0\n", - " # Extra fields for visualization\n", - " ball_position: Tuple[int, int]\n", - " paddle_position: int\n", - "\n", - "\n", - "# ============================================================================\n", - "# ENVIRONMENT - Server-side logic (following OpenEnv Environment pattern)\n", - "# ============================================================================\n", - "\n", - "class CatchEnvironment:\n", - " \"\"\"\n", - " Catch game following OpenEnv's Environment pattern.\n", - " \n", - " In production:\n", - " โ€ข Runs in Docker container\n", - " โ€ข Accessed via HTTPEnvClient\n", - " โ€ข Exposed via FastAPI server\n", - " \n", - " For this demo:\n", - " โ€ข We run it locally to see internals\n", - " โ€ข But the structure is identical!\n", - " \"\"\"\n", - " \n", - " def __init__(self, grid_size=5):\n", - " self.grid_size = grid_size\n", - " \n", - " def reset(self) -> CatchObservation:\n", - " \"\"\"Start new episode (implements Environment.reset()).\"\"\"\n", - " self.ball_row = 0\n", - " self.ball_col = random.randint(0, self.grid_size - 1)\n", - " self.paddle_col = self.grid_size // 2\n", - " self.done = False\n", - " return self._make_observation()\n", - " \n", - " def step(self, action: int) -> CatchObservation:\n", - " \"\"\"Execute action (implements Environment.step()).\n", - " \n", - " Args:\n", - " action: 0=LEFT, 1=STAY, 2=RIGHT\n", - " \"\"\"\n", - " # Move paddle\n", - " if action == 0 and self.paddle_col > 0:\n", - " self.paddle_col -= 1\n", - " elif action == 2 and self.paddle_col < self.grid_size - 1:\n", - " self.paddle_col += 1\n", - " \n", - " # Move ball down\n", - " self.ball_row += 1\n", - " \n", - " # Check if episode done\n", - " if self.ball_row >= self.grid_size - 1:\n", - " self.done = True\n", - " reward = 1.0 if self.ball_col == self.paddle_col else 0.0\n", - " else:\n", - " reward = 0.0\n", - " \n", - " return self._make_observation(reward)\n", - " \n", - " def _make_observation(self, reward=0.0) -> CatchObservation:\n", - " \"\"\"Create type-safe observation.\"\"\"\n", - " # Flatten grid to vector (like real RL environments do)\n", - " info_state = [0.0] * (self.grid_size * self.grid_size)\n", - " ball_idx = self.ball_row * self.grid_size + self.ball_col\n", - " paddle_idx = (self.grid_size - 1) * self.grid_size + self.paddle_col\n", - " info_state[ball_idx] = 1.0 # Ball = 1.0\n", - " info_state[paddle_idx] = 0.5 # Paddle = 0.5\n", - " \n", - " return CatchObservation(\n", - " info_state=info_state,\n", - " legal_actions=[0, 1, 2],\n", - " done=self.done,\n", - " reward=reward,\n", - " ball_position=(self.ball_row, self.ball_col),\n", - " paddle_position=self.paddle_col\n", - " )\n", - " \n", - " def render(self):\n", - " \"\"\"Visualize current state.\"\"\"\n", - " for row in range(self.grid_size):\n", - " line = \" \"\n", - " for col in range(self.grid_size):\n", - " if row == self.ball_row and col == self.ball_col:\n", - " line += \"๐Ÿ”ด \"\n", - " elif row == self.grid_size - 1 and col == self.paddle_col:\n", - " line += \"๐Ÿ“ \"\n", - " else:\n", - " line += \"โฌœ \"\n", - " print(line)\n", - "\n", - "\n", - "print(\"โœ… Environment created following OpenEnv pattern!\")\n", - "print(\"\\n๐Ÿ“‹ What we just built:\")\n", - "print(\" โ€ข reset() โ†’ CatchObservation (type-safe!)\")\n", - "print(\" โ€ข step(action) โ†’ CatchObservation (type-safe!)\")\n", - "print(\" โ€ข render() โ†’ Visual display\")\n", - "print(\"\\n๐Ÿš€ In production: This would run in Docker + FastAPI\")\n", - "print(\" But the structure is EXACTLY the same!\")" - ] + "source": "import random\nfrom dataclasses import dataclass\nfrom typing import List, Tuple\n\n# ============================================================================\n# MODELS - Type-safe contracts (following OpenEnv pattern)\n# ============================================================================\n\n@dataclass\nclass CatchObservation:\n \"\"\"Type-safe observation following OpenEnv Observation base class.\"\"\"\n info_state: List[float] # Grid as flat array\n legal_actions: List[int] # [0, 1, 2] always\n done: bool # Episode finished?\n reward: float # +1 or 0\n # Extra fields for visualization\n ball_position: Tuple[int, int]\n paddle_position: int\n\n\n# ============================================================================\n# ENVIRONMENT - Server-side logic (following OpenEnv Environment pattern)\n# ============================================================================\n\nclass CatchEnvironment:\n \"\"\"\n Catch game following OpenEnv's Environment pattern.\n \n In production:\n โ€ข Runs in Docker container\n โ€ข Accessed via HTTPEnvClient\n โ€ข Exposed via FastAPI server\n \n For this demo:\n โ€ข We run it locally to see internals\n โ€ข But the structure is identical!\n \"\"\"\n \n def __init__(self, grid_size=5):\n self.grid_size = grid_size\n \n def reset(self) -> CatchObservation:\n \"\"\"Start new episode (implements Environment.reset()).\"\"\"\n self.ball_row = 0\n self.ball_col = random.randint(0, self.grid_size - 1)\n self.paddle_col = self.grid_size // 2\n self.done = False\n return self._make_observation()\n \n def step(self, action: int) -> CatchObservation:\n \"\"\"Execute action (implements Environment.step()).\n \n Args:\n action: 0=LEFT, 1=STAY, 2=RIGHT\n \"\"\"\n # Move paddle\n if action == 0 and self.paddle_col > 0:\n self.paddle_col -= 1\n elif action == 2 and self.paddle_col < self.grid_size - 1:\n self.paddle_col += 1\n \n # Move ball down\n self.ball_row += 1\n \n # Check if episode done\n if self.ball_row >= self.grid_size - 1:\n self.done = True\n reward = 1.0 if self.ball_col == self.paddle_col else 0.0\n else:\n reward = 0.0\n \n return self._make_observation(reward)\n \n def _make_observation(self, reward=0.0) -> CatchObservation:\n \"\"\"Create type-safe observation.\"\"\"\n # Flatten grid to vector (like real RL environments do)\n info_state = [0.0] * (self.grid_size * self.grid_size)\n ball_idx = self.ball_row * self.grid_size + self.ball_col\n paddle_idx = (self.grid_size - 1) * self.grid_size + self.paddle_col\n info_state[ball_idx] = 1.0 # Ball = 1.0\n info_state[paddle_idx] = 0.5 # Paddle = 0.5\n \n return CatchObservation(\n info_state=info_state,\n legal_actions=[0, 1, 2],\n done=self.done,\n reward=reward,\n ball_position=(self.ball_row, self.ball_col),\n paddle_position=self.paddle_col\n )\n \n def render(self):\n \"\"\"Visualize current state.\"\"\"\n for row in range(self.grid_size):\n line = \" \"\n for col in range(self.grid_size):\n if row == self.ball_row and col == self.ball_col:\n line += \"๐Ÿ”ด \"\n elif row == self.grid_size - 1 and col == self.paddle_col:\n line += \"๐Ÿ“ \"\n else:\n line += \"โฌœ \"\n print(line)\n\n\nprint(\"๐ŸŽ‰ \" + \"=\"*64 + \" ๐ŸŽ‰\")\nprint(\" โœ… Environment Created Following OpenEnv Pattern!\")\nprint(\"๐ŸŽ‰ \" + \"=\"*64 + \" ๐ŸŽ‰\")\nprint(\"\\n๐Ÿ“‹ What we just built:\")\nprint(\" โ€ข reset() โ†’ CatchObservation (type-safe!)\")\nprint(\" โ€ข step(action) โ†’ CatchObservation (type-safe!)\")\nprint(\" โ€ข render() โ†’ Visual display\")\nprint(\"\\n๐Ÿš€ In production: This would run in Docker + FastAPI\")\nprint(\" But the structure is EXACTLY the same!\")\nprint(\"\\n๐Ÿ’ก This is your blueprint for creating ANY OpenEnv environment!\\n\")" }, { "cell_type": "markdown", @@ -754,23 +407,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "# Create environment\n", - "env = CatchEnvironment()\n", - "obs = env.reset()\n", - "\n", - "print(\"=\"*60)\n", - "print(\" ๐ŸŽฎ INITIAL STATE\")\n", - "print(\"=\"*60 + \"\\n\")\n", - "env.render()\n", - "print(f\"\\n๐Ÿ”ด Ball at: column {obs.ball_position[1]}\")\n", - "print(f\"๐Ÿ“ Paddle at: column {obs.paddle_position}\")\n", - "print(f\"\\n๐Ÿ“Š Observation:\")\n", - "print(f\" โ€ข Legal actions: {obs.legal_actions}\")\n", - "print(f\" โ€ข Info state size: {len(obs.info_state)} (5ร—5 grid flattened)\")\n", - "print(f\" โ€ข Done: {obs.done}\")\n", - "print(f\" โ€ข Reward: {obs.reward}\")" - ] + "source": "# Create environment and start a new episode\nenv = CatchEnvironment()\nobs = env.reset()\n\nprint(\"๐ŸŽฎ \" + \"=\"*58 + \" ๐ŸŽฎ\")\nprint(\" INITIAL GAME STATE\")\nprint(\"๐ŸŽฎ \" + \"=\"*58 + \" ๐ŸŽฎ\\n\")\n\n# Visualize the game board\nenv.render()\n\n# Show game info\nprint(f\"\\n๐Ÿ“ Game Info:\")\nprint(f\" ๐Ÿ”ด Ball at: column {obs.ball_position[1]} (row {obs.ball_position[0]})\")\nprint(f\" ๐Ÿ“ Paddle at: column {obs.paddle_position}\")\n\nprint(f\"\\n๐Ÿ“Š Observation Details:\")\nprint(f\" โ€ข Legal actions: {obs.legal_actions} โ†’ [LEFT, STAY, RIGHT]\")\nprint(f\" โ€ข Info state size: {len(obs.info_state)} (5ร—5 grid flattened)\")\nprint(f\" โ€ข Episode done: {obs.done}\")\nprint(f\" โ€ข Current reward: {obs.reward}\")\n\nprint(\"\\n๐Ÿ’ก The ball will fall down each step. Can your policy catch it?\")\nprint(\"=\"*62)" }, { "cell_type": "markdown", @@ -820,76 +457,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "# ============================================================================\n", - "# POLICIES - Different AI strategies\n", - "# ============================================================================\n", - "\n", - "class RandomPolicy:\n", - " \"\"\"Baseline: Pure random guessing.\"\"\"\n", - " name = \"๐ŸŽฒ Random Guesser\"\n", - " \n", - " def select_action(self, obs: CatchObservation) -> int:\n", - " return random.choice(obs.legal_actions)\n", - "\n", - "\n", - "class AlwaysStayPolicy:\n", - " \"\"\"Bad strategy: Never moves.\"\"\"\n", - " name = \"๐Ÿ›‘ Always Stay\"\n", - " \n", - " def select_action(self, obs: CatchObservation) -> int:\n", - " return 1 # STAY\n", - "\n", - "\n", - "class SmartPolicy:\n", - " \"\"\"Optimal: Move paddle toward ball.\"\"\"\n", - " name = \"๐Ÿง  Smart Heuristic\"\n", - " \n", - " def select_action(self, obs: CatchObservation) -> int:\n", - " ball_col = obs.ball_position[1]\n", - " paddle_col = obs.paddle_position\n", - " \n", - " if paddle_col < ball_col:\n", - " return 2 # Move RIGHT\n", - " elif paddle_col > ball_col:\n", - " return 0 # Move LEFT\n", - " else:\n", - " return 1 # STAY (already aligned)\n", - "\n", - "\n", - "class LearningPolicy:\n", - " \"\"\"Simulated RL: Epsilon-greedy exploration.\"\"\"\n", - " name = \"๐Ÿ“ˆ Learning Agent\"\n", - " \n", - " def __init__(self):\n", - " self.steps = 0\n", - " \n", - " def select_action(self, obs: CatchObservation) -> int:\n", - " self.steps += 1\n", - " \n", - " # Decay exploration rate over time\n", - " epsilon = max(0.1, 1.0 - (self.steps / 100))\n", - " \n", - " if random.random() < epsilon:\n", - " # Explore: random action\n", - " return random.choice(obs.legal_actions)\n", - " else:\n", - " # Exploit: use smart strategy\n", - " ball_col = obs.ball_position[1]\n", - " paddle_col = obs.paddle_position\n", - " if paddle_col < ball_col:\n", - " return 2\n", - " elif paddle_col > ball_col:\n", - " return 0\n", - " else:\n", - " return 1\n", - "\n", - "\n", - "print(\"โœ… 4 Policies created!\\n\")\n", - "policies = [RandomPolicy(), AlwaysStayPolicy(), SmartPolicy(), LearningPolicy()]\n", - "for i, policy in enumerate(policies, 1):\n", - " print(f\" {i}. {policy.name}\")" - ] + "source": "# ============================================================================\n# POLICIES - Different AI strategies\n# ============================================================================\n\nclass RandomPolicy:\n \"\"\"Baseline: Pure random guessing.\"\"\"\n name = \"๐ŸŽฒ Random Guesser\"\n \n def select_action(self, obs: CatchObservation) -> int:\n return random.choice(obs.legal_actions)\n\n\nclass AlwaysStayPolicy:\n \"\"\"Bad strategy: Never moves.\"\"\"\n name = \"๐Ÿ›‘ Always Stay\"\n \n def select_action(self, obs: CatchObservation) -> int:\n return 1 # STAY\n\n\nclass SmartPolicy:\n \"\"\"Optimal: Move paddle toward ball.\"\"\"\n name = \"๐Ÿง  Smart Heuristic\"\n \n def select_action(self, obs: CatchObservation) -> int:\n ball_col = obs.ball_position[1]\n paddle_col = obs.paddle_position\n \n if paddle_col < ball_col:\n return 2 # Move RIGHT\n elif paddle_col > ball_col:\n return 0 # Move LEFT\n else:\n return 1 # STAY (already aligned)\n\n\nclass LearningPolicy:\n \"\"\"Simulated RL: Epsilon-greedy exploration.\"\"\"\n name = \"๐Ÿ“ˆ Learning Agent\"\n \n def __init__(self):\n self.steps = 0\n \n def select_action(self, obs: CatchObservation) -> int:\n self.steps += 1\n \n # Decay exploration rate over time\n epsilon = max(0.1, 1.0 - (self.steps / 100))\n \n if random.random() < epsilon:\n # Explore: random action\n return random.choice(obs.legal_actions)\n else:\n # Exploit: use smart strategy\n ball_col = obs.ball_position[1]\n paddle_col = obs.paddle_position\n if paddle_col < ball_col:\n return 2\n elif paddle_col > ball_col:\n return 0\n else:\n return 1\n\n\nprint(\"๐Ÿค– \" + \"=\"*64 + \" ๐Ÿค–\")\nprint(\" โœ… 4 Policies Created!\")\nprint(\"๐Ÿค– \" + \"=\"*64 + \" ๐Ÿค–\\n\")\n\npolicies = [RandomPolicy(), AlwaysStayPolicy(), SmartPolicy(), LearningPolicy()]\nfor i, policy in enumerate(policies, 1):\n print(f\" {i}. {policy.name}\")\n\nprint(\"\\n๐Ÿ’ก Each policy represents a different approach to solving the game!\")\nprint(\" Let's see who performs best! ๐Ÿ†\\n\")" }, { "cell_type": "markdown", @@ -993,55 +561,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "def evaluate_policies(num_episodes=50):\n", - " \"\"\"Compare all policies over many episodes.\"\"\"\n", - " policies = [\n", - " RandomPolicy(),\n", - " AlwaysStayPolicy(),\n", - " SmartPolicy(),\n", - " LearningPolicy(),\n", - " ]\n", - " \n", - " print(\"\\n\" + \"=\"*70)\n", - " print(f\" ๐Ÿ† POLICY SHOWDOWN - {num_episodes} Episodes Each\")\n", - " print(\"=\"*70 + \"\\n\")\n", - " \n", - " results = []\n", - " for policy in policies:\n", - " print(f\"Testing {policy.name}...\", end=\" \")\n", - " env = CatchEnvironment()\n", - " successes = sum(run_episode(env, policy, visualize=False) \n", - " for _ in range(num_episodes))\n", - " success_rate = (successes / num_episodes) * 100\n", - " results.append((policy.name, success_rate, successes))\n", - " print(f\"โœ“\")\n", - " \n", - " print(\"\\n\" + \"=\"*70)\n", - " print(\" ๐Ÿ“Š RESULTS\")\n", - " print(\"=\"*70 + \"\\n\")\n", - " \n", - " # Sort by success rate\n", - " results.sort(key=lambda x: x[1], reverse=True)\n", - " \n", - " for name, rate, successes in results:\n", - " bar = \"โ–ˆ\" * int(rate / 2)\n", - " print(f\"{name:25s} [{bar:<50}] {rate:5.1f}% ({successes}/{num_episodes})\")\n", - " \n", - " print(\"\\n\" + \"=\"*70)\n", - " print(\"\\n๐Ÿ’ก Key Insights:\")\n", - " print(\" โ€ข Random (~20%): Baseline - pure luck\")\n", - " print(\" โ€ข Always Stay (~20%): Bad - only works if ball in center\")\n", - " print(\" โ€ข Smart (100%): Optimal - always catches\")\n", - " print(\" โ€ข Learning (~85%): Improves over time with experience\")\n", - " print(\"\\n๐ŸŽ“ This is RL in action:\")\n", - " print(\" 1. Start with exploration (random)\")\n", - " print(\" 2. Learn from rewards\")\n", - " print(\" 3. Converge to optimal behavior\\n\")\n", - "\n", - "# Run the competition!\n", - "evaluate_policies(num_episodes=50)" - ] + "source": "def evaluate_policies(num_episodes=50):\n \"\"\"Compare all policies over many episodes.\"\"\"\n policies = [\n RandomPolicy(),\n AlwaysStayPolicy(),\n SmartPolicy(),\n LearningPolicy(),\n ]\n \n print(\"\\n๐Ÿ† \" + \"=\"*66 + \" ๐Ÿ†\")\n print(f\" POLICY SHOWDOWN - {num_episodes} Episodes Each\")\n print(\"๐Ÿ† \" + \"=\"*66 + \" ๐Ÿ†\\n\")\n \n results = []\n for policy in policies:\n print(f\"โšก Testing {policy.name}...\", end=\" \")\n env = CatchEnvironment()\n successes = sum(run_episode(env, policy, visualize=False) \n for _ in range(num_episodes))\n success_rate = (successes / num_episodes) * 100\n results.append((policy.name, success_rate, successes))\n print(f\"โœ“ Done!\")\n \n print(\"\\n\" + \"=\"*70)\n print(\" ๐Ÿ“Š FINAL RESULTS\")\n print(\"=\"*70 + \"\\n\")\n \n # Sort by success rate (descending)\n results.sort(key=lambda x: x[1], reverse=True)\n \n # Award medals to top 3\n medals = [\"๐Ÿฅ‡\", \"๐Ÿฅˆ\", \"๐Ÿฅ‰\", \" \"]\n \n for i, (name, rate, successes) in enumerate(results):\n medal = medals[i]\n bar = \"โ–ˆ\" * int(rate / 2)\n print(f\"{medal} {name:25s} [{bar:<50}] {rate:5.1f}% ({successes}/{num_episodes})\")\n \n print(\"\\n\" + \"=\"*70)\n print(\"\\nโœจ Key Insights:\")\n print(\" โ€ข Random (~20%): Baseline - pure luck ๐ŸŽฒ\")\n print(\" โ€ข Always Stay (~20%): Bad strategy - stays center ๐Ÿ›‘\")\n print(\" โ€ข Smart (100%): Optimal - perfect play! ๐Ÿง \")\n print(\" โ€ข Learning (~85%): Improves over time ๐Ÿ“ˆ\")\n print(\"\\n๐ŸŽ“ This is Reinforcement Learning in action:\")\n print(\" 1. Start with exploration (trying random things)\")\n print(\" 2. Learn from rewards (what works, what doesn't)\")\n print(\" 3. Converge to optimal behavior (smart strategy)\")\n print(\"\\n๐ŸŽฏ The Learning Agent gets smarter with every episode!\\n\")\n\n# Run the epic competition!\nprint(\"๐ŸŽฎ Starting the showdown...\")\nevaluate_policies(num_episodes=50)" }, { "cell_type": "markdown", @@ -1457,60 +977,12 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "## ๐Ÿ“š Resources\n", - "\n", - "
\n", - "\n", - "**๐Ÿ”— Links**\n", - "\n", - "- **OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n", - "- **OpenSpiel**: https://github.com/google-deepmind/open_spiel\n", - "- **FastAPI Docs**: https://fastapi.tiangolo.com/\n", - "- **Docker Guide**: https://docs.docker.com/get-started/\n", - "\n", - "**๐Ÿ“– Documentation**\n", - "\n", - "- Environment creation guide: `src/envs/README.md`\n", - "- OpenSpiel integration: `src/envs/openspiel_env/README.md`\n", - "- Example scripts: `examples/`\n", - "\n", - "**๐ŸŽ“ Community**\n", - "\n", - "- Supported by: Meta PyTorch, Hugging Face, Unsloth AI, and more\n", - "- License: BSD 3-Clause\n", - "- Contributions welcome!\n", - "\n", - "
" - ] + "source": "## ๐Ÿ“š Resources\n\n
\n\n### ๐Ÿ”— Essential Links\n\n- **๐Ÿ  OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n- **๐ŸŽฎ OpenSpiel**: https://github.com/google-deepmind/open_spiel\n- **โšก FastAPI Docs**: https://fastapi.tiangolo.com/\n- **๐Ÿณ Docker Guide**: https://docs.docker.com/get-started/\n- **๐Ÿ”ฅ PyTorch**: https://pytorch.org/\n\n### ๐Ÿ“– Documentation Deep Dives\n\n- **Environment Creation Guide**: `src/envs/README.md`\n- **OpenSpiel Integration**: `src/envs/openspiel_env/README.md`\n- **Example Scripts**: `examples/`\n- **RFC 001**: [Baseline API Specs](https://github.com/meta-pytorch/OpenEnv/pull/26)\n\n### ๐ŸŽ“ Community & Support\n\n**Supported by amazing organizations:**\n- ๐Ÿ”ฅ Meta PyTorch\n- ๐Ÿค— Hugging Face\n- โšก Unsloth AI\n- ๐ŸŒŸ Reflection AI\n- ๐Ÿš€ And many more!\n\n**License**: BSD 3-Clause (very permissive!)\n\n**Contributions**: Always welcome! Check out the issues tab.\n\n
\n\n---\n\n### ๐ŸŒˆ What's Next?\n\n1. โญ **Star the repo** to show support and stay updated\n2. ๐Ÿ”„ **Try modifying** the Catch game (make it harder? bigger grid?)\n3. ๐ŸŽฎ **Explore** other OpenSpiel games\n4. ๐Ÿ› ๏ธ **Build** your own environment integration\n5. ๐Ÿ’ฌ **Share** what you build with the community!" }, { "cell_type": "markdown", "metadata": {}, - "source": [ - "---\n", - "\n", - "
\n", - "\n", - "# ๐ŸŽ‰ Congratulations!\n", - "\n", - "### You're now an OpenEnv expert!\n", - "\n", - "You understand:\n", - "- โœ… How RL works\n", - "- โœ… Why OpenEnv matters\n", - "- โœ… How to use existing environments\n", - "- โœ… How to create new integrations\n", - "- โœ… How to deploy to production\n", - "\n", - "---\n", - "\n", - "### Now go build something amazing! ๐Ÿš€\n", - "\n", - "**Welcome to the future of RL.**\n", - "\n", - "
" - ] + "source": "---\n\n
\n\n# ๐ŸŽ‰ Congratulations! You Did It! ๐ŸŽ‰\n\n### You're now an OpenEnv expert!\n\n
\n\n## โœ… What You've Mastered:\n\n**๐Ÿง  Concepts**\n- How RL works (the observe-act-reward loop)\n- Why OpenEnv matters (production-ready RL)\n- How to use existing environments\n\n**๐Ÿ› ๏ธ Practical Skills**\n- Creating new integrations\n- Building type-safe environments\n- Deploying to production\n\n**๐ŸŽฏ Real Experience**\n- Built a complete RL environment\n- Tested multiple policies\n- Watched learning happen in real-time!\n\n---\n\n### Now go build something amazing! ๐Ÿš€\n\n**Welcome to the future of RL with PyTorch & OpenEnv**\n\n
\n\n[![Star on GitHub](https://img.shields.io/badge/โญ_Star_on_GitHub-gray?style=for-the-badge)](https://github.com/meta-pytorch/OpenEnv)\n\n
\n\n---\n\n
\n\n## ๐ŸŒŸ Want to Learn More?\n\n- ๐Ÿ“– Check out the [docs](https://github.com/meta-pytorch/OpenEnv)\n- ๐ŸŽฎ Try the other example games\n- ๐Ÿ’ฌ Join the community discussions\n- ๐Ÿ› ๏ธ Build your own integration\n- ๐Ÿš€ Deploy to production\n- โญ Star the repo to stay updated!\n\n**Happy coding! ๐ŸŽŠ**\n\n
" } ], "metadata": { @@ -1534,4 +1006,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file From c756d8a3c82acdaea0f9a5cec86bd5a583da52ed Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 13:48:25 -0700 Subject: [PATCH 06/19] Add TOC --- examples/OpenEnv_Tutorial.ipynb | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index e534fcc..a8635bc 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -10,6 +10,11 @@ "metadata": {}, "source": "## ๐Ÿ“‹ What You'll Learn\n\n\n\n\n\n\n\n\n\n\n
\n\n**๐ŸŽฏ Part 1-2: The Fundamentals**\n- โšก RL in 60 seconds\n- ๐Ÿค” Why existing solutions fall short\n- ๐Ÿ’ก The OpenEnv solution\n\n\n\n**๐Ÿ—๏ธ Part 3-5: The Architecture**\n- ๐Ÿ”ง How OpenEnv works\n- ๐Ÿ” Exploring real code\n- ๐ŸŽฎ OpenSpiel integration example\n\n
\n\n**๐ŸŽฎ Part 6-8: Hands-On Demo**\n- ๐Ÿ”จ Build a game environment\n- ๐Ÿค– Test 4 different policies\n- ๐Ÿ‘€ Watch learning happen live\n\n\n\n**๐Ÿ”ง Part 9-10: Going Further**\n- ๐Ÿš€ Use real OpenSpiel\n- โœจ Create your own integration\n- ๐ŸŒ Deploy to production\n\n
\n\n> ๐Ÿ’ก **Pro Tip**: This notebook is designed to run top-to-bottom in Google Colab with zero setup!\n> \n> โฑ๏ธ **Time**: ~30 minutes | ๐Ÿ“Š **Difficulty**: Beginner-friendly | ๐ŸŽฏ **Outcome**: Production-ready RL knowledge" }, + { + "cell_type": "markdown", + "source": "---\n\n\n# Part 1: RL in 60 Seconds โฑ๏ธ\n\n
\n\n**Reinforcement Learning is simpler than you think.**\n\nIt's just a loop:\n\n```\nwhile not done:\n observation = environment.observe()\n action = policy.choose(observation)\n reward = environment.step(action)\n policy.learn(reward)\n```\n\nThat's it. That's RL.\n\n
\n\nLet's see it in action:", + "metadata": {} + }, { "cell_type": "markdown", "metadata": {}, @@ -49,20 +54,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "
\n", - "\n", - "**๐Ÿค” The Problem**: Our random guesser never improves because it doesn't use the rewards!\n", - "\n", - "Real RL agents:\n", - "- ๐Ÿ“Š Track which actions lead to rewards\n", - "- ๐ŸŽฏ Choose better actions over time\n", - "- ๐Ÿ”„ Balance exploration (trying new things) vs exploitation (using what works)\n", - "\n", - "We'll build this later!\n", - "\n", - "
" - ] + "source": "---\n\n\n# Part 2: The Problem with Traditional RL ๐Ÿ˜ค\n\n
\n\n## ๐Ÿค” Why Can't We Just Use OpenAI Gym?\n\nGood question! Gym is great for research, but production needs more...\n\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
ChallengeTraditional ApproachOpenEnv Solution
Type SafetyโŒ obs[0][3] - what is this?โœ… obs.info_state - IDE knows!
IsolationโŒ Same process (can crash your training)โœ… Docker containers (fully isolated)
DeploymentโŒ \"Works on my machine\" ๐Ÿคทโœ… Same container everywhere ๐Ÿณ
ScalingโŒ Hard to distributeโœ… Deploy to Kubernetes โ˜ธ๏ธ
LanguageโŒ Python onlyโœ… Any language (HTTP API) ๐ŸŒ
DebuggingโŒ Cryptic numpy errorsโœ… Clear type errors ๐Ÿ›
\n\n
\n\n## ๐Ÿ’ก The OpenEnv Philosophy\n\n**\"RL environments should be like microservices\"**\n\nThink of it like this: You don't run your database in the same process as your web server, right? Same principle!\n\n- ๐Ÿ”’ **Isolated**: Run in containers (security + stability)\n- ๐ŸŒ **Standard**: HTTP API, works everywhere\n- ๐Ÿ“ฆ **Versioned**: Docker images (reproducibility!)\n- ๐Ÿš€ **Scalable**: Deploy to cloud with one command\n- ๐Ÿ›ก๏ธ **Type-safe**: Catch bugs before they happen\n- ๐Ÿ”„ **Portable**: Works on Mac, Linux, Windows, Cloud\n\n
" }, { "cell_type": "markdown", From af5de22232159fd93cad5a29c206599f4d06445f Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 13:49:04 -0700 Subject: [PATCH 07/19] Update OpenEnv_Tutorial.ipynb --- examples/OpenEnv_Tutorial.ipynb | 92 ++------------------------------- 1 file changed, 4 insertions(+), 88 deletions(-) diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index a8635bc..4d6378f 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -64,47 +64,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "### The Architecture\n", - "\n", - "```\n", - "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", - "โ”‚ YOUR TRAINING CODE โ”‚\n", - "โ”‚ โ”‚\n", - "โ”‚ env = OpenSpielEnv(...) โ† Import the client โ”‚\n", - "โ”‚ result = env.reset() โ† Type-safe! โ”‚\n", - "โ”‚ result = env.step(action) โ† Type-safe! โ”‚\n", - "โ”‚ โ”‚\n", - "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", - " โ”‚\n", - " โ”‚ HTTP/JSON (Language-Agnostic)\n", - " โ”‚ POST /reset, POST /step, GET /state\n", - " โ”‚\n", - "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", - "โ”‚ DOCKER CONTAINER โ”‚\n", - "โ”‚ โ”‚\n", - "โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚\n", - "โ”‚ โ”‚ FastAPI Server โ”‚ โ”‚\n", - "โ”‚ โ”‚ โ””โ”€ Environment (reset, step, state) โ”‚ โ”‚\n", - "โ”‚ โ”‚ โ””โ”€ Your Game/Simulation Logic โ”‚ โ”‚\n", - "โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚\n", - "โ”‚ โ”‚\n", - "โ”‚ Isolated โ€ข Reproducible โ€ข Secure โ”‚\n", - "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", - "```\n", - "\n", - "
\n", - "\n", - "**๐ŸŽฏ Key Insight**: You never see HTTP details - just clean Python methods!\n", - "\n", - "```python\n", - "env.reset() # Under the hood: HTTP POST to /reset\n", - "env.step(...) # Under the hood: HTTP POST to /step\n", - "env.state() # Under the hood: HTTP GET to /state\n", - "```\n", - "\n", - "
" - ] + "source": "---\n\n\n# Part 3: Setup ๐Ÿ› ๏ธ\n\n
\n\n**Running in Colab?** This cell will clone OpenEnv and install dependencies automatically.\n\n**Running locally?** Make sure you're in the OpenEnv directory.\n\n
" }, { "cell_type": "markdown", @@ -128,7 +88,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": "# Detect environment\ntry:\n import google.colab\n IN_COLAB = True\n print(\"๐ŸŒ Running in Google Colab - Perfect!\")\nexcept ImportError:\n IN_COLAB = False\n print(\"๐Ÿ’ป Running locally - Nice!\")\n\nif IN_COLAB:\n print(\"\\n๐Ÿ“ฆ Cloning OpenEnv repository...\")\n !git clone https://github.com/meta-pytorch/OpenEnv.git > /dev/null 2>&1\n %cd OpenEnv\n \n print(\"๐Ÿ“š Installing dependencies (this takes ~10 seconds)...\")\n !pip install -q fastapi uvicorn requests\n \n import sys\n sys.path.insert(0, './src')\n print(\"\\nโœ… Setup complete! Everything is ready to go! ๐ŸŽ‰\")\nelse:\n import sys\n from pathlib import Path\n sys.path.insert(0, str(Path.cwd().parent / 'src'))\n print(\"โœ… Using local OpenEnv installation\")\n\nprint(\"\\n๐Ÿš€ Ready to explore OpenEnv and build amazing things!\")\nprint(\"๐Ÿ’ก Tip: Run cells top-to-bottom for the best experience.\\n\")" + "source": "---\n\n\n# Part 4: The OpenEnv Pattern ๐Ÿ—๏ธ\n\n
\n\n## Every OpenEnv Environment Has 3 Components:\n\n```\nsrc/envs/your_env/\nโ”œโ”€โ”€ ๐Ÿ“ models.py โ† Type-safe contracts\nโ”‚ (Action, Observation, State)\nโ”‚\nโ”œโ”€โ”€ ๐Ÿ“ฑ client.py โ† What YOU import\nโ”‚ (HTTPEnvClient implementation)\nโ”‚\nโ””โ”€โ”€ ๐Ÿ–ฅ๏ธ server/\n โ”œโ”€โ”€ environment.py โ† Game/simulation logic\n โ”œโ”€โ”€ app.py โ† FastAPI server\n โ””โ”€โ”€ Dockerfile โ† Container definition\n```\n\n
\n\nLet's explore the actual OpenEnv code to see how this works:" }, { "cell_type": "markdown", @@ -166,7 +126,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": "# Import OpenEnv's core abstractions\nfrom core.env_server import Environment, Action, Observation, State\nfrom core.http_env_client import HTTPEnvClient\n\nprint(\"=\"*70)\nprint(\" ๐Ÿงฉ OPENENV CORE ABSTRACTIONS\")\nprint(\"=\"*70)\n\nprint(\"\"\"\n๐Ÿ–ฅ๏ธ SERVER SIDE (runs in Docker):\n\n class Environment(ABC):\n '''Base class for all environment implementations'''\n \n @abstractmethod\n def reset(self) -> Observation:\n '''Start new episode'''\n \n @abstractmethod\n def step(self, action: Action) -> Observation:\n '''Execute action, return observation'''\n \n @property\n def state(self) -> State:\n '''Get episode metadata'''\n\n๐Ÿ“ฑ CLIENT SIDE (your training code):\n\n class HTTPEnvClient(ABC):\n '''Base class for HTTP clients'''\n \n def reset(self) -> StepResult:\n # HTTP POST /reset\n \n def step(self, action) -> StepResult:\n # HTTP POST /step\n \n def state(self) -> State:\n # HTTP GET /state\n\"\"\")\n\nprint(\"=\"*70)\nprint(\"\\nโœจ Same interface on both sides - communication via HTTP!\")\nprint(\"๐ŸŽฏ You focus on RL, OpenEnv handles the infrastructure.\\n\")" + "source": "---\n\n\n# Part 5: Example Integration - OpenSpiel ๐ŸŽฎ\n\n
\n\n## What is OpenSpiel?\n\n**OpenSpiel** is a library from DeepMind with **70+ game environments** for RL research.\n\n## OpenEnv's Integration\n\nWe've wrapped **6 OpenSpiel games** following the OpenEnv pattern:\n\n\n\n\n\n\n
\n\n**๐ŸŽฏ Single-Player**\n1. **Catch** - Catch falling ball\n2. **Cliff Walking** - Navigate grid\n3. **2048** - Tile puzzle\n4. **Blackjack** - Card game\n\n\n\n**๐Ÿ‘ฅ Multi-Player**\n5. **Tic-Tac-Toe** - Classic 3ร—3\n6. **Kuhn Poker** - Imperfect info poker\n\n
\n\nThis shows how OpenEnv can wrap **any** existing RL library!\n\n
" }, { "cell_type": "markdown", @@ -277,51 +237,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "from envs.openspiel_env.client import OpenSpielEnv\n", - "\n", - "print(\"=\"*70)\n", - "print(\" ๐Ÿ”Œ HOW OPENENV WRAPS OPENSPIEL\")\n", - "print(\"=\"*70)\n", - "\n", - "print(\"\"\"\n", - "class OpenSpielEnv(HTTPEnvClient[OpenSpielAction, OpenSpielObservation]):\n", - " \n", - " def _step_payload(self, action: OpenSpielAction) -> dict:\n", - " '''Convert typed action to JSON for HTTP'''\n", - " return {\n", - " \"action_id\": action.action_id,\n", - " \"game_name\": action.game_name,\n", - " }\n", - " \n", - " def _parse_result(self, payload: dict) -> StepResult:\n", - " '''Parse HTTP JSON response into typed observation'''\n", - " return StepResult(\n", - " observation=OpenSpielObservation(...),\n", - " reward=payload['reward'],\n", - " done=payload['done']\n", - " )\n", - "\n", - "\"\"\")\n", - "\n", - "print(\"โ”€\" * 70)\n", - "print(\"\\nโœจ Usage (works for ALL OpenEnv environments):\")\n", - "print(\"\"\"\n", - " env = OpenSpielEnv(base_url=\"http://localhost:8000\")\n", - " \n", - " result = env.reset()\n", - " # Returns StepResult[OpenSpielObservation] - Type safe!\n", - " \n", - " result = env.step(OpenSpielAction(action_id=2, game_name=\"catch\"))\n", - " # Type checker knows this is valid!\n", - " \n", - " state = env.state()\n", - " # Returns OpenSpielState\n", - "\"\"\")\n", - "\n", - "print(\"โ”€\" * 70)\n", - "print(\"\\n๐ŸŽฏ This pattern works for ANY environment you want to wrap!\\n\")" - ] + "source": "---\n\n\n
\n\n# ๐ŸŽฎ Part 6: Interactive Demo\n\n### Now let's BUILD something!\n\nWe'll create a **Catch game** following OpenEnv patterns,
\nthen watch **4 different AI policies** compete for the championship! ๐Ÿ†\n\n
\n\n**Get ready for:**\n- โšก Live gameplay visualization\n- ๐Ÿค– AI policy showdown\n- ๐Ÿ“Š Real-time learning metrics\n- ๐ŸŽฏ Production-ready patterns\n\n
" }, { "cell_type": "markdown", From 520983c2e899e4c5e9c20bfde0b8eef1117e6a1e Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 13:56:09 -0700 Subject: [PATCH 08/19] Update OpenEnv_Tutorial.ipynb --- examples/OpenEnv_Tutorial.ipynb | 288 ++------------------------------ 1 file changed, 11 insertions(+), 277 deletions(-) diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index 4d6378f..3f52343 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -56,6 +56,11 @@ "metadata": {}, "source": "---\n\n\n# Part 2: The Problem with Traditional RL ๐Ÿ˜ค\n\n
\n\n## ๐Ÿค” Why Can't We Just Use OpenAI Gym?\n\nGood question! Gym is great for research, but production needs more...\n\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
ChallengeTraditional ApproachOpenEnv Solution
Type SafetyโŒ obs[0][3] - what is this?โœ… obs.info_state - IDE knows!
IsolationโŒ Same process (can crash your training)โœ… Docker containers (fully isolated)
DeploymentโŒ \"Works on my machine\" ๐Ÿคทโœ… Same container everywhere ๐Ÿณ
ScalingโŒ Hard to distributeโœ… Deploy to Kubernetes โ˜ธ๏ธ
LanguageโŒ Python onlyโœ… Any language (HTTP API) ๐ŸŒ
DebuggingโŒ Cryptic numpy errorsโœ… Clear type errors ๐Ÿ›
\n\n
\n\n## ๐Ÿ’ก The OpenEnv Philosophy\n\n**\"RL environments should be like microservices\"**\n\nThink of it like this: You don't run your database in the same process as your web server, right? Same principle!\n\n- ๐Ÿ”’ **Isolated**: Run in containers (security + stability)\n- ๐ŸŒ **Standard**: HTTP API, works everywhere\n- ๐Ÿ“ฆ **Versioned**: Docker images (reproducibility!)\n- ๐Ÿš€ **Scalable**: Deploy to cloud with one command\n- ๐Ÿ›ก๏ธ **Type-safe**: Catch bugs before they happen\n- ๐Ÿ”„ **Portable**: Works on Mac, Linux, Windows, Cloud\n\n
" }, + { + "cell_type": "markdown", + "source": "### The Architecture\n\n```\nโ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\nโ”‚ YOUR TRAINING CODE โ”‚\nโ”‚ โ”‚\nโ”‚ env = OpenSpielEnv(...) โ† Import the client โ”‚\nโ”‚ result = env.reset() โ† Type-safe! โ”‚\nโ”‚ result = env.step(action) โ† Type-safe! โ”‚\nโ”‚ โ”‚\nโ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n โ”‚\n โ”‚ HTTP/JSON (Language-Agnostic)\n โ”‚ POST /reset, POST /step, GET /state\n โ”‚\nโ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\nโ”‚ DOCKER CONTAINER โ”‚\nโ”‚ โ”‚\nโ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚\nโ”‚ โ”‚ FastAPI Server โ”‚ โ”‚\nโ”‚ โ”‚ โ””โ”€ Environment (reset, step, state) โ”‚ โ”‚\nโ”‚ โ”‚ โ””โ”€ Your Game/Simulation Logic โ”‚ โ”‚\nโ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚\nโ”‚ โ”‚\nโ”‚ Isolated โ€ข Reproducible โ€ข Secure โ”‚\nโ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n```\n\n
\n\n**๐ŸŽฏ Key Insight**: You never see HTTP details - just clean Python methods!\n\n```python\nenv.reset() # Under the hood: HTTP POST to /reset\nenv.step(...) # Under the hood: HTTP POST to /step\nenv.state() # Under the hood: HTTP GET to /state\n```\n\nThe magic? OpenEnv handles all the plumbing. You focus on RL! โœจ\n\n
", + "metadata": {} + }, { "cell_type": "markdown", "metadata": {}, @@ -315,7 +320,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": "# Create environment and start a new episode\nenv = CatchEnvironment()\nobs = env.reset()\n\nprint(\"๐ŸŽฎ \" + \"=\"*58 + \" ๐ŸŽฎ\")\nprint(\" INITIAL GAME STATE\")\nprint(\"๐ŸŽฎ \" + \"=\"*58 + \" ๐ŸŽฎ\\n\")\n\n# Visualize the game board\nenv.render()\n\n# Show game info\nprint(f\"\\n๐Ÿ“ Game Info:\")\nprint(f\" ๐Ÿ”ด Ball at: column {obs.ball_position[1]} (row {obs.ball_position[0]})\")\nprint(f\" ๐Ÿ“ Paddle at: column {obs.paddle_position}\")\n\nprint(f\"\\n๐Ÿ“Š Observation Details:\")\nprint(f\" โ€ข Legal actions: {obs.legal_actions} โ†’ [LEFT, STAY, RIGHT]\")\nprint(f\" โ€ข Info state size: {len(obs.info_state)} (5ร—5 grid flattened)\")\nprint(f\" โ€ข Episode done: {obs.done}\")\nprint(f\" โ€ข Current reward: {obs.reward}\")\n\nprint(\"\\n๐Ÿ’ก The ball will fall down each step. Can your policy catch it?\")\nprint(\"=\"*62)" + "source": "---\n\n\n# Part 7: Four Policies ๐Ÿค–\n\n
\n\n## Let's test 4 different AI strategies:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
PolicyStrategyExpected Performance
๐ŸŽฒ RandomPick random action every step~20% (pure luck)
๐Ÿ›‘ Always StayNever move, hope ball lands in center~20% (terrible!)
๐Ÿง  SmartMove paddle toward ball100% (optimal!)
๐Ÿ“ˆ LearningStart random, learn smart strategy~85% (improves over time)
\n\n
" }, { "cell_type": "markdown", @@ -436,18 +441,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "
\n", - "\n", - "**๐Ÿ’ก Try changing the policy!**\n", - "\n", - "Replace `SmartPolicy()` with:\n", - "- `RandomPolicy()` - Watch it fail!\n", - "- `AlwaysStayPolicy()` - Usually fails\n", - "- `LearningPolicy()` - Gets better over time\n", - "\n", - "
" - ] + "source": "---\n\n\n# Part 8: Policy Competition! ๐Ÿ†\n\n
\n\nLet's run **50 episodes** for each policy and see who wins!\n\n
" }, { "cell_type": "markdown", @@ -474,238 +468,17 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "---\n", - "\n", - "
\n", - "\n", - "# ๐ŸŽ‰ Congratulations!\n", - "\n", - "### You just built and tested a complete RL environment!\n", - "\n", - "But we did it **the OpenEnv way**: type-safe, structured, production-ready.\n", - "\n", - "
" - ] + "source": "---\n\n\n# Part 9: Using Real OpenSpiel ๐ŸŽฎ\n\n
\n\n## What We Just Built vs Production OpenSpiel\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
ComponentOur DemoOpenEnv + OpenSpiel
EnvironmentLocal Python classDocker container
CommunicationDirect function callsHTTP/JSON
ClientDirect accessHTTPEnvClient
Type Safetyโœ… Dataclassesโœ… Dataclasses
APIreset(), step()reset(), step() (same!)
\n\n**๐ŸŽฏ Same structure, production features!**\n\n
\n\n### Using OpenSpiel Integration:\n\n```python\n# 1. Install OpenSpiel\n!pip install open_spiel\n\n# 2. Import OpenEnv's integration\nfrom envs.openspiel_env import OpenSpielEnv, OpenSpielAction\n\n# 3. Connect to server (HTTP!)\nenv = OpenSpielEnv(base_url=\"http://localhost:8000\")\n\n# 4. Same API you just learned!\nresult = env.reset()\nresult = env.step(OpenSpielAction(action_id=2, game_name=\"catch\"))\nstate = env.state()\n\n# 5. Switch games by changing game_name:\nresult = env.step(OpenSpielAction(action_id=4, game_name=\"tic_tac_toe\"))\n```\n\n
\n\n**๐ŸŽฎ 6 Games Available:**\n\n1. `\"catch\"` - What we just built!\n2. `\"tic_tac_toe\"` - Classic 3ร—3\n3. `\"kuhn_poker\"` - Imperfect information poker\n4. `\"cliff_walking\"` - Grid navigation\n5. `\"2048\"` - Tile puzzle\n6. `\"blackjack\"` - Card game\n\n**All use the exact same interface!**\n\n
" }, { "cell_type": "markdown", "metadata": {}, - "source": [ - "---\n", - "\n", - "# Part 9: Using Real OpenSpiel ๐ŸŽฎ\n", - "\n", - "
\n", - "\n", - "## What We Just Built vs Production OpenSpiel\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
ComponentOur DemoOpenEnv + OpenSpiel
EnvironmentLocal Python classDocker container
CommunicationDirect function callsHTTP/JSON
ClientDirect accessHTTPEnvClient
Type Safetyโœ… Dataclassesโœ… Dataclasses
APIreset(), step()reset(), step() (same!)
\n", - "\n", - "**๐ŸŽฏ Same structure, production features!**\n", - "\n", - "
\n", - "\n", - "### Using OpenSpiel Integration:\n", - "\n", - "```python\n", - "# 1. Install OpenSpiel\n", - "!pip install open_spiel\n", - "\n", - "# 2. Import OpenEnv's integration\n", - "from envs.openspiel_env import OpenSpielEnv, OpenSpielAction\n", - "\n", - "# 3. Connect to server (HTTP!)\n", - "env = OpenSpielEnv(base_url=\"http://localhost:8000\")\n", - "\n", - "# 4. Same API you just learned!\n", - "result = env.reset()\n", - "result = env.step(OpenSpielAction(action_id=2, game_name=\"catch\"))\n", - "state = env.state()\n", - "\n", - "# 5. Switch games by changing game_name:\n", - "result = env.step(OpenSpielAction(action_id=4, game_name=\"tic_tac_toe\"))\n", - "```\n", - "\n", - "
\n", - "\n", - "**๐ŸŽฎ 6 Games Available:**\n", - "\n", - "1. `\"catch\"` - What we just built!\n", - "2. `\"tic_tac_toe\"` - Classic 3ร—3\n", - "3. `\"kuhn_poker\"` - Imperfect information poker\n", - "4. `\"cliff_walking\"` - Grid navigation\n", - "5. `\"2048\"` - Tile puzzle\n", - "6. `\"blackjack\"` - Card game\n", - "\n", - "**All use the exact same interface!**\n", - "\n", - "
" - ] + "source": "---\n\n\n# Part 10: Create Your Own Integration ๐Ÿ› ๏ธ\n\n
\n\n## The 5-Step Pattern\n\nWant to wrap your own environment in OpenEnv? Here's how:\n\n
\n\n### Step 1: Define Types (`models.py`)\n\n```python\nfrom dataclasses import dataclass\nfrom core.env_server import Action, Observation, State\n\n@dataclass\nclass YourAction(Action):\n action_value: int\n # Add your action fields\n\n@dataclass\nclass YourObservation(Observation):\n state_data: List[float]\n done: bool\n reward: float\n # Add your observation fields\n\n@dataclass\nclass YourState(State):\n episode_id: str\n step_count: int\n # Add your state fields\n```\n\n### Step 2: Implement Environment (`server/environment.py`)\n\n```python\nfrom core.env_server import Environment\n\nclass YourEnvironment(Environment):\n def reset(self) -> Observation:\n # Initialize your game/simulation\n return YourObservation(...)\n \n def step(self, action: Action) -> Observation:\n # Execute action, update state\n return YourObservation(...)\n \n @property\n def state(self) -> State:\n return self._state\n```\n\n### Step 3: Create Client (`client.py`)\n\n```python\nfrom core.http_env_client import HTTPEnvClient\nfrom core.types import StepResult\n\nclass YourEnv(HTTPEnvClient[YourAction, YourObservation]):\n def _step_payload(self, action: YourAction) -> dict:\n \"\"\"Convert action to JSON\"\"\"\n return {\"action_value\": action.action_value}\n \n def _parse_result(self, payload: dict) -> StepResult:\n \"\"\"Parse JSON to observation\"\"\"\n return StepResult(\n observation=YourObservation(...),\n reward=payload['reward'],\n done=payload['done']\n )\n \n def _parse_state(self, payload: dict) -> YourState:\n return YourState(...)\n```\n\n### Step 4: Create Server (`server/app.py`)\n\n```python\nfrom core.env_server import create_fastapi_app\nfrom .your_environment import YourEnvironment\n\nenv = YourEnvironment()\napp = create_fastapi_app(env)\n\n# That's it! OpenEnv creates all endpoints for you.\n```\n\n### Step 5: Dockerize (`server/Dockerfile`)\n\n```dockerfile\nFROM python:3.11-slim\n\nWORKDIR /app\nCOPY requirements.txt .\nRUN pip install --no-cache-dir -r requirements.txt\n\nCOPY . .\nCMD [\"uvicorn\", \"app:app\", \"--host\", \"0.0.0.0\", \"--port\", \"8000\"]\n```\n\n
\n\n### ๐ŸŽ“ Examples to Study\n\nOpenEnv includes 3 complete examples:\n\n1. **`src/envs/echo_env/`**\n - Simplest possible environment\n - Great for testing and learning\n\n2. **`src/envs/openspiel_env/`**\n - Wraps external library (OpenSpiel)\n - Shows integration pattern\n - 6 games in one integration\n\n3. **`src/envs/coding_env/`**\n - Python code execution environment\n - Shows complex use case\n - Security considerations\n\n**๐Ÿ’ก Study these to understand the patterns!**\n\n
" }, { "cell_type": "markdown", "metadata": {}, - "source": [ - "---\n", - "\n", - "# Part 10: Create Your Own Integration ๐Ÿ› ๏ธ\n", - "\n", - "
\n", - "\n", - "## The 5-Step Pattern\n", - "\n", - "Want to wrap your own environment in OpenEnv? Here's how:\n", - "\n", - "
\n", - "\n", - "### Step 1: Define Types (`models.py`)\n", - "\n", - "```python\n", - "from dataclasses import dataclass\n", - "from core.env_server import Action, Observation, State\n", - "\n", - "@dataclass\n", - "class YourAction(Action):\n", - " action_value: int\n", - " # Add your action fields\n", - "\n", - "@dataclass\n", - "class YourObservation(Observation):\n", - " state_data: List[float]\n", - " done: bool\n", - " reward: float\n", - " # Add your observation fields\n", - "\n", - "@dataclass\n", - "class YourState(State):\n", - " episode_id: str\n", - " step_count: int\n", - " # Add your state fields\n", - "```\n", - "\n", - "### Step 2: Implement Environment (`server/environment.py`)\n", - "\n", - "```python\n", - "from core.env_server import Environment\n", - "\n", - "class YourEnvironment(Environment):\n", - " def reset(self) -> Observation:\n", - " # Initialize your game/simulation\n", - " return YourObservation(...)\n", - " \n", - " def step(self, action: Action) -> Observation:\n", - " # Execute action, update state\n", - " return YourObservation(...)\n", - " \n", - " @property\n", - " def state(self) -> State:\n", - " return self._state\n", - "```\n", - "\n", - "### Step 3: Create Client (`client.py`)\n", - "\n", - "```python\n", - "from core.http_env_client import HTTPEnvClient\n", - "from core.types import StepResult\n", - "\n", - "class YourEnv(HTTPEnvClient[YourAction, YourObservation]):\n", - " def _step_payload(self, action: YourAction) -> dict:\n", - " \"\"\"Convert action to JSON\"\"\"\n", - " return {\"action_value\": action.action_value}\n", - " \n", - " def _parse_result(self, payload: dict) -> StepResult:\n", - " \"\"\"Parse JSON to observation\"\"\"\n", - " return StepResult(\n", - " observation=YourObservation(...),\n", - " reward=payload['reward'],\n", - " done=payload['done']\n", - " )\n", - " \n", - " def _parse_state(self, payload: dict) -> YourState:\n", - " return YourState(...)\n", - "```\n", - "\n", - "### Step 4: Create Server (`server/app.py`)\n", - "\n", - "```python\n", - "from core.env_server import create_fastapi_app\n", - "from .your_environment import YourEnvironment\n", - "\n", - "env = YourEnvironment()\n", - "app = create_fastapi_app(env)\n", - "\n", - "# That's it! OpenEnv creates all endpoints for you.\n", - "```\n", - "\n", - "### Step 5: Dockerize (`server/Dockerfile`)\n", - "\n", - "```dockerfile\n", - "FROM python:3.11-slim\n", - "\n", - "WORKDIR /app\n", - "COPY requirements.txt .\n", - "RUN pip install --no-cache-dir -r requirements.txt\n", - "\n", - "COPY . .\n", - "CMD [\"uvicorn\", \"app:app\", \"--host\", \"0.0.0.0\", \"--port\", \"8000\"]\n", - "```\n", - "\n", - "
\n", - "\n", - "### ๐ŸŽ“ Examples to Study\n", - "\n", - "OpenEnv includes 3 complete examples:\n", - "\n", - "1. **`src/envs/echo_env/`**\n", - " - Simplest possible environment\n", - " - Great for testing and learning\n", - "\n", - "2. **`src/envs/openspiel_env/`**\n", - " - Wraps external library (OpenSpiel)\n", - " - Shows integration pattern\n", - " - 6 games in one integration\n", - "\n", - "3. **`src/envs/coding_env/`**\n", - " - Python code execution environment\n", - " - Shows complex use case\n", - " - Security considerations\n", - "\n", - "**๐Ÿ’ก Study these to understand the patterns!**\n", - "\n", - "
" - ] + "source": "---\n\n\n
\n\n# ๐ŸŽ“ Summary: Your Journey\n\n
" }, { "cell_type": "markdown", @@ -841,46 +614,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "---\n", - "\n", - "## ๐Ÿš€ Next Steps\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "\n", - "### ๐Ÿ“– Learn More\n", - "\n", - "- Explore `src/envs/README.md`\n", - "- Read [RFC 001](https://github.com/meta-pytorch/OpenEnv/pull/26)\n", - "- Check example scripts in `examples/`\n", - "- Study OpenSpiel integration\n", - "\n", - "\n", - "\n", - "### ๐Ÿ› ๏ธ Build\n", - "\n", - "- Wrap your favorite RL environment\n", - "- Implement real RL algorithms (DQN, PPO)\n", - "- Create a custom game\n", - "- Deploy to production\n", - "\n", - "\n", - "\n", - "### ๐Ÿค Contribute\n", - "\n", - "- Star the [repo](https://github.com/meta-pytorch/OpenEnv)\n", - "- Report issues\n", - "- Submit PRs\n", - "- Share your integrations\n", - "\n", - "
" - ] + "source": "\n## ๐Ÿ“š Resources\n\n
\n\n### ๐Ÿ”— Essential Links\n\n- **๐Ÿ  OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n- **๐ŸŽฎ OpenSpiel**: https://github.com/google-deepmind/open_spiel\n- **โšก FastAPI Docs**: https://fastapi.tiangolo.com/\n- **๐Ÿณ Docker Guide**: https://docs.docker.com/get-started/\n- **๐Ÿ”ฅ PyTorch**: https://pytorch.org/\n\n### ๐Ÿ“– Documentation Deep Dives\n\n- **Environment Creation Guide**: `src/envs/README.md`\n- **OpenSpiel Integration**: `src/envs/openspiel_env/README.md`\n- **Example Scripts**: `examples/`\n- **RFC 001**: [Baseline API Specs](https://github.com/meta-pytorch/OpenEnv/pull/26)\n\n### ๐ŸŽ“ Community & Support\n\n**Supported by amazing organizations:**\n- ๐Ÿ”ฅ Meta PyTorch\n- ๐Ÿค— Hugging Face\n- โšก Unsloth AI\n- ๐ŸŒŸ Reflection AI\n- ๐Ÿš€ And many more!\n\n**License**: BSD 3-Clause (very permissive!)\n\n**Contributions**: Always welcome! Check out the issues tab.\n\n
\n\n---\n\n### ๐ŸŒˆ What's Next?\n\n1. โญ **Star the repo** to show support and stay updated\n2. ๐Ÿ”„ **Try modifying** the Catch game (make it harder? bigger grid?)\n3. ๐ŸŽฎ **Explore** other OpenSpiel games\n4. ๐Ÿ› ๏ธ **Build** your own environment integration\n5. ๐Ÿ’ฌ **Share** what you build with the community!" }, { "cell_type": "markdown", From ebfbeb379b3946e180071cba6ada3578990b4863 Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 13:57:34 -0700 Subject: [PATCH 09/19] Update OpenEnv_Tutorial.ipynb --- examples/OpenEnv_Tutorial.ipynb | 1163 ++++++++++++++++++++++++++++++- 1 file changed, 1127 insertions(+), 36 deletions(-) diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index 3f52343..f588c32 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -3,17 +3,111 @@ { "cell_type": "markdown", "metadata": {}, - "source": "
\n\n\"PyTorch\"\n\n# OpenEnv: Production RL Made Simple\n\n### *From \"Hello World\" to Production Deployment in 30 Minutes* โœจ\n\n---\n\n**What if RL environments were as easy to use as REST APIs?**\n\nThat's OpenEnv. Type-safe. Isolated. Production-ready. ๐ŸŽฏ\n\n[![GitHub](https://img.shields.io/badge/GitHub-meta--pytorch%2FOpenEnv-blue?logo=github)](https://github.com/meta-pytorch/OpenEnv)\n[![License](https://img.shields.io/badge/License-BSD%203--Clause-green.svg)](https://opensource.org/licenses/BSD-3-Clause)\n[![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?logo=pytorch&logoColor=white)](https://pytorch.org/)\n\n
\n\n---" + "source": [ + "
\n", + "\n", + "\"PyTorch\"\n", + "\n", + "Author: [Sanyam Bhutani](http://twitter.com/bhutanisanyam1/)\n", + "\n", + "# OpenEnv: Production RL Made Simple\n", + "\n", + "### *From \"Hello World\" to RL Training in 5 Minutes* โœจ\n", + "\n", + "---\n", + "\n", + "**What if RL environments were as easy to use as REST APIs?**\n", + "\n", + "That's OpenEnv. Type-safe. Isolated. Production-ready. ๐ŸŽฏ\n", + "\n", + "[![GitHub](https://img.shields.io/badge/GitHub-meta--pytorch%2FOpenEnv-blue?logo=github)](https://github.com/meta-pytorch/OpenEnv)\n", + "[![License](https://img.shields.io/badge/License-BSD%203--Clause-green.svg)](https://opensource.org/licenses/BSD-3-Clause)\n", + "[![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?logo=pytorch&logoColor=white)](https://pytorch.org/)\n", + "\n", + "
\n", + "\n", + "---" + ] }, { "cell_type": "markdown", "metadata": {}, - "source": "## ๐Ÿ“‹ What You'll Learn\n\n\n\n\n\n\n\n\n\n\n
\n\n**๐ŸŽฏ Part 1-2: The Fundamentals**\n- โšก RL in 60 seconds\n- ๐Ÿค” Why existing solutions fall short\n- ๐Ÿ’ก The OpenEnv solution\n\n\n\n**๐Ÿ—๏ธ Part 3-5: The Architecture**\n- ๐Ÿ”ง How OpenEnv works\n- ๐Ÿ” Exploring real code\n- ๐ŸŽฎ OpenSpiel integration example\n\n
\n\n**๐ŸŽฎ Part 6-8: Hands-On Demo**\n- ๐Ÿ”จ Build a game environment\n- ๐Ÿค– Test 4 different policies\n- ๐Ÿ‘€ Watch learning happen live\n\n\n\n**๐Ÿ”ง Part 9-10: Going Further**\n- ๐Ÿš€ Use real OpenSpiel\n- โœจ Create your own integration\n- ๐ŸŒ Deploy to production\n\n
\n\n> ๐Ÿ’ก **Pro Tip**: This notebook is designed to run top-to-bottom in Google Colab with zero setup!\n> \n> โฑ๏ธ **Time**: ~30 minutes | ๐Ÿ“Š **Difficulty**: Beginner-friendly | ๐ŸŽฏ **Outcome**: Production-ready RL knowledge" + "source": [ + "## ๐Ÿ“‹ What You'll Learn\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "\n", + "**๐ŸŽฏ Part 1-2: The Fundamentals**\n", + "- โšก RL in 60 seconds\n", + "- ๐Ÿค” Why existing solutions fall short\n", + "- ๐Ÿ’ก The OpenEnv solution\n", + "\n", + "\n", + "\n", + "**๐Ÿ—๏ธ Part 3-5: The Architecture**\n", + "- ๐Ÿ”ง How OpenEnv works\n", + "- ๐Ÿ” Exploring real code\n", + "- ๐ŸŽฎ OpenSpiel integration example\n", + "\n", + "
\n", + "\n", + "**๐ŸŽฎ Part 6-8: Hands-On Demo**\n", + "- ๐Ÿ”จ Build a game environment\n", + "- ๐Ÿค– Test 4 different policies\n", + "- ๐Ÿ‘€ Watch learning happen live\n", + "\n", + "\n", + "\n", + "**๐Ÿ”ง Part 9-10: Going Further**\n", + "- ๐Ÿš€ Use real OpenSpiel\n", + "- โœจ Create your own integration\n", + "- ๐ŸŒ Deploy to production\n", + "\n", + "
\n", + "\n", + "> ๐Ÿ’ก **Pro Tip**: This notebook is designed to run top-to-bottom in Google Colab with zero setup!\n", + "> \n", + "> โฑ๏ธ **Time**: ~30 minutes | ๐Ÿ“Š **Difficulty**: Beginner-friendly | ๐ŸŽฏ **Outcome**: Production-ready RL knowledge" + ] }, { "cell_type": "markdown", - "source": "---\n\n\n# Part 1: RL in 60 Seconds โฑ๏ธ\n\n
\n\n**Reinforcement Learning is simpler than you think.**\n\nIt's just a loop:\n\n```\nwhile not done:\n observation = environment.observe()\n action = policy.choose(observation)\n reward = environment.step(action)\n policy.learn(reward)\n```\n\nThat's it. That's RL.\n\n
\n\nLet's see it in action:", - "metadata": {} + "metadata": {}, + "source": [ + "---\n", + "\n", + "\n", + "# Part 1: RL in 60 Seconds โฑ๏ธ\n", + "\n", + "
\n", + "\n", + "**Reinforcement Learning is simpler than you think.**\n", + "\n", + "It's just a loop:\n", + "\n", + "```\n", + "while not done:\n", + " observation = environment.observe()\n", + " action = policy.choose(observation)\n", + " reward = environment.step(action)\n", + " policy.learn(reward)\n", + "```\n", + "\n", + "That's it. That's RL.\n", + "\n", + "
\n", + "\n", + "Let's see it in action:" + ] }, { "cell_type": "markdown", @@ -49,27 +143,254 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": "import random\n\nprint(\"๐ŸŽฒ \" + \"=\"*58 + \" ๐ŸŽฒ\")\nprint(\" Number Guessing Game - The Simplest RL Example\")\nprint(\"๐ŸŽฒ \" + \"=\"*58 + \" ๐ŸŽฒ\")\n\n# Environment setup\ntarget = random.randint(1, 10)\nguesses_left = 3\n\nprint(f\"\\n๐ŸŽฏ I'm thinking of a number between 1 and 10...\")\nprint(f\"๐Ÿ’ญ You have {guesses_left} guesses. Let's see how random guessing works!\\n\")\n\n# The RL Loop - Pure random policy (no learning!)\nwhile guesses_left > 0:\n # Policy: Random guessing (no learning yet!)\n guess = random.randint(1, 10)\n guesses_left -= 1\n \n print(f\"๐Ÿ’ญ Guess #{3-guesses_left}: {guess}\", end=\" โ†’ \")\n \n # Reward signal (but we're not using it!)\n if guess == target:\n print(\"๐ŸŽ‰ Correct! +10 points\")\n break\n elif abs(guess - target) <= 2:\n print(\"๐Ÿ”ฅ Warm! (close)\")\n else:\n print(\"โ„๏ธ Cold! (far)\")\nelse:\n print(f\"\\n๐Ÿ’” Out of guesses. The number was {target}.\")\n\nprint(\"\\n\" + \"=\"*62)\nprint(\"๐Ÿ’ก This is RL: Observe โ†’ Act โ†’ Reward โ†’ Repeat\")\nprint(\" But this policy is terrible! It doesn't learn from rewards.\")\nprint(\"=\"*62 + \"\\n\")" + "source": [ + "import random\n", + "\n", + "print(\"๐ŸŽฒ \" + \"=\"*58 + \" ๐ŸŽฒ\")\n", + "print(\" Number Guessing Game - The Simplest RL Example\")\n", + "print(\"๐ŸŽฒ \" + \"=\"*58 + \" ๐ŸŽฒ\")\n", + "\n", + "# Environment setup\n", + "target = random.randint(1, 10)\n", + "guesses_left = 3\n", + "\n", + "print(f\"\\n๐ŸŽฏ I'm thinking of a number between 1 and 10...\")\n", + "print(f\"๐Ÿ’ญ You have {guesses_left} guesses. Let's see how random guessing works!\\n\")\n", + "\n", + "# The RL Loop - Pure random policy (no learning!)\n", + "while guesses_left > 0:\n", + " # Policy: Random guessing (no learning yet!)\n", + " guess = random.randint(1, 10)\n", + " guesses_left -= 1\n", + " \n", + " print(f\"๐Ÿ’ญ Guess #{3-guesses_left}: {guess}\", end=\" โ†’ \")\n", + " \n", + " # Reward signal (but we're not using it!)\n", + " if guess == target:\n", + " print(\"๐ŸŽ‰ Correct! +10 points\")\n", + " break\n", + " elif abs(guess - target) <= 2:\n", + " print(\"๐Ÿ”ฅ Warm! (close)\")\n", + " else:\n", + " print(\"โ„๏ธ Cold! (far)\")\n", + "else:\n", + " print(f\"\\n๐Ÿ’” Out of guesses. The number was {target}.\")\n", + "\n", + "print(\"\\n\" + \"=\"*62)\n", + "print(\"๐Ÿ’ก This is RL: Observe โ†’ Act โ†’ Reward โ†’ Repeat\")\n", + "print(\" But this policy is terrible! It doesn't learn from rewards.\")\n", + "print(\"=\"*62 + \"\\n\")" + ] }, { "cell_type": "markdown", "metadata": {}, - "source": "---\n\n\n# Part 2: The Problem with Traditional RL ๐Ÿ˜ค\n\n
\n\n## ๐Ÿค” Why Can't We Just Use OpenAI Gym?\n\nGood question! Gym is great for research, but production needs more...\n\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
ChallengeTraditional ApproachOpenEnv Solution
Type SafetyโŒ obs[0][3] - what is this?โœ… obs.info_state - IDE knows!
IsolationโŒ Same process (can crash your training)โœ… Docker containers (fully isolated)
DeploymentโŒ \"Works on my machine\" ๐Ÿคทโœ… Same container everywhere ๐Ÿณ
ScalingโŒ Hard to distributeโœ… Deploy to Kubernetes โ˜ธ๏ธ
LanguageโŒ Python onlyโœ… Any language (HTTP API) ๐ŸŒ
DebuggingโŒ Cryptic numpy errorsโœ… Clear type errors ๐Ÿ›
\n\n
\n\n## ๐Ÿ’ก The OpenEnv Philosophy\n\n**\"RL environments should be like microservices\"**\n\nThink of it like this: You don't run your database in the same process as your web server, right? Same principle!\n\n- ๐Ÿ”’ **Isolated**: Run in containers (security + stability)\n- ๐ŸŒ **Standard**: HTTP API, works everywhere\n- ๐Ÿ“ฆ **Versioned**: Docker images (reproducibility!)\n- ๐Ÿš€ **Scalable**: Deploy to cloud with one command\n- ๐Ÿ›ก๏ธ **Type-safe**: Catch bugs before they happen\n- ๐Ÿ”„ **Portable**: Works on Mac, Linux, Windows, Cloud\n\n
" + "source": [ + "---\n", + "\n", + "\n", + "# Part 2: The Problem with Traditional RL ๐Ÿ˜ค\n", + "\n", + "
\n", + "\n", + "## ๐Ÿค” Why Can't We Just Use OpenAI Gym?\n", + "\n", + "Good question! Gym is great for research, but production needs more...\n", + "\n", + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
ChallengeTraditional ApproachOpenEnv Solution
Type SafetyโŒ obs[0][3] - what is this?โœ… obs.info_state - IDE knows!
IsolationโŒ Same process (can crash your training)โœ… Docker containers (fully isolated)
DeploymentโŒ \"Works on my machine\" ๐Ÿคทโœ… Same container everywhere ๐Ÿณ
ScalingโŒ Hard to distributeโœ… Deploy to Kubernetes โ˜ธ๏ธ
LanguageโŒ Python onlyโœ… Any language (HTTP API) ๐ŸŒ
DebuggingโŒ Cryptic numpy errorsโœ… Clear type errors ๐Ÿ›
\n", + "\n", + "
\n", + "\n", + "## ๐Ÿ’ก The OpenEnv Philosophy\n", + "\n", + "**\"RL environments should be like microservices\"**\n", + "\n", + "Think of it like this: You don't run your database in the same process as your web server, right? Same principle!\n", + "\n", + "- ๐Ÿ”’ **Isolated**: Run in containers (security + stability)\n", + "- ๐ŸŒ **Standard**: HTTP API, works everywhere\n", + "- ๐Ÿ“ฆ **Versioned**: Docker images (reproducibility!)\n", + "- ๐Ÿš€ **Scalable**: Deploy to cloud with one command\n", + "- ๐Ÿ›ก๏ธ **Type-safe**: Catch bugs before they happen\n", + "- ๐Ÿ”„ **Portable**: Works on Mac, Linux, Windows, Cloud\n", + "\n", + "
" + ] }, { "cell_type": "markdown", - "source": "### The Architecture\n\n```\nโ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\nโ”‚ YOUR TRAINING CODE โ”‚\nโ”‚ โ”‚\nโ”‚ env = OpenSpielEnv(...) โ† Import the client โ”‚\nโ”‚ result = env.reset() โ† Type-safe! โ”‚\nโ”‚ result = env.step(action) โ† Type-safe! โ”‚\nโ”‚ โ”‚\nโ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n โ”‚\n โ”‚ HTTP/JSON (Language-Agnostic)\n โ”‚ POST /reset, POST /step, GET /state\n โ”‚\nโ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\nโ”‚ DOCKER CONTAINER โ”‚\nโ”‚ โ”‚\nโ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚\nโ”‚ โ”‚ FastAPI Server โ”‚ โ”‚\nโ”‚ โ”‚ โ””โ”€ Environment (reset, step, state) โ”‚ โ”‚\nโ”‚ โ”‚ โ””โ”€ Your Game/Simulation Logic โ”‚ โ”‚\nโ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚\nโ”‚ โ”‚\nโ”‚ Isolated โ€ข Reproducible โ€ข Secure โ”‚\nโ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n```\n\n
\n\n**๐ŸŽฏ Key Insight**: You never see HTTP details - just clean Python methods!\n\n```python\nenv.reset() # Under the hood: HTTP POST to /reset\nenv.step(...) # Under the hood: HTTP POST to /step\nenv.state() # Under the hood: HTTP GET to /state\n```\n\nThe magic? OpenEnv handles all the plumbing. You focus on RL! โœจ\n\n
", - "metadata": {} + "metadata": {}, + "source": [ + "### The Architecture\n", + "\n", + "```\n", + "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", + "โ”‚ YOUR TRAINING CODE โ”‚\n", + "โ”‚ โ”‚\n", + "โ”‚ env = OpenSpielEnv(...) โ† Import the client โ”‚\n", + "โ”‚ result = env.reset() โ† Type-safe! โ”‚\n", + "โ”‚ result = env.step(action) โ† Type-safe! โ”‚\n", + "โ”‚ โ”‚\n", + "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", + " โ”‚\n", + " โ”‚ HTTP/JSON (Language-Agnostic)\n", + " โ”‚ POST /reset, POST /step, GET /state\n", + " โ”‚\n", + "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", + "โ”‚ DOCKER CONTAINER โ”‚\n", + "โ”‚ โ”‚\n", + "โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚\n", + "โ”‚ โ”‚ FastAPI Server โ”‚ โ”‚\n", + "โ”‚ โ”‚ โ””โ”€ Environment (reset, step, state) โ”‚ โ”‚\n", + "โ”‚ โ”‚ โ””โ”€ Your Game/Simulation Logic โ”‚ โ”‚\n", + "โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚\n", + "โ”‚ โ”‚\n", + "โ”‚ Isolated โ€ข Reproducible โ€ข Secure โ”‚\n", + "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", + "```\n", + "\n", + "
\n", + "\n", + "**๐ŸŽฏ Key Insight**: You never see HTTP details - just clean Python methods!\n", + "\n", + "```python\n", + "env.reset() # Under the hood: HTTP POST to /reset\n", + "env.step(...) # Under the hood: HTTP POST to /step\n", + "env.state() # Under the hood: HTTP GET to /state\n", + "```\n", + "\n", + "The magic? OpenEnv handles all the plumbing. You focus on RL! โœจ\n", + "\n", + "
" + ] }, { "cell_type": "markdown", "metadata": {}, - "source": "---\n\n# Part 2: The Problem with Traditional RL ๐Ÿ˜ค\n\n
\n\n## ๐Ÿค” Why Can't We Just Use OpenAI Gym?\n\nGood question! Gym is great for research, but production needs more...\n\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
ChallengeTraditional ApproachOpenEnv Solution
Type SafetyโŒ obs[0][3] - what is this?โœ… obs.info_state - IDE knows!
IsolationโŒ Same process (can crash your training)โœ… Docker containers (fully isolated)
DeploymentโŒ \"Works on my machine\" ๐Ÿคทโœ… Same container everywhere ๐Ÿณ
ScalingโŒ Hard to distributeโœ… Deploy to Kubernetes โ˜ธ๏ธ
LanguageโŒ Python onlyโœ… Any language (HTTP API) ๐ŸŒ
DebuggingโŒ Cryptic numpy errorsโœ… Clear type errors ๐Ÿ›
\n\n
\n\n## ๐Ÿ’ก The OpenEnv Philosophy\n\n**\"RL environments should be like microservices\"**\n\nThink of it like this: You don't run your database in the same process as your web server, right? Same principle!\n\n- ๐Ÿ”’ **Isolated**: Run in containers (security + stability)\n- ๐ŸŒ **Standard**: HTTP API, works everywhere\n- ๐Ÿ“ฆ **Versioned**: Docker images (reproducibility!)\n- ๐Ÿš€ **Scalable**: Deploy to cloud with one command\n- ๐Ÿ›ก๏ธ **Type-safe**: Catch bugs before they happen\n- ๐Ÿ”„ **Portable**: Works on Mac, Linux, Windows, Cloud\n\n
" + "source": [ + "---\n", + "\n", + "# Part 2: The Problem with Traditional RL ๐Ÿ˜ค\n", + "\n", + "
\n", + "\n", + "## ๐Ÿค” Why Can't We Just Use OpenAI Gym?\n", + "\n", + "Good question! Gym is great for research, but production needs more...\n", + "\n", + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
ChallengeTraditional ApproachOpenEnv Solution
Type SafetyโŒ obs[0][3] - what is this?โœ… obs.info_state - IDE knows!
IsolationโŒ Same process (can crash your training)โœ… Docker containers (fully isolated)
DeploymentโŒ \"Works on my machine\" ๐Ÿคทโœ… Same container everywhere ๐Ÿณ
ScalingโŒ Hard to distributeโœ… Deploy to Kubernetes โ˜ธ๏ธ
LanguageโŒ Python onlyโœ… Any language (HTTP API) ๐ŸŒ
DebuggingโŒ Cryptic numpy errorsโœ… Clear type errors ๐Ÿ›
\n", + "\n", + "
\n", + "\n", + "## ๐Ÿ’ก The OpenEnv Philosophy\n", + "\n", + "**\"RL environments should be like microservices\"**\n", + "\n", + "Think of it like this: You don't run your database in the same process as your web server, right? Same principle!\n", + "\n", + "- ๐Ÿ”’ **Isolated**: Run in containers (security + stability)\n", + "- ๐ŸŒ **Standard**: HTTP API, works everywhere\n", + "- ๐Ÿ“ฆ **Versioned**: Docker images (reproducibility!)\n", + "- ๐Ÿš€ **Scalable**: Deploy to cloud with one command\n", + "- ๐Ÿ›ก๏ธ **Type-safe**: Catch bugs before they happen\n", + "- ๐Ÿ”„ **Portable**: Works on Mac, Linux, Windows, Cloud\n", + "\n", + "
" + ] }, { "cell_type": "markdown", "metadata": {}, - "source": "---\n\n\n# Part 3: Setup ๐Ÿ› ๏ธ\n\n
\n\n**Running in Colab?** This cell will clone OpenEnv and install dependencies automatically.\n\n**Running locally?** Make sure you're in the OpenEnv directory.\n\n
" + "source": [ + "---\n", + "\n", + "\n", + "# Part 3: Setup ๐Ÿ› ๏ธ\n", + "\n", + "
\n", + "\n", + "**Running in Colab?** This cell will clone OpenEnv and install dependencies automatically.\n", + "\n", + "**Running locally?** Make sure you're in the OpenEnv directory.\n", + "\n", + "
" + ] }, { "cell_type": "markdown", @@ -93,7 +414,34 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": "---\n\n\n# Part 4: The OpenEnv Pattern ๐Ÿ—๏ธ\n\n
\n\n## Every OpenEnv Environment Has 3 Components:\n\n```\nsrc/envs/your_env/\nโ”œโ”€โ”€ ๐Ÿ“ models.py โ† Type-safe contracts\nโ”‚ (Action, Observation, State)\nโ”‚\nโ”œโ”€โ”€ ๐Ÿ“ฑ client.py โ† What YOU import\nโ”‚ (HTTPEnvClient implementation)\nโ”‚\nโ””โ”€โ”€ ๐Ÿ–ฅ๏ธ server/\n โ”œโ”€โ”€ environment.py โ† Game/simulation logic\n โ”œโ”€โ”€ app.py โ† FastAPI server\n โ””โ”€โ”€ Dockerfile โ† Container definition\n```\n\n
\n\nLet's explore the actual OpenEnv code to see how this works:" + "source": [ + "---\n", + "\n", + "\n", + "# Part 4: The OpenEnv Pattern ๐Ÿ—๏ธ\n", + "\n", + "
\n", + "\n", + "## Every OpenEnv Environment Has 3 Components:\n", + "\n", + "```\n", + "src/envs/your_env/\n", + "โ”œโ”€โ”€ ๐Ÿ“ models.py โ† Type-safe contracts\n", + "โ”‚ (Action, Observation, State)\n", + "โ”‚\n", + "โ”œโ”€โ”€ ๐Ÿ“ฑ client.py โ† What YOU import\n", + "โ”‚ (HTTPEnvClient implementation)\n", + "โ”‚\n", + "โ””โ”€โ”€ ๐Ÿ–ฅ๏ธ server/\n", + " โ”œโ”€โ”€ environment.py โ† Game/simulation logic\n", + " โ”œโ”€โ”€ app.py โ† FastAPI server\n", + " โ””โ”€โ”€ Dockerfile โ† Container definition\n", + "```\n", + "\n", + "
\n", + "\n", + "Let's explore the actual OpenEnv code to see how this works:" + ] }, { "cell_type": "markdown", @@ -131,7 +479,47 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": "---\n\n\n# Part 5: Example Integration - OpenSpiel ๐ŸŽฎ\n\n
\n\n## What is OpenSpiel?\n\n**OpenSpiel** is a library from DeepMind with **70+ game environments** for RL research.\n\n## OpenEnv's Integration\n\nWe've wrapped **6 OpenSpiel games** following the OpenEnv pattern:\n\n\n\n\n\n\n
\n\n**๐ŸŽฏ Single-Player**\n1. **Catch** - Catch falling ball\n2. **Cliff Walking** - Navigate grid\n3. **2048** - Tile puzzle\n4. **Blackjack** - Card game\n\n\n\n**๐Ÿ‘ฅ Multi-Player**\n5. **Tic-Tac-Toe** - Classic 3ร—3\n6. **Kuhn Poker** - Imperfect info poker\n\n
\n\nThis shows how OpenEnv can wrap **any** existing RL library!\n\n
" + "source": [ + "---\n", + "\n", + "\n", + "# Part 5: Example Integration - OpenSpiel ๐ŸŽฎ\n", + "\n", + "
\n", + "\n", + "## What is OpenSpiel?\n", + "\n", + "**OpenSpiel** is a library from DeepMind with **70+ game environments** for RL research.\n", + "\n", + "## OpenEnv's Integration\n", + "\n", + "We've wrapped **6 OpenSpiel games** following the OpenEnv pattern:\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "\n", + "**๐ŸŽฏ Single-Player**\n", + "1. **Catch** - Catch falling ball\n", + "2. **Cliff Walking** - Navigate grid\n", + "3. **2048** - Tile puzzle\n", + "4. **Blackjack** - Card game\n", + "\n", + "\n", + "\n", + "**๐Ÿ‘ฅ Multi-Player**\n", + "5. **Tic-Tac-Toe** - Classic 3ร—3\n", + "6. **Kuhn Poker** - Imperfect info poker\n", + "\n", + "
\n", + "\n", + "This shows how OpenEnv can wrap **any** existing RL library!\n", + "\n", + "
" + ] }, { "cell_type": "markdown", @@ -237,18 +625,61 @@ "
" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": "---\n\n\n
\n\n# ๐ŸŽฎ Part 6: Interactive Demo\n\n### Now let's BUILD something!\n\nWe'll create a **Catch game** following OpenEnv patterns,
\nthen watch **4 different AI policies** compete for the championship! ๐Ÿ†\n\n
\n\n**Get ready for:**\n- โšก Live gameplay visualization\n- ๐Ÿค– AI policy showdown\n- ๐Ÿ“Š Real-time learning metrics\n- ๐ŸŽฏ Production-ready patterns\n\n
" - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": "---\n\n
\n\n# ๐ŸŽฎ Part 6: Interactive Demo\n\n### Now let's BUILD something!\n\nWe'll create a **Catch game** following OpenEnv patterns,
\nthen watch **4 different AI policies** compete for the championship! ๐Ÿ†\n\n
\n\n**Get ready for:**\n- โšก Live gameplay visualization\n- ๐Ÿค– AI policy showdown\n- ๐Ÿ“Š Real-time learning metrics\n- ๐ŸŽฏ Production-ready patterns\n\n
" - }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "---\n", + "\n", + "\n", + "
\n", + "\n", + "# ๐ŸŽฎ Part 6: Interactive Demo\n", + "\n", + "### Now let's BUILD something!\n", + "\n", + "We'll create a **Catch game** following OpenEnv patterns,
\n", + "then watch **4 different AI policies** compete for the championship! ๐Ÿ†\n", + "\n", + "
\n", + "\n", + "**Get ready for:**\n", + "- โšก Live gameplay visualization\n", + "- ๐Ÿค– AI policy showdown\n", + "- ๐Ÿ“Š Real-time learning metrics\n", + "- ๐ŸŽฏ Production-ready patterns\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "
\n", + "\n", + "# ๐ŸŽฎ Part 6: Interactive Demo\n", + "\n", + "### Now let's BUILD something!\n", + "\n", + "We'll create a **Catch game** following OpenEnv patterns,
\n", + "then watch **4 different AI policies** compete for the championship! ๐Ÿ†\n", + "\n", + "
\n", + "\n", + "**Get ready for:**\n", + "- โšก Live gameplay visualization\n", + "- ๐Ÿค– AI policy showdown\n", + "- ๐Ÿ“Š Real-time learning metrics\n", + "- ๐ŸŽฏ Production-ready patterns\n", + "\n", + "
" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -306,7 +737,123 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": "import random\nfrom dataclasses import dataclass\nfrom typing import List, Tuple\n\n# ============================================================================\n# MODELS - Type-safe contracts (following OpenEnv pattern)\n# ============================================================================\n\n@dataclass\nclass CatchObservation:\n \"\"\"Type-safe observation following OpenEnv Observation base class.\"\"\"\n info_state: List[float] # Grid as flat array\n legal_actions: List[int] # [0, 1, 2] always\n done: bool # Episode finished?\n reward: float # +1 or 0\n # Extra fields for visualization\n ball_position: Tuple[int, int]\n paddle_position: int\n\n\n# ============================================================================\n# ENVIRONMENT - Server-side logic (following OpenEnv Environment pattern)\n# ============================================================================\n\nclass CatchEnvironment:\n \"\"\"\n Catch game following OpenEnv's Environment pattern.\n \n In production:\n โ€ข Runs in Docker container\n โ€ข Accessed via HTTPEnvClient\n โ€ข Exposed via FastAPI server\n \n For this demo:\n โ€ข We run it locally to see internals\n โ€ข But the structure is identical!\n \"\"\"\n \n def __init__(self, grid_size=5):\n self.grid_size = grid_size\n \n def reset(self) -> CatchObservation:\n \"\"\"Start new episode (implements Environment.reset()).\"\"\"\n self.ball_row = 0\n self.ball_col = random.randint(0, self.grid_size - 1)\n self.paddle_col = self.grid_size // 2\n self.done = False\n return self._make_observation()\n \n def step(self, action: int) -> CatchObservation:\n \"\"\"Execute action (implements Environment.step()).\n \n Args:\n action: 0=LEFT, 1=STAY, 2=RIGHT\n \"\"\"\n # Move paddle\n if action == 0 and self.paddle_col > 0:\n self.paddle_col -= 1\n elif action == 2 and self.paddle_col < self.grid_size - 1:\n self.paddle_col += 1\n \n # Move ball down\n self.ball_row += 1\n \n # Check if episode done\n if self.ball_row >= self.grid_size - 1:\n self.done = True\n reward = 1.0 if self.ball_col == self.paddle_col else 0.0\n else:\n reward = 0.0\n \n return self._make_observation(reward)\n \n def _make_observation(self, reward=0.0) -> CatchObservation:\n \"\"\"Create type-safe observation.\"\"\"\n # Flatten grid to vector (like real RL environments do)\n info_state = [0.0] * (self.grid_size * self.grid_size)\n ball_idx = self.ball_row * self.grid_size + self.ball_col\n paddle_idx = (self.grid_size - 1) * self.grid_size + self.paddle_col\n info_state[ball_idx] = 1.0 # Ball = 1.0\n info_state[paddle_idx] = 0.5 # Paddle = 0.5\n \n return CatchObservation(\n info_state=info_state,\n legal_actions=[0, 1, 2],\n done=self.done,\n reward=reward,\n ball_position=(self.ball_row, self.ball_col),\n paddle_position=self.paddle_col\n )\n \n def render(self):\n \"\"\"Visualize current state.\"\"\"\n for row in range(self.grid_size):\n line = \" \"\n for col in range(self.grid_size):\n if row == self.ball_row and col == self.ball_col:\n line += \"๐Ÿ”ด \"\n elif row == self.grid_size - 1 and col == self.paddle_col:\n line += \"๐Ÿ“ \"\n else:\n line += \"โฌœ \"\n print(line)\n\n\nprint(\"๐ŸŽ‰ \" + \"=\"*64 + \" ๐ŸŽ‰\")\nprint(\" โœ… Environment Created Following OpenEnv Pattern!\")\nprint(\"๐ŸŽ‰ \" + \"=\"*64 + \" ๐ŸŽ‰\")\nprint(\"\\n๐Ÿ“‹ What we just built:\")\nprint(\" โ€ข reset() โ†’ CatchObservation (type-safe!)\")\nprint(\" โ€ข step(action) โ†’ CatchObservation (type-safe!)\")\nprint(\" โ€ข render() โ†’ Visual display\")\nprint(\"\\n๐Ÿš€ In production: This would run in Docker + FastAPI\")\nprint(\" But the structure is EXACTLY the same!\")\nprint(\"\\n๐Ÿ’ก This is your blueprint for creating ANY OpenEnv environment!\\n\")" + "source": [ + "import random\n", + "from dataclasses import dataclass\n", + "from typing import List, Tuple\n", + "\n", + "# ============================================================================\n", + "# MODELS - Type-safe contracts (following OpenEnv pattern)\n", + "# ============================================================================\n", + "\n", + "@dataclass\n", + "class CatchObservation:\n", + " \"\"\"Type-safe observation following OpenEnv Observation base class.\"\"\"\n", + " info_state: List[float] # Grid as flat array\n", + " legal_actions: List[int] # [0, 1, 2] always\n", + " done: bool # Episode finished?\n", + " reward: float # +1 or 0\n", + " # Extra fields for visualization\n", + " ball_position: Tuple[int, int]\n", + " paddle_position: int\n", + "\n", + "\n", + "# ============================================================================\n", + "# ENVIRONMENT - Server-side logic (following OpenEnv Environment pattern)\n", + "# ============================================================================\n", + "\n", + "class CatchEnvironment:\n", + " \"\"\"\n", + " Catch game following OpenEnv's Environment pattern.\n", + " \n", + " In production:\n", + " โ€ข Runs in Docker container\n", + " โ€ข Accessed via HTTPEnvClient\n", + " โ€ข Exposed via FastAPI server\n", + " \n", + " For this demo:\n", + " โ€ข We run it locally to see internals\n", + " โ€ข But the structure is identical!\n", + " \"\"\"\n", + " \n", + " def __init__(self, grid_size=5):\n", + " self.grid_size = grid_size\n", + " \n", + " def reset(self) -> CatchObservation:\n", + " \"\"\"Start new episode (implements Environment.reset()).\"\"\"\n", + " self.ball_row = 0\n", + " self.ball_col = random.randint(0, self.grid_size - 1)\n", + " self.paddle_col = self.grid_size // 2\n", + " self.done = False\n", + " return self._make_observation()\n", + " \n", + " def step(self, action: int) -> CatchObservation:\n", + " \"\"\"Execute action (implements Environment.step()).\n", + " \n", + " Args:\n", + " action: 0=LEFT, 1=STAY, 2=RIGHT\n", + " \"\"\"\n", + " # Move paddle\n", + " if action == 0 and self.paddle_col > 0:\n", + " self.paddle_col -= 1\n", + " elif action == 2 and self.paddle_col < self.grid_size - 1:\n", + " self.paddle_col += 1\n", + " \n", + " # Move ball down\n", + " self.ball_row += 1\n", + " \n", + " # Check if episode done\n", + " if self.ball_row >= self.grid_size - 1:\n", + " self.done = True\n", + " reward = 1.0 if self.ball_col == self.paddle_col else 0.0\n", + " else:\n", + " reward = 0.0\n", + " \n", + " return self._make_observation(reward)\n", + " \n", + " def _make_observation(self, reward=0.0) -> CatchObservation:\n", + " \"\"\"Create type-safe observation.\"\"\"\n", + " # Flatten grid to vector (like real RL environments do)\n", + " info_state = [0.0] * (self.grid_size * self.grid_size)\n", + " ball_idx = self.ball_row * self.grid_size + self.ball_col\n", + " paddle_idx = (self.grid_size - 1) * self.grid_size + self.paddle_col\n", + " info_state[ball_idx] = 1.0 # Ball = 1.0\n", + " info_state[paddle_idx] = 0.5 # Paddle = 0.5\n", + " \n", + " return CatchObservation(\n", + " info_state=info_state,\n", + " legal_actions=[0, 1, 2],\n", + " done=self.done,\n", + " reward=reward,\n", + " ball_position=(self.ball_row, self.ball_col),\n", + " paddle_position=self.paddle_col\n", + " )\n", + " \n", + " def render(self):\n", + " \"\"\"Visualize current state.\"\"\"\n", + " for row in range(self.grid_size):\n", + " line = \" \"\n", + " for col in range(self.grid_size):\n", + " if row == self.ball_row and col == self.ball_col:\n", + " line += \"๐Ÿ”ด \"\n", + " elif row == self.grid_size - 1 and col == self.paddle_col:\n", + " line += \"๐Ÿ“ \"\n", + " else:\n", + " line += \"โฌœ \"\n", + " print(line)\n", + "\n", + "\n", + "print(\"๐ŸŽ‰ \" + \"=\"*64 + \" ๐ŸŽ‰\")\n", + "print(\" โœ… Environment Created Following OpenEnv Pattern!\")\n", + "print(\"๐ŸŽ‰ \" + \"=\"*64 + \" ๐ŸŽ‰\")\n", + "print(\"\\n๐Ÿ“‹ What we just built:\")\n", + "print(\" โ€ข reset() โ†’ CatchObservation (type-safe!)\")\n", + "print(\" โ€ข step(action) โ†’ CatchObservation (type-safe!)\")\n", + "print(\" โ€ข render() โ†’ Visual display\")\n", + "print(\"\\n๐Ÿš€ In production: This would run in Docker + FastAPI\")\n", + "print(\" But the structure is EXACTLY the same!\")\n", + "print(\"\\n๐Ÿ’ก This is your blueprint for creating ANY OpenEnv environment!\\n\")" + ] }, { "cell_type": "markdown", @@ -320,7 +867,46 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": "---\n\n\n# Part 7: Four Policies ๐Ÿค–\n\n
\n\n## Let's test 4 different AI strategies:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
PolicyStrategyExpected Performance
๐ŸŽฒ RandomPick random action every step~20% (pure luck)
๐Ÿ›‘ Always StayNever move, hope ball lands in center~20% (terrible!)
๐Ÿง  SmartMove paddle toward ball100% (optimal!)
๐Ÿ“ˆ LearningStart random, learn smart strategy~85% (improves over time)
\n\n
" + "source": [ + "---\n", + "\n", + "\n", + "# Part 7: Four Policies ๐Ÿค–\n", + "\n", + "
\n", + "\n", + "## Let's test 4 different AI strategies:\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
PolicyStrategyExpected Performance
๐ŸŽฒ RandomPick random action every step~20% (pure luck)
๐Ÿ›‘ Always StayNever move, hope ball lands in center~20% (terrible!)
๐Ÿง  SmartMove paddle toward ball100% (optimal!)
๐Ÿ“ˆ LearningStart random, learn smart strategy~85% (improves over time)
\n", + "\n", + "
" + ] }, { "cell_type": "markdown", @@ -370,7 +956,82 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": "# ============================================================================\n# POLICIES - Different AI strategies\n# ============================================================================\n\nclass RandomPolicy:\n \"\"\"Baseline: Pure random guessing.\"\"\"\n name = \"๐ŸŽฒ Random Guesser\"\n \n def select_action(self, obs: CatchObservation) -> int:\n return random.choice(obs.legal_actions)\n\n\nclass AlwaysStayPolicy:\n \"\"\"Bad strategy: Never moves.\"\"\"\n name = \"๐Ÿ›‘ Always Stay\"\n \n def select_action(self, obs: CatchObservation) -> int:\n return 1 # STAY\n\n\nclass SmartPolicy:\n \"\"\"Optimal: Move paddle toward ball.\"\"\"\n name = \"๐Ÿง  Smart Heuristic\"\n \n def select_action(self, obs: CatchObservation) -> int:\n ball_col = obs.ball_position[1]\n paddle_col = obs.paddle_position\n \n if paddle_col < ball_col:\n return 2 # Move RIGHT\n elif paddle_col > ball_col:\n return 0 # Move LEFT\n else:\n return 1 # STAY (already aligned)\n\n\nclass LearningPolicy:\n \"\"\"Simulated RL: Epsilon-greedy exploration.\"\"\"\n name = \"๐Ÿ“ˆ Learning Agent\"\n \n def __init__(self):\n self.steps = 0\n \n def select_action(self, obs: CatchObservation) -> int:\n self.steps += 1\n \n # Decay exploration rate over time\n epsilon = max(0.1, 1.0 - (self.steps / 100))\n \n if random.random() < epsilon:\n # Explore: random action\n return random.choice(obs.legal_actions)\n else:\n # Exploit: use smart strategy\n ball_col = obs.ball_position[1]\n paddle_col = obs.paddle_position\n if paddle_col < ball_col:\n return 2\n elif paddle_col > ball_col:\n return 0\n else:\n return 1\n\n\nprint(\"๐Ÿค– \" + \"=\"*64 + \" ๐Ÿค–\")\nprint(\" โœ… 4 Policies Created!\")\nprint(\"๐Ÿค– \" + \"=\"*64 + \" ๐Ÿค–\\n\")\n\npolicies = [RandomPolicy(), AlwaysStayPolicy(), SmartPolicy(), LearningPolicy()]\nfor i, policy in enumerate(policies, 1):\n print(f\" {i}. {policy.name}\")\n\nprint(\"\\n๐Ÿ’ก Each policy represents a different approach to solving the game!\")\nprint(\" Let's see who performs best! ๐Ÿ†\\n\")" + "source": [ + "# ============================================================================\n", + "# POLICIES - Different AI strategies\n", + "# ============================================================================\n", + "\n", + "class RandomPolicy:\n", + " \"\"\"Baseline: Pure random guessing.\"\"\"\n", + " name = \"๐ŸŽฒ Random Guesser\"\n", + " \n", + " def select_action(self, obs: CatchObservation) -> int:\n", + " return random.choice(obs.legal_actions)\n", + "\n", + "\n", + "class AlwaysStayPolicy:\n", + " \"\"\"Bad strategy: Never moves.\"\"\"\n", + " name = \"๐Ÿ›‘ Always Stay\"\n", + " \n", + " def select_action(self, obs: CatchObservation) -> int:\n", + " return 1 # STAY\n", + "\n", + "\n", + "class SmartPolicy:\n", + " \"\"\"Optimal: Move paddle toward ball.\"\"\"\n", + " name = \"๐Ÿง  Smart Heuristic\"\n", + " \n", + " def select_action(self, obs: CatchObservation) -> int:\n", + " ball_col = obs.ball_position[1]\n", + " paddle_col = obs.paddle_position\n", + " \n", + " if paddle_col < ball_col:\n", + " return 2 # Move RIGHT\n", + " elif paddle_col > ball_col:\n", + " return 0 # Move LEFT\n", + " else:\n", + " return 1 # STAY (already aligned)\n", + "\n", + "\n", + "class LearningPolicy:\n", + " \"\"\"Simulated RL: Epsilon-greedy exploration.\"\"\"\n", + " name = \"๐Ÿ“ˆ Learning Agent\"\n", + " \n", + " def __init__(self):\n", + " self.steps = 0\n", + " \n", + " def select_action(self, obs: CatchObservation) -> int:\n", + " self.steps += 1\n", + " \n", + " # Decay exploration rate over time\n", + " epsilon = max(0.1, 1.0 - (self.steps / 100))\n", + " \n", + " if random.random() < epsilon:\n", + " # Explore: random action\n", + " return random.choice(obs.legal_actions)\n", + " else:\n", + " # Exploit: use smart strategy\n", + " ball_col = obs.ball_position[1]\n", + " paddle_col = obs.paddle_position\n", + " if paddle_col < ball_col:\n", + " return 2\n", + " elif paddle_col > ball_col:\n", + " return 0\n", + " else:\n", + " return 1\n", + "\n", + "\n", + "print(\"๐Ÿค– \" + \"=\"*64 + \" ๐Ÿค–\")\n", + "print(\" โœ… 4 Policies Created!\")\n", + "print(\"๐Ÿค– \" + \"=\"*64 + \" ๐Ÿค–\\n\")\n", + "\n", + "policies = [RandomPolicy(), AlwaysStayPolicy(), SmartPolicy(), LearningPolicy()]\n", + "for i, policy in enumerate(policies, 1):\n", + " print(f\" {i}. {policy.name}\")\n", + "\n", + "print(\"\\n๐Ÿ’ก Each policy represents a different approach to solving the game!\")\n", + "print(\" Let's see who performs best! ๐Ÿ†\\n\")" + ] }, { "cell_type": "markdown", @@ -441,7 +1102,18 @@ { "cell_type": "markdown", "metadata": {}, - "source": "---\n\n\n# Part 8: Policy Competition! ๐Ÿ†\n\n
\n\nLet's run **50 episodes** for each policy and see who wins!\n\n
" + "source": [ + "---\n", + "\n", + "\n", + "# Part 8: Policy Competition! ๐Ÿ†\n", + "\n", + "
\n", + "\n", + "Let's run **50 episodes** for each policy and see who wins!\n", + "\n", + "
" + ] }, { "cell_type": "markdown", @@ -463,22 +1135,296 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": "def evaluate_policies(num_episodes=50):\n \"\"\"Compare all policies over many episodes.\"\"\"\n policies = [\n RandomPolicy(),\n AlwaysStayPolicy(),\n SmartPolicy(),\n LearningPolicy(),\n ]\n \n print(\"\\n๐Ÿ† \" + \"=\"*66 + \" ๐Ÿ†\")\n print(f\" POLICY SHOWDOWN - {num_episodes} Episodes Each\")\n print(\"๐Ÿ† \" + \"=\"*66 + \" ๐Ÿ†\\n\")\n \n results = []\n for policy in policies:\n print(f\"โšก Testing {policy.name}...\", end=\" \")\n env = CatchEnvironment()\n successes = sum(run_episode(env, policy, visualize=False) \n for _ in range(num_episodes))\n success_rate = (successes / num_episodes) * 100\n results.append((policy.name, success_rate, successes))\n print(f\"โœ“ Done!\")\n \n print(\"\\n\" + \"=\"*70)\n print(\" ๐Ÿ“Š FINAL RESULTS\")\n print(\"=\"*70 + \"\\n\")\n \n # Sort by success rate (descending)\n results.sort(key=lambda x: x[1], reverse=True)\n \n # Award medals to top 3\n medals = [\"๐Ÿฅ‡\", \"๐Ÿฅˆ\", \"๐Ÿฅ‰\", \" \"]\n \n for i, (name, rate, successes) in enumerate(results):\n medal = medals[i]\n bar = \"โ–ˆ\" * int(rate / 2)\n print(f\"{medal} {name:25s} [{bar:<50}] {rate:5.1f}% ({successes}/{num_episodes})\")\n \n print(\"\\n\" + \"=\"*70)\n print(\"\\nโœจ Key Insights:\")\n print(\" โ€ข Random (~20%): Baseline - pure luck ๐ŸŽฒ\")\n print(\" โ€ข Always Stay (~20%): Bad strategy - stays center ๐Ÿ›‘\")\n print(\" โ€ข Smart (100%): Optimal - perfect play! ๐Ÿง \")\n print(\" โ€ข Learning (~85%): Improves over time ๐Ÿ“ˆ\")\n print(\"\\n๐ŸŽ“ This is Reinforcement Learning in action:\")\n print(\" 1. Start with exploration (trying random things)\")\n print(\" 2. Learn from rewards (what works, what doesn't)\")\n print(\" 3. Converge to optimal behavior (smart strategy)\")\n print(\"\\n๐ŸŽฏ The Learning Agent gets smarter with every episode!\\n\")\n\n# Run the epic competition!\nprint(\"๐ŸŽฎ Starting the showdown...\")\nevaluate_policies(num_episodes=50)" + "source": [ + "def evaluate_policies(num_episodes=50):\n", + " \"\"\"Compare all policies over many episodes.\"\"\"\n", + " policies = [\n", + " RandomPolicy(),\n", + " AlwaysStayPolicy(),\n", + " SmartPolicy(),\n", + " LearningPolicy(),\n", + " ]\n", + " \n", + " print(\"\\n๐Ÿ† \" + \"=\"*66 + \" ๐Ÿ†\")\n", + " print(f\" POLICY SHOWDOWN - {num_episodes} Episodes Each\")\n", + " print(\"๐Ÿ† \" + \"=\"*66 + \" ๐Ÿ†\\n\")\n", + " \n", + " results = []\n", + " for policy in policies:\n", + " print(f\"โšก Testing {policy.name}...\", end=\" \")\n", + " env = CatchEnvironment()\n", + " successes = sum(run_episode(env, policy, visualize=False) \n", + " for _ in range(num_episodes))\n", + " success_rate = (successes / num_episodes) * 100\n", + " results.append((policy.name, success_rate, successes))\n", + " print(f\"โœ“ Done!\")\n", + " \n", + " print(\"\\n\" + \"=\"*70)\n", + " print(\" ๐Ÿ“Š FINAL RESULTS\")\n", + " print(\"=\"*70 + \"\\n\")\n", + " \n", + " # Sort by success rate (descending)\n", + " results.sort(key=lambda x: x[1], reverse=True)\n", + " \n", + " # Award medals to top 3\n", + " medals = [\"๐Ÿฅ‡\", \"๐Ÿฅˆ\", \"๐Ÿฅ‰\", \" \"]\n", + " \n", + " for i, (name, rate, successes) in enumerate(results):\n", + " medal = medals[i]\n", + " bar = \"โ–ˆ\" * int(rate / 2)\n", + " print(f\"{medal} {name:25s} [{bar:<50}] {rate:5.1f}% ({successes}/{num_episodes})\")\n", + " \n", + " print(\"\\n\" + \"=\"*70)\n", + " print(\"\\nโœจ Key Insights:\")\n", + " print(\" โ€ข Random (~20%): Baseline - pure luck ๐ŸŽฒ\")\n", + " print(\" โ€ข Always Stay (~20%): Bad strategy - stays center ๐Ÿ›‘\")\n", + " print(\" โ€ข Smart (100%): Optimal - perfect play! ๐Ÿง \")\n", + " print(\" โ€ข Learning (~85%): Improves over time ๐Ÿ“ˆ\")\n", + " print(\"\\n๐ŸŽ“ This is Reinforcement Learning in action:\")\n", + " print(\" 1. Start with exploration (trying random things)\")\n", + " print(\" 2. Learn from rewards (what works, what doesn't)\")\n", + " print(\" 3. Converge to optimal behavior (smart strategy)\")\n", + " print(\"\\n๐ŸŽฏ The Learning Agent gets smarter with every episode!\\n\")\n", + "\n", + "# Run the epic competition!\n", + "print(\"๐ŸŽฎ Starting the showdown...\")\n", + "evaluate_policies(num_episodes=50)" + ] }, { "cell_type": "markdown", "metadata": {}, - "source": "---\n\n\n# Part 9: Using Real OpenSpiel ๐ŸŽฎ\n\n
\n\n## What We Just Built vs Production OpenSpiel\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
ComponentOur DemoOpenEnv + OpenSpiel
EnvironmentLocal Python classDocker container
CommunicationDirect function callsHTTP/JSON
ClientDirect accessHTTPEnvClient
Type Safetyโœ… Dataclassesโœ… Dataclasses
APIreset(), step()reset(), step() (same!)
\n\n**๐ŸŽฏ Same structure, production features!**\n\n
\n\n### Using OpenSpiel Integration:\n\n```python\n# 1. Install OpenSpiel\n!pip install open_spiel\n\n# 2. Import OpenEnv's integration\nfrom envs.openspiel_env import OpenSpielEnv, OpenSpielAction\n\n# 3. Connect to server (HTTP!)\nenv = OpenSpielEnv(base_url=\"http://localhost:8000\")\n\n# 4. Same API you just learned!\nresult = env.reset()\nresult = env.step(OpenSpielAction(action_id=2, game_name=\"catch\"))\nstate = env.state()\n\n# 5. Switch games by changing game_name:\nresult = env.step(OpenSpielAction(action_id=4, game_name=\"tic_tac_toe\"))\n```\n\n
\n\n**๐ŸŽฎ 6 Games Available:**\n\n1. `\"catch\"` - What we just built!\n2. `\"tic_tac_toe\"` - Classic 3ร—3\n3. `\"kuhn_poker\"` - Imperfect information poker\n4. `\"cliff_walking\"` - Grid navigation\n5. `\"2048\"` - Tile puzzle\n6. `\"blackjack\"` - Card game\n\n**All use the exact same interface!**\n\n
" + "source": [ + "---\n", + "\n", + "\n", + "# Part 9: Using Real OpenSpiel ๐ŸŽฎ\n", + "\n", + "
\n", + "\n", + "## What We Just Built vs Production OpenSpiel\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
ComponentOur DemoOpenEnv + OpenSpiel
EnvironmentLocal Python classDocker container
CommunicationDirect function callsHTTP/JSON
ClientDirect accessHTTPEnvClient
Type Safetyโœ… Dataclassesโœ… Dataclasses
APIreset(), step()reset(), step() (same!)
\n", + "\n", + "**๐ŸŽฏ Same structure, production features!**\n", + "\n", + "
\n", + "\n", + "### Using OpenSpiel Integration:\n", + "\n", + "```python\n", + "# 1. Install OpenSpiel\n", + "!pip install open_spiel\n", + "\n", + "# 2. Import OpenEnv's integration\n", + "from envs.openspiel_env import OpenSpielEnv, OpenSpielAction\n", + "\n", + "# 3. Connect to server (HTTP!)\n", + "env = OpenSpielEnv(base_url=\"http://localhost:8000\")\n", + "\n", + "# 4. Same API you just learned!\n", + "result = env.reset()\n", + "result = env.step(OpenSpielAction(action_id=2, game_name=\"catch\"))\n", + "state = env.state()\n", + "\n", + "# 5. Switch games by changing game_name:\n", + "result = env.step(OpenSpielAction(action_id=4, game_name=\"tic_tac_toe\"))\n", + "```\n", + "\n", + "
\n", + "\n", + "**๐ŸŽฎ 6 Games Available:**\n", + "\n", + "1. `\"catch\"` - What we just built!\n", + "2. `\"tic_tac_toe\"` - Classic 3ร—3\n", + "3. `\"kuhn_poker\"` - Imperfect information poker\n", + "4. `\"cliff_walking\"` - Grid navigation\n", + "5. `\"2048\"` - Tile puzzle\n", + "6. `\"blackjack\"` - Card game\n", + "\n", + "**All use the exact same interface!**\n", + "\n", + "
" + ] }, { "cell_type": "markdown", "metadata": {}, - "source": "---\n\n\n# Part 10: Create Your Own Integration ๐Ÿ› ๏ธ\n\n
\n\n## The 5-Step Pattern\n\nWant to wrap your own environment in OpenEnv? Here's how:\n\n
\n\n### Step 1: Define Types (`models.py`)\n\n```python\nfrom dataclasses import dataclass\nfrom core.env_server import Action, Observation, State\n\n@dataclass\nclass YourAction(Action):\n action_value: int\n # Add your action fields\n\n@dataclass\nclass YourObservation(Observation):\n state_data: List[float]\n done: bool\n reward: float\n # Add your observation fields\n\n@dataclass\nclass YourState(State):\n episode_id: str\n step_count: int\n # Add your state fields\n```\n\n### Step 2: Implement Environment (`server/environment.py`)\n\n```python\nfrom core.env_server import Environment\n\nclass YourEnvironment(Environment):\n def reset(self) -> Observation:\n # Initialize your game/simulation\n return YourObservation(...)\n \n def step(self, action: Action) -> Observation:\n # Execute action, update state\n return YourObservation(...)\n \n @property\n def state(self) -> State:\n return self._state\n```\n\n### Step 3: Create Client (`client.py`)\n\n```python\nfrom core.http_env_client import HTTPEnvClient\nfrom core.types import StepResult\n\nclass YourEnv(HTTPEnvClient[YourAction, YourObservation]):\n def _step_payload(self, action: YourAction) -> dict:\n \"\"\"Convert action to JSON\"\"\"\n return {\"action_value\": action.action_value}\n \n def _parse_result(self, payload: dict) -> StepResult:\n \"\"\"Parse JSON to observation\"\"\"\n return StepResult(\n observation=YourObservation(...),\n reward=payload['reward'],\n done=payload['done']\n )\n \n def _parse_state(self, payload: dict) -> YourState:\n return YourState(...)\n```\n\n### Step 4: Create Server (`server/app.py`)\n\n```python\nfrom core.env_server import create_fastapi_app\nfrom .your_environment import YourEnvironment\n\nenv = YourEnvironment()\napp = create_fastapi_app(env)\n\n# That's it! OpenEnv creates all endpoints for you.\n```\n\n### Step 5: Dockerize (`server/Dockerfile`)\n\n```dockerfile\nFROM python:3.11-slim\n\nWORKDIR /app\nCOPY requirements.txt .\nRUN pip install --no-cache-dir -r requirements.txt\n\nCOPY . .\nCMD [\"uvicorn\", \"app:app\", \"--host\", \"0.0.0.0\", \"--port\", \"8000\"]\n```\n\n
\n\n### ๐ŸŽ“ Examples to Study\n\nOpenEnv includes 3 complete examples:\n\n1. **`src/envs/echo_env/`**\n - Simplest possible environment\n - Great for testing and learning\n\n2. **`src/envs/openspiel_env/`**\n - Wraps external library (OpenSpiel)\n - Shows integration pattern\n - 6 games in one integration\n\n3. **`src/envs/coding_env/`**\n - Python code execution environment\n - Shows complex use case\n - Security considerations\n\n**๐Ÿ’ก Study these to understand the patterns!**\n\n
" + "source": [ + "---\n", + "\n", + "\n", + "# Part 10: Create Your Own Integration ๐Ÿ› ๏ธ\n", + "\n", + "
\n", + "\n", + "## The 5-Step Pattern\n", + "\n", + "Want to wrap your own environment in OpenEnv? Here's how:\n", + "\n", + "
\n", + "\n", + "### Step 1: Define Types (`models.py`)\n", + "\n", + "```python\n", + "from dataclasses import dataclass\n", + "from core.env_server import Action, Observation, State\n", + "\n", + "@dataclass\n", + "class YourAction(Action):\n", + " action_value: int\n", + " # Add your action fields\n", + "\n", + "@dataclass\n", + "class YourObservation(Observation):\n", + " state_data: List[float]\n", + " done: bool\n", + " reward: float\n", + " # Add your observation fields\n", + "\n", + "@dataclass\n", + "class YourState(State):\n", + " episode_id: str\n", + " step_count: int\n", + " # Add your state fields\n", + "```\n", + "\n", + "### Step 2: Implement Environment (`server/environment.py`)\n", + "\n", + "```python\n", + "from core.env_server import Environment\n", + "\n", + "class YourEnvironment(Environment):\n", + " def reset(self) -> Observation:\n", + " # Initialize your game/simulation\n", + " return YourObservation(...)\n", + " \n", + " def step(self, action: Action) -> Observation:\n", + " # Execute action, update state\n", + " return YourObservation(...)\n", + " \n", + " @property\n", + " def state(self) -> State:\n", + " return self._state\n", + "```\n", + "\n", + "### Step 3: Create Client (`client.py`)\n", + "\n", + "```python\n", + "from core.http_env_client import HTTPEnvClient\n", + "from core.types import StepResult\n", + "\n", + "class YourEnv(HTTPEnvClient[YourAction, YourObservation]):\n", + " def _step_payload(self, action: YourAction) -> dict:\n", + " \"\"\"Convert action to JSON\"\"\"\n", + " return {\"action_value\": action.action_value}\n", + " \n", + " def _parse_result(self, payload: dict) -> StepResult:\n", + " \"\"\"Parse JSON to observation\"\"\"\n", + " return StepResult(\n", + " observation=YourObservation(...),\n", + " reward=payload['reward'],\n", + " done=payload['done']\n", + " )\n", + " \n", + " def _parse_state(self, payload: dict) -> YourState:\n", + " return YourState(...)\n", + "```\n", + "\n", + "### Step 4: Create Server (`server/app.py`)\n", + "\n", + "```python\n", + "from core.env_server import create_fastapi_app\n", + "from .your_environment import YourEnvironment\n", + "\n", + "env = YourEnvironment()\n", + "app = create_fastapi_app(env)\n", + "\n", + "# That's it! OpenEnv creates all endpoints for you.\n", + "```\n", + "\n", + "### Step 5: Dockerize (`server/Dockerfile`)\n", + "\n", + "```dockerfile\n", + "FROM python:3.11-slim\n", + "\n", + "WORKDIR /app\n", + "COPY requirements.txt .\n", + "RUN pip install --no-cache-dir -r requirements.txt\n", + "\n", + "COPY . .\n", + "CMD [\"uvicorn\", \"app:app\", \"--host\", \"0.0.0.0\", \"--port\", \"8000\"]\n", + "```\n", + "\n", + "
\n", + "\n", + "### ๐ŸŽ“ Examples to Study\n", + "\n", + "OpenEnv includes 3 complete examples:\n", + "\n", + "1. **`src/envs/echo_env/`**\n", + " - Simplest possible environment\n", + " - Great for testing and learning\n", + "\n", + "2. **`src/envs/openspiel_env/`**\n", + " - Wraps external library (OpenSpiel)\n", + " - Shows integration pattern\n", + " - 6 games in one integration\n", + "\n", + "3. **`src/envs/coding_env/`**\n", + " - Python code execution environment\n", + " - Shows complex use case\n", + " - Security considerations\n", + "\n", + "**๐Ÿ’ก Study these to understand the patterns!**\n", + "\n", + "
" + ] }, { "cell_type": "markdown", "metadata": {}, - "source": "---\n\n\n
\n\n# ๐ŸŽ“ Summary: Your Journey\n\n
" + "source": [ + "---\n", + "\n", + "\n", + "
\n", + "\n", + "# ๐ŸŽ“ Summary: Your Journey\n", + "\n", + "
" + ] }, { "cell_type": "markdown", @@ -614,17 +1560,162 @@ { "cell_type": "markdown", "metadata": {}, - "source": "\n## ๐Ÿ“š Resources\n\n
\n\n### ๐Ÿ”— Essential Links\n\n- **๐Ÿ  OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n- **๐ŸŽฎ OpenSpiel**: https://github.com/google-deepmind/open_spiel\n- **โšก FastAPI Docs**: https://fastapi.tiangolo.com/\n- **๐Ÿณ Docker Guide**: https://docs.docker.com/get-started/\n- **๐Ÿ”ฅ PyTorch**: https://pytorch.org/\n\n### ๐Ÿ“– Documentation Deep Dives\n\n- **Environment Creation Guide**: `src/envs/README.md`\n- **OpenSpiel Integration**: `src/envs/openspiel_env/README.md`\n- **Example Scripts**: `examples/`\n- **RFC 001**: [Baseline API Specs](https://github.com/meta-pytorch/OpenEnv/pull/26)\n\n### ๐ŸŽ“ Community & Support\n\n**Supported by amazing organizations:**\n- ๐Ÿ”ฅ Meta PyTorch\n- ๐Ÿค— Hugging Face\n- โšก Unsloth AI\n- ๐ŸŒŸ Reflection AI\n- ๐Ÿš€ And many more!\n\n**License**: BSD 3-Clause (very permissive!)\n\n**Contributions**: Always welcome! Check out the issues tab.\n\n
\n\n---\n\n### ๐ŸŒˆ What's Next?\n\n1. โญ **Star the repo** to show support and stay updated\n2. ๐Ÿ”„ **Try modifying** the Catch game (make it harder? bigger grid?)\n3. ๐ŸŽฎ **Explore** other OpenSpiel games\n4. ๐Ÿ› ๏ธ **Build** your own environment integration\n5. ๐Ÿ’ฌ **Share** what you build with the community!" + "source": [ + "\n", + "## ๐Ÿ“š Resources\n", + "\n", + "
\n", + "\n", + "### ๐Ÿ”— Essential Links\n", + "\n", + "- **๐Ÿ  OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n", + "- **๐ŸŽฎ OpenSpiel**: https://github.com/google-deepmind/open_spiel\n", + "- **โšก FastAPI Docs**: https://fastapi.tiangolo.com/\n", + "- **๐Ÿณ Docker Guide**: https://docs.docker.com/get-started/\n", + "- **๐Ÿ”ฅ PyTorch**: https://pytorch.org/\n", + "\n", + "### ๐Ÿ“– Documentation Deep Dives\n", + "\n", + "- **Environment Creation Guide**: `src/envs/README.md`\n", + "- **OpenSpiel Integration**: `src/envs/openspiel_env/README.md`\n", + "- **Example Scripts**: `examples/`\n", + "- **RFC 001**: [Baseline API Specs](https://github.com/meta-pytorch/OpenEnv/pull/26)\n", + "\n", + "### ๐ŸŽ“ Community & Support\n", + "\n", + "**Supported by amazing organizations:**\n", + "- ๐Ÿ”ฅ Meta PyTorch\n", + "- ๐Ÿค— Hugging Face\n", + "- โšก Unsloth AI\n", + "- ๐ŸŒŸ Reflection AI\n", + "- ๐Ÿš€ And many more!\n", + "\n", + "**License**: BSD 3-Clause (very permissive!)\n", + "\n", + "**Contributions**: Always welcome! Check out the issues tab.\n", + "\n", + "
\n", + "\n", + "---\n", + "\n", + "### ๐ŸŒˆ What's Next?\n", + "\n", + "1. โญ **Star the repo** to show support and stay updated\n", + "2. ๐Ÿ”„ **Try modifying** the Catch game (make it harder? bigger grid?)\n", + "3. ๐ŸŽฎ **Explore** other OpenSpiel games\n", + "4. ๐Ÿ› ๏ธ **Build** your own environment integration\n", + "5. ๐Ÿ’ฌ **Share** what you build with the community!" + ] }, { "cell_type": "markdown", "metadata": {}, - "source": "## ๐Ÿ“š Resources\n\n
\n\n### ๐Ÿ”— Essential Links\n\n- **๐Ÿ  OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n- **๐ŸŽฎ OpenSpiel**: https://github.com/google-deepmind/open_spiel\n- **โšก FastAPI Docs**: https://fastapi.tiangolo.com/\n- **๐Ÿณ Docker Guide**: https://docs.docker.com/get-started/\n- **๐Ÿ”ฅ PyTorch**: https://pytorch.org/\n\n### ๐Ÿ“– Documentation Deep Dives\n\n- **Environment Creation Guide**: `src/envs/README.md`\n- **OpenSpiel Integration**: `src/envs/openspiel_env/README.md`\n- **Example Scripts**: `examples/`\n- **RFC 001**: [Baseline API Specs](https://github.com/meta-pytorch/OpenEnv/pull/26)\n\n### ๐ŸŽ“ Community & Support\n\n**Supported by amazing organizations:**\n- ๐Ÿ”ฅ Meta PyTorch\n- ๐Ÿค— Hugging Face\n- โšก Unsloth AI\n- ๐ŸŒŸ Reflection AI\n- ๐Ÿš€ And many more!\n\n**License**: BSD 3-Clause (very permissive!)\n\n**Contributions**: Always welcome! Check out the issues tab.\n\n
\n\n---\n\n### ๐ŸŒˆ What's Next?\n\n1. โญ **Star the repo** to show support and stay updated\n2. ๐Ÿ”„ **Try modifying** the Catch game (make it harder? bigger grid?)\n3. ๐ŸŽฎ **Explore** other OpenSpiel games\n4. ๐Ÿ› ๏ธ **Build** your own environment integration\n5. ๐Ÿ’ฌ **Share** what you build with the community!" + "source": [ + "## ๐Ÿ“š Resources\n", + "\n", + "
\n", + "\n", + "### ๐Ÿ”— Essential Links\n", + "\n", + "- **๐Ÿ  OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n", + "- **๐ŸŽฎ OpenSpiel**: https://github.com/google-deepmind/open_spiel\n", + "- **โšก FastAPI Docs**: https://fastapi.tiangolo.com/\n", + "- **๐Ÿณ Docker Guide**: https://docs.docker.com/get-started/\n", + "- **๐Ÿ”ฅ PyTorch**: https://pytorch.org/\n", + "\n", + "### ๐Ÿ“– Documentation Deep Dives\n", + "\n", + "- **Environment Creation Guide**: `src/envs/README.md`\n", + "- **OpenSpiel Integration**: `src/envs/openspiel_env/README.md`\n", + "- **Example Scripts**: `examples/`\n", + "- **RFC 001**: [Baseline API Specs](https://github.com/meta-pytorch/OpenEnv/pull/26)\n", + "\n", + "### ๐ŸŽ“ Community & Support\n", + "\n", + "**Supported by amazing organizations:**\n", + "- ๐Ÿ”ฅ Meta PyTorch\n", + "- ๐Ÿค— Hugging Face\n", + "- โšก Unsloth AI\n", + "- ๐ŸŒŸ Reflection AI\n", + "- ๐Ÿš€ And many more!\n", + "\n", + "**License**: BSD 3-Clause (very permissive!)\n", + "\n", + "**Contributions**: Always welcome! Check out the issues tab.\n", + "\n", + "
\n", + "\n", + "---\n", + "\n", + "### ๐ŸŒˆ What's Next?\n", + "\n", + "1. โญ **Star the repo** to show support and stay updated\n", + "2. ๐Ÿ”„ **Try modifying** the Catch game (make it harder? bigger grid?)\n", + "3. ๐ŸŽฎ **Explore** other OpenSpiel games\n", + "4. ๐Ÿ› ๏ธ **Build** your own environment integration\n", + "5. ๐Ÿ’ฌ **Share** what you build with the community!" + ] }, { "cell_type": "markdown", "metadata": {}, - "source": "---\n\n
\n\n# ๐ŸŽ‰ Congratulations! You Did It! ๐ŸŽ‰\n\n### You're now an OpenEnv expert!\n\n
\n\n## โœ… What You've Mastered:\n\n**๐Ÿง  Concepts**\n- How RL works (the observe-act-reward loop)\n- Why OpenEnv matters (production-ready RL)\n- How to use existing environments\n\n**๐Ÿ› ๏ธ Practical Skills**\n- Creating new integrations\n- Building type-safe environments\n- Deploying to production\n\n**๐ŸŽฏ Real Experience**\n- Built a complete RL environment\n- Tested multiple policies\n- Watched learning happen in real-time!\n\n---\n\n### Now go build something amazing! ๐Ÿš€\n\n**Welcome to the future of RL with PyTorch & OpenEnv**\n\n
\n\n[![Star on GitHub](https://img.shields.io/badge/โญ_Star_on_GitHub-gray?style=for-the-badge)](https://github.com/meta-pytorch/OpenEnv)\n\n
\n\n---\n\n
\n\n## ๐ŸŒŸ Want to Learn More?\n\n- ๐Ÿ“– Check out the [docs](https://github.com/meta-pytorch/OpenEnv)\n- ๐ŸŽฎ Try the other example games\n- ๐Ÿ’ฌ Join the community discussions\n- ๐Ÿ› ๏ธ Build your own integration\n- ๐Ÿš€ Deploy to production\n- โญ Star the repo to stay updated!\n\n**Happy coding! ๐ŸŽŠ**\n\n
" + "source": [ + "---\n", + "\n", + "
\n", + "\n", + "# ๐ŸŽ‰ Congratulations! You Did It! ๐ŸŽ‰\n", + "\n", + "### You're now an OpenEnv expert!\n", + "\n", + "
\n", + "\n", + "## โœ… What You've Mastered:\n", + "\n", + "**๐Ÿง  Concepts**\n", + "- How RL works (the observe-act-reward loop)\n", + "- Why OpenEnv matters (production-ready RL)\n", + "- How to use existing environments\n", + "\n", + "**๐Ÿ› ๏ธ Practical Skills**\n", + "- Creating new integrations\n", + "- Building type-safe environments\n", + "- Deploying to production\n", + "\n", + "**๐ŸŽฏ Real Experience**\n", + "- Built a complete RL environment\n", + "- Tested multiple policies\n", + "- Watched learning happen in real-time!\n", + "\n", + "---\n", + "\n", + "### Now go build something amazing! ๐Ÿš€\n", + "\n", + "**Welcome to the future of RL with PyTorch & OpenEnv**\n", + "\n", + "
\n", + "\n", + "[![Star on GitHub](https://img.shields.io/badge/โญ_Star_on_GitHub-gray?style=for-the-badge)](https://github.com/meta-pytorch/OpenEnv)\n", + "\n", + "
\n", + "\n", + "---\n", + "\n", + "
\n", + "\n", + "## ๐ŸŒŸ Want to Learn More?\n", + "\n", + "- ๐Ÿ“– Check out the [docs](https://github.com/meta-pytorch/OpenEnv)\n", + "- ๐ŸŽฎ Try the other example games\n", + "- ๐Ÿ’ฌ Join the community discussions\n", + "- ๐Ÿ› ๏ธ Build your own integration\n", + "- ๐Ÿš€ Deploy to production\n", + "- โญ Star the repo to stay updated!\n", + "\n", + "**Happy coding! ๐ŸŽŠ**\n", + "\n", + "
" + ] } ], "metadata": { @@ -648,4 +1739,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +} From dab03bf69183468640754fc321c01b81a629f156 Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 13:58:07 -0700 Subject: [PATCH 10/19] Update OpenEnv_Tutorial.ipynb --- examples/OpenEnv_Tutorial.ipynb | 32 +------------------------------- 1 file changed, 1 insertion(+), 31 deletions(-) diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index f588c32..7ade722 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -76,37 +76,7 @@ "\n", "> ๐Ÿ’ก **Pro Tip**: This notebook is designed to run top-to-bottom in Google Colab with zero setup!\n", "> \n", - "> โฑ๏ธ **Time**: ~30 minutes | ๐Ÿ“Š **Difficulty**: Beginner-friendly | ๐ŸŽฏ **Outcome**: Production-ready RL knowledge" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "\n", - "# Part 1: RL in 60 Seconds โฑ๏ธ\n", - "\n", - "
\n", - "\n", - "**Reinforcement Learning is simpler than you think.**\n", - "\n", - "It's just a loop:\n", - "\n", - "```\n", - "while not done:\n", - " observation = environment.observe()\n", - " action = policy.choose(observation)\n", - " reward = environment.step(action)\n", - " policy.learn(reward)\n", - "```\n", - "\n", - "That's it. That's RL.\n", - "\n", - "
\n", - "\n", - "Let's see it in action:" + "> โฑ๏ธ **Time**: ~5 minutes | ๐Ÿ“Š **Difficulty**: Beginner-friendly | ๐ŸŽฏ **Outcome**: Production-ready RL knowledge" ] }, { From f017dc28cec80a748176b9a905b0324a6649eb43 Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 13:58:52 -0700 Subject: [PATCH 11/19] Update OpenEnv_Tutorial.ipynb --- examples/OpenEnv_Tutorial.ipynb | 18 ------------------ 1 file changed, 18 deletions(-) diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index 7ade722..a756a02 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -344,24 +344,6 @@ "
" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "\n", - "# Part 3: Setup ๐Ÿ› ๏ธ\n", - "\n", - "
\n", - "\n", - "**Running in Colab?** This cell will clone OpenEnv and install dependencies automatically.\n", - "\n", - "**Running locally?** Make sure you're in the OpenEnv directory.\n", - "\n", - "
" - ] - }, { "cell_type": "markdown", "metadata": {}, From 4cb119bf5cc6acfa4f6bdd9bc70a740f42b0c4e3 Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 13:59:45 -0700 Subject: [PATCH 12/19] Update OpenEnv_Tutorial.ipynb --- examples/OpenEnv_Tutorial.ipynb | 34 --------------------------------- 1 file changed, 34 deletions(-) diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index a756a02..93fc8c9 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -361,40 +361,6 @@ "
" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "---\n", - "\n", - "\n", - "# Part 4: The OpenEnv Pattern ๐Ÿ—๏ธ\n", - "\n", - "
\n", - "\n", - "## Every OpenEnv Environment Has 3 Components:\n", - "\n", - "```\n", - "src/envs/your_env/\n", - "โ”œโ”€โ”€ ๐Ÿ“ models.py โ† Type-safe contracts\n", - "โ”‚ (Action, Observation, State)\n", - "โ”‚\n", - "โ”œโ”€โ”€ ๐Ÿ“ฑ client.py โ† What YOU import\n", - "โ”‚ (HTTPEnvClient implementation)\n", - "โ”‚\n", - "โ””โ”€โ”€ ๐Ÿ–ฅ๏ธ server/\n", - " โ”œโ”€โ”€ environment.py โ† Game/simulation logic\n", - " โ”œโ”€โ”€ app.py โ† FastAPI server\n", - " โ””โ”€โ”€ Dockerfile โ† Container definition\n", - "```\n", - "\n", - "
\n", - "\n", - "Let's explore the actual OpenEnv code to see how this works:" - ] - }, { "cell_type": "markdown", "metadata": {}, From 5a44568f0e36c688bea4ad52d9c00629e09a3957 Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 14:00:00 -0700 Subject: [PATCH 13/19] Update OpenEnv_Tutorial.ipynb --- examples/OpenEnv_Tutorial.ipynb | 47 --------------------------------- 1 file changed, 47 deletions(-) diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index 93fc8c9..3f7bf58 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -392,53 +392,6 @@ "Let's explore the actual OpenEnv code to see how this works:" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "---\n", - "\n", - "\n", - "# Part 5: Example Integration - OpenSpiel ๐ŸŽฎ\n", - "\n", - "
\n", - "\n", - "## What is OpenSpiel?\n", - "\n", - "**OpenSpiel** is a library from DeepMind with **70+ game environments** for RL research.\n", - "\n", - "## OpenEnv's Integration\n", - "\n", - "We've wrapped **6 OpenSpiel games** following the OpenEnv pattern:\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "\n", - "**๐ŸŽฏ Single-Player**\n", - "1. **Catch** - Catch falling ball\n", - "2. **Cliff Walking** - Navigate grid\n", - "3. **2048** - Tile puzzle\n", - "4. **Blackjack** - Card game\n", - "\n", - "\n", - "\n", - "**๐Ÿ‘ฅ Multi-Player**\n", - "5. **Tic-Tac-Toe** - Classic 3ร—3\n", - "6. **Kuhn Poker** - Imperfect info poker\n", - "\n", - "
\n", - "\n", - "This shows how OpenEnv can wrap **any** existing RL library!\n", - "\n", - "
" - ] - }, { "cell_type": "markdown", "metadata": {}, From 1a88d0708f7bc0fa455827afb4b0db139cefb1ab Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 14:00:32 -0700 Subject: [PATCH 14/19] Update OpenEnv_Tutorial.ipynb --- examples/OpenEnv_Tutorial.ipynb | 72 --------------------------------- 1 file changed, 72 deletions(-) diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index 3f7bf58..caa1267 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -272,78 +272,6 @@ "
" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "# Part 2: The Problem with Traditional RL ๐Ÿ˜ค\n", - "\n", - "
\n", - "\n", - "## ๐Ÿค” Why Can't We Just Use OpenAI Gym?\n", - "\n", - "Good question! Gym is great for research, but production needs more...\n", - "\n", - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
ChallengeTraditional ApproachOpenEnv Solution
Type SafetyโŒ obs[0][3] - what is this?โœ… obs.info_state - IDE knows!
IsolationโŒ Same process (can crash your training)โœ… Docker containers (fully isolated)
DeploymentโŒ \"Works on my machine\" ๐Ÿคทโœ… Same container everywhere ๐Ÿณ
ScalingโŒ Hard to distributeโœ… Deploy to Kubernetes โ˜ธ๏ธ
LanguageโŒ Python onlyโœ… Any language (HTTP API) ๐ŸŒ
DebuggingโŒ Cryptic numpy errorsโœ… Clear type errors ๐Ÿ›
\n", - "\n", - "
\n", - "\n", - "## ๐Ÿ’ก The OpenEnv Philosophy\n", - "\n", - "**\"RL environments should be like microservices\"**\n", - "\n", - "Think of it like this: You don't run your database in the same process as your web server, right? Same principle!\n", - "\n", - "- ๐Ÿ”’ **Isolated**: Run in containers (security + stability)\n", - "- ๐ŸŒ **Standard**: HTTP API, works everywhere\n", - "- ๐Ÿ“ฆ **Versioned**: Docker images (reproducibility!)\n", - "- ๐Ÿš€ **Scalable**: Deploy to cloud with one command\n", - "- ๐Ÿ›ก๏ธ **Type-safe**: Catch bugs before they happen\n", - "- ๐Ÿ”„ **Portable**: Works on Mac, Linux, Windows, Cloud\n", - "\n", - "
" - ] - }, { "cell_type": "markdown", "metadata": {}, From cdc39e894eb4ef5d9d7195de96f475e731916173 Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 14:06:18 -0700 Subject: [PATCH 15/19] fix repeats --- examples/OpenEnv_Tutorial.ipynb | 745 ++++++++++++++++++-------------- fix_notebook.py | 171 ++++++++ 2 files changed, 595 insertions(+), 321 deletions(-) create mode 100644 fix_notebook.py diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index caa1267..126697f 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -12,13 +12,13 @@ "\n", "# OpenEnv: Production RL Made Simple\n", "\n", - "### *From \"Hello World\" to RL Training in 5 Minutes* โœจ\n", + "### *From \"Hello World\" to RL Training in 5 Minutes* \u2728\n", "\n", "---\n", "\n", "**What if RL environments were as easy to use as REST APIs?**\n", "\n", - "That's OpenEnv. Type-safe. Isolated. Production-ready. ๐ŸŽฏ\n", + "That's OpenEnv. Type-safe. Isolated. Production-ready. \ud83c\udfaf\n", "\n", "[![GitHub](https://img.shields.io/badge/GitHub-meta--pytorch%2FOpenEnv-blue?logo=github)](https://github.com/meta-pytorch/OpenEnv)\n", "[![License](https://img.shields.io/badge/License-BSD%203--Clause-green.svg)](https://opensource.org/licenses/BSD-3-Clause)\n", @@ -33,50 +33,50 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## ๐Ÿ“‹ What You'll Learn\n", + "## \ud83d\udccb What You'll Learn\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", - "**๐ŸŽฏ Part 1-2: The Fundamentals**\n", - "- โšก RL in 60 seconds\n", - "- ๐Ÿค” Why existing solutions fall short\n", - "- ๐Ÿ’ก The OpenEnv solution\n", + "**\ud83c\udfaf Part 1-2: The Fundamentals**\n", + "- \u26a1 RL in 60 seconds\n", + "- \ud83e\udd14 Why existing solutions fall short\n", + "- \ud83d\udca1 The OpenEnv solution\n", "\n", "\n", "\n", - "**๐Ÿ—๏ธ Part 3-5: The Architecture**\n", - "- ๐Ÿ”ง How OpenEnv works\n", - "- ๐Ÿ” Exploring real code\n", - "- ๐ŸŽฎ OpenSpiel integration example\n", + "**\ud83c\udfd7\ufe0f Part 3-5: The Architecture**\n", + "- \ud83d\udd27 How OpenEnv works\n", + "- \ud83d\udd0d Exploring real code\n", + "- \ud83c\udfae OpenSpiel integration example\n", "\n", "
\n", "\n", - "**๐ŸŽฎ Part 6-8: Hands-On Demo**\n", - "- ๐Ÿ”จ Build a game environment\n", - "- ๐Ÿค– Test 4 different policies\n", - "- ๐Ÿ‘€ Watch learning happen live\n", + "**\ud83c\udfae Part 6-8: Hands-On Demo**\n", + "- \ud83d\udd28 Build a game environment\n", + "- \ud83e\udd16 Test 4 different policies\n", + "- \ud83d\udc40 Watch learning happen live\n", "\n", "\n", "\n", - "**๐Ÿ”ง Part 9-10: Going Further**\n", - "- ๐Ÿš€ Use real OpenSpiel\n", - "- โœจ Create your own integration\n", - "- ๐ŸŒ Deploy to production\n", + "**\ud83d\udd27 Part 9-10: Going Further**\n", + "- \ud83d\ude80 Use real OpenSpiel\n", + "- \u2728 Create your own integration\n", + "- \ud83c\udf10 Deploy to production\n", "\n", "
\n", "\n", - "> ๐Ÿ’ก **Pro Tip**: This notebook is designed to run top-to-bottom in Google Colab with zero setup!\n", + "> \ud83d\udca1 **Pro Tip**: This notebook is designed to run top-to-bottom in Google Colab with zero setup!\n", "> \n", - "> โฑ๏ธ **Time**: ~5 minutes | ๐Ÿ“Š **Difficulty**: Beginner-friendly | ๐ŸŽฏ **Outcome**: Production-ready RL knowledge" + "> \u23f1\ufe0f **Time**: ~5 minutes | \ud83d\udcca **Difficulty**: Beginner-friendly | \ud83c\udfaf **Outcome**: Production-ready RL knowledge" ] }, { @@ -85,7 +85,87 @@ "source": [ "---\n", "\n", - "# Part 1: RL in 60 Seconds โฑ๏ธ\n", + "## \ud83d\udcd1 Table of Contents\n", + "\n", + "
\n", + "\n", + "**Quick Navigation** - Click any section to jump right there! \ud83c\udfaf\n", + "\n", + "### Foundation\n", + "- [Part 1: RL in 60 Seconds \u23f1\ufe0f](#part-1)\n", + "- [Part 2: The Problem with Traditional RL \ud83d\ude24](#part-2)\n", + "- [Part 3: Setup \ud83d\udee0\ufe0f](#part-3)\n", + "\n", + "### Architecture\n", + "- [Part 4: The OpenEnv Pattern \ud83c\udfd7\ufe0f](#part-4)\n", + "- [Part 5: Example Integration - OpenSpiel \ud83c\udfae](#part-5)\n", + "\n", + "### Hands-On Demo\n", + "- [Part 6: Interactive Demo \ud83c\udfae](#part-6)\n", + "- [Part 7: Four Policies \ud83e\udd16](#part-7)\n", + "- [Part 8: Policy Competition! \ud83c\udfc6](#part-8)\n", + "\n", + "### Advanced\n", + "- [Part 9: Using Real OpenSpiel \ud83c\udfae](#part-9)\n", + "- [Part 10: Create Your Own Integration \ud83d\udee0\ufe0f](#part-10)\n", + "\n", + "### Wrap Up\n", + "- [Summary: Your Journey \ud83c\udf93](#summary)\n", + "- [Resources \ud83d\udcda](#resources)\n", + "\n", + "
\n", + "\n", + "---" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Detect environment\n", + "try:\n", + " import google.colab\n", + " IN_COLAB = True\n", + " print(\"\ud83c\udf10 Running in Google Colab - Perfect!\")\n", + "except ImportError:\n", + " IN_COLAB = False\n", + " print(\"\ud83d\udcbb Running locally - Nice!\")\n", + "\n", + "if IN_COLAB:\n", + " print(\"\\n\ud83d\udce6 Cloning OpenEnv repository...\")\n", + " !git clone https://github.com/meta-pytorch/OpenEnv.git > /dev/null 2>&1\n", + " %cd OpenEnv\n", + " \n", + " print(\"\ud83d\udcda Installing dependencies (this takes ~10 seconds)...\")\n", + " !pip install -q fastapi uvicorn requests\n", + " \n", + " import sys\n", + " sys.path.insert(0, './src')\n", + " print(\"\\n\u2705 Setup complete! Everything is ready to go! \ud83c\udf89\")\n", + "else:\n", + " import sys\n", + " from pathlib import Path\n", + " sys.path.insert(0, str(Path.cwd().parent / 'src'))\n", + " print(\"\u2705 Using local OpenEnv installation\")\n", + "\n", + "print(\"\\n\ud83d\ude80 Ready to explore OpenEnv and build amazing things!\")\n", + "print(\"\ud83d\udca1 Tip: Run cells top-to-bottom for the best experience.\\n\")" + ] + }, + { + "cell_type": "markdown", + "source": "---\n\n\n# Part 1: RL in 60 Seconds \u23f1\ufe0f\n\n
\n\n**Reinforcement Learning is simpler than you think.**\n\nIt's just a loop:\n\n```\nwhile not done:\n observation = environment.observe()\n action = policy.choose(observation)\n reward = environment.step(action)\n policy.learn(reward)\n```\n\nThat's it. That's RL.\n\n
\n\nLet's see it in action:", + "metadata": {} + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "# Part 1: RL in 60 Seconds \u23f1\ufe0f\n", "\n", "
\n", "\n", @@ -116,16 +196,16 @@ "source": [ "import random\n", "\n", - "print(\"๐ŸŽฒ \" + \"=\"*58 + \" ๐ŸŽฒ\")\n", + "print(\"\ud83c\udfb2 \" + \"=\"*58 + \" \ud83c\udfb2\")\n", "print(\" Number Guessing Game - The Simplest RL Example\")\n", - "print(\"๐ŸŽฒ \" + \"=\"*58 + \" ๐ŸŽฒ\")\n", + "print(\"\ud83c\udfb2 \" + \"=\"*58 + \" \ud83c\udfb2\")\n", "\n", "# Environment setup\n", "target = random.randint(1, 10)\n", "guesses_left = 3\n", "\n", - "print(f\"\\n๐ŸŽฏ I'm thinking of a number between 1 and 10...\")\n", - "print(f\"๐Ÿ’ญ You have {guesses_left} guesses. Let's see how random guessing works!\\n\")\n", + "print(f\"\\n\ud83c\udfaf I'm thinking of a number between 1 and 10...\")\n", + "print(f\"\ud83d\udcad You have {guesses_left} guesses. Let's see how random guessing works!\\n\")\n", "\n", "# The RL Loop - Pure random policy (no learning!)\n", "while guesses_left > 0:\n", @@ -133,21 +213,21 @@ " guess = random.randint(1, 10)\n", " guesses_left -= 1\n", " \n", - " print(f\"๐Ÿ’ญ Guess #{3-guesses_left}: {guess}\", end=\" โ†’ \")\n", + " print(f\"\ud83d\udcad Guess #{3-guesses_left}: {guess}\", end=\" \u2192 \")\n", " \n", " # Reward signal (but we're not using it!)\n", " if guess == target:\n", - " print(\"๐ŸŽ‰ Correct! +10 points\")\n", + " print(\"\ud83c\udf89 Correct! +10 points\")\n", " break\n", " elif abs(guess - target) <= 2:\n", - " print(\"๐Ÿ”ฅ Warm! (close)\")\n", + " print(\"\ud83d\udd25 Warm! (close)\")\n", " else:\n", - " print(\"โ„๏ธ Cold! (far)\")\n", + " print(\"\u2744\ufe0f Cold! (far)\")\n", "else:\n", - " print(f\"\\n๐Ÿ’” Out of guesses. The number was {target}.\")\n", + " print(f\"\\n\ud83d\udc94 Out of guesses. The number was {target}.\")\n", "\n", "print(\"\\n\" + \"=\"*62)\n", - "print(\"๐Ÿ’ก This is RL: Observe โ†’ Act โ†’ Reward โ†’ Repeat\")\n", + "print(\"\ud83d\udca1 This is RL: Observe \u2192 Act \u2192 Reward \u2192 Repeat\")\n", "print(\" But this policy is terrible! It doesn't learn from rewards.\")\n", "print(\"=\"*62 + \"\\n\")" ] @@ -159,11 +239,11 @@ "---\n", "\n", "\n", - "# Part 2: The Problem with Traditional RL ๐Ÿ˜ค\n", + "# Part 2: The Problem with Traditional RL \ud83d\ude24\n", "\n", "
\n", "\n", - "## ๐Ÿค” Why Can't We Just Use OpenAI Gym?\n", + "## \ud83e\udd14 Why Can't We Just Use OpenAI Gym?\n", "\n", "Good question! Gym is great for research, but production needs more...\n", "\n", @@ -177,50 +257,50 @@ "\n", "\n", "Type Safety\n", - "โŒ obs[0][3] - what is this?\n", - "โœ… obs.info_state - IDE knows!\n", + "\u274c obs[0][3] - what is this?\n", + "\u2705 obs.info_state - IDE knows!\n", "\n", "\n", "Isolation\n", - "โŒ Same process (can crash your training)\n", - "โœ… Docker containers (fully isolated)\n", + "\u274c Same process (can crash your training)\n", + "\u2705 Docker containers (fully isolated)\n", "\n", "\n", "Deployment\n", - "โŒ \"Works on my machine\" ๐Ÿคท\n", - "โœ… Same container everywhere ๐Ÿณ\n", + "\u274c \"Works on my machine\" \ud83e\udd37\n", + "\u2705 Same container everywhere \ud83d\udc33\n", "\n", "\n", "Scaling\n", - "โŒ Hard to distribute\n", - "โœ… Deploy to Kubernetes โ˜ธ๏ธ\n", + "\u274c Hard to distribute\n", + "\u2705 Deploy to Kubernetes \u2638\ufe0f\n", "\n", "\n", "Language\n", - "โŒ Python only\n", - "โœ… Any language (HTTP API) ๐ŸŒ\n", + "\u274c Python only\n", + "\u2705 Any language (HTTP API) \ud83c\udf10\n", "\n", "\n", "Debugging\n", - "โŒ Cryptic numpy errors\n", - "โœ… Clear type errors ๐Ÿ›\n", + "\u274c Cryptic numpy errors\n", + "\u2705 Clear type errors \ud83d\udc1b\n", "\n", "\n", "\n", "
\n", "\n", - "## ๐Ÿ’ก The OpenEnv Philosophy\n", + "## \ud83d\udca1 The OpenEnv Philosophy\n", "\n", "**\"RL environments should be like microservices\"**\n", "\n", "Think of it like this: You don't run your database in the same process as your web server, right? Same principle!\n", "\n", - "- ๐Ÿ”’ **Isolated**: Run in containers (security + stability)\n", - "- ๐ŸŒ **Standard**: HTTP API, works everywhere\n", - "- ๐Ÿ“ฆ **Versioned**: Docker images (reproducibility!)\n", - "- ๐Ÿš€ **Scalable**: Deploy to cloud with one command\n", - "- ๐Ÿ›ก๏ธ **Type-safe**: Catch bugs before they happen\n", - "- ๐Ÿ”„ **Portable**: Works on Mac, Linux, Windows, Cloud\n", + "- \ud83d\udd12 **Isolated**: Run in containers (security + stability)\n", + "- \ud83c\udf10 **Standard**: HTTP API, works everywhere\n", + "- \ud83d\udce6 **Versioned**: Docker images (reproducibility!)\n", + "- \ud83d\ude80 **Scalable**: Deploy to cloud with one command\n", + "- \ud83d\udee1\ufe0f **Type-safe**: Catch bugs before they happen\n", + "- \ud83d\udd04 **Portable**: Works on Mac, Linux, Windows, Cloud\n", "\n", "
" ] @@ -232,34 +312,34 @@ "### The Architecture\n", "\n", "```\n", - "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", - "โ”‚ YOUR TRAINING CODE โ”‚\n", - "โ”‚ โ”‚\n", - "โ”‚ env = OpenSpielEnv(...) โ† Import the client โ”‚\n", - "โ”‚ result = env.reset() โ† Type-safe! โ”‚\n", - "โ”‚ result = env.step(action) โ† Type-safe! โ”‚\n", - "โ”‚ โ”‚\n", - "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", - " โ”‚\n", - " โ”‚ HTTP/JSON (Language-Agnostic)\n", - " โ”‚ POST /reset, POST /step, GET /state\n", - " โ”‚\n", - "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", - "โ”‚ DOCKER CONTAINER โ”‚\n", - "โ”‚ โ”‚\n", - "โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚\n", - "โ”‚ โ”‚ FastAPI Server โ”‚ โ”‚\n", - "โ”‚ โ”‚ โ””โ”€ Environment (reset, step, state) โ”‚ โ”‚\n", - "โ”‚ โ”‚ โ””โ”€ Your Game/Simulation Logic โ”‚ โ”‚\n", - "โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚\n", - "โ”‚ โ”‚\n", - "โ”‚ Isolated โ€ข Reproducible โ€ข Secure โ”‚\n", - "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", + "\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n", + "\u2502 YOUR TRAINING CODE \u2502\n", + "\u2502 \u2502\n", + "\u2502 env = OpenSpielEnv(...) \u2190 Import the client \u2502\n", + "\u2502 result = env.reset() \u2190 Type-safe! \u2502\n", + "\u2502 result = env.step(action) \u2190 Type-safe! \u2502\n", + "\u2502 \u2502\n", + "\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n", + " \u2502\n", + " \u2502 HTTP/JSON (Language-Agnostic)\n", + " \u2502 POST /reset, POST /step, GET /state\n", + " \u2502\n", + "\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n", + "\u2502 DOCKER CONTAINER \u2502\n", + "\u2502 \u2502\n", + "\u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502\n", + "\u2502 \u2502 FastAPI Server \u2502 \u2502\n", + "\u2502 \u2502 \u2514\u2500 Environment (reset, step, state) \u2502 \u2502\n", + "\u2502 \u2502 \u2514\u2500 Your Game/Simulation Logic \u2502 \u2502\n", + "\u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502\n", + "\u2502 \u2502\n", + "\u2502 Isolated \u2022 Reproducible \u2022 Secure \u2502\n", + "\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n", "```\n", "\n", "
\n", "\n", - "**๐ŸŽฏ Key Insight**: You never see HTTP details - just clean Python methods!\n", + "**\ud83c\udfaf Key Insight**: You never see HTTP details - just clean Python methods!\n", "\n", "```python\n", "env.reset() # Under the hood: HTTP POST to /reset\n", @@ -267,18 +347,23 @@ "env.state() # Under the hood: HTTP GET to /state\n", "```\n", "\n", - "The magic? OpenEnv handles all the plumbing. You focus on RL! โœจ\n", + "The magic? OpenEnv handles all the plumbing. You focus on RL! \u2728\n", "\n", "
" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": "---\n\n\n# Part 3: Setup \ud83d\udee0\ufe0f\n\n
\n\n**Running in Colab?** This cell will clone OpenEnv and install dependencies automatically.\n\n**Running locally?** Make sure you're in the OpenEnv directory.\n\n
" + }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", - "# Part 3: Setup ๐Ÿ› ๏ธ\n", + "# Part 3: Setup \ud83d\udee0\ufe0f\n", "\n", "
\n", "\n", @@ -290,43 +375,76 @@ ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": "---\n\n\n# Part 4: The OpenEnv Pattern \ud83c\udfd7\ufe0f\n\n
\n\n## Every OpenEnv Environment Has 3 Components:\n\n```\nsrc/envs/your_env/\n\u251c\u2500\u2500 \ud83d\udcdd models.py \u2190 Type-safe contracts\n\u2502 (Action, Observation, State)\n\u2502\n\u251c\u2500\u2500 \ud83d\udcf1 client.py \u2190 What YOU import\n\u2502 (HTTPEnvClient implementation)\n\u2502\n\u2514\u2500\u2500 \ud83d\udda5\ufe0f server/\n \u251c\u2500\u2500 environment.py \u2190 Game/simulation logic\n \u251c\u2500\u2500 app.py \u2190 FastAPI server\n \u2514\u2500\u2500 Dockerfile \u2190 Container definition\n```\n\n
\n\nLet's explore the actual OpenEnv code to see how this works:" + }, + { + "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ - "---\n", + "# Import OpenEnv's core abstractions\n", + "from core.env_server import Environment, Action, Observation, State\n", + "from core.http_env_client import HTTPEnvClient\n", "\n", - "# Part 4: The OpenEnv Pattern ๐Ÿ—๏ธ\n", + "print(\"=\"*70)\n", + "print(\" \ud83e\udde9 OPENENV CORE ABSTRACTIONS\")\n", + "print(\"=\"*70)\n", "\n", - "
\n", + "print(\"\"\"\n", + "\ud83d\udda5\ufe0f SERVER SIDE (runs in Docker):\n", "\n", - "## Every OpenEnv Environment Has 3 Components:\n", + " class Environment(ABC):\n", + " '''Base class for all environment implementations'''\n", + " \n", + " @abstractmethod\n", + " def reset(self) -> Observation:\n", + " '''Start new episode'''\n", + " \n", + " @abstractmethod\n", + " def step(self, action: Action) -> Observation:\n", + " '''Execute action, return observation'''\n", + " \n", + " @property\n", + " def state(self) -> State:\n", + " '''Get episode metadata'''\n", "\n", - "```\n", - "src/envs/your_env/\n", - "โ”œโ”€โ”€ ๐Ÿ“ models.py โ† Type-safe contracts\n", - "โ”‚ (Action, Observation, State)\n", - "โ”‚\n", - "โ”œโ”€โ”€ ๐Ÿ“ฑ client.py โ† What YOU import\n", - "โ”‚ (HTTPEnvClient implementation)\n", - "โ”‚\n", - "โ””โ”€โ”€ ๐Ÿ–ฅ๏ธ server/\n", - " โ”œโ”€โ”€ environment.py โ† Game/simulation logic\n", - " โ”œโ”€โ”€ app.py โ† FastAPI server\n", - " โ””โ”€โ”€ Dockerfile โ† Container definition\n", - "```\n", + "\ud83d\udcf1 CLIENT SIDE (your training code):\n", "\n", - "
\n", + " class HTTPEnvClient(ABC):\n", + " '''Base class for HTTP clients'''\n", + " \n", + " def reset(self) -> StepResult:\n", + " # HTTP POST /reset\n", + " \n", + " def step(self, action) -> StepResult:\n", + " # HTTP POST /step\n", + " \n", + " def state(self) -> State:\n", + " # HTTP GET /state\n", + "\"\"\")\n", "\n", - "Let's explore the actual OpenEnv code to see how this works:" + "print(\"=\"*70)\n", + "print(\"\\n\u2728 Same interface on both sides - communication via HTTP!\")\n", + "print(\"\ud83c\udfaf You focus on RL, OpenEnv handles the infrastructure.\\n\")" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": "---\n\n\n# Part 5: Example Integration - OpenSpiel \ud83c\udfae\n\n
\n\n## What is OpenSpiel?\n\n**OpenSpiel** is a library from DeepMind with **70+ game environments** for RL research.\n\n## OpenEnv's Integration\n\nWe've wrapped **6 OpenSpiel games** following the OpenEnv pattern:\n\n\n\n\n\n\n
\n\n**\ud83c\udfaf Single-Player**\n1. **Catch** - Catch falling ball\n2. **Cliff Walking** - Navigate grid\n3. **2048** - Tile puzzle\n4. **Blackjack** - Card game\n\n\n\n**\ud83d\udc65 Multi-Player**\n5. **Tic-Tac-Toe** - Classic 3\u00d73\n6. **Kuhn Poker** - Imperfect info poker\n\n
\n\nThis shows how OpenEnv can wrap **any** existing RL library!\n\n
" + }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", - "# Part 5: Example Integration - OpenSpiel ๐ŸŽฎ\n", + "# Part 5: Example Integration - OpenSpiel \ud83c\udfae\n", "\n", "
\n", "\n", @@ -342,7 +460,7 @@ "\n", "\n", "\n", - "**๐ŸŽฏ Single-Player**\n", + "**\ud83c\udfaf Single-Player**\n", "1. **Catch** - Catch falling ball\n", "2. **Cliff Walking** - Navigate grid\n", "3. **2048** - Tile puzzle\n", @@ -351,8 +469,8 @@ "\n", "\n", "\n", - "**๐Ÿ‘ฅ Multi-Player**\n", - "5. **Tic-Tac-Toe** - Classic 3ร—3\n", + "**\ud83d\udc65 Multi-Player**\n", + "5. **Tic-Tac-Toe** - Classic 3\u00d73\n", "6. **Kuhn Poker** - Imperfect info poker\n", "\n", "\n", @@ -364,6 +482,13 @@ "
" ] }, + { + "cell_type": "code", + "source": "from envs.openspiel_env.client import OpenSpielEnv\n\nprint(\"=\"*70)\nprint(\" \ud83d\udd0c HOW OPENENV WRAPS OPENSPIEL\")\nprint(\"=\"*70)\n\nprint(\"\"\"\nclass OpenSpielEnv(HTTPEnvClient[OpenSpielAction, OpenSpielObservation]):\n \n def _step_payload(self, action: OpenSpielAction) -> dict:\n '''Convert typed action to JSON for HTTP'''\n return {\n \"action_id\": action.action_id,\n \"game_name\": action.game_name,\n }\n \n def _parse_result(self, payload: dict) -> StepResult:\n '''Parse HTTP JSON response into typed observation'''\n return StepResult(\n observation=OpenSpielObservation(...),\n reward=payload['reward'],\n done=payload['done']\n )\n\n\"\"\")\n\nprint(\"\u2500\" * 70)\nprint(\"\\n\u2728 Usage (works for ALL OpenEnv environments):\")\nprint(\"\"\"\n env = OpenSpielEnv(base_url=\"http://localhost:8000\")\n \n result = env.reset()\n # Returns StepResult[OpenSpielObservation] - Type safe!\n \n result = env.step(OpenSpielAction(action_id=2, game_name=\"catch\"))\n # Type checker knows this is valid!\n \n state = env.state()\n # Returns OpenSpielState\n\"\"\")\n\nprint(\"\u2500\" * 70)\nprint(\"\\n\ud83c\udfaf This pattern works for ANY environment you want to wrap!\\n\")", + "metadata": {}, + "execution_count": null, + "outputs": [] + }, { "cell_type": "code", "execution_count": null, @@ -379,30 +504,30 @@ "from dataclasses import fields\n", "\n", "print(\"=\"*70)\n", - "print(\" ๐ŸŽฎ OPENSPIEL INTEGRATION - TYPE-SAFE MODELS\")\n", + "print(\" \ud83c\udfae OPENSPIEL INTEGRATION - TYPE-SAFE MODELS\")\n", "print(\"=\"*70)\n", "\n", - "print(\"\\n๐Ÿ“ค OpenSpielAction (what you send):\")\n", - "print(\" \" + \"โ”€\" * 64)\n", + "print(\"\\n\ud83d\udce4 OpenSpielAction (what you send):\")\n", + "print(\" \" + \"\u2500\" * 64)\n", "for field in fields(OpenSpielAction):\n", - " print(f\" โ€ข {field.name:20s} : {field.type}\")\n", + " print(f\" \u2022 {field.name:20s} : {field.type}\")\n", "\n", - "print(\"\\n๐Ÿ“ฅ OpenSpielObservation (what you receive):\")\n", - "print(\" \" + \"โ”€\" * 64)\n", + "print(\"\\n\ud83d\udce5 OpenSpielObservation (what you receive):\")\n", + "print(\" \" + \"\u2500\" * 64)\n", "for field in fields(OpenSpielObservation):\n", - " print(f\" โ€ข {field.name:20s} : {field.type}\")\n", + " print(f\" \u2022 {field.name:20s} : {field.type}\")\n", "\n", - "print(\"\\n๐Ÿ“Š OpenSpielState (episode metadata):\")\n", - "print(\" \" + \"โ”€\" * 64)\n", + "print(\"\\n\ud83d\udcca OpenSpielState (episode metadata):\")\n", + "print(\" \" + \"\u2500\" * 64)\n", "for field in fields(OpenSpielState):\n", - " print(f\" โ€ข {field.name:20s} : {field.type}\")\n", + " print(f\" \u2022 {field.name:20s} : {field.type}\")\n", "\n", "print(\"\\n\" + \"=\"*70)\n", - "print(\"\\n๐Ÿ’ก Type safety means:\")\n", - "print(\" โœ… Your IDE autocompletes these fields\")\n", - "print(\" โœ… Typos are caught before running\")\n", - "print(\" โœ… Refactoring is safe\")\n", - "print(\" โœ… Self-documenting code\\n\")" + "print(\"\\n\ud83d\udca1 Type safety means:\")\n", + "print(\" \u2705 Your IDE autocompletes these fields\")\n", + "print(\" \u2705 Typos are caught before running\")\n", + "print(\" \u2705 Refactoring is safe\")\n", + "print(\" \u2705 Self-documenting code\\n\")" ] }, { @@ -415,44 +540,15 @@ "\n", "The client **inherits from HTTPEnvClient** and implements 3 methods:\n", "\n", - "1. `_step_payload()` - Convert action โ†’ JSON\n", - "2. `_parse_result()` - Parse JSON โ†’ typed observation \n", - "3. `_parse_state()` - Parse JSON โ†’ state\n", + "1. `_step_payload()` - Convert action \u2192 JSON\n", + "2. `_parse_result()` - Parse JSON \u2192 typed observation \n", + "3. `_parse_state()` - Parse JSON \u2192 state\n", "\n", "That's it! The base class handles all HTTP communication.\n", "\n", "
" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "---\n", - "\n", - "\n", - "
\n", - "\n", - "# ๐ŸŽฎ Part 6: Interactive Demo\n", - "\n", - "### Now let's BUILD something!\n", - "\n", - "We'll create a **Catch game** following OpenEnv patterns,
\n", - "then watch **4 different AI policies** compete for the championship! ๐Ÿ†\n", - "\n", - "
\n", - "\n", - "**Get ready for:**\n", - "- โšก Live gameplay visualization\n", - "- ๐Ÿค– AI policy showdown\n", - "- ๐Ÿ“Š Real-time learning metrics\n", - "- ๐ŸŽฏ Production-ready patterns\n", - "\n", - "
" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -461,20 +557,20 @@ "\n", "
\n", "\n", - "# ๐ŸŽฎ Part 6: Interactive Demo\n", + "# \ud83c\udfae Part 6: Interactive Demo\n", "\n", "### Now let's BUILD something!\n", "\n", "We'll create a **Catch game** following OpenEnv patterns,
\n", - "then watch **4 different AI policies** compete for the championship! ๐Ÿ†\n", + "then watch **4 different AI policies** compete for the championship! \ud83c\udfc6\n", "\n", "
\n", "\n", "**Get ready for:**\n", - "- โšก Live gameplay visualization\n", - "- ๐Ÿค– AI policy showdown\n", - "- ๐Ÿ“Š Real-time learning metrics\n", - "- ๐ŸŽฏ Production-ready patterns\n", + "- \u26a1 Live gameplay visualization\n", + "- \ud83e\udd16 AI policy showdown\n", + "- \ud83d\udcca Real-time learning metrics\n", + "- \ud83c\udfaf Production-ready patterns\n", "\n", "
" ] @@ -483,18 +579,18 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## The Game: Catch ๐Ÿ”ด๐Ÿ“\n", + "## The Game: Catch \ud83d\udd34\ud83c\udfd3\n", "\n", "\n", "\n", "\n", "\n", @@ -521,7 +617,7 @@ "\n", "
\n", "\n", - "**๐ŸŽฏ Why This Game?**\n", + "**\ud83c\udfaf Why This Game?**\n", "- Simple rules (easy to understand)\n", "- Visual (see what's happening)\n", "- Fast episodes (~5 steps)\n", @@ -531,6 +627,13 @@ "
" ] }, + { + "cell_type": "code", + "source": "# Create environment and start a new episode\nenv = CatchEnvironment()\nobs = env.reset()\n\nprint(\"\ud83c\udfae \" + \"=\"*58 + \" \ud83c\udfae\")\nprint(\" INITIAL GAME STATE\")\nprint(\"\ud83c\udfae \" + \"=\"*58 + \" \ud83c\udfae\\n\")\n\n# Visualize the game board\nenv.render()\n\n# Show game info\nprint(f\"\\n\ud83d\udccd Game Info:\")\nprint(f\" \ud83d\udd34 Ball at: column {obs.ball_position[1]} (row {obs.ball_position[0]})\")\nprint(f\" \ud83c\udfd3 Paddle at: column {obs.paddle_position}\")\n\nprint(f\"\\n\ud83d\udcca Observation Details:\")\nprint(f\" \u2022 Legal actions: {obs.legal_actions} \u2192 [LEFT, STAY, RIGHT]\")\nprint(f\" \u2022 Info state size: {len(obs.info_state)} (5\u00d75 grid flattened)\")\nprint(f\" \u2022 Episode done: {obs.done}\")\nprint(f\" \u2022 Current reward: {obs.reward}\")\n\nprint(\"\\n\ud83d\udca1 The ball will fall down each step. Can your policy catch it?\")\nprint(\"=\"*62)", + "metadata": {}, + "execution_count": null, + "outputs": [] + }, { "cell_type": "code", "execution_count": null, @@ -566,13 +669,13 @@ " Catch game following OpenEnv's Environment pattern.\n", " \n", " In production:\n", - " โ€ข Runs in Docker container\n", - " โ€ข Accessed via HTTPEnvClient\n", - " โ€ข Exposed via FastAPI server\n", + " \u2022 Runs in Docker container\n", + " \u2022 Accessed via HTTPEnvClient\n", + " \u2022 Exposed via FastAPI server\n", " \n", " For this demo:\n", - " โ€ข We run it locally to see internals\n", - " โ€ข But the structure is identical!\n", + " \u2022 We run it locally to see internals\n", + " \u2022 But the structure is identical!\n", " \"\"\"\n", " \n", " def __init__(self, grid_size=5):\n", @@ -634,24 +737,24 @@ " line = \" \"\n", " for col in range(self.grid_size):\n", " if row == self.ball_row and col == self.ball_col:\n", - " line += \"๐Ÿ”ด \"\n", + " line += \"\ud83d\udd34 \"\n", " elif row == self.grid_size - 1 and col == self.paddle_col:\n", - " line += \"๐Ÿ“ \"\n", + " line += \"\ud83c\udfd3 \"\n", " else:\n", - " line += \"โฌœ \"\n", + " line += \"\u2b1c \"\n", " print(line)\n", "\n", "\n", - "print(\"๐ŸŽ‰ \" + \"=\"*64 + \" ๐ŸŽ‰\")\n", - "print(\" โœ… Environment Created Following OpenEnv Pattern!\")\n", - "print(\"๐ŸŽ‰ \" + \"=\"*64 + \" ๐ŸŽ‰\")\n", - "print(\"\\n๐Ÿ“‹ What we just built:\")\n", - "print(\" โ€ข reset() โ†’ CatchObservation (type-safe!)\")\n", - "print(\" โ€ข step(action) โ†’ CatchObservation (type-safe!)\")\n", - "print(\" โ€ข render() โ†’ Visual display\")\n", - "print(\"\\n๐Ÿš€ In production: This would run in Docker + FastAPI\")\n", + "print(\"\ud83c\udf89 \" + \"=\"*64 + \" \ud83c\udf89\")\n", + "print(\" \u2705 Environment Created Following OpenEnv Pattern!\")\n", + "print(\"\ud83c\udf89 \" + \"=\"*64 + \" \ud83c\udf89\")\n", + "print(\"\\n\ud83d\udccb What we just built:\")\n", + "print(\" \u2022 reset() \u2192 CatchObservation (type-safe!)\")\n", + "print(\" \u2022 step(action) \u2192 CatchObservation (type-safe!)\")\n", + "print(\" \u2022 render() \u2192 Visual display\")\n", + "print(\"\\n\ud83d\ude80 In production: This would run in Docker + FastAPI\")\n", "print(\" But the structure is EXACTLY the same!\")\n", - "print(\"\\n๐Ÿ’ก This is your blueprint for creating ANY OpenEnv environment!\\n\")" + "print(\"\\n\ud83d\udca1 This is your blueprint for creating ANY OpenEnv environment!\\n\")" ] }, { @@ -670,7 +773,7 @@ "---\n", "\n", "\n", - "# Part 7: Four Policies ๐Ÿค–\n", + "# Part 7: Four Policies \ud83e\udd16\n", "\n", "
\n", "\n", @@ -683,22 +786,22 @@ "
\n", "\n", "\n", - "\n", + "\n", "\n", "\n", "\n", "\n", - "\n", + "\n", "\n", "\n", "\n", "\n", - "\n", + "\n", "\n", "\n", "\n", "\n", - "\n", + "\n", "\n", "\n", "\n", @@ -713,7 +816,7 @@ "source": [ "---\n", "\n", - "# Part 7: Four Policies ๐Ÿค–\n", + "# Part 7: Four Policies \ud83e\udd16\n", "\n", "
\n", "\n", @@ -726,22 +829,22 @@ "
\n", "\n", "\n", - "\n", + "\n", "\n", "\n", "\n", "\n", - "\n", + "\n", "\n", "\n", "\n", "\n", - "\n", + "\n", "\n", "\n", "\n", "\n", - "\n", + "\n", "\n", "\n", "\n", @@ -762,7 +865,7 @@ "\n", "class RandomPolicy:\n", " \"\"\"Baseline: Pure random guessing.\"\"\"\n", - " name = \"๐ŸŽฒ Random Guesser\"\n", + " name = \"\ud83c\udfb2 Random Guesser\"\n", " \n", " def select_action(self, obs: CatchObservation) -> int:\n", " return random.choice(obs.legal_actions)\n", @@ -770,7 +873,7 @@ "\n", "class AlwaysStayPolicy:\n", " \"\"\"Bad strategy: Never moves.\"\"\"\n", - " name = \"๐Ÿ›‘ Always Stay\"\n", + " name = \"\ud83d\uded1 Always Stay\"\n", " \n", " def select_action(self, obs: CatchObservation) -> int:\n", " return 1 # STAY\n", @@ -778,7 +881,7 @@ "\n", "class SmartPolicy:\n", " \"\"\"Optimal: Move paddle toward ball.\"\"\"\n", - " name = \"๐Ÿง  Smart Heuristic\"\n", + " name = \"\ud83e\udde0 Smart Heuristic\"\n", " \n", " def select_action(self, obs: CatchObservation) -> int:\n", " ball_col = obs.ball_position[1]\n", @@ -794,7 +897,7 @@ "\n", "class LearningPolicy:\n", " \"\"\"Simulated RL: Epsilon-greedy exploration.\"\"\"\n", - " name = \"๐Ÿ“ˆ Learning Agent\"\n", + " name = \"\ud83d\udcc8 Learning Agent\"\n", " \n", " def __init__(self):\n", " self.steps = 0\n", @@ -820,16 +923,16 @@ " return 1\n", "\n", "\n", - "print(\"๐Ÿค– \" + \"=\"*64 + \" ๐Ÿค–\")\n", - "print(\" โœ… 4 Policies Created!\")\n", - "print(\"๐Ÿค– \" + \"=\"*64 + \" ๐Ÿค–\\n\")\n", + "print(\"\ud83e\udd16 \" + \"=\"*64 + \" \ud83e\udd16\")\n", + "print(\" \u2705 4 Policies Created!\")\n", + "print(\"\ud83e\udd16 \" + \"=\"*64 + \" \ud83e\udd16\\n\")\n", "\n", "policies = [RandomPolicy(), AlwaysStayPolicy(), SmartPolicy(), LearningPolicy()]\n", "for i, policy in enumerate(policies, 1):\n", " print(f\" {i}. {policy.name}\")\n", "\n", - "print(\"\\n๐Ÿ’ก Each policy represents a different approach to solving the game!\")\n", - "print(\" Let's see who performs best! ๐Ÿ†\\n\")" + "print(\"\\n\ud83d\udca1 Each policy represents a different approach to solving the game!\")\n", + "print(\" Let's see who performs best! \ud83c\udfc6\\n\")" ] }, { @@ -855,15 +958,15 @@ " \n", " if visualize:\n", " print(f\"\\n{'='*60}\")\n", - " print(f\" ๐ŸŽฎ {policy.name}\")\n", - " print(f\" ๐Ÿ”ด Ball will fall at column: {obs.ball_position[1]}\")\n", + " print(f\" \ud83c\udfae {policy.name}\")\n", + " print(f\" \ud83d\udd34 Ball will fall at column: {obs.ball_position[1]}\")\n", " print('='*60 + '\\n')\n", " env.render()\n", " time.sleep(delay)\n", " \n", " total_reward = 0\n", " step = 0\n", - " action_names = [\"โฌ…๏ธ LEFT\", \"๐Ÿ›‘ STAY\", \"โžก๏ธ RIGHT\"]\n", + " action_names = [\"\u2b05\ufe0f LEFT\", \"\ud83d\uded1 STAY\", \"\u27a1\ufe0f RIGHT\"]\n", " \n", " # THE RL LOOP\n", " while not obs.done:\n", @@ -877,14 +980,14 @@ " total_reward += obs.reward\n", " \n", " if visualize:\n", - " print(f\"\\n๐Ÿ“ Step {step + 1}: {action_names[action]}\")\n", + " print(f\"\\n\ud83d\udccd Step {step + 1}: {action_names[action]}\")\n", " env.render()\n", " time.sleep(delay)\n", " \n", " step += 1\n", " \n", " if visualize:\n", - " result = \"๐ŸŽ‰ CAUGHT!\" if total_reward > 0 else \"๐Ÿ˜ข MISSED\"\n", + " result = \"\ud83c\udf89 CAUGHT!\" if total_reward > 0 else \"\ud83d\ude22 MISSED\"\n", " print(f\"\\n{'='*60}\")\n", " print(f\" {result} Reward: {total_reward}\")\n", " print('='*60)\n", @@ -905,7 +1008,7 @@ "---\n", "\n", "\n", - "# Part 8: Policy Competition! ๐Ÿ†\n", + "# Part 8: Policy Competition! \ud83c\udfc6\n", "\n", "
\n", "\n", @@ -920,7 +1023,7 @@ "source": [ "---\n", "\n", - "# Part 8: Policy Competition! ๐Ÿ†\n", + "# Part 8: Policy Competition! \ud83c\udfc6\n", "\n", "
\n", "\n", @@ -944,49 +1047,49 @@ " LearningPolicy(),\n", " ]\n", " \n", - " print(\"\\n๐Ÿ† \" + \"=\"*66 + \" ๐Ÿ†\")\n", + " print(\"\\n\ud83c\udfc6 \" + \"=\"*66 + \" \ud83c\udfc6\")\n", " print(f\" POLICY SHOWDOWN - {num_episodes} Episodes Each\")\n", - " print(\"๐Ÿ† \" + \"=\"*66 + \" ๐Ÿ†\\n\")\n", + " print(\"\ud83c\udfc6 \" + \"=\"*66 + \" \ud83c\udfc6\\n\")\n", " \n", " results = []\n", " for policy in policies:\n", - " print(f\"โšก Testing {policy.name}...\", end=\" \")\n", + " print(f\"\u26a1 Testing {policy.name}...\", end=\" \")\n", " env = CatchEnvironment()\n", " successes = sum(run_episode(env, policy, visualize=False) \n", " for _ in range(num_episodes))\n", " success_rate = (successes / num_episodes) * 100\n", " results.append((policy.name, success_rate, successes))\n", - " print(f\"โœ“ Done!\")\n", + " print(f\"\u2713 Done!\")\n", " \n", " print(\"\\n\" + \"=\"*70)\n", - " print(\" ๐Ÿ“Š FINAL RESULTS\")\n", + " print(\" \ud83d\udcca FINAL RESULTS\")\n", " print(\"=\"*70 + \"\\n\")\n", " \n", " # Sort by success rate (descending)\n", " results.sort(key=lambda x: x[1], reverse=True)\n", " \n", " # Award medals to top 3\n", - " medals = [\"๐Ÿฅ‡\", \"๐Ÿฅˆ\", \"๐Ÿฅ‰\", \" \"]\n", + " medals = [\"\ud83e\udd47\", \"\ud83e\udd48\", \"\ud83e\udd49\", \" \"]\n", " \n", " for i, (name, rate, successes) in enumerate(results):\n", " medal = medals[i]\n", - " bar = \"โ–ˆ\" * int(rate / 2)\n", + " bar = \"\u2588\" * int(rate / 2)\n", " print(f\"{medal} {name:25s} [{bar:<50}] {rate:5.1f}% ({successes}/{num_episodes})\")\n", " \n", " print(\"\\n\" + \"=\"*70)\n", - " print(\"\\nโœจ Key Insights:\")\n", - " print(\" โ€ข Random (~20%): Baseline - pure luck ๐ŸŽฒ\")\n", - " print(\" โ€ข Always Stay (~20%): Bad strategy - stays center ๐Ÿ›‘\")\n", - " print(\" โ€ข Smart (100%): Optimal - perfect play! ๐Ÿง \")\n", - " print(\" โ€ข Learning (~85%): Improves over time ๐Ÿ“ˆ\")\n", - " print(\"\\n๐ŸŽ“ This is Reinforcement Learning in action:\")\n", + " print(\"\\n\u2728 Key Insights:\")\n", + " print(\" \u2022 Random (~20%): Baseline - pure luck \ud83c\udfb2\")\n", + " print(\" \u2022 Always Stay (~20%): Bad strategy - stays center \ud83d\uded1\")\n", + " print(\" \u2022 Smart (100%): Optimal - perfect play! \ud83e\udde0\")\n", + " print(\" \u2022 Learning (~85%): Improves over time \ud83d\udcc8\")\n", + " print(\"\\n\ud83c\udf93 This is Reinforcement Learning in action:\")\n", " print(\" 1. Start with exploration (trying random things)\")\n", " print(\" 2. Learn from rewards (what works, what doesn't)\")\n", " print(\" 3. Converge to optimal behavior (smart strategy)\")\n", - " print(\"\\n๐ŸŽฏ The Learning Agent gets smarter with every episode!\\n\")\n", + " print(\"\\n\ud83c\udfaf The Learning Agent gets smarter with every episode!\\n\")\n", "\n", "# Run the epic competition!\n", - "print(\"๐ŸŽฎ Starting the showdown...\")\n", + "print(\"\ud83c\udfae Starting the showdown...\")\n", "evaluate_policies(num_episodes=50)" ] }, @@ -997,7 +1100,7 @@ "---\n", "\n", "\n", - "# Part 9: Using Real OpenSpiel ๐ŸŽฎ\n", + "# Part 9: Using Real OpenSpiel \ud83c\udfae\n", "\n", "
\n", "\n", @@ -1026,8 +1129,8 @@ "\n", "
\n", "\n", - "\n", - "\n", + "\n", + "\n", "\n", "\n", "\n", @@ -1036,7 +1139,7 @@ "\n", "
\n", "\n", "```\n", - "โฌœ โฌœ ๐Ÿ”ด โฌœ โฌœ \n", - "โฌœ โฌœ โฌœ โฌœ โฌœ Ball\n", - "โฌœ โฌœ โฌœ โฌœ โฌœ falls\n", - "โฌœ โฌœ โฌœ โฌœ โฌœ down\n", - "โฌœ โฌœ ๐Ÿ“ โฌœ โฌœ \n", + "\u2b1c \u2b1c \ud83d\udd34 \u2b1c \u2b1c \n", + "\u2b1c \u2b1c \u2b1c \u2b1c \u2b1c Ball\n", + "\u2b1c \u2b1c \u2b1c \u2b1c \u2b1c falls\n", + "\u2b1c \u2b1c \u2b1c \u2b1c \u2b1c down\n", + "\u2b1c \u2b1c \ud83c\udfd3 \u2b1c \u2b1c \n", " Paddle\n", "```\n", "\n", @@ -502,18 +598,18 @@ "\n", "\n", "**Rules:**\n", - "- 5ร—5 grid\n", + "- 5\u00d75 grid\n", "- Ball falls from random column\n", "- Move paddle to catch it\n", "\n", "**Actions:**\n", - "- `0` = Move LEFT โฌ…๏ธ\n", - "- `1` = STAY ๐Ÿ›‘\n", - "- `2` = Move RIGHT โžก๏ธ\n", + "- `0` = Move LEFT \u2b05\ufe0f\n", + "- `1` = STAY \ud83d\uded1\n", + "- `2` = Move RIGHT \u27a1\ufe0f\n", "\n", "**Reward:**\n", - "- `+1` if caught ๐ŸŽ‰\n", - "- `0` if missed ๐Ÿ˜ข\n", + "- `+1` if caught \ud83c\udf89\n", + "- `0` if missed \ud83d\ude22\n", "\n", "
Expected Performance
๐ŸŽฒ Random\ud83c\udfb2 RandomPick random action every step~20% (pure luck)
๐Ÿ›‘ Always Stay\ud83d\uded1 Always StayNever move, hope ball lands in center~20% (terrible!)
๐Ÿง  Smart\ud83e\udde0 SmartMove paddle toward ball100% (optimal!)
๐Ÿ“ˆ Learning\ud83d\udcc8 LearningStart random, learn smart strategy~85% (improves over time)
Expected Performance
๐ŸŽฒ Random\ud83c\udfb2 RandomPick random action every step~20% (pure luck)
๐Ÿ›‘ Always Stay\ud83d\uded1 Always StayNever move, hope ball lands in center~20% (terrible!)
๐Ÿง  Smart\ud83e\udde0 SmartMove paddle toward ball100% (optimal!)
๐Ÿ“ˆ Learning\ud83d\udcc8 LearningStart random, learn smart strategy~85% (improves over time)
Type Safetyโœ… Dataclassesโœ… Dataclasses\u2705 Dataclasses\u2705 Dataclasses
API
\n", "\n", - "**๐ŸŽฏ Same structure, production features!**\n", + "**\ud83c\udfaf Same structure, production features!**\n", "\n", "
\n", "\n", @@ -1063,10 +1166,10 @@ "\n", "
\n", "\n", - "**๐ŸŽฎ 6 Games Available:**\n", + "**\ud83c\udfae 6 Games Available:**\n", "\n", "1. `\"catch\"` - What we just built!\n", - "2. `\"tic_tac_toe\"` - Classic 3ร—3\n", + "2. `\"tic_tac_toe\"` - Classic 3\u00d73\n", "3. `\"kuhn_poker\"` - Imperfect information poker\n", "4. `\"cliff_walking\"` - Grid navigation\n", "5. `\"2048\"` - Tile puzzle\n", @@ -1084,7 +1187,7 @@ "---\n", "\n", "\n", - "# Part 10: Create Your Own Integration ๐Ÿ› ๏ธ\n", + "# Part 10: Create Your Own Integration \ud83d\udee0\ufe0f\n", "\n", "
\n", "\n", @@ -1188,7 +1291,7 @@ "\n", "
\n", "\n", - "### ๐ŸŽ“ Examples to Study\n", + "### \ud83c\udf93 Examples to Study\n", "\n", "OpenEnv includes 3 complete examples:\n", "\n", @@ -1206,7 +1309,7 @@ " - Shows complex use case\n", " - Security considerations\n", "\n", - "**๐Ÿ’ก Study these to understand the patterns!**\n", + "**\ud83d\udca1 Study these to understand the patterns!**\n", "\n", "
" ] @@ -1220,7 +1323,7 @@ "\n", "
\n", "\n", - "# ๐ŸŽ“ Summary: Your Journey\n", + "# \ud83c\udf93 Summary: Your Journey\n", "\n", "
" ] @@ -1233,7 +1336,7 @@ "\n", "
\n", "\n", - "# ๐ŸŽ“ Summary: Your Journey\n", + "# \ud83c\udf93 Summary: Your Journey\n", "\n", "
" ] @@ -1248,19 +1351,19 @@ "\n", "\n", "\n", - "### ๐Ÿ“š Concepts\n", + "### \ud83d\udcda Concepts\n", "\n", - "โœ… **RL Fundamentals**\n", + "\u2705 **RL Fundamentals**\n", "- The observe-act-reward loop\n", "- What makes good policies\n", "- Exploration vs exploitation\n", "\n", - "โœ… **OpenEnv Architecture**\n", + "\u2705 **OpenEnv Architecture**\n", "- Client-server separation\n", "- Type-safe contracts\n", "- HTTP communication layer\n", "\n", - "โœ… **Production Patterns**\n", + "\u2705 **Production Patterns**\n", "- Docker isolation\n", "- API design\n", "- Reproducible deployments\n", @@ -1268,19 +1371,19 @@ "\n", "\n", "\n", - "### ๐Ÿ› ๏ธ Skills\n", + "### \ud83d\udee0\ufe0f Skills\n", "\n", - "โœ… **Using Environments**\n", + "\u2705 **Using Environments**\n", "- Import OpenEnv clients\n", "- Call reset/step/state\n", "- Work with typed observations\n", "\n", - "โœ… **Building Environments**\n", + "\u2705 **Building Environments**\n", "- Define type-safe models\n", "- Implement Environment class\n", "- Create HTTPEnvClient\n", "\n", - "โœ… **Testing & Debugging**\n", + "\u2705 **Testing & Debugging**\n", "- Compare policies\n", "- Visualize episodes\n", "- Measure performance\n", @@ -1305,45 +1408,45 @@ "\n", "\n", "Type Safety\n", - "โŒ Arrays, dicts\n", - "โœ… Dataclasses\n", - "๐Ÿ† OpenEnv\n", + "\u274c Arrays, dicts\n", + "\u2705 Dataclasses\n", + "\ud83c\udfc6 OpenEnv\n", "\n", "\n", "Isolation\n", - "โŒ Same process\n", - "โœ… Docker\n", - "๐Ÿ† OpenEnv\n", + "\u274c Same process\n", + "\u2705 Docker\n", + "\ud83c\udfc6 OpenEnv\n", "\n", "\n", "Deployment\n", - "โŒ Manual setup\n", - "โœ… K8s-ready\n", - "๐Ÿ† OpenEnv\n", + "\u274c Manual setup\n", + "\u2705 K8s-ready\n", + "\ud83c\udfc6 OpenEnv\n", "\n", "\n", "Language\n", - "โŒ Python only\n", - "โœ… Any (HTTP)\n", - "๐Ÿ† OpenEnv\n", + "\u274c Python only\n", + "\u2705 Any (HTTP)\n", + "\ud83c\udfc6 OpenEnv\n", "\n", "\n", "Reproducibility\n", - "โŒ \"Works on my machine\"\n", - "โœ… Same everywhere\n", - "๐Ÿ† OpenEnv\n", + "\u274c \"Works on my machine\"\n", + "\u2705 Same everywhere\n", + "\ud83c\udfc6 OpenEnv\n", "\n", "\n", "Community\n", - "โœ… Large ecosystem\n", - "๐ŸŸก Growing\n", - "๐Ÿค Both!\n", + "\u2705 Large ecosystem\n", + "\ud83d\udfe1 Growing\n", + "\ud83e\udd1d Both!\n", "\n", "\n", "\n", "
\n", "\n", - "**๐ŸŽฏ The Bottom Line**\n", + "**\ud83c\udfaf The Bottom Line**\n", "\n", "OpenEnv brings **production engineering** to RL:\n", "- Same environments work locally and in production\n", @@ -1361,33 +1464,33 @@ "metadata": {}, "source": [ "\n", - "## ๐Ÿ“š Resources\n", + "## \ud83d\udcda Resources\n", "\n", "
\n", "\n", - "### ๐Ÿ”— Essential Links\n", + "### \ud83d\udd17 Essential Links\n", "\n", - "- **๐Ÿ  OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n", - "- **๐ŸŽฎ OpenSpiel**: https://github.com/google-deepmind/open_spiel\n", - "- **โšก FastAPI Docs**: https://fastapi.tiangolo.com/\n", - "- **๐Ÿณ Docker Guide**: https://docs.docker.com/get-started/\n", - "- **๐Ÿ”ฅ PyTorch**: https://pytorch.org/\n", + "- **\ud83c\udfe0 OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n", + "- **\ud83c\udfae OpenSpiel**: https://github.com/google-deepmind/open_spiel\n", + "- **\u26a1 FastAPI Docs**: https://fastapi.tiangolo.com/\n", + "- **\ud83d\udc33 Docker Guide**: https://docs.docker.com/get-started/\n", + "- **\ud83d\udd25 PyTorch**: https://pytorch.org/\n", "\n", - "### ๐Ÿ“– Documentation Deep Dives\n", + "### \ud83d\udcd6 Documentation Deep Dives\n", "\n", "- **Environment Creation Guide**: `src/envs/README.md`\n", "- **OpenSpiel Integration**: `src/envs/openspiel_env/README.md`\n", "- **Example Scripts**: `examples/`\n", "- **RFC 001**: [Baseline API Specs](https://github.com/meta-pytorch/OpenEnv/pull/26)\n", "\n", - "### ๐ŸŽ“ Community & Support\n", + "### \ud83c\udf93 Community & Support\n", "\n", "**Supported by amazing organizations:**\n", - "- ๐Ÿ”ฅ Meta PyTorch\n", - "- ๐Ÿค— Hugging Face\n", - "- โšก Unsloth AI\n", - "- ๐ŸŒŸ Reflection AI\n", - "- ๐Ÿš€ And many more!\n", + "- \ud83d\udd25 Meta PyTorch\n", + "- \ud83e\udd17 Hugging Face\n", + "- \u26a1 Unsloth AI\n", + "- \ud83c\udf1f Reflection AI\n", + "- \ud83d\ude80 And many more!\n", "\n", "**License**: BSD 3-Clause (very permissive!)\n", "\n", @@ -1397,46 +1500,46 @@ "\n", "---\n", "\n", - "### ๐ŸŒˆ What's Next?\n", + "### \ud83c\udf08 What's Next?\n", "\n", - "1. โญ **Star the repo** to show support and stay updated\n", - "2. ๐Ÿ”„ **Try modifying** the Catch game (make it harder? bigger grid?)\n", - "3. ๐ŸŽฎ **Explore** other OpenSpiel games\n", - "4. ๐Ÿ› ๏ธ **Build** your own environment integration\n", - "5. ๐Ÿ’ฌ **Share** what you build with the community!" + "1. \u2b50 **Star the repo** to show support and stay updated\n", + "2. \ud83d\udd04 **Try modifying** the Catch game (make it harder? bigger grid?)\n", + "3. \ud83c\udfae **Explore** other OpenSpiel games\n", + "4. \ud83d\udee0\ufe0f **Build** your own environment integration\n", + "5. \ud83d\udcac **Share** what you build with the community!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## ๐Ÿ“š Resources\n", + "## \ud83d\udcda Resources\n", "\n", "
\n", "\n", - "### ๐Ÿ”— Essential Links\n", + "### \ud83d\udd17 Essential Links\n", "\n", - "- **๐Ÿ  OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n", - "- **๐ŸŽฎ OpenSpiel**: https://github.com/google-deepmind/open_spiel\n", - "- **โšก FastAPI Docs**: https://fastapi.tiangolo.com/\n", - "- **๐Ÿณ Docker Guide**: https://docs.docker.com/get-started/\n", - "- **๐Ÿ”ฅ PyTorch**: https://pytorch.org/\n", + "- **\ud83c\udfe0 OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n", + "- **\ud83c\udfae OpenSpiel**: https://github.com/google-deepmind/open_spiel\n", + "- **\u26a1 FastAPI Docs**: https://fastapi.tiangolo.com/\n", + "- **\ud83d\udc33 Docker Guide**: https://docs.docker.com/get-started/\n", + "- **\ud83d\udd25 PyTorch**: https://pytorch.org/\n", "\n", - "### ๐Ÿ“– Documentation Deep Dives\n", + "### \ud83d\udcd6 Documentation Deep Dives\n", "\n", "- **Environment Creation Guide**: `src/envs/README.md`\n", "- **OpenSpiel Integration**: `src/envs/openspiel_env/README.md`\n", "- **Example Scripts**: `examples/`\n", "- **RFC 001**: [Baseline API Specs](https://github.com/meta-pytorch/OpenEnv/pull/26)\n", "\n", - "### ๐ŸŽ“ Community & Support\n", + "### \ud83c\udf93 Community & Support\n", "\n", "**Supported by amazing organizations:**\n", - "- ๐Ÿ”ฅ Meta PyTorch\n", - "- ๐Ÿค— Hugging Face\n", - "- โšก Unsloth AI\n", - "- ๐ŸŒŸ Reflection AI\n", - "- ๐Ÿš€ And many more!\n", + "- \ud83d\udd25 Meta PyTorch\n", + "- \ud83e\udd17 Hugging Face\n", + "- \u26a1 Unsloth AI\n", + "- \ud83c\udf1f Reflection AI\n", + "- \ud83d\ude80 And many more!\n", "\n", "**License**: BSD 3-Clause (very permissive!)\n", "\n", @@ -1446,13 +1549,13 @@ "\n", "---\n", "\n", - "### ๐ŸŒˆ What's Next?\n", + "### \ud83c\udf08 What's Next?\n", "\n", - "1. โญ **Star the repo** to show support and stay updated\n", - "2. ๐Ÿ”„ **Try modifying** the Catch game (make it harder? bigger grid?)\n", - "3. ๐ŸŽฎ **Explore** other OpenSpiel games\n", - "4. ๐Ÿ› ๏ธ **Build** your own environment integration\n", - "5. ๐Ÿ’ฌ **Share** what you build with the community!" + "1. \u2b50 **Star the repo** to show support and stay updated\n", + "2. \ud83d\udd04 **Try modifying** the Catch game (make it harder? bigger grid?)\n", + "3. \ud83c\udfae **Explore** other OpenSpiel games\n", + "4. \ud83d\udee0\ufe0f **Build** your own environment integration\n", + "5. \ud83d\udcac **Share** what you build with the community!" ] }, { @@ -1463,38 +1566,38 @@ "\n", "
\n", "\n", - "# ๐ŸŽ‰ Congratulations! You Did It! ๐ŸŽ‰\n", + "# \ud83c\udf89 Congratulations! You Did It! \ud83c\udf89\n", "\n", "### You're now an OpenEnv expert!\n", "\n", "
\n", "\n", - "## โœ… What You've Mastered:\n", + "## \u2705 What You've Mastered:\n", "\n", - "**๐Ÿง  Concepts**\n", + "**\ud83e\udde0 Concepts**\n", "- How RL works (the observe-act-reward loop)\n", "- Why OpenEnv matters (production-ready RL)\n", "- How to use existing environments\n", "\n", - "**๐Ÿ› ๏ธ Practical Skills**\n", + "**\ud83d\udee0\ufe0f Practical Skills**\n", "- Creating new integrations\n", "- Building type-safe environments\n", "- Deploying to production\n", "\n", - "**๐ŸŽฏ Real Experience**\n", + "**\ud83c\udfaf Real Experience**\n", "- Built a complete RL environment\n", "- Tested multiple policies\n", "- Watched learning happen in real-time!\n", "\n", "---\n", "\n", - "### Now go build something amazing! ๐Ÿš€\n", + "### Now go build something amazing! \ud83d\ude80\n", "\n", "**Welcome to the future of RL with PyTorch & OpenEnv**\n", "\n", "
\n", "\n", - "[![Star on GitHub](https://img.shields.io/badge/โญ_Star_on_GitHub-gray?style=for-the-badge)](https://github.com/meta-pytorch/OpenEnv)\n", + "[![Star on GitHub](https://img.shields.io/badge/\u2b50_Star_on_GitHub-gray?style=for-the-badge)](https://github.com/meta-pytorch/OpenEnv)\n", "\n", "
\n", "\n", @@ -1502,16 +1605,16 @@ "\n", "
\n", "\n", - "## ๐ŸŒŸ Want to Learn More?\n", + "## \ud83c\udf1f Want to Learn More?\n", "\n", - "- ๐Ÿ“– Check out the [docs](https://github.com/meta-pytorch/OpenEnv)\n", - "- ๐ŸŽฎ Try the other example games\n", - "- ๐Ÿ’ฌ Join the community discussions\n", - "- ๐Ÿ› ๏ธ Build your own integration\n", - "- ๐Ÿš€ Deploy to production\n", - "- โญ Star the repo to stay updated!\n", + "- \ud83d\udcd6 Check out the [docs](https://github.com/meta-pytorch/OpenEnv)\n", + "- \ud83c\udfae Try the other example games\n", + "- \ud83d\udcac Join the community discussions\n", + "- \ud83d\udee0\ufe0f Build your own integration\n", + "- \ud83d\ude80 Deploy to production\n", + "- \u2b50 Star the repo to stay updated!\n", "\n", - "**Happy coding! ๐ŸŽŠ**\n", + "**Happy coding! \ud83c\udf8a**\n", "\n", "
" ] @@ -1538,4 +1641,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file diff --git a/fix_notebook.py b/fix_notebook.py new file mode 100644 index 0000000..1e73044 --- /dev/null +++ b/fix_notebook.py @@ -0,0 +1,171 @@ +#!/usr/bin/env python3 +import json + +# Read notebook +with open('examples/OpenEnv_Tutorial.ipynb', 'r') as f: + nb = json.load(f) + +# Insert TOC after cell 1 +toc_cell = { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## ๐Ÿ“‘ Table of Contents\n", + "\n", + "
\n", + "\n", + "**Quick Navigation** - Click any section to jump right there! ๐ŸŽฏ\n", + "\n", + "### Foundation\n", + "- [Part 1: RL in 60 Seconds โฑ๏ธ](#part-1)\n", + "- [Part 2: The Problem with Traditional RL ๐Ÿ˜ค](#part-2)\n", + "- [Part 3: Setup ๐Ÿ› ๏ธ](#part-3)\n", + "\n", + "### Architecture\n", + "- [Part 4: The OpenEnv Pattern ๐Ÿ—๏ธ](#part-4)\n", + "- [Part 5: Example Integration - OpenSpiel ๐ŸŽฎ](#part-5)\n", + "\n", + "### Hands-On Demo\n", + "- [Part 6: Interactive Demo ๐ŸŽฎ](#part-6)\n", + "- [Part 7: Four Policies ๐Ÿค–](#part-7)\n", + "- [Part 8: Policy Competition! ๐Ÿ†](#part-8)\n", + "\n", + "### Advanced\n", + "- [Part 9: Using Real OpenSpiel ๐ŸŽฎ](#part-9)\n", + "- [Part 10: Create Your Own Integration ๐Ÿ› ๏ธ](#part-10)\n", + "\n", + "### Wrap Up\n", + "- [Summary: Your Journey ๐ŸŽ“](#summary)\n", + "- [Resources ๐Ÿ“š](#resources)\n", + "\n", + "
\n", + "\n", + "---" + ] +} + +# Insert setup code cell after Part 3 header +setup_cell = { + "cell_type": "code", + "execution_count": None, + "metadata": {}, + "outputs": [], + "source": [ + "# Detect environment\n", + "try:\n", + " import google.colab\n", + " IN_COLAB = True\n", + " print(\"๐ŸŒ Running in Google Colab - Perfect!\")\n", + "except ImportError:\n", + " IN_COLAB = False\n", + " print(\"๐Ÿ’ป Running locally - Nice!\")\n", + "\n", + "if IN_COLAB:\n", + " print(\"\\n๐Ÿ“ฆ Cloning OpenEnv repository...\")\n", + " !git clone https://github.com/meta-pytorch/OpenEnv.git > /dev/null 2>&1\n", + " %cd OpenEnv\n", + " \n", + " print(\"๐Ÿ“š Installing dependencies (this takes ~10 seconds)...\")\n", + " !pip install -q fastapi uvicorn requests\n", + " \n", + " import sys\n", + " sys.path.insert(0, './src')\n", + " print(\"\\nโœ… Setup complete! Everything is ready to go! ๐ŸŽ‰\")\n", + "else:\n", + " import sys\n", + " from pathlib import Path\n", + " sys.path.insert(0, str(Path.cwd().parent / 'src'))\n", + " print(\"โœ… Using local OpenEnv installation\")\n", + "\n", + "print(\"\\n๐Ÿš€ Ready to explore OpenEnv and build amazing things!\")\n", + "print(\"๐Ÿ’ก Tip: Run cells top-to-bottom for the best experience.\\n\")" + ] +} + +# Insert architecture diagram after Part 2 +arch_cell = { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The Architecture\n", + "\n", + "```\n", + "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", + "โ”‚ YOUR TRAINING CODE โ”‚\n", + "โ”‚ โ”‚\n", + "โ”‚ env = OpenSpielEnv(...) โ† Import the client โ”‚\n", + "โ”‚ result = env.reset() โ† Type-safe! โ”‚\n", + "โ”‚ result = env.step(action) โ† Type-safe! โ”‚\n", + "โ”‚ โ”‚\n", + "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", + " โ”‚\n", + " โ”‚ HTTP/JSON (Language-Agnostic)\n", + " โ”‚ POST /reset, POST /step, GET /state\n", + " โ”‚\n", + "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", + "โ”‚ DOCKER CONTAINER โ”‚\n", + "โ”‚ โ”‚\n", + "โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚\n", + "โ”‚ โ”‚ FastAPI Server โ”‚ โ”‚\n", + "โ”‚ โ”‚ โ””โ”€ Environment (reset, step, state) โ”‚ โ”‚\n", + "โ”‚ โ”‚ โ””โ”€ Your Game/Simulation Logic โ”‚ โ”‚\n", + "โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚\n", + "โ”‚ โ”‚\n", + "โ”‚ Isolated โ€ข Reproducible โ€ข Secure โ”‚\n", + "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", + "```\n", + "\n", + "
\n", + "\n", + "**๐ŸŽฏ Key Insight**: You never see HTTP details - just clean Python methods!\n", + "\n", + "```python\n", + "env.reset() # Under the hood: HTTP POST to /reset\n", + "env.step(...) # Under the hood: HTTP POST to /step\n", + "env.state() # Under the hood: HTTP GET to /state\n", + "```\n", + "\n", + "The magic? OpenEnv handles all the plumbing. You focus on RL! โœจ\n", + "\n", + "
" + ] +} + +# Check which cells exist +has_toc = any('Table of Contents' in ''.join(cell.get('source', [])) for cell in nb['cells']) +has_setup = any('IN_COLAB' in ''.join(cell.get('source', [])) for cell in nb['cells']) +has_arch = any('โ”Œโ”€โ”€โ”€' in ''.join(cell.get('source', [])) and 'YOUR TRAINING CODE' in ''.join(cell.get('source', [])) for cell in nb['cells']) + +print(f"Current state:") +print(f" TOC present: {has_toc}") +print(f" Setup code present: {has_setup}") +print(f" Architecture diagram present: {has_arch}") + +# Insert TOC after cell 1 if missing +if not has_toc: + nb['cells'].insert(2, toc_cell) + print("โœ… Added TOC") + +# Find Part 3 header and add setup cell after it if missing +if not has_setup: + for i, cell in enumerate(nb['cells']): + if 'Part 3: Setup' in ''.join(cell.get('source', [])) and cell['cell_type'] == 'markdown': + nb['cells'].insert(i + 1, setup_cell) + print("โœ… Added setup code cell") + break + +# Find Part 2 and add architecture diagram if missing +if not has_arch: + for i, cell in enumerate(nb['cells']): + if 'Part 2:' in ''.join(cell.get('source', [])) and 'The OpenEnv Philosophy' in ''.join(cell.get('source', [])): + nb['cells'].insert(i + 1, arch_cell) + print("โœ… Added architecture diagram") + break + +# Save +with open('examples/OpenEnv_Tutorial.ipynb', 'w') as f: + json.dump(nb, f, indent=1) + +print(f"\nโœ… Notebook fixed! Total cells: {len(nb['cells'])}") From c0ad1c223369c734701566b472256f8b647c6b8b Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 14:06:48 -0700 Subject: [PATCH 16/19] Delete fix_notebook.py --- fix_notebook.py | 171 ------------------------------------------------ 1 file changed, 171 deletions(-) delete mode 100644 fix_notebook.py diff --git a/fix_notebook.py b/fix_notebook.py deleted file mode 100644 index 1e73044..0000000 --- a/fix_notebook.py +++ /dev/null @@ -1,171 +0,0 @@ -#!/usr/bin/env python3 -import json - -# Read notebook -with open('examples/OpenEnv_Tutorial.ipynb', 'r') as f: - nb = json.load(f) - -# Insert TOC after cell 1 -toc_cell = { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "## ๐Ÿ“‘ Table of Contents\n", - "\n", - "
\n", - "\n", - "**Quick Navigation** - Click any section to jump right there! ๐ŸŽฏ\n", - "\n", - "### Foundation\n", - "- [Part 1: RL in 60 Seconds โฑ๏ธ](#part-1)\n", - "- [Part 2: The Problem with Traditional RL ๐Ÿ˜ค](#part-2)\n", - "- [Part 3: Setup ๐Ÿ› ๏ธ](#part-3)\n", - "\n", - "### Architecture\n", - "- [Part 4: The OpenEnv Pattern ๐Ÿ—๏ธ](#part-4)\n", - "- [Part 5: Example Integration - OpenSpiel ๐ŸŽฎ](#part-5)\n", - "\n", - "### Hands-On Demo\n", - "- [Part 6: Interactive Demo ๐ŸŽฎ](#part-6)\n", - "- [Part 7: Four Policies ๐Ÿค–](#part-7)\n", - "- [Part 8: Policy Competition! ๐Ÿ†](#part-8)\n", - "\n", - "### Advanced\n", - "- [Part 9: Using Real OpenSpiel ๐ŸŽฎ](#part-9)\n", - "- [Part 10: Create Your Own Integration ๐Ÿ› ๏ธ](#part-10)\n", - "\n", - "### Wrap Up\n", - "- [Summary: Your Journey ๐ŸŽ“](#summary)\n", - "- [Resources ๐Ÿ“š](#resources)\n", - "\n", - "
\n", - "\n", - "---" - ] -} - -# Insert setup code cell after Part 3 header -setup_cell = { - "cell_type": "code", - "execution_count": None, - "metadata": {}, - "outputs": [], - "source": [ - "# Detect environment\n", - "try:\n", - " import google.colab\n", - " IN_COLAB = True\n", - " print(\"๐ŸŒ Running in Google Colab - Perfect!\")\n", - "except ImportError:\n", - " IN_COLAB = False\n", - " print(\"๐Ÿ’ป Running locally - Nice!\")\n", - "\n", - "if IN_COLAB:\n", - " print(\"\\n๐Ÿ“ฆ Cloning OpenEnv repository...\")\n", - " !git clone https://github.com/meta-pytorch/OpenEnv.git > /dev/null 2>&1\n", - " %cd OpenEnv\n", - " \n", - " print(\"๐Ÿ“š Installing dependencies (this takes ~10 seconds)...\")\n", - " !pip install -q fastapi uvicorn requests\n", - " \n", - " import sys\n", - " sys.path.insert(0, './src')\n", - " print(\"\\nโœ… Setup complete! Everything is ready to go! ๐ŸŽ‰\")\n", - "else:\n", - " import sys\n", - " from pathlib import Path\n", - " sys.path.insert(0, str(Path.cwd().parent / 'src'))\n", - " print(\"โœ… Using local OpenEnv installation\")\n", - "\n", - "print(\"\\n๐Ÿš€ Ready to explore OpenEnv and build amazing things!\")\n", - "print(\"๐Ÿ’ก Tip: Run cells top-to-bottom for the best experience.\\n\")" - ] -} - -# Insert architecture diagram after Part 2 -arch_cell = { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### The Architecture\n", - "\n", - "```\n", - "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", - "โ”‚ YOUR TRAINING CODE โ”‚\n", - "โ”‚ โ”‚\n", - "โ”‚ env = OpenSpielEnv(...) โ† Import the client โ”‚\n", - "โ”‚ result = env.reset() โ† Type-safe! โ”‚\n", - "โ”‚ result = env.step(action) โ† Type-safe! โ”‚\n", - "โ”‚ โ”‚\n", - "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", - " โ”‚\n", - " โ”‚ HTTP/JSON (Language-Agnostic)\n", - " โ”‚ POST /reset, POST /step, GET /state\n", - " โ”‚\n", - "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", - "โ”‚ DOCKER CONTAINER โ”‚\n", - "โ”‚ โ”‚\n", - "โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚\n", - "โ”‚ โ”‚ FastAPI Server โ”‚ โ”‚\n", - "โ”‚ โ”‚ โ””โ”€ Environment (reset, step, state) โ”‚ โ”‚\n", - "โ”‚ โ”‚ โ””โ”€ Your Game/Simulation Logic โ”‚ โ”‚\n", - "โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚\n", - "โ”‚ โ”‚\n", - "โ”‚ Isolated โ€ข Reproducible โ€ข Secure โ”‚\n", - "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", - "```\n", - "\n", - "
\n", - "\n", - "**๐ŸŽฏ Key Insight**: You never see HTTP details - just clean Python methods!\n", - "\n", - "```python\n", - "env.reset() # Under the hood: HTTP POST to /reset\n", - "env.step(...) # Under the hood: HTTP POST to /step\n", - "env.state() # Under the hood: HTTP GET to /state\n", - "```\n", - "\n", - "The magic? OpenEnv handles all the plumbing. You focus on RL! โœจ\n", - "\n", - "
" - ] -} - -# Check which cells exist -has_toc = any('Table of Contents' in ''.join(cell.get('source', [])) for cell in nb['cells']) -has_setup = any('IN_COLAB' in ''.join(cell.get('source', [])) for cell in nb['cells']) -has_arch = any('โ”Œโ”€โ”€โ”€' in ''.join(cell.get('source', [])) and 'YOUR TRAINING CODE' in ''.join(cell.get('source', [])) for cell in nb['cells']) - -print(f"Current state:") -print(f" TOC present: {has_toc}") -print(f" Setup code present: {has_setup}") -print(f" Architecture diagram present: {has_arch}") - -# Insert TOC after cell 1 if missing -if not has_toc: - nb['cells'].insert(2, toc_cell) - print("โœ… Added TOC") - -# Find Part 3 header and add setup cell after it if missing -if not has_setup: - for i, cell in enumerate(nb['cells']): - if 'Part 3: Setup' in ''.join(cell.get('source', [])) and cell['cell_type'] == 'markdown': - nb['cells'].insert(i + 1, setup_cell) - print("โœ… Added setup code cell") - break - -# Find Part 2 and add architecture diagram if missing -if not has_arch: - for i, cell in enumerate(nb['cells']): - if 'Part 2:' in ''.join(cell.get('source', [])) and 'The OpenEnv Philosophy' in ''.join(cell.get('source', [])): - nb['cells'].insert(i + 1, arch_cell) - print("โœ… Added architecture diagram") - break - -# Save -with open('examples/OpenEnv_Tutorial.ipynb', 'w') as f: - json.dump(nb, f, indent=1) - -print(f"\nโœ… Notebook fixed! Total cells: {len(nb['cells'])}") From 578e5a0c8b7172dcf8df2c3a6546442b991c433c Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 14:09:40 -0700 Subject: [PATCH 17/19] Update OpenEnv_Tutorial.ipynb --- examples/OpenEnv_Tutorial.ipynb | 800 +++++++++++++++++++------------- 1 file changed, 489 insertions(+), 311 deletions(-) diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index 126697f..68a4568 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -8,24 +8,28 @@ "\n", "\"PyTorch\"\n", "\n", - "Author: [Sanyam Bhutani](http://twitter.com/bhutanisanyam1/)\n", + "\n", "\n", "# OpenEnv: Production RL Made Simple\n", "\n", - "### *From \"Hello World\" to RL Training in 5 Minutes* \u2728\n", + "### *From \"Hello World\" to RL Training in 5 Minutes* โœจ\n", "\n", "---\n", "\n", "**What if RL environments were as easy to use as REST APIs?**\n", "\n", - "That's OpenEnv. Type-safe. Isolated. Production-ready. \ud83c\udfaf\n", + "That's OpenEnv. Type-safe. Isolated. Production-ready. ๐ŸŽฏ\n", "\n", "[![GitHub](https://img.shields.io/badge/GitHub-meta--pytorch%2FOpenEnv-blue?logo=github)](https://github.com/meta-pytorch/OpenEnv)\n", "[![License](https://img.shields.io/badge/License-BSD%203--Clause-green.svg)](https://opensource.org/licenses/BSD-3-Clause)\n", "[![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?logo=pytorch&logoColor=white)](https://pytorch.org/)\n", "\n", + "Author: [Sanyam Bhutani](http://twitter.com/bhutanisanyam1/)\n", + "\n", "
\n", "\n", + "\n", + "\n", "---" ] }, @@ -33,50 +37,50 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## \ud83d\udccb What You'll Learn\n", + "## ๐Ÿ“‹ What You'll Learn\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", - "**\ud83c\udfaf Part 1-2: The Fundamentals**\n", - "- \u26a1 RL in 60 seconds\n", - "- \ud83e\udd14 Why existing solutions fall short\n", - "- \ud83d\udca1 The OpenEnv solution\n", + "**๐ŸŽฏ Part 1-2: The Fundamentals**\n", + "- โšก RL in 60 seconds\n", + "- ๐Ÿค” Why existing solutions fall short\n", + "- ๐Ÿ’ก The OpenEnv solution\n", "\n", "\n", "\n", - "**\ud83c\udfd7\ufe0f Part 3-5: The Architecture**\n", - "- \ud83d\udd27 How OpenEnv works\n", - "- \ud83d\udd0d Exploring real code\n", - "- \ud83c\udfae OpenSpiel integration example\n", + "**๐Ÿ—๏ธ Part 3-5: The Architecture**\n", + "- ๐Ÿ”ง How OpenEnv works\n", + "- ๐Ÿ” Exploring real code\n", + "- ๐ŸŽฎ OpenSpiel integration example\n", "\n", "
\n", "\n", - "**\ud83c\udfae Part 6-8: Hands-On Demo**\n", - "- \ud83d\udd28 Build a game environment\n", - "- \ud83e\udd16 Test 4 different policies\n", - "- \ud83d\udc40 Watch learning happen live\n", + "**๐ŸŽฎ Part 6-8: Hands-On Demo**\n", + "- ๐Ÿ”จ Build a game environment\n", + "- ๐Ÿค– Test 4 different policies\n", + "- ๐Ÿ‘€ Watch learning happen live\n", "\n", "\n", "\n", - "**\ud83d\udd27 Part 9-10: Going Further**\n", - "- \ud83d\ude80 Use real OpenSpiel\n", - "- \u2728 Create your own integration\n", - "- \ud83c\udf10 Deploy to production\n", + "**๐Ÿ”ง Part 9-10: Going Further**\n", + "- ๐Ÿš€ Use real OpenSpiel\n", + "- โœจ Create your own integration\n", + "- ๐ŸŒ Deploy to production\n", "\n", "
\n", "\n", - "> \ud83d\udca1 **Pro Tip**: This notebook is designed to run top-to-bottom in Google Colab with zero setup!\n", + "> ๐Ÿ’ก **Pro Tip**: This notebook is designed to run top-to-bottom in Google Colab with zero setup!\n", "> \n", - "> \u23f1\ufe0f **Time**: ~5 minutes | \ud83d\udcca **Difficulty**: Beginner-friendly | \ud83c\udfaf **Outcome**: Production-ready RL knowledge" + "> โฑ๏ธ **Time**: ~5 minutes | ๐Ÿ“Š **Difficulty**: Beginner-friendly | ๐ŸŽฏ **Outcome**: Production-ready RL knowledge" ] }, { @@ -85,33 +89,33 @@ "source": [ "---\n", "\n", - "## \ud83d\udcd1 Table of Contents\n", + "## ๐Ÿ“‘ Table of Contents\n", "\n", "
\n", "\n", - "**Quick Navigation** - Click any section to jump right there! \ud83c\udfaf\n", + "**Quick Navigation** - Click any section to jump right there! ๐ŸŽฏ\n", "\n", "### Foundation\n", - "- [Part 1: RL in 60 Seconds \u23f1\ufe0f](#part-1)\n", - "- [Part 2: The Problem with Traditional RL \ud83d\ude24](#part-2)\n", - "- [Part 3: Setup \ud83d\udee0\ufe0f](#part-3)\n", + "- [Part 1: RL in 60 Seconds โฑ๏ธ](#part-1)\n", + "- [Part 2: The Problem with Traditional RL ๐Ÿ˜ค](#part-2)\n", + "- [Part 3: Setup ๐Ÿ› ๏ธ](#part-3)\n", "\n", "### Architecture\n", - "- [Part 4: The OpenEnv Pattern \ud83c\udfd7\ufe0f](#part-4)\n", - "- [Part 5: Example Integration - OpenSpiel \ud83c\udfae](#part-5)\n", + "- [Part 4: The OpenEnv Pattern ๐Ÿ—๏ธ](#part-4)\n", + "- [Part 5: Example Integration - OpenSpiel ๐ŸŽฎ](#part-5)\n", "\n", "### Hands-On Demo\n", - "- [Part 6: Interactive Demo \ud83c\udfae](#part-6)\n", - "- [Part 7: Four Policies \ud83e\udd16](#part-7)\n", - "- [Part 8: Policy Competition! \ud83c\udfc6](#part-8)\n", + "- [Part 6: Interactive Demo ๐ŸŽฎ](#part-6)\n", + "- [Part 7: Four Policies ๐Ÿค–](#part-7)\n", + "- [Part 8: Policy Competition! ๐Ÿ†](#part-8)\n", "\n", "### Advanced\n", - "- [Part 9: Using Real OpenSpiel \ud83c\udfae](#part-9)\n", - "- [Part 10: Create Your Own Integration \ud83d\udee0\ufe0f](#part-10)\n", + "- [Part 9: Using Real OpenSpiel ๐ŸŽฎ](#part-9)\n", + "- [Part 10: Create Your Own Integration ๐Ÿ› ๏ธ](#part-10)\n", "\n", "### Wrap Up\n", - "- [Summary: Your Journey \ud83c\udf93](#summary)\n", - "- [Resources \ud83d\udcda](#resources)\n", + "- [Summary: Your Journey ๐ŸŽ“](#summary)\n", + "- [Resources ๐Ÿ“š](#resources)\n", "\n", "
\n", "\n", @@ -128,36 +132,61 @@ "try:\n", " import google.colab\n", " IN_COLAB = True\n", - " print(\"\ud83c\udf10 Running in Google Colab - Perfect!\")\n", + " print(\"๐ŸŒ Running in Google Colab - Perfect!\")\n", "except ImportError:\n", " IN_COLAB = False\n", - " print(\"\ud83d\udcbb Running locally - Nice!\")\n", + " print(\"๐Ÿ’ป Running locally - Nice!\")\n", "\n", "if IN_COLAB:\n", - " print(\"\\n\ud83d\udce6 Cloning OpenEnv repository...\")\n", + " print(\"\\n๐Ÿ“ฆ Cloning OpenEnv repository...\")\n", " !git clone https://github.com/meta-pytorch/OpenEnv.git > /dev/null 2>&1\n", " %cd OpenEnv\n", " \n", - " print(\"\ud83d\udcda Installing dependencies (this takes ~10 seconds)...\")\n", + " print(\"๐Ÿ“š Installing dependencies (this takes ~10 seconds)...\")\n", " !pip install -q fastapi uvicorn requests\n", " \n", " import sys\n", " sys.path.insert(0, './src')\n", - " print(\"\\n\u2705 Setup complete! Everything is ready to go! \ud83c\udf89\")\n", + " print(\"\\nโœ… Setup complete! Everything is ready to go! ๐ŸŽ‰\")\n", "else:\n", " import sys\n", " from pathlib import Path\n", " sys.path.insert(0, str(Path.cwd().parent / 'src'))\n", - " print(\"\u2705 Using local OpenEnv installation\")\n", + " print(\"โœ… Using local OpenEnv installation\")\n", "\n", - "print(\"\\n\ud83d\ude80 Ready to explore OpenEnv and build amazing things!\")\n", - "print(\"\ud83d\udca1 Tip: Run cells top-to-bottom for the best experience.\\n\")" + "print(\"\\n๐Ÿš€ Ready to explore OpenEnv and build amazing things!\")\n", + "print(\"๐Ÿ’ก Tip: Run cells top-to-bottom for the best experience.\\n\")" ] }, { "cell_type": "markdown", - "source": "---\n\n\n# Part 1: RL in 60 Seconds \u23f1\ufe0f\n\n
\n\n**Reinforcement Learning is simpler than you think.**\n\nIt's just a loop:\n\n```\nwhile not done:\n observation = environment.observe()\n action = policy.choose(observation)\n reward = environment.step(action)\n policy.learn(reward)\n```\n\nThat's it. That's RL.\n\n
\n\nLet's see it in action:", - "metadata": {} + "metadata": {}, + "source": [ + "---\n", + "\n", + "\n", + "# Part 1: RL in 60 Seconds โฑ๏ธ\n", + "\n", + "
\n", + "\n", + "**Reinforcement Learning is simpler than you think.**\n", + "\n", + "It's just a loop:\n", + "\n", + "```\n", + "while not done:\n", + " observation = environment.observe()\n", + " action = policy.choose(observation)\n", + " reward = environment.step(action)\n", + " policy.learn(reward)\n", + "```\n", + "\n", + "That's it. That's RL.\n", + "\n", + "
\n", + "\n", + "Let's see it in action:" + ] }, { "cell_type": "markdown", @@ -165,7 +194,7 @@ "source": [ "---\n", "\n", - "# Part 1: RL in 60 Seconds \u23f1\ufe0f\n", + "# Part 1: RL in 60 Seconds โฑ๏ธ\n", "\n", "
\n", "\n", @@ -196,16 +225,16 @@ "source": [ "import random\n", "\n", - "print(\"\ud83c\udfb2 \" + \"=\"*58 + \" \ud83c\udfb2\")\n", + "print(\"๐ŸŽฒ \" + \"=\"*58 + \" ๐ŸŽฒ\")\n", "print(\" Number Guessing Game - The Simplest RL Example\")\n", - "print(\"\ud83c\udfb2 \" + \"=\"*58 + \" \ud83c\udfb2\")\n", + "print(\"๐ŸŽฒ \" + \"=\"*58 + \" ๐ŸŽฒ\")\n", "\n", "# Environment setup\n", "target = random.randint(1, 10)\n", "guesses_left = 3\n", "\n", - "print(f\"\\n\ud83c\udfaf I'm thinking of a number between 1 and 10...\")\n", - "print(f\"\ud83d\udcad You have {guesses_left} guesses. Let's see how random guessing works!\\n\")\n", + "print(f\"\\n๐ŸŽฏ I'm thinking of a number between 1 and 10...\")\n", + "print(f\"๐Ÿ’ญ You have {guesses_left} guesses. Let's see how random guessing works!\\n\")\n", "\n", "# The RL Loop - Pure random policy (no learning!)\n", "while guesses_left > 0:\n", @@ -213,21 +242,21 @@ " guess = random.randint(1, 10)\n", " guesses_left -= 1\n", " \n", - " print(f\"\ud83d\udcad Guess #{3-guesses_left}: {guess}\", end=\" \u2192 \")\n", + " print(f\"๐Ÿ’ญ Guess #{3-guesses_left}: {guess}\", end=\" โ†’ \")\n", " \n", " # Reward signal (but we're not using it!)\n", " if guess == target:\n", - " print(\"\ud83c\udf89 Correct! +10 points\")\n", + " print(\"๐ŸŽ‰ Correct! +10 points\")\n", " break\n", " elif abs(guess - target) <= 2:\n", - " print(\"\ud83d\udd25 Warm! (close)\")\n", + " print(\"๐Ÿ”ฅ Warm! (close)\")\n", " else:\n", - " print(\"\u2744\ufe0f Cold! (far)\")\n", + " print(\"โ„๏ธ Cold! (far)\")\n", "else:\n", - " print(f\"\\n\ud83d\udc94 Out of guesses. The number was {target}.\")\n", + " print(f\"\\n๐Ÿ’” Out of guesses. The number was {target}.\")\n", "\n", "print(\"\\n\" + \"=\"*62)\n", - "print(\"\ud83d\udca1 This is RL: Observe \u2192 Act \u2192 Reward \u2192 Repeat\")\n", + "print(\"๐Ÿ’ก This is RL: Observe โ†’ Act โ†’ Reward โ†’ Repeat\")\n", "print(\" But this policy is terrible! It doesn't learn from rewards.\")\n", "print(\"=\"*62 + \"\\n\")" ] @@ -239,11 +268,11 @@ "---\n", "\n", "\n", - "# Part 2: The Problem with Traditional RL \ud83d\ude24\n", + "# Part 2: The Problem with Traditional RL ๐Ÿ˜ค\n", "\n", "
\n", "\n", - "## \ud83e\udd14 Why Can't We Just Use OpenAI Gym?\n", + "## ๐Ÿค” Why Can't We Just Use OpenAI Gym?\n", "\n", "Good question! Gym is great for research, but production needs more...\n", "\n", @@ -257,50 +286,50 @@ "\n", "\n", "Type Safety\n", - "\u274c obs[0][3] - what is this?\n", - "\u2705 obs.info_state - IDE knows!\n", + "โŒ obs[0][3] - what is this?\n", + "โœ… obs.info_state - IDE knows!\n", "\n", "\n", "Isolation\n", - "\u274c Same process (can crash your training)\n", - "\u2705 Docker containers (fully isolated)\n", + "โŒ Same process (can crash your training)\n", + "โœ… Docker containers (fully isolated)\n", "\n", "\n", "Deployment\n", - "\u274c \"Works on my machine\" \ud83e\udd37\n", - "\u2705 Same container everywhere \ud83d\udc33\n", + "โŒ \"Works on my machine\" ๐Ÿคท\n", + "โœ… Same container everywhere ๐Ÿณ\n", "\n", "\n", "Scaling\n", - "\u274c Hard to distribute\n", - "\u2705 Deploy to Kubernetes \u2638\ufe0f\n", + "โŒ Hard to distribute\n", + "โœ… Deploy to Kubernetes โ˜ธ๏ธ\n", "\n", "\n", "Language\n", - "\u274c Python only\n", - "\u2705 Any language (HTTP API) \ud83c\udf10\n", + "โŒ Python only\n", + "โœ… Any language (HTTP API) ๐ŸŒ\n", "\n", "\n", "Debugging\n", - "\u274c Cryptic numpy errors\n", - "\u2705 Clear type errors \ud83d\udc1b\n", + "โŒ Cryptic numpy errors\n", + "โœ… Clear type errors ๐Ÿ›\n", "\n", "\n", "\n", "
\n", "\n", - "## \ud83d\udca1 The OpenEnv Philosophy\n", + "## ๐Ÿ’ก The OpenEnv Philosophy\n", "\n", "**\"RL environments should be like microservices\"**\n", "\n", "Think of it like this: You don't run your database in the same process as your web server, right? Same principle!\n", "\n", - "- \ud83d\udd12 **Isolated**: Run in containers (security + stability)\n", - "- \ud83c\udf10 **Standard**: HTTP API, works everywhere\n", - "- \ud83d\udce6 **Versioned**: Docker images (reproducibility!)\n", - "- \ud83d\ude80 **Scalable**: Deploy to cloud with one command\n", - "- \ud83d\udee1\ufe0f **Type-safe**: Catch bugs before they happen\n", - "- \ud83d\udd04 **Portable**: Works on Mac, Linux, Windows, Cloud\n", + "- ๐Ÿ”’ **Isolated**: Run in containers (security + stability)\n", + "- ๐ŸŒ **Standard**: HTTP API, works everywhere\n", + "- ๐Ÿ“ฆ **Versioned**: Docker images (reproducibility!)\n", + "- ๐Ÿš€ **Scalable**: Deploy to cloud with one command\n", + "- ๐Ÿ›ก๏ธ **Type-safe**: Catch bugs before they happen\n", + "- ๐Ÿ”„ **Portable**: Works on Mac, Linux, Windows, Cloud\n", "\n", "
" ] @@ -312,34 +341,34 @@ "### The Architecture\n", "\n", "```\n", - "\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n", - "\u2502 YOUR TRAINING CODE \u2502\n", - "\u2502 \u2502\n", - "\u2502 env = OpenSpielEnv(...) \u2190 Import the client \u2502\n", - "\u2502 result = env.reset() \u2190 Type-safe! \u2502\n", - "\u2502 result = env.step(action) \u2190 Type-safe! \u2502\n", - "\u2502 \u2502\n", - "\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n", - " \u2502\n", - " \u2502 HTTP/JSON (Language-Agnostic)\n", - " \u2502 POST /reset, POST /step, GET /state\n", - " \u2502\n", - "\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n", - "\u2502 DOCKER CONTAINER \u2502\n", - "\u2502 \u2502\n", - "\u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502\n", - "\u2502 \u2502 FastAPI Server \u2502 \u2502\n", - "\u2502 \u2502 \u2514\u2500 Environment (reset, step, state) \u2502 \u2502\n", - "\u2502 \u2502 \u2514\u2500 Your Game/Simulation Logic \u2502 \u2502\n", - "\u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502\n", - "\u2502 \u2502\n", - "\u2502 Isolated \u2022 Reproducible \u2022 Secure \u2502\n", - "\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n", + "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", + "โ”‚ YOUR TRAINING CODE โ”‚\n", + "โ”‚ โ”‚\n", + "โ”‚ env = OpenSpielEnv(...) โ† Import the client โ”‚\n", + "โ”‚ result = env.reset() โ† Type-safe! โ”‚\n", + "โ”‚ result = env.step(action) โ† Type-safe! โ”‚\n", + "โ”‚ โ”‚\n", + "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", + " โ”‚\n", + " โ”‚ HTTP/JSON (Language-Agnostic)\n", + " โ”‚ POST /reset, POST /step, GET /state\n", + " โ”‚\n", + "โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”\n", + "โ”‚ DOCKER CONTAINER โ”‚\n", + "โ”‚ โ”‚\n", + "โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚\n", + "โ”‚ โ”‚ FastAPI Server โ”‚ โ”‚\n", + "โ”‚ โ”‚ โ””โ”€ Environment (reset, step, state) โ”‚ โ”‚\n", + "โ”‚ โ”‚ โ””โ”€ Your Game/Simulation Logic โ”‚ โ”‚\n", + "โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚\n", + "โ”‚ โ”‚\n", + "โ”‚ Isolated โ€ข Reproducible โ€ข Secure โ”‚\n", + "โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜\n", "```\n", "\n", "
\n", "\n", - "**\ud83c\udfaf Key Insight**: You never see HTTP details - just clean Python methods!\n", + "**๐ŸŽฏ Key Insight**: You never see HTTP details - just clean Python methods!\n", "\n", "```python\n", "env.reset() # Under the hood: HTTP POST to /reset\n", @@ -347,7 +376,7 @@ "env.state() # Under the hood: HTTP GET to /state\n", "```\n", "\n", - "The magic? OpenEnv handles all the plumbing. You focus on RL! \u2728\n", + "The magic? OpenEnv handles all the plumbing. You focus on RL! โœจ\n", "\n", "
" ] @@ -355,7 +384,20 @@ { "cell_type": "markdown", "metadata": {}, - "source": "---\n\n\n# Part 3: Setup \ud83d\udee0\ufe0f\n\n
\n\n**Running in Colab?** This cell will clone OpenEnv and install dependencies automatically.\n\n**Running locally?** Make sure you're in the OpenEnv directory.\n\n
" + "source": [ + "---\n", + "\n", + "\n", + "# Part 3: Setup ๐Ÿ› ๏ธ\n", + "\n", + "
\n", + "\n", + "**Running in Colab?** This cell will clone OpenEnv and install dependencies automatically.\n", + "\n", + "**Running locally?** Make sure you're in the OpenEnv directory.\n", + "\n", + "
" + ] }, { "cell_type": "markdown", @@ -363,7 +405,7 @@ "source": [ "---\n", "\n", - "# Part 3: Setup \ud83d\udee0\ufe0f\n", + "# Part 3: Setup ๐Ÿ› ๏ธ\n", "\n", "
\n", "\n", @@ -379,7 +421,34 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": "---\n\n\n# Part 4: The OpenEnv Pattern \ud83c\udfd7\ufe0f\n\n
\n\n## Every OpenEnv Environment Has 3 Components:\n\n```\nsrc/envs/your_env/\n\u251c\u2500\u2500 \ud83d\udcdd models.py \u2190 Type-safe contracts\n\u2502 (Action, Observation, State)\n\u2502\n\u251c\u2500\u2500 \ud83d\udcf1 client.py \u2190 What YOU import\n\u2502 (HTTPEnvClient implementation)\n\u2502\n\u2514\u2500\u2500 \ud83d\udda5\ufe0f server/\n \u251c\u2500\u2500 environment.py \u2190 Game/simulation logic\n \u251c\u2500\u2500 app.py \u2190 FastAPI server\n \u2514\u2500\u2500 Dockerfile \u2190 Container definition\n```\n\n
\n\nLet's explore the actual OpenEnv code to see how this works:" + "source": [ + "---\n", + "\n", + "\n", + "# Part 4: The OpenEnv Pattern ๐Ÿ—๏ธ\n", + "\n", + "
\n", + "\n", + "## Every OpenEnv Environment Has 3 Components:\n", + "\n", + "```\n", + "src/envs/your_env/\n", + "โ”œโ”€โ”€ ๐Ÿ“ models.py โ† Type-safe contracts\n", + "โ”‚ (Action, Observation, State)\n", + "โ”‚\n", + "โ”œโ”€โ”€ ๐Ÿ“ฑ client.py โ† What YOU import\n", + "โ”‚ (HTTPEnvClient implementation)\n", + "โ”‚\n", + "โ””โ”€โ”€ ๐Ÿ–ฅ๏ธ server/\n", + " โ”œโ”€โ”€ environment.py โ† Game/simulation logic\n", + " โ”œโ”€โ”€ app.py โ† FastAPI server\n", + " โ””โ”€โ”€ Dockerfile โ† Container definition\n", + "```\n", + "\n", + "
\n", + "\n", + "Let's explore the actual OpenEnv code to see how this works:" + ] }, { "cell_type": "code", @@ -392,11 +461,11 @@ "from core.http_env_client import HTTPEnvClient\n", "\n", "print(\"=\"*70)\n", - "print(\" \ud83e\udde9 OPENENV CORE ABSTRACTIONS\")\n", + "print(\" ๐Ÿงฉ OPENENV CORE ABSTRACTIONS\")\n", "print(\"=\"*70)\n", "\n", "print(\"\"\"\n", - "\ud83d\udda5\ufe0f SERVER SIDE (runs in Docker):\n", + "๐Ÿ–ฅ๏ธ SERVER SIDE (runs in Docker):\n", "\n", " class Environment(ABC):\n", " '''Base class for all environment implementations'''\n", @@ -413,7 +482,7 @@ " def state(self) -> State:\n", " '''Get episode metadata'''\n", "\n", - "\ud83d\udcf1 CLIENT SIDE (your training code):\n", + "๐Ÿ“ฑ CLIENT SIDE (your training code):\n", "\n", " class HTTPEnvClient(ABC):\n", " '''Base class for HTTP clients'''\n", @@ -429,14 +498,54 @@ "\"\"\")\n", "\n", "print(\"=\"*70)\n", - "print(\"\\n\u2728 Same interface on both sides - communication via HTTP!\")\n", - "print(\"\ud83c\udfaf You focus on RL, OpenEnv handles the infrastructure.\\n\")" + "print(\"\\nโœจ Same interface on both sides - communication via HTTP!\")\n", + "print(\"๐ŸŽฏ You focus on RL, OpenEnv handles the infrastructure.\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, - "source": "---\n\n\n# Part 5: Example Integration - OpenSpiel \ud83c\udfae\n\n
\n\n## What is OpenSpiel?\n\n**OpenSpiel** is a library from DeepMind with **70+ game environments** for RL research.\n\n## OpenEnv's Integration\n\nWe've wrapped **6 OpenSpiel games** following the OpenEnv pattern:\n\n\n\n\n\n\n
\n\n**\ud83c\udfaf Single-Player**\n1. **Catch** - Catch falling ball\n2. **Cliff Walking** - Navigate grid\n3. **2048** - Tile puzzle\n4. **Blackjack** - Card game\n\n\n\n**\ud83d\udc65 Multi-Player**\n5. **Tic-Tac-Toe** - Classic 3\u00d73\n6. **Kuhn Poker** - Imperfect info poker\n\n
\n\nThis shows how OpenEnv can wrap **any** existing RL library!\n\n
" + "source": [ + "---\n", + "\n", + "\n", + "# Part 5: Example Integration - OpenSpiel ๐ŸŽฎ\n", + "\n", + "
\n", + "\n", + "## What is OpenSpiel?\n", + "\n", + "**OpenSpiel** is a library from DeepMind with **70+ game environments** for RL research.\n", + "\n", + "## OpenEnv's Integration\n", + "\n", + "We've wrapped **6 OpenSpiel games** following the OpenEnv pattern:\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "\n", + "**๐ŸŽฏ Single-Player**\n", + "1. **Catch** - Catch falling ball\n", + "2. **Cliff Walking** - Navigate grid\n", + "3. **2048** - Tile puzzle\n", + "4. **Blackjack** - Card game\n", + "\n", + "\n", + "\n", + "**๐Ÿ‘ฅ Multi-Player**\n", + "5. **Tic-Tac-Toe** - Classic 3ร—3\n", + "6. **Kuhn Poker** - Imperfect info poker\n", + "\n", + "
\n", + "\n", + "This shows how OpenEnv can wrap **any** existing RL library!\n", + "\n", + "
" + ] }, { "cell_type": "markdown", @@ -444,7 +553,7 @@ "source": [ "---\n", "\n", - "# Part 5: Example Integration - OpenSpiel \ud83c\udfae\n", + "# Part 5: Example Integration - OpenSpiel ๐ŸŽฎ\n", "\n", "
\n", "\n", @@ -460,7 +569,7 @@ "\n", "\n", "\n", - "**\ud83c\udfaf Single-Player**\n", + "**๐ŸŽฏ Single-Player**\n", "1. **Catch** - Catch falling ball\n", "2. **Cliff Walking** - Navigate grid\n", "3. **2048** - Tile puzzle\n", @@ -469,8 +578,8 @@ "\n", "\n", "\n", - "**\ud83d\udc65 Multi-Player**\n", - "5. **Tic-Tac-Toe** - Classic 3\u00d73\n", + "**๐Ÿ‘ฅ Multi-Player**\n", + "5. **Tic-Tac-Toe** - Classic 3ร—3\n", "6. **Kuhn Poker** - Imperfect info poker\n", "\n", "\n", @@ -484,10 +593,54 @@ }, { "cell_type": "code", - "source": "from envs.openspiel_env.client import OpenSpielEnv\n\nprint(\"=\"*70)\nprint(\" \ud83d\udd0c HOW OPENENV WRAPS OPENSPIEL\")\nprint(\"=\"*70)\n\nprint(\"\"\"\nclass OpenSpielEnv(HTTPEnvClient[OpenSpielAction, OpenSpielObservation]):\n \n def _step_payload(self, action: OpenSpielAction) -> dict:\n '''Convert typed action to JSON for HTTP'''\n return {\n \"action_id\": action.action_id,\n \"game_name\": action.game_name,\n }\n \n def _parse_result(self, payload: dict) -> StepResult:\n '''Parse HTTP JSON response into typed observation'''\n return StepResult(\n observation=OpenSpielObservation(...),\n reward=payload['reward'],\n done=payload['done']\n )\n\n\"\"\")\n\nprint(\"\u2500\" * 70)\nprint(\"\\n\u2728 Usage (works for ALL OpenEnv environments):\")\nprint(\"\"\"\n env = OpenSpielEnv(base_url=\"http://localhost:8000\")\n \n result = env.reset()\n # Returns StepResult[OpenSpielObservation] - Type safe!\n \n result = env.step(OpenSpielAction(action_id=2, game_name=\"catch\"))\n # Type checker knows this is valid!\n \n state = env.state()\n # Returns OpenSpielState\n\"\"\")\n\nprint(\"\u2500\" * 70)\nprint(\"\\n\ud83c\udfaf This pattern works for ANY environment you want to wrap!\\n\")", - "metadata": {}, "execution_count": null, - "outputs": [] + "metadata": {}, + "outputs": [], + "source": [ + "from envs.openspiel_env.client import OpenSpielEnv\n", + "\n", + "print(\"=\"*70)\n", + "print(\" ๐Ÿ”Œ HOW OPENENV WRAPS OPENSPIEL\")\n", + "print(\"=\"*70)\n", + "\n", + "print(\"\"\"\n", + "class OpenSpielEnv(HTTPEnvClient[OpenSpielAction, OpenSpielObservation]):\n", + " \n", + " def _step_payload(self, action: OpenSpielAction) -> dict:\n", + " '''Convert typed action to JSON for HTTP'''\n", + " return {\n", + " \"action_id\": action.action_id,\n", + " \"game_name\": action.game_name,\n", + " }\n", + " \n", + " def _parse_result(self, payload: dict) -> StepResult:\n", + " '''Parse HTTP JSON response into typed observation'''\n", + " return StepResult(\n", + " observation=OpenSpielObservation(...),\n", + " reward=payload['reward'],\n", + " done=payload['done']\n", + " )\n", + "\n", + "\"\"\")\n", + "\n", + "print(\"โ”€\" * 70)\n", + "print(\"\\nโœจ Usage (works for ALL OpenEnv environments):\")\n", + "print(\"\"\"\n", + " env = OpenSpielEnv(base_url=\"http://localhost:8000\")\n", + " \n", + " result = env.reset()\n", + " # Returns StepResult[OpenSpielObservation] - Type safe!\n", + " \n", + " result = env.step(OpenSpielAction(action_id=2, game_name=\"catch\"))\n", + " # Type checker knows this is valid!\n", + " \n", + " state = env.state()\n", + " # Returns OpenSpielState\n", + "\"\"\")\n", + "\n", + "print(\"โ”€\" * 70)\n", + "print(\"\\n๐ŸŽฏ This pattern works for ANY environment you want to wrap!\\n\")" + ] }, { "cell_type": "code", @@ -504,30 +657,30 @@ "from dataclasses import fields\n", "\n", "print(\"=\"*70)\n", - "print(\" \ud83c\udfae OPENSPIEL INTEGRATION - TYPE-SAFE MODELS\")\n", + "print(\" ๐ŸŽฎ OPENSPIEL INTEGRATION - TYPE-SAFE MODELS\")\n", "print(\"=\"*70)\n", "\n", - "print(\"\\n\ud83d\udce4 OpenSpielAction (what you send):\")\n", - "print(\" \" + \"\u2500\" * 64)\n", + "print(\"\\n๐Ÿ“ค OpenSpielAction (what you send):\")\n", + "print(\" \" + \"โ”€\" * 64)\n", "for field in fields(OpenSpielAction):\n", - " print(f\" \u2022 {field.name:20s} : {field.type}\")\n", + " print(f\" โ€ข {field.name:20s} : {field.type}\")\n", "\n", - "print(\"\\n\ud83d\udce5 OpenSpielObservation (what you receive):\")\n", - "print(\" \" + \"\u2500\" * 64)\n", + "print(\"\\n๐Ÿ“ฅ OpenSpielObservation (what you receive):\")\n", + "print(\" \" + \"โ”€\" * 64)\n", "for field in fields(OpenSpielObservation):\n", - " print(f\" \u2022 {field.name:20s} : {field.type}\")\n", + " print(f\" โ€ข {field.name:20s} : {field.type}\")\n", "\n", - "print(\"\\n\ud83d\udcca OpenSpielState (episode metadata):\")\n", - "print(\" \" + \"\u2500\" * 64)\n", + "print(\"\\n๐Ÿ“Š OpenSpielState (episode metadata):\")\n", + "print(\" \" + \"โ”€\" * 64)\n", "for field in fields(OpenSpielState):\n", - " print(f\" \u2022 {field.name:20s} : {field.type}\")\n", + " print(f\" โ€ข {field.name:20s} : {field.type}\")\n", "\n", "print(\"\\n\" + \"=\"*70)\n", - "print(\"\\n\ud83d\udca1 Type safety means:\")\n", - "print(\" \u2705 Your IDE autocompletes these fields\")\n", - "print(\" \u2705 Typos are caught before running\")\n", - "print(\" \u2705 Refactoring is safe\")\n", - "print(\" \u2705 Self-documenting code\\n\")" + "print(\"\\n๐Ÿ’ก Type safety means:\")\n", + "print(\" โœ… Your IDE autocompletes these fields\")\n", + "print(\" โœ… Typos are caught before running\")\n", + "print(\" โœ… Refactoring is safe\")\n", + "print(\" โœ… Self-documenting code\\n\")" ] }, { @@ -540,9 +693,9 @@ "\n", "The client **inherits from HTTPEnvClient** and implements 3 methods:\n", "\n", - "1. `_step_payload()` - Convert action \u2192 JSON\n", - "2. `_parse_result()` - Parse JSON \u2192 typed observation \n", - "3. `_parse_state()` - Parse JSON \u2192 state\n", + "1. `_step_payload()` - Convert action โ†’ JSON\n", + "2. `_parse_result()` - Parse JSON โ†’ typed observation \n", + "3. `_parse_state()` - Parse JSON โ†’ state\n", "\n", "That's it! The base class handles all HTTP communication.\n", "\n", @@ -557,20 +710,20 @@ "\n", "
\n", "\n", - "# \ud83c\udfae Part 6: Interactive Demo\n", + "# ๐ŸŽฎ Part 6: Interactive Demo\n", "\n", "### Now let's BUILD something!\n", "\n", "We'll create a **Catch game** following OpenEnv patterns,
\n", - "then watch **4 different AI policies** compete for the championship! \ud83c\udfc6\n", + "then watch **4 different AI policies** compete for the championship! ๐Ÿ†\n", "\n", "
\n", "\n", "**Get ready for:**\n", - "- \u26a1 Live gameplay visualization\n", - "- \ud83e\udd16 AI policy showdown\n", - "- \ud83d\udcca Real-time learning metrics\n", - "- \ud83c\udfaf Production-ready patterns\n", + "- โšก Live gameplay visualization\n", + "- ๐Ÿค– AI policy showdown\n", + "- ๐Ÿ“Š Real-time learning metrics\n", + "- ๐ŸŽฏ Production-ready patterns\n", "\n", "
" ] @@ -579,18 +732,18 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## The Game: Catch \ud83d\udd34\ud83c\udfd3\n", + "## The Game: Catch ๐Ÿ”ด๐Ÿ“\n", "\n", "\n", "\n", "\n", "\n", @@ -617,7 +770,7 @@ "\n", "
\n", "\n", - "**\ud83c\udfaf Why This Game?**\n", + "**๐ŸŽฏ Why This Game?**\n", "- Simple rules (easy to understand)\n", "- Visual (see what's happening)\n", "- Fast episodes (~5 steps)\n", @@ -629,10 +782,35 @@ }, { "cell_type": "code", - "source": "# Create environment and start a new episode\nenv = CatchEnvironment()\nobs = env.reset()\n\nprint(\"\ud83c\udfae \" + \"=\"*58 + \" \ud83c\udfae\")\nprint(\" INITIAL GAME STATE\")\nprint(\"\ud83c\udfae \" + \"=\"*58 + \" \ud83c\udfae\\n\")\n\n# Visualize the game board\nenv.render()\n\n# Show game info\nprint(f\"\\n\ud83d\udccd Game Info:\")\nprint(f\" \ud83d\udd34 Ball at: column {obs.ball_position[1]} (row {obs.ball_position[0]})\")\nprint(f\" \ud83c\udfd3 Paddle at: column {obs.paddle_position}\")\n\nprint(f\"\\n\ud83d\udcca Observation Details:\")\nprint(f\" \u2022 Legal actions: {obs.legal_actions} \u2192 [LEFT, STAY, RIGHT]\")\nprint(f\" \u2022 Info state size: {len(obs.info_state)} (5\u00d75 grid flattened)\")\nprint(f\" \u2022 Episode done: {obs.done}\")\nprint(f\" \u2022 Current reward: {obs.reward}\")\n\nprint(\"\\n\ud83d\udca1 The ball will fall down each step. Can your policy catch it?\")\nprint(\"=\"*62)", - "metadata": {}, "execution_count": null, - "outputs": [] + "metadata": {}, + "outputs": [], + "source": [ + "# Create environment and start a new episode\n", + "env = CatchEnvironment()\n", + "obs = env.reset()\n", + "\n", + "print(\"๐ŸŽฎ \" + \"=\"*58 + \" ๐ŸŽฎ\")\n", + "print(\" INITIAL GAME STATE\")\n", + "print(\"๐ŸŽฎ \" + \"=\"*58 + \" ๐ŸŽฎ\\n\")\n", + "\n", + "# Visualize the game board\n", + "env.render()\n", + "\n", + "# Show game info\n", + "print(f\"\\n๐Ÿ“ Game Info:\")\n", + "print(f\" ๐Ÿ”ด Ball at: column {obs.ball_position[1]} (row {obs.ball_position[0]})\")\n", + "print(f\" ๐Ÿ“ Paddle at: column {obs.paddle_position}\")\n", + "\n", + "print(f\"\\n๐Ÿ“Š Observation Details:\")\n", + "print(f\" โ€ข Legal actions: {obs.legal_actions} โ†’ [LEFT, STAY, RIGHT]\")\n", + "print(f\" โ€ข Info state size: {len(obs.info_state)} (5ร—5 grid flattened)\")\n", + "print(f\" โ€ข Episode done: {obs.done}\")\n", + "print(f\" โ€ข Current reward: {obs.reward}\")\n", + "\n", + "print(\"\\n๐Ÿ’ก The ball will fall down each step. Can your policy catch it?\")\n", + "print(\"=\"*62)" + ] }, { "cell_type": "code", @@ -669,13 +847,13 @@ " Catch game following OpenEnv's Environment pattern.\n", " \n", " In production:\n", - " \u2022 Runs in Docker container\n", - " \u2022 Accessed via HTTPEnvClient\n", - " \u2022 Exposed via FastAPI server\n", + " โ€ข Runs in Docker container\n", + " โ€ข Accessed via HTTPEnvClient\n", + " โ€ข Exposed via FastAPI server\n", " \n", " For this demo:\n", - " \u2022 We run it locally to see internals\n", - " \u2022 But the structure is identical!\n", + " โ€ข We run it locally to see internals\n", + " โ€ข But the structure is identical!\n", " \"\"\"\n", " \n", " def __init__(self, grid_size=5):\n", @@ -737,24 +915,24 @@ " line = \" \"\n", " for col in range(self.grid_size):\n", " if row == self.ball_row and col == self.ball_col:\n", - " line += \"\ud83d\udd34 \"\n", + " line += \"๐Ÿ”ด \"\n", " elif row == self.grid_size - 1 and col == self.paddle_col:\n", - " line += \"\ud83c\udfd3 \"\n", + " line += \"๐Ÿ“ \"\n", " else:\n", - " line += \"\u2b1c \"\n", + " line += \"โฌœ \"\n", " print(line)\n", "\n", "\n", - "print(\"\ud83c\udf89 \" + \"=\"*64 + \" \ud83c\udf89\")\n", - "print(\" \u2705 Environment Created Following OpenEnv Pattern!\")\n", - "print(\"\ud83c\udf89 \" + \"=\"*64 + \" \ud83c\udf89\")\n", - "print(\"\\n\ud83d\udccb What we just built:\")\n", - "print(\" \u2022 reset() \u2192 CatchObservation (type-safe!)\")\n", - "print(\" \u2022 step(action) \u2192 CatchObservation (type-safe!)\")\n", - "print(\" \u2022 render() \u2192 Visual display\")\n", - "print(\"\\n\ud83d\ude80 In production: This would run in Docker + FastAPI\")\n", + "print(\"๐ŸŽ‰ \" + \"=\"*64 + \" ๐ŸŽ‰\")\n", + "print(\" โœ… Environment Created Following OpenEnv Pattern!\")\n", + "print(\"๐ŸŽ‰ \" + \"=\"*64 + \" ๐ŸŽ‰\")\n", + "print(\"\\n๐Ÿ“‹ What we just built:\")\n", + "print(\" โ€ข reset() โ†’ CatchObservation (type-safe!)\")\n", + "print(\" โ€ข step(action) โ†’ CatchObservation (type-safe!)\")\n", + "print(\" โ€ข render() โ†’ Visual display\")\n", + "print(\"\\n๐Ÿš€ In production: This would run in Docker + FastAPI\")\n", "print(\" But the structure is EXACTLY the same!\")\n", - "print(\"\\n\ud83d\udca1 This is your blueprint for creating ANY OpenEnv environment!\\n\")" + "print(\"\\n๐Ÿ’ก This is your blueprint for creating ANY OpenEnv environment!\\n\")" ] }, { @@ -773,7 +951,7 @@ "---\n", "\n", "\n", - "# Part 7: Four Policies \ud83e\udd16\n", + "# Part 7: Four Policies ๐Ÿค–\n", "\n", "
\n", "\n", @@ -786,22 +964,22 @@ "
\n", "\n", "\n", - "\n", + "\n", "\n", "\n", "\n", "\n", - "\n", + "\n", "\n", "\n", "\n", "\n", - "\n", + "\n", "\n", "\n", "\n", "\n", - "\n", + "\n", "\n", "\n", "\n", @@ -816,7 +994,7 @@ "source": [ "---\n", "\n", - "# Part 7: Four Policies \ud83e\udd16\n", + "# Part 7: Four Policies ๐Ÿค–\n", "\n", "
\n", "\n", @@ -829,22 +1007,22 @@ "
\n", "\n", "\n", - "\n", + "\n", "\n", "\n", "\n", "\n", - "\n", + "\n", "\n", "\n", "\n", "\n", - "\n", + "\n", "\n", "\n", "\n", "\n", - "\n", + "\n", "\n", "\n", "\n", @@ -865,7 +1043,7 @@ "\n", "class RandomPolicy:\n", " \"\"\"Baseline: Pure random guessing.\"\"\"\n", - " name = \"\ud83c\udfb2 Random Guesser\"\n", + " name = \"๐ŸŽฒ Random Guesser\"\n", " \n", " def select_action(self, obs: CatchObservation) -> int:\n", " return random.choice(obs.legal_actions)\n", @@ -873,7 +1051,7 @@ "\n", "class AlwaysStayPolicy:\n", " \"\"\"Bad strategy: Never moves.\"\"\"\n", - " name = \"\ud83d\uded1 Always Stay\"\n", + " name = \"๐Ÿ›‘ Always Stay\"\n", " \n", " def select_action(self, obs: CatchObservation) -> int:\n", " return 1 # STAY\n", @@ -881,7 +1059,7 @@ "\n", "class SmartPolicy:\n", " \"\"\"Optimal: Move paddle toward ball.\"\"\"\n", - " name = \"\ud83e\udde0 Smart Heuristic\"\n", + " name = \"๐Ÿง  Smart Heuristic\"\n", " \n", " def select_action(self, obs: CatchObservation) -> int:\n", " ball_col = obs.ball_position[1]\n", @@ -897,7 +1075,7 @@ "\n", "class LearningPolicy:\n", " \"\"\"Simulated RL: Epsilon-greedy exploration.\"\"\"\n", - " name = \"\ud83d\udcc8 Learning Agent\"\n", + " name = \"๐Ÿ“ˆ Learning Agent\"\n", " \n", " def __init__(self):\n", " self.steps = 0\n", @@ -923,16 +1101,16 @@ " return 1\n", "\n", "\n", - "print(\"\ud83e\udd16 \" + \"=\"*64 + \" \ud83e\udd16\")\n", - "print(\" \u2705 4 Policies Created!\")\n", - "print(\"\ud83e\udd16 \" + \"=\"*64 + \" \ud83e\udd16\\n\")\n", + "print(\"๐Ÿค– \" + \"=\"*64 + \" ๐Ÿค–\")\n", + "print(\" โœ… 4 Policies Created!\")\n", + "print(\"๐Ÿค– \" + \"=\"*64 + \" ๐Ÿค–\\n\")\n", "\n", "policies = [RandomPolicy(), AlwaysStayPolicy(), SmartPolicy(), LearningPolicy()]\n", "for i, policy in enumerate(policies, 1):\n", " print(f\" {i}. {policy.name}\")\n", "\n", - "print(\"\\n\ud83d\udca1 Each policy represents a different approach to solving the game!\")\n", - "print(\" Let's see who performs best! \ud83c\udfc6\\n\")" + "print(\"\\n๐Ÿ’ก Each policy represents a different approach to solving the game!\")\n", + "print(\" Let's see who performs best! ๐Ÿ†\\n\")" ] }, { @@ -958,15 +1136,15 @@ " \n", " if visualize:\n", " print(f\"\\n{'='*60}\")\n", - " print(f\" \ud83c\udfae {policy.name}\")\n", - " print(f\" \ud83d\udd34 Ball will fall at column: {obs.ball_position[1]}\")\n", + " print(f\" ๐ŸŽฎ {policy.name}\")\n", + " print(f\" ๐Ÿ”ด Ball will fall at column: {obs.ball_position[1]}\")\n", " print('='*60 + '\\n')\n", " env.render()\n", " time.sleep(delay)\n", " \n", " total_reward = 0\n", " step = 0\n", - " action_names = [\"\u2b05\ufe0f LEFT\", \"\ud83d\uded1 STAY\", \"\u27a1\ufe0f RIGHT\"]\n", + " action_names = [\"โฌ…๏ธ LEFT\", \"๐Ÿ›‘ STAY\", \"โžก๏ธ RIGHT\"]\n", " \n", " # THE RL LOOP\n", " while not obs.done:\n", @@ -980,14 +1158,14 @@ " total_reward += obs.reward\n", " \n", " if visualize:\n", - " print(f\"\\n\ud83d\udccd Step {step + 1}: {action_names[action]}\")\n", + " print(f\"\\n๐Ÿ“ Step {step + 1}: {action_names[action]}\")\n", " env.render()\n", " time.sleep(delay)\n", " \n", " step += 1\n", " \n", " if visualize:\n", - " result = \"\ud83c\udf89 CAUGHT!\" if total_reward > 0 else \"\ud83d\ude22 MISSED\"\n", + " result = \"๐ŸŽ‰ CAUGHT!\" if total_reward > 0 else \"๐Ÿ˜ข MISSED\"\n", " print(f\"\\n{'='*60}\")\n", " print(f\" {result} Reward: {total_reward}\")\n", " print('='*60)\n", @@ -1008,7 +1186,7 @@ "---\n", "\n", "\n", - "# Part 8: Policy Competition! \ud83c\udfc6\n", + "# Part 8: Policy Competition! ๐Ÿ†\n", "\n", "
\n", "\n", @@ -1023,7 +1201,7 @@ "source": [ "---\n", "\n", - "# Part 8: Policy Competition! \ud83c\udfc6\n", + "# Part 8: Policy Competition! ๐Ÿ†\n", "\n", "
\n", "\n", @@ -1047,49 +1225,49 @@ " LearningPolicy(),\n", " ]\n", " \n", - " print(\"\\n\ud83c\udfc6 \" + \"=\"*66 + \" \ud83c\udfc6\")\n", + " print(\"\\n๐Ÿ† \" + \"=\"*66 + \" ๐Ÿ†\")\n", " print(f\" POLICY SHOWDOWN - {num_episodes} Episodes Each\")\n", - " print(\"\ud83c\udfc6 \" + \"=\"*66 + \" \ud83c\udfc6\\n\")\n", + " print(\"๐Ÿ† \" + \"=\"*66 + \" ๐Ÿ†\\n\")\n", " \n", " results = []\n", " for policy in policies:\n", - " print(f\"\u26a1 Testing {policy.name}...\", end=\" \")\n", + " print(f\"โšก Testing {policy.name}...\", end=\" \")\n", " env = CatchEnvironment()\n", " successes = sum(run_episode(env, policy, visualize=False) \n", " for _ in range(num_episodes))\n", " success_rate = (successes / num_episodes) * 100\n", " results.append((policy.name, success_rate, successes))\n", - " print(f\"\u2713 Done!\")\n", + " print(f\"โœ“ Done!\")\n", " \n", " print(\"\\n\" + \"=\"*70)\n", - " print(\" \ud83d\udcca FINAL RESULTS\")\n", + " print(\" ๐Ÿ“Š FINAL RESULTS\")\n", " print(\"=\"*70 + \"\\n\")\n", " \n", " # Sort by success rate (descending)\n", " results.sort(key=lambda x: x[1], reverse=True)\n", " \n", " # Award medals to top 3\n", - " medals = [\"\ud83e\udd47\", \"\ud83e\udd48\", \"\ud83e\udd49\", \" \"]\n", + " medals = [\"๐Ÿฅ‡\", \"๐Ÿฅˆ\", \"๐Ÿฅ‰\", \" \"]\n", " \n", " for i, (name, rate, successes) in enumerate(results):\n", " medal = medals[i]\n", - " bar = \"\u2588\" * int(rate / 2)\n", + " bar = \"โ–ˆ\" * int(rate / 2)\n", " print(f\"{medal} {name:25s} [{bar:<50}] {rate:5.1f}% ({successes}/{num_episodes})\")\n", " \n", " print(\"\\n\" + \"=\"*70)\n", - " print(\"\\n\u2728 Key Insights:\")\n", - " print(\" \u2022 Random (~20%): Baseline - pure luck \ud83c\udfb2\")\n", - " print(\" \u2022 Always Stay (~20%): Bad strategy - stays center \ud83d\uded1\")\n", - " print(\" \u2022 Smart (100%): Optimal - perfect play! \ud83e\udde0\")\n", - " print(\" \u2022 Learning (~85%): Improves over time \ud83d\udcc8\")\n", - " print(\"\\n\ud83c\udf93 This is Reinforcement Learning in action:\")\n", + " print(\"\\nโœจ Key Insights:\")\n", + " print(\" โ€ข Random (~20%): Baseline - pure luck ๐ŸŽฒ\")\n", + " print(\" โ€ข Always Stay (~20%): Bad strategy - stays center ๐Ÿ›‘\")\n", + " print(\" โ€ข Smart (100%): Optimal - perfect play! ๐Ÿง \")\n", + " print(\" โ€ข Learning (~85%): Improves over time ๐Ÿ“ˆ\")\n", + " print(\"\\n๐ŸŽ“ This is Reinforcement Learning in action:\")\n", " print(\" 1. Start with exploration (trying random things)\")\n", " print(\" 2. Learn from rewards (what works, what doesn't)\")\n", " print(\" 3. Converge to optimal behavior (smart strategy)\")\n", - " print(\"\\n\ud83c\udfaf The Learning Agent gets smarter with every episode!\\n\")\n", + " print(\"\\n๐ŸŽฏ The Learning Agent gets smarter with every episode!\\n\")\n", "\n", "# Run the epic competition!\n", - "print(\"\ud83c\udfae Starting the showdown...\")\n", + "print(\"๐ŸŽฎ Starting the showdown...\")\n", "evaluate_policies(num_episodes=50)" ] }, @@ -1100,7 +1278,7 @@ "---\n", "\n", "\n", - "# Part 9: Using Real OpenSpiel \ud83c\udfae\n", + "# Part 9: Using Real OpenSpiel ๐ŸŽฎ\n", "\n", "
\n", "\n", @@ -1129,8 +1307,8 @@ "\n", "
\n", "\n", - "\n", - "\n", + "\n", + "\n", "\n", "\n", "\n", @@ -1139,7 +1317,7 @@ "\n", "
\n", "\n", "```\n", - "\u2b1c \u2b1c \ud83d\udd34 \u2b1c \u2b1c \n", - "\u2b1c \u2b1c \u2b1c \u2b1c \u2b1c Ball\n", - "\u2b1c \u2b1c \u2b1c \u2b1c \u2b1c falls\n", - "\u2b1c \u2b1c \u2b1c \u2b1c \u2b1c down\n", - "\u2b1c \u2b1c \ud83c\udfd3 \u2b1c \u2b1c \n", + "โฌœ โฌœ ๐Ÿ”ด โฌœ โฌœ \n", + "โฌœ โฌœ โฌœ โฌœ โฌœ Ball\n", + "โฌœ โฌœ โฌœ โฌœ โฌœ falls\n", + "โฌœ โฌœ โฌœ โฌœ โฌœ down\n", + "โฌœ โฌœ ๐Ÿ“ โฌœ โฌœ \n", " Paddle\n", "```\n", "\n", @@ -598,18 +751,18 @@ "\n", "\n", "**Rules:**\n", - "- 5\u00d75 grid\n", + "- 5ร—5 grid\n", "- Ball falls from random column\n", "- Move paddle to catch it\n", "\n", "**Actions:**\n", - "- `0` = Move LEFT \u2b05\ufe0f\n", - "- `1` = STAY \ud83d\uded1\n", - "- `2` = Move RIGHT \u27a1\ufe0f\n", + "- `0` = Move LEFT โฌ…๏ธ\n", + "- `1` = STAY ๐Ÿ›‘\n", + "- `2` = Move RIGHT โžก๏ธ\n", "\n", "**Reward:**\n", - "- `+1` if caught \ud83c\udf89\n", - "- `0` if missed \ud83d\ude22\n", + "- `+1` if caught ๐ŸŽ‰\n", + "- `0` if missed ๐Ÿ˜ข\n", "\n", "
Expected Performance
\ud83c\udfb2 Random๐ŸŽฒ RandomPick random action every step~20% (pure luck)
\ud83d\uded1 Always Stay๐Ÿ›‘ Always StayNever move, hope ball lands in center~20% (terrible!)
\ud83e\udde0 Smart๐Ÿง  SmartMove paddle toward ball100% (optimal!)
\ud83d\udcc8 Learning๐Ÿ“ˆ LearningStart random, learn smart strategy~85% (improves over time)
Expected Performance
\ud83c\udfb2 Random๐ŸŽฒ RandomPick random action every step~20% (pure luck)
\ud83d\uded1 Always Stay๐Ÿ›‘ Always StayNever move, hope ball lands in center~20% (terrible!)
\ud83e\udde0 Smart๐Ÿง  SmartMove paddle toward ball100% (optimal!)
\ud83d\udcc8 Learning๐Ÿ“ˆ LearningStart random, learn smart strategy~85% (improves over time)
Type Safety\u2705 Dataclasses\u2705 Dataclassesโœ… Dataclassesโœ… Dataclasses
API
\n", "\n", - "**\ud83c\udfaf Same structure, production features!**\n", + "**๐ŸŽฏ Same structure, production features!**\n", "\n", "
\n", "\n", @@ -1166,10 +1344,10 @@ "\n", "
\n", "\n", - "**\ud83c\udfae 6 Games Available:**\n", + "**๐ŸŽฎ 6 Games Available:**\n", "\n", "1. `\"catch\"` - What we just built!\n", - "2. `\"tic_tac_toe\"` - Classic 3\u00d73\n", + "2. `\"tic_tac_toe\"` - Classic 3ร—3\n", "3. `\"kuhn_poker\"` - Imperfect information poker\n", "4. `\"cliff_walking\"` - Grid navigation\n", "5. `\"2048\"` - Tile puzzle\n", @@ -1187,7 +1365,7 @@ "---\n", "\n", "\n", - "# Part 10: Create Your Own Integration \ud83d\udee0\ufe0f\n", + "# Part 10: Create Your Own Integration ๐Ÿ› ๏ธ\n", "\n", "
\n", "\n", @@ -1291,7 +1469,7 @@ "\n", "
\n", "\n", - "### \ud83c\udf93 Examples to Study\n", + "### ๐ŸŽ“ Examples to Study\n", "\n", "OpenEnv includes 3 complete examples:\n", "\n", @@ -1309,7 +1487,7 @@ " - Shows complex use case\n", " - Security considerations\n", "\n", - "**\ud83d\udca1 Study these to understand the patterns!**\n", + "**๐Ÿ’ก Study these to understand the patterns!**\n", "\n", "
" ] @@ -1323,7 +1501,7 @@ "\n", "
\n", "\n", - "# \ud83c\udf93 Summary: Your Journey\n", + "# ๐ŸŽ“ Summary: Your Journey\n", "\n", "
" ] @@ -1336,7 +1514,7 @@ "\n", "
\n", "\n", - "# \ud83c\udf93 Summary: Your Journey\n", + "# ๐ŸŽ“ Summary: Your Journey\n", "\n", "
" ] @@ -1351,19 +1529,19 @@ "\n", "\n", "\n", - "### \ud83d\udcda Concepts\n", + "### ๐Ÿ“š Concepts\n", "\n", - "\u2705 **RL Fundamentals**\n", + "โœ… **RL Fundamentals**\n", "- The observe-act-reward loop\n", "- What makes good policies\n", "- Exploration vs exploitation\n", "\n", - "\u2705 **OpenEnv Architecture**\n", + "โœ… **OpenEnv Architecture**\n", "- Client-server separation\n", "- Type-safe contracts\n", "- HTTP communication layer\n", "\n", - "\u2705 **Production Patterns**\n", + "โœ… **Production Patterns**\n", "- Docker isolation\n", "- API design\n", "- Reproducible deployments\n", @@ -1371,19 +1549,19 @@ "\n", "\n", "\n", - "### \ud83d\udee0\ufe0f Skills\n", + "### ๐Ÿ› ๏ธ Skills\n", "\n", - "\u2705 **Using Environments**\n", + "โœ… **Using Environments**\n", "- Import OpenEnv clients\n", "- Call reset/step/state\n", "- Work with typed observations\n", "\n", - "\u2705 **Building Environments**\n", + "โœ… **Building Environments**\n", "- Define type-safe models\n", "- Implement Environment class\n", "- Create HTTPEnvClient\n", "\n", - "\u2705 **Testing & Debugging**\n", + "โœ… **Testing & Debugging**\n", "- Compare policies\n", "- Visualize episodes\n", "- Measure performance\n", @@ -1408,45 +1586,45 @@ "\n", "\n", "Type Safety\n", - "\u274c Arrays, dicts\n", - "\u2705 Dataclasses\n", - "\ud83c\udfc6 OpenEnv\n", + "โŒ Arrays, dicts\n", + "โœ… Dataclasses\n", + "๐Ÿ† OpenEnv\n", "\n", "\n", "Isolation\n", - "\u274c Same process\n", - "\u2705 Docker\n", - "\ud83c\udfc6 OpenEnv\n", + "โŒ Same process\n", + "โœ… Docker\n", + "๐Ÿ† OpenEnv\n", "\n", "\n", "Deployment\n", - "\u274c Manual setup\n", - "\u2705 K8s-ready\n", - "\ud83c\udfc6 OpenEnv\n", + "โŒ Manual setup\n", + "โœ… K8s-ready\n", + "๐Ÿ† OpenEnv\n", "\n", "\n", "Language\n", - "\u274c Python only\n", - "\u2705 Any (HTTP)\n", - "\ud83c\udfc6 OpenEnv\n", + "โŒ Python only\n", + "โœ… Any (HTTP)\n", + "๐Ÿ† OpenEnv\n", "\n", "\n", "Reproducibility\n", - "\u274c \"Works on my machine\"\n", - "\u2705 Same everywhere\n", - "\ud83c\udfc6 OpenEnv\n", + "โŒ \"Works on my machine\"\n", + "โœ… Same everywhere\n", + "๐Ÿ† OpenEnv\n", "\n", "\n", "Community\n", - "\u2705 Large ecosystem\n", - "\ud83d\udfe1 Growing\n", - "\ud83e\udd1d Both!\n", + "โœ… Large ecosystem\n", + "๐ŸŸก Growing\n", + "๐Ÿค Both!\n", "\n", "\n", "\n", "
\n", "\n", - "**\ud83c\udfaf The Bottom Line**\n", + "**๐ŸŽฏ The Bottom Line**\n", "\n", "OpenEnv brings **production engineering** to RL:\n", "- Same environments work locally and in production\n", @@ -1464,33 +1642,33 @@ "metadata": {}, "source": [ "\n", - "## \ud83d\udcda Resources\n", + "## ๐Ÿ“š Resources\n", "\n", "
\n", "\n", - "### \ud83d\udd17 Essential Links\n", + "### ๐Ÿ”— Essential Links\n", "\n", - "- **\ud83c\udfe0 OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n", - "- **\ud83c\udfae OpenSpiel**: https://github.com/google-deepmind/open_spiel\n", - "- **\u26a1 FastAPI Docs**: https://fastapi.tiangolo.com/\n", - "- **\ud83d\udc33 Docker Guide**: https://docs.docker.com/get-started/\n", - "- **\ud83d\udd25 PyTorch**: https://pytorch.org/\n", + "- **๐Ÿ  OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n", + "- **๐ŸŽฎ OpenSpiel**: https://github.com/google-deepmind/open_spiel\n", + "- **โšก FastAPI Docs**: https://fastapi.tiangolo.com/\n", + "- **๐Ÿณ Docker Guide**: https://docs.docker.com/get-started/\n", + "- **๐Ÿ”ฅ PyTorch**: https://pytorch.org/\n", "\n", - "### \ud83d\udcd6 Documentation Deep Dives\n", + "### ๐Ÿ“– Documentation Deep Dives\n", "\n", "- **Environment Creation Guide**: `src/envs/README.md`\n", "- **OpenSpiel Integration**: `src/envs/openspiel_env/README.md`\n", "- **Example Scripts**: `examples/`\n", "- **RFC 001**: [Baseline API Specs](https://github.com/meta-pytorch/OpenEnv/pull/26)\n", "\n", - "### \ud83c\udf93 Community & Support\n", + "### ๐ŸŽ“ Community & Support\n", "\n", "**Supported by amazing organizations:**\n", - "- \ud83d\udd25 Meta PyTorch\n", - "- \ud83e\udd17 Hugging Face\n", - "- \u26a1 Unsloth AI\n", - "- \ud83c\udf1f Reflection AI\n", - "- \ud83d\ude80 And many more!\n", + "- ๐Ÿ”ฅ Meta PyTorch\n", + "- ๐Ÿค— Hugging Face\n", + "- โšก Unsloth AI\n", + "- ๐ŸŒŸ Reflection AI\n", + "- ๐Ÿš€ And many more!\n", "\n", "**License**: BSD 3-Clause (very permissive!)\n", "\n", @@ -1500,46 +1678,46 @@ "\n", "---\n", "\n", - "### \ud83c\udf08 What's Next?\n", + "### ๐ŸŒˆ What's Next?\n", "\n", - "1. \u2b50 **Star the repo** to show support and stay updated\n", - "2. \ud83d\udd04 **Try modifying** the Catch game (make it harder? bigger grid?)\n", - "3. \ud83c\udfae **Explore** other OpenSpiel games\n", - "4. \ud83d\udee0\ufe0f **Build** your own environment integration\n", - "5. \ud83d\udcac **Share** what you build with the community!" + "1. โญ **Star the repo** to show support and stay updated\n", + "2. ๐Ÿ”„ **Try modifying** the Catch game (make it harder? bigger grid?)\n", + "3. ๐ŸŽฎ **Explore** other OpenSpiel games\n", + "4. ๐Ÿ› ๏ธ **Build** your own environment integration\n", + "5. ๐Ÿ’ฌ **Share** what you build with the community!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## \ud83d\udcda Resources\n", + "## ๐Ÿ“š Resources\n", "\n", "
\n", "\n", - "### \ud83d\udd17 Essential Links\n", + "### ๐Ÿ”— Essential Links\n", "\n", - "- **\ud83c\udfe0 OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n", - "- **\ud83c\udfae OpenSpiel**: https://github.com/google-deepmind/open_spiel\n", - "- **\u26a1 FastAPI Docs**: https://fastapi.tiangolo.com/\n", - "- **\ud83d\udc33 Docker Guide**: https://docs.docker.com/get-started/\n", - "- **\ud83d\udd25 PyTorch**: https://pytorch.org/\n", + "- **๐Ÿ  OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n", + "- **๐ŸŽฎ OpenSpiel**: https://github.com/google-deepmind/open_spiel\n", + "- **โšก FastAPI Docs**: https://fastapi.tiangolo.com/\n", + "- **๐Ÿณ Docker Guide**: https://docs.docker.com/get-started/\n", + "- **๐Ÿ”ฅ PyTorch**: https://pytorch.org/\n", "\n", - "### \ud83d\udcd6 Documentation Deep Dives\n", + "### ๐Ÿ“– Documentation Deep Dives\n", "\n", "- **Environment Creation Guide**: `src/envs/README.md`\n", "- **OpenSpiel Integration**: `src/envs/openspiel_env/README.md`\n", "- **Example Scripts**: `examples/`\n", "- **RFC 001**: [Baseline API Specs](https://github.com/meta-pytorch/OpenEnv/pull/26)\n", "\n", - "### \ud83c\udf93 Community & Support\n", + "### ๐ŸŽ“ Community & Support\n", "\n", "**Supported by amazing organizations:**\n", - "- \ud83d\udd25 Meta PyTorch\n", - "- \ud83e\udd17 Hugging Face\n", - "- \u26a1 Unsloth AI\n", - "- \ud83c\udf1f Reflection AI\n", - "- \ud83d\ude80 And many more!\n", + "- ๐Ÿ”ฅ Meta PyTorch\n", + "- ๐Ÿค— Hugging Face\n", + "- โšก Unsloth AI\n", + "- ๐ŸŒŸ Reflection AI\n", + "- ๐Ÿš€ And many more!\n", "\n", "**License**: BSD 3-Clause (very permissive!)\n", "\n", @@ -1549,13 +1727,13 @@ "\n", "---\n", "\n", - "### \ud83c\udf08 What's Next?\n", + "### ๐ŸŒˆ What's Next?\n", "\n", - "1. \u2b50 **Star the repo** to show support and stay updated\n", - "2. \ud83d\udd04 **Try modifying** the Catch game (make it harder? bigger grid?)\n", - "3. \ud83c\udfae **Explore** other OpenSpiel games\n", - "4. \ud83d\udee0\ufe0f **Build** your own environment integration\n", - "5. \ud83d\udcac **Share** what you build with the community!" + "1. โญ **Star the repo** to show support and stay updated\n", + "2. ๐Ÿ”„ **Try modifying** the Catch game (make it harder? bigger grid?)\n", + "3. ๐ŸŽฎ **Explore** other OpenSpiel games\n", + "4. ๐Ÿ› ๏ธ **Build** your own environment integration\n", + "5. ๐Ÿ’ฌ **Share** what you build with the community!" ] }, { @@ -1566,38 +1744,38 @@ "\n", "
\n", "\n", - "# \ud83c\udf89 Congratulations! You Did It! \ud83c\udf89\n", + "# ๐ŸŽ‰ Congratulations! You Did It! ๐ŸŽ‰\n", "\n", "### You're now an OpenEnv expert!\n", "\n", "
\n", "\n", - "## \u2705 What You've Mastered:\n", + "## โœ… What You've Mastered:\n", "\n", - "**\ud83e\udde0 Concepts**\n", + "**๐Ÿง  Concepts**\n", "- How RL works (the observe-act-reward loop)\n", "- Why OpenEnv matters (production-ready RL)\n", "- How to use existing environments\n", "\n", - "**\ud83d\udee0\ufe0f Practical Skills**\n", + "**๐Ÿ› ๏ธ Practical Skills**\n", "- Creating new integrations\n", "- Building type-safe environments\n", "- Deploying to production\n", "\n", - "**\ud83c\udfaf Real Experience**\n", + "**๐ŸŽฏ Real Experience**\n", "- Built a complete RL environment\n", "- Tested multiple policies\n", "- Watched learning happen in real-time!\n", "\n", "---\n", "\n", - "### Now go build something amazing! \ud83d\ude80\n", + "### Now go build something amazing! ๐Ÿš€\n", "\n", "**Welcome to the future of RL with PyTorch & OpenEnv**\n", "\n", "
\n", "\n", - "[![Star on GitHub](https://img.shields.io/badge/\u2b50_Star_on_GitHub-gray?style=for-the-badge)](https://github.com/meta-pytorch/OpenEnv)\n", + "[![Star on GitHub](https://img.shields.io/badge/โญ_Star_on_GitHub-gray?style=for-the-badge)](https://github.com/meta-pytorch/OpenEnv)\n", "\n", "
\n", "\n", @@ -1605,16 +1783,16 @@ "\n", "
\n", "\n", - "## \ud83c\udf1f Want to Learn More?\n", + "## ๐ŸŒŸ Want to Learn More?\n", "\n", - "- \ud83d\udcd6 Check out the [docs](https://github.com/meta-pytorch/OpenEnv)\n", - "- \ud83c\udfae Try the other example games\n", - "- \ud83d\udcac Join the community discussions\n", - "- \ud83d\udee0\ufe0f Build your own integration\n", - "- \ud83d\ude80 Deploy to production\n", - "- \u2b50 Star the repo to stay updated!\n", + "- ๐Ÿ“– Check out the [docs](https://github.com/meta-pytorch/OpenEnv)\n", + "- ๐ŸŽฎ Try the other example games\n", + "- ๐Ÿ’ฌ Join the community discussions\n", + "- ๐Ÿ› ๏ธ Build your own integration\n", + "- ๐Ÿš€ Deploy to production\n", + "- โญ Star the repo to stay updated!\n", "\n", - "**Happy coding! \ud83c\udf8a**\n", + "**Happy coding! ๐ŸŽŠ**\n", "\n", "
" ] @@ -1641,4 +1819,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +} From bb75d0e778466bca8eb3279b25917ef43789b375 Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 14:17:29 -0700 Subject: [PATCH 18/19] Update OpenEnv_Tutorial.ipynb --- examples/OpenEnv_Tutorial.ipynb | 341 +++----------------------------- 1 file changed, 30 insertions(+), 311 deletions(-) diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index 68a4568..0b2b481 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -122,72 +122,6 @@ "---" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Detect environment\n", - "try:\n", - " import google.colab\n", - " IN_COLAB = True\n", - " print(\"๐ŸŒ Running in Google Colab - Perfect!\")\n", - "except ImportError:\n", - " IN_COLAB = False\n", - " print(\"๐Ÿ’ป Running locally - Nice!\")\n", - "\n", - "if IN_COLAB:\n", - " print(\"\\n๐Ÿ“ฆ Cloning OpenEnv repository...\")\n", - " !git clone https://github.com/meta-pytorch/OpenEnv.git > /dev/null 2>&1\n", - " %cd OpenEnv\n", - " \n", - " print(\"๐Ÿ“š Installing dependencies (this takes ~10 seconds)...\")\n", - " !pip install -q fastapi uvicorn requests\n", - " \n", - " import sys\n", - " sys.path.insert(0, './src')\n", - " print(\"\\nโœ… Setup complete! Everything is ready to go! ๐ŸŽ‰\")\n", - "else:\n", - " import sys\n", - " from pathlib import Path\n", - " sys.path.insert(0, str(Path.cwd().parent / 'src'))\n", - " print(\"โœ… Using local OpenEnv installation\")\n", - "\n", - "print(\"\\n๐Ÿš€ Ready to explore OpenEnv and build amazing things!\")\n", - "print(\"๐Ÿ’ก Tip: Run cells top-to-bottom for the best experience.\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "\n", - "# Part 1: RL in 60 Seconds โฑ๏ธ\n", - "\n", - "
\n", - "\n", - "**Reinforcement Learning is simpler than you think.**\n", - "\n", - "It's just a loop:\n", - "\n", - "```\n", - "while not done:\n", - " observation = environment.observe()\n", - " action = policy.choose(observation)\n", - " reward = environment.step(action)\n", - " policy.learn(reward)\n", - "```\n", - "\n", - "That's it. That's RL.\n", - "\n", - "
\n", - "\n", - "Let's see it in action:" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -387,7 +321,6 @@ "source": [ "---\n", "\n", - "\n", "# Part 3: Setup ๐Ÿ› ๏ธ\n", "\n", "
\n", @@ -400,27 +333,44 @@ ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ - "---\n", - "\n", - "# Part 3: Setup ๐Ÿ› ๏ธ\n", - "\n", - "
\n", - "\n", - "**Running in Colab?** This cell will clone OpenEnv and install dependencies automatically.\n", + "# Detect environment\n", + "try:\n", + " import google.colab\n", + " IN_COLAB = True\n", + " print(\"๐ŸŒ Running in Google Colab - Perfect!\")\n", + "except ImportError:\n", + " IN_COLAB = False\n", + " print(\"๐Ÿ’ป Running locally - Nice!\")\n", "\n", - "**Running locally?** Make sure you're in the OpenEnv directory.\n", + "if IN_COLAB:\n", + " print(\"\\n๐Ÿ“ฆ Cloning OpenEnv repository...\")\n", + " !git clone https://github.com/meta-pytorch/OpenEnv.git > /dev/null 2>&1\n", + " %cd OpenEnv\n", + " \n", + " print(\"๐Ÿ“š Installing dependencies (this takes ~10 seconds)...\")\n", + " !pip install -q fastapi uvicorn requests\n", + " \n", + " import sys\n", + " sys.path.insert(0, './src')\n", + " print(\"\\nโœ… Setup complete! Everything is ready to go! ๐ŸŽ‰\")\n", + "else:\n", + " import sys\n", + " from pathlib import Path\n", + " sys.path.insert(0, str(Path.cwd().parent / 'src'))\n", + " print(\"โœ… Using local OpenEnv installation\")\n", "\n", - "
" + "print(\"\\n๐Ÿš€ Ready to explore OpenEnv and build amazing things!\")\n", + "print(\"๐Ÿ’ก Tip: Run cells top-to-bottom for the best experience.\\n\")" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ "---\n", "\n", @@ -502,51 +452,6 @@ "print(\"๐ŸŽฏ You focus on RL, OpenEnv handles the infrastructure.\\n\")" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "\n", - "# Part 5: Example Integration - OpenSpiel ๐ŸŽฎ\n", - "\n", - "
\n", - "\n", - "## What is OpenSpiel?\n", - "\n", - "**OpenSpiel** is a library from DeepMind with **70+ game environments** for RL research.\n", - "\n", - "## OpenEnv's Integration\n", - "\n", - "We've wrapped **6 OpenSpiel games** following the OpenEnv pattern:\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "\n", - "**๐ŸŽฏ Single-Player**\n", - "1. **Catch** - Catch falling ball\n", - "2. **Cliff Walking** - Navigate grid\n", - "3. **2048** - Tile puzzle\n", - "4. **Blackjack** - Card game\n", - "\n", - "\n", - "\n", - "**๐Ÿ‘ฅ Multi-Player**\n", - "5. **Tic-Tac-Toe** - Classic 3ร—3\n", - "6. **Kuhn Poker** - Imperfect info poker\n", - "\n", - "
\n", - "\n", - "This shows how OpenEnv can wrap **any** existing RL library!\n", - "\n", - "
" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -942,52 +847,6 @@ "### Test the Environment" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "---\n", - "\n", - "\n", - "# Part 7: Four Policies ๐Ÿค–\n", - "\n", - "
\n", - "\n", - "## Let's test 4 different AI strategies:\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
PolicyStrategyExpected Performance
๐ŸŽฒ RandomPick random action every step~20% (pure luck)
๐Ÿ›‘ Always StayNever move, hope ball lands in center~20% (terrible!)
๐Ÿง  SmartMove paddle toward ball100% (optimal!)
๐Ÿ“ˆ LearningStart random, learn smart strategy~85% (improves over time)
\n", - "\n", - "
" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -1179,22 +1038,6 @@ "run_episode(env, policy, visualize=True, delay=0.4)" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "\n", - "# Part 8: Policy Competition! ๐Ÿ†\n", - "\n", - "
\n", - "\n", - "Let's run **50 episodes** for each policy and see who wins!\n", - "\n", - "
" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -1492,20 +1335,6 @@ "
" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "\n", - "
\n", - "\n", - "# ๐ŸŽ“ Summary: Your Journey\n", - "\n", - "
" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -1686,116 +1515,6 @@ "4. ๐Ÿ› ๏ธ **Build** your own environment integration\n", "5. ๐Ÿ’ฌ **Share** what you build with the community!" ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## ๐Ÿ“š Resources\n", - "\n", - "
\n", - "\n", - "### ๐Ÿ”— Essential Links\n", - "\n", - "- **๐Ÿ  OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv\n", - "- **๐ŸŽฎ OpenSpiel**: https://github.com/google-deepmind/open_spiel\n", - "- **โšก FastAPI Docs**: https://fastapi.tiangolo.com/\n", - "- **๐Ÿณ Docker Guide**: https://docs.docker.com/get-started/\n", - "- **๐Ÿ”ฅ PyTorch**: https://pytorch.org/\n", - "\n", - "### ๐Ÿ“– Documentation Deep Dives\n", - "\n", - "- **Environment Creation Guide**: `src/envs/README.md`\n", - "- **OpenSpiel Integration**: `src/envs/openspiel_env/README.md`\n", - "- **Example Scripts**: `examples/`\n", - "- **RFC 001**: [Baseline API Specs](https://github.com/meta-pytorch/OpenEnv/pull/26)\n", - "\n", - "### ๐ŸŽ“ Community & Support\n", - "\n", - "**Supported by amazing organizations:**\n", - "- ๐Ÿ”ฅ Meta PyTorch\n", - "- ๐Ÿค— Hugging Face\n", - "- โšก Unsloth AI\n", - "- ๐ŸŒŸ Reflection AI\n", - "- ๐Ÿš€ And many more!\n", - "\n", - "**License**: BSD 3-Clause (very permissive!)\n", - "\n", - "**Contributions**: Always welcome! Check out the issues tab.\n", - "\n", - "
\n", - "\n", - "---\n", - "\n", - "### ๐ŸŒˆ What's Next?\n", - "\n", - "1. โญ **Star the repo** to show support and stay updated\n", - "2. ๐Ÿ”„ **Try modifying** the Catch game (make it harder? bigger grid?)\n", - "3. ๐ŸŽฎ **Explore** other OpenSpiel games\n", - "4. ๐Ÿ› ๏ธ **Build** your own environment integration\n", - "5. ๐Ÿ’ฌ **Share** what you build with the community!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "
\n", - "\n", - "# ๐ŸŽ‰ Congratulations! You Did It! ๐ŸŽ‰\n", - "\n", - "### You're now an OpenEnv expert!\n", - "\n", - "
\n", - "\n", - "## โœ… What You've Mastered:\n", - "\n", - "**๐Ÿง  Concepts**\n", - "- How RL works (the observe-act-reward loop)\n", - "- Why OpenEnv matters (production-ready RL)\n", - "- How to use existing environments\n", - "\n", - "**๐Ÿ› ๏ธ Practical Skills**\n", - "- Creating new integrations\n", - "- Building type-safe environments\n", - "- Deploying to production\n", - "\n", - "**๐ŸŽฏ Real Experience**\n", - "- Built a complete RL environment\n", - "- Tested multiple policies\n", - "- Watched learning happen in real-time!\n", - "\n", - "---\n", - "\n", - "### Now go build something amazing! ๐Ÿš€\n", - "\n", - "**Welcome to the future of RL with PyTorch & OpenEnv**\n", - "\n", - "
\n", - "\n", - "[![Star on GitHub](https://img.shields.io/badge/โญ_Star_on_GitHub-gray?style=for-the-badge)](https://github.com/meta-pytorch/OpenEnv)\n", - "\n", - "
\n", - "\n", - "---\n", - "\n", - "
\n", - "\n", - "## ๐ŸŒŸ Want to Learn More?\n", - "\n", - "- ๐Ÿ“– Check out the [docs](https://github.com/meta-pytorch/OpenEnv)\n", - "- ๐ŸŽฎ Try the other example games\n", - "- ๐Ÿ’ฌ Join the community discussions\n", - "- ๐Ÿ› ๏ธ Build your own integration\n", - "- ๐Ÿš€ Deploy to production\n", - "- โญ Star the repo to stay updated!\n", - "\n", - "**Happy coding! ๐ŸŽŠ**\n", - "\n", - "
" - ] } ], "metadata": { From f6424fda186c904e9d1aa8ea3436476ff70402fd Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani Date: Mon, 20 Oct 2025 22:21:42 -0700 Subject: [PATCH 19/19] Update OpenEnv_Tutorial.ipynb --- examples/OpenEnv_Tutorial.ipynb | 684 +++++++++++++++++++------------- 1 file changed, 407 insertions(+), 277 deletions(-) diff --git a/examples/OpenEnv_Tutorial.ipynb b/examples/OpenEnv_Tutorial.ipynb index 0b2b481..894d864 100644 --- a/examples/OpenEnv_Tutorial.ipynb +++ b/examples/OpenEnv_Tutorial.ipynb @@ -2,6 +2,7 @@ "cells": [ { "cell_type": "markdown", + "id": "cell-0", "metadata": {}, "source": [ "
\n", @@ -36,6 +37,36 @@ { "cell_type": "markdown", "metadata": {}, + "source": [ + "---\n", + "\n", + "## Why OpenEnv?\n", + "\n", + "Let's take a trip down memory lane:\n", + "\n", + "It's 2016, RL is popular. You read some papers, it looks promising. \n", + "\n", + "But in real world: Cartpole is the best you can run on a gaming GPU. \n", + "\n", + "What do you do beyond Cartpole?\n", + "\n", + "Fast forward to 2025, GRPO is awesome and this time it's not JUST in theory, it works well in practise and is really here! \n", + "\n", + "The problem still remains, how do you take these RL algorithms and take them beyond Cartpole?\n", + "\n", + "A huge part of RL is giving your algorithms environment access to learn. \n", + "\n", + "We are excited to introduce an Environement Spec for adding Open Environments for RL Training. This will allow you to focus on your experiments and allow everyone to bring their environments. \n", + "\n", + "Focus on experiments, use OpenEnvironments, and build agents that go beyond Cartpole on a single spec.\n", + "\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "cell-1", + "metadata": {}, "source": [ "## ๐Ÿ“‹ What You'll Learn\n", "\n", @@ -62,7 +93,7 @@ "\n", "\n", "**๐ŸŽฎ Part 6-8: Hands-On Demo**\n", - "- ๐Ÿ”จ Build a game environment\n", + "- ๐Ÿ”Œ Use existing OpenSpiel environment\n", "- ๐Ÿค– Test 4 different policies\n", "- ๐Ÿ‘€ Watch learning happen live\n", "\n", @@ -70,8 +101,8 @@ "\n", "\n", "**๐Ÿ”ง Part 9-10: Going Further**\n", - "- ๐Ÿš€ Use real OpenSpiel\n", - "- โœจ Create your own integration\n", + "- ๐ŸŽฎ Switch to other OpenSpiel games\n", + "- โœจ Build your own integration\n", "- ๐ŸŒ Deploy to production\n", "\n", "\n", @@ -79,12 +110,13 @@ "\n", "\n", "> ๐Ÿ’ก **Pro Tip**: This notebook is designed to run top-to-bottom in Google Colab with zero setup!\n", - "> \n", - "> โฑ๏ธ **Time**: ~5 minutes | ๐Ÿ“Š **Difficulty**: Beginner-friendly | ๐ŸŽฏ **Outcome**: Production-ready RL knowledge" + ">\n", + "> โฑ๏ธ **Time**: ~5 minutes | ๐Ÿ“Š **Difficulty**: Beginner-friendly | ๐ŸŽฏ **Outcome**: Production-ready RL knowledge\n" ] }, { "cell_type": "markdown", + "id": "cell-2", "metadata": {}, "source": [ "---\n", @@ -124,6 +156,7 @@ }, { "cell_type": "markdown", + "id": "cell-3", "metadata": {}, "source": [ "---\n", @@ -154,6 +187,7 @@ { "cell_type": "code", "execution_count": null, + "id": "cell-4", "metadata": {}, "outputs": [], "source": [ @@ -197,6 +231,7 @@ }, { "cell_type": "markdown", + "id": "cell-5", "metadata": {}, "source": [ "---\n", @@ -270,6 +305,7 @@ }, { "cell_type": "markdown", + "id": "cell-6", "metadata": {}, "source": [ "### The Architecture\n", @@ -317,6 +353,7 @@ }, { "cell_type": "markdown", + "id": "cell-7", "metadata": {}, "source": [ "---\n", @@ -335,6 +372,7 @@ { "cell_type": "code", "execution_count": null, + "id": "cell-8", "metadata": {}, "outputs": [], "source": [ @@ -370,6 +408,7 @@ }, { "cell_type": "markdown", + "id": "cell-9", "metadata": {}, "source": [ "---\n", @@ -403,6 +442,7 @@ { "cell_type": "code", "execution_count": null, + "id": "cell-10", "metadata": {}, "outputs": [], "source": [ @@ -454,6 +494,7 @@ }, { "cell_type": "markdown", + "id": "cell-11", "metadata": {}, "source": [ "---\n", @@ -499,6 +540,7 @@ { "cell_type": "code", "execution_count": null, + "id": "cell-12", "metadata": {}, "outputs": [], "source": [ @@ -550,6 +592,7 @@ { "cell_type": "code", "execution_count": null, + "id": "cell-13", "metadata": {}, "outputs": [], "source": [ @@ -590,6 +633,7 @@ }, { "cell_type": "markdown", + "id": "cell-14", "metadata": {}, "source": [ "### How the Client Works\n", @@ -609,25 +653,26 @@ }, { "cell_type": "markdown", + "id": "cell-15", "metadata": {}, "source": [ "---\n", "\n", "
\n", "\n", - "# ๐ŸŽฎ Part 6: Interactive Demo\n", + "# ๐ŸŽฎ Part 6: Using Real OpenSpiel\n", "\n", - "### Now let's BUILD something!\n", + "### Now let's USE a production environment!\n", "\n", - "We'll create a **Catch game** following OpenEnv patterns,
\n", - "then watch **4 different AI policies** compete for the championship! ๐Ÿ†\n", + "We'll play **Catch** using OpenEnv's **OpenSpiel integration** ๐ŸŽฏ
\n", + "This is a REAL environment running in production at companies!\n", "\n", "
\n", "\n", "**Get ready for:**\n", - "- โšก Live gameplay visualization\n", - "- ๐Ÿค– AI policy showdown\n", - "- ๐Ÿ“Š Real-time learning metrics\n", + "- ๐Ÿ”Œ Using existing environments (not building)\n", + "- ๐Ÿค– Testing policies against real games\n", + "- ๐Ÿ“Š Live gameplay visualization\n", "- ๐ŸŽฏ Production-ready patterns\n", "\n", "
" @@ -635,6 +680,7 @@ }, { "cell_type": "markdown", + "id": "cell-16", "metadata": {}, "source": [ "## The Game: Catch ๐Ÿ”ด๐Ÿ“\n", @@ -644,11 +690,11 @@ "\n", "\n", "```\n", - "โฌœ โฌœ ๐Ÿ”ด โฌœ โฌœ \n", + "โฌœ โฌœ ๐Ÿ”ด โฌœ โฌœ\n", "โฌœ โฌœ โฌœ โฌœ โฌœ Ball\n", "โฌœ โฌœ โฌœ โฌœ โฌœ falls\n", "โฌœ โฌœ โฌœ โฌœ โฌœ down\n", - "โฌœ โฌœ ๐Ÿ“ โฌœ โฌœ \n", + "โฌœ โฌœ ๐Ÿ“ โฌœ โฌœ\n", " Paddle\n", "```\n", "\n", @@ -658,7 +704,7 @@ "**Rules:**\n", "- 5ร—5 grid\n", "- Ball falls from random column\n", - "- Move paddle to catch it\n", + "- Move paddle left/right to catch it\n", "\n", "**Actions:**\n", "- `0` = Move LEFT โฌ…๏ธ\n", @@ -675,12 +721,14 @@ "\n", "
\n", "\n", - "**๐ŸŽฏ Why This Game?**\n", + "**๐ŸŽฏ Why Catch?**\n", "- Simple rules (easy to understand)\n", - "- Visual (see what's happening)\n", "- Fast episodes (~5 steps)\n", "- Clear success/failure\n", - "- Perfect for testing policies!\n", + "- Part of OpenSpiel's 70+ games!\n", + "\n", + "**๐Ÿ’ก The Big Idea:**\n", + "Instead of building this from scratch, we'll USE OpenEnv's existing OpenSpiel integration. Same interface, but production-ready!\n", "\n", "
" ] @@ -688,167 +736,198 @@ { "cell_type": "code", "execution_count": null, + "id": "cell-17", "metadata": {}, "outputs": [], "source": [ - "# Create environment and start a new episode\n", - "env = CatchEnvironment()\n", - "obs = env.reset()\n", - "\n", - "print(\"๐ŸŽฎ \" + \"=\"*58 + \" ๐ŸŽฎ\")\n", - "print(\" INITIAL GAME STATE\")\n", - "print(\"๐ŸŽฎ \" + \"=\"*58 + \" ๐ŸŽฎ\\n\")\n", - "\n", - "# Visualize the game board\n", - "env.render()\n", - "\n", - "# Show game info\n", - "print(f\"\\n๐Ÿ“ Game Info:\")\n", - "print(f\" ๐Ÿ”ด Ball at: column {obs.ball_position[1]} (row {obs.ball_position[0]})\")\n", - "print(f\" ๐Ÿ“ Paddle at: column {obs.paddle_position}\")\n", - "\n", - "print(f\"\\n๐Ÿ“Š Observation Details:\")\n", - "print(f\" โ€ข Legal actions: {obs.legal_actions} โ†’ [LEFT, STAY, RIGHT]\")\n", - "print(f\" โ€ข Info state size: {len(obs.info_state)} (5ร—5 grid flattened)\")\n", - "print(f\" โ€ข Episode done: {obs.done}\")\n", - "print(f\" โ€ข Current reward: {obs.reward}\")\n", - "\n", - "print(\"\\n๐Ÿ’ก The ball will fall down each step. Can your policy catch it?\")\n", - "print(\"=\"*62)" + "from envs.openspiel_env import OpenSpielEnv\n", + "from envs.openspiel_env.models import (\n", + " OpenSpielAction,\n", + " OpenSpielObservation,\n", + " OpenSpielState\n", + ")\n", + "from dataclasses import fields\n", + "\n", + "print(\"๐ŸŽฎ \" + \"=\"*64 + \" ๐ŸŽฎ\")\n", + "print(\" โœ… Importing Real OpenSpiel Environment!\")\n", + "print(\"๐ŸŽฎ \" + \"=\"*64 + \" ๐ŸŽฎ\\n\")\n", + "\n", + "print(\"๐Ÿ“ฆ What we just imported:\")\n", + "print(\" โ€ข OpenSpielEnv - HTTP client for OpenSpiel games\")\n", + "print(\" โ€ข OpenSpielAction - Type-safe actions\")\n", + "print(\" โ€ข OpenSpielObservation - Type-safe observations\")\n", + "print(\" โ€ข OpenSpielState - Episode metadata\\n\")\n", + "\n", + "print(\"๐Ÿ“‹ OpenSpielObservation fields:\")\n", + "print(\" \" + \"โ”€\" * 60)\n", + "for field in fields(OpenSpielObservation):\n", + " print(f\" โ€ข {field.name:25s} : {field.type}\")\n", + "\n", + "print(\"\\n\" + \"=\"*70)\n", + "print(\"\\n๐Ÿ’ก This is REAL OpenEnv code - used in production!\")\n", + "print(\" โ€ข Wraps 6 OpenSpiel games (Catch, Tic-Tac-Toe, Poker, etc.)\")\n", + "print(\" โ€ข Type-safe actions and observations\")\n", + "print(\" โ€ข Works via HTTP (we\\'ll see that next!)\\n\")" ] }, { "cell_type": "code", "execution_count": null, + "id": "cell-18", "metadata": {}, "outputs": [], "source": [ - "import random\n", - "from dataclasses import dataclass\n", - "from typing import List, Tuple\n", + "import subprocess\n", + "import time\n", + "import sys\n", + "import os\n", "\n", - "# ============================================================================\n", - "# MODELS - Type-safe contracts (following OpenEnv pattern)\n", - "# ============================================================================\n", + "print(\"๐Ÿš€ \" + \"=\"*64 + \" ๐Ÿš€\")\n", + "print(\" Starting OpenSpiel Server (Catch Game)\")\n", + "print(\"๐Ÿš€ \" + \"=\"*64 + \" ๐Ÿš€\\n\")\n", "\n", - "@dataclass\n", - "class CatchObservation:\n", - " \"\"\"Type-safe observation following OpenEnv Observation base class.\"\"\"\n", - " info_state: List[float] # Grid as flat array\n", - " legal_actions: List[int] # [0, 1, 2] always\n", - " done: bool # Episode finished?\n", - " reward: float # +1 or 0\n", - " # Extra fields for visualization\n", - " ball_position: Tuple[int, int]\n", - " paddle_position: int\n", + "# Check if open_spiel is installed\n", + "try:\n", + " import pyspiel\n", + " print(\"โœ… OpenSpiel is installed!\\n\")\n", + "except ImportError:\n", + " print(\"โš ๏ธ OpenSpiel not found. Installing...\")\n", + " import subprocess\n", + " subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"-q\", \"open_spiel\"])\n", + " print(\"โœ… OpenSpiel installed!\\n\")\n", "\n", + "# Start the OpenSpiel server in background\n", + "print(\"โšก Starting FastAPI server for OpenSpiel Catch...\")\n", + "print(\" (This uses REAL OpenEnv + OpenSpiel integration)\\n\")\n", "\n", - "# ============================================================================\n", - "# ENVIRONMENT - Server-side logic (following OpenEnv Environment pattern)\n", - "# ============================================================================\n", + "# Determine the correct path\n", + "if IN_COLAB:\n", + " work_dir = \"/content/OpenEnv\"\n", + "else:\n", + " from pathlib import Path\n", + " work_dir = str(Path.cwd().parent.absolute())\n", + "\n", + "server_process = subprocess.Popen(\n", + " [sys.executable, \"-m\", \"uvicorn\",\n", + " \"envs.openspiel_env.server.app:app\",\n", + " \"--host\", \"0.0.0.0\",\n", + " \"--port\", \"8000\"],\n", + " env={**os.environ,\n", + " \"PYTHONPATH\": f\"{work_dir}/src\",\n", + " \"OPENSPIEL_GAME\": \"catch\",\n", + " \"OPENSPIEL_AGENT_PLAYER\": \"0\",\n", + " \"OPENSPIEL_OPPONENT_POLICY\": \"random\"},\n", + " stdout=subprocess.PIPE,\n", + " stderr=subprocess.PIPE,\n", + " text=True,\n", + " cwd=work_dir\n", + ")\n", "\n", - "class CatchEnvironment:\n", - " \"\"\"\n", - " Catch game following OpenEnv's Environment pattern.\n", - " \n", - " In production:\n", - " โ€ข Runs in Docker container\n", - " โ€ข Accessed via HTTPEnvClient\n", - " โ€ข Exposed via FastAPI server\n", - " \n", - " For this demo:\n", - " โ€ข We run it locally to see internals\n", - " โ€ข But the structure is identical!\n", - " \"\"\"\n", - " \n", - " def __init__(self, grid_size=5):\n", - " self.grid_size = grid_size\n", - " \n", - " def reset(self) -> CatchObservation:\n", - " \"\"\"Start new episode (implements Environment.reset()).\"\"\"\n", - " self.ball_row = 0\n", - " self.ball_col = random.randint(0, self.grid_size - 1)\n", - " self.paddle_col = self.grid_size // 2\n", - " self.done = False\n", - " return self._make_observation()\n", - " \n", - " def step(self, action: int) -> CatchObservation:\n", - " \"\"\"Execute action (implements Environment.step()).\n", - " \n", - " Args:\n", - " action: 0=LEFT, 1=STAY, 2=RIGHT\n", - " \"\"\"\n", - " # Move paddle\n", - " if action == 0 and self.paddle_col > 0:\n", - " self.paddle_col -= 1\n", - " elif action == 2 and self.paddle_col < self.grid_size - 1:\n", - " self.paddle_col += 1\n", - " \n", - " # Move ball down\n", - " self.ball_row += 1\n", - " \n", - " # Check if episode done\n", - " if self.ball_row >= self.grid_size - 1:\n", - " self.done = True\n", - " reward = 1.0 if self.ball_col == self.paddle_col else 0.0\n", - " else:\n", - " reward = 0.0\n", - " \n", - " return self._make_observation(reward)\n", - " \n", - " def _make_observation(self, reward=0.0) -> CatchObservation:\n", - " \"\"\"Create type-safe observation.\"\"\"\n", - " # Flatten grid to vector (like real RL environments do)\n", - " info_state = [0.0] * (self.grid_size * self.grid_size)\n", - " ball_idx = self.ball_row * self.grid_size + self.ball_col\n", - " paddle_idx = (self.grid_size - 1) * self.grid_size + self.paddle_col\n", - " info_state[ball_idx] = 1.0 # Ball = 1.0\n", - " info_state[paddle_idx] = 0.5 # Paddle = 0.5\n", - " \n", - " return CatchObservation(\n", - " info_state=info_state,\n", - " legal_actions=[0, 1, 2],\n", - " done=self.done,\n", - " reward=reward,\n", - " ball_position=(self.ball_row, self.ball_col),\n", - " paddle_position=self.paddle_col\n", - " )\n", - " \n", - " def render(self):\n", - " \"\"\"Visualize current state.\"\"\"\n", - " for row in range(self.grid_size):\n", - " line = \" \"\n", - " for col in range(self.grid_size):\n", - " if row == self.ball_row and col == self.ball_col:\n", - " line += \"๐Ÿ”ด \"\n", - " elif row == self.grid_size - 1 and col == self.paddle_col:\n", - " line += \"๐Ÿ“ \"\n", - " else:\n", - " line += \"โฌœ \"\n", - " print(line)\n", - "\n", - "\n", - "print(\"๐ŸŽ‰ \" + \"=\"*64 + \" ๐ŸŽ‰\")\n", - "print(\" โœ… Environment Created Following OpenEnv Pattern!\")\n", - "print(\"๐ŸŽ‰ \" + \"=\"*64 + \" ๐ŸŽ‰\")\n", - "print(\"\\n๐Ÿ“‹ What we just built:\")\n", - "print(\" โ€ข reset() โ†’ CatchObservation (type-safe!)\")\n", - "print(\" โ€ข step(action) โ†’ CatchObservation (type-safe!)\")\n", - "print(\" โ€ข render() โ†’ Visual display\")\n", - "print(\"\\n๐Ÿš€ In production: This would run in Docker + FastAPI\")\n", - "print(\" But the structure is EXACTLY the same!\")\n", - "print(\"\\n๐Ÿ’ก This is your blueprint for creating ANY OpenEnv environment!\\n\")" + "# Wait for server to start\n", + "print(\"โณ Waiting for server to start...\")\n", + "time.sleep(5)\n", + "\n", + "# Check if server is running\n", + "import requests\n", + "try:\n", + " response = requests.get('http://localhost:8000/health', timeout=2)\n", + " print(\"\\nโœ… OpenSpiel server is running!\")\n", + " print(\"๐ŸŒ Server URL: http://localhost:8000\")\n", + " print(\"๐Ÿ“ Endpoints available:\")\n", + " print(\" โ€ข POST /reset\")\n", + " print(\" โ€ข POST /step\")\n", + " print(\" โ€ข GET /state\")\n", + " print(\"\\n๐ŸŽฏ This is REAL OpenEnv + OpenSpiel in action!\")\n", + " print(\" โ€ข Running actual OpenSpiel Catch game\")\n", + " print(\" โ€ข Exposed via FastAPI HTTP server\")\n", + " print(\" โ€ข Using OpenEnv's standard interface\\n\")\n", + "except Exception as e:\n", + " print(f\"\\nโŒ Server failed to start: {e}\")\n", + " print(\"\\n๐Ÿ“‹ Checking error output...\")\n", + " server_process.poll()\n", + " if server_process.stderr:\n", + " stderr = server_process.stderr.read()\n", + " if stderr:\n", + " print(stderr)\n", + " print(\"\\n๐Ÿ’ก Make sure open_spiel is installed:\")\n", + " print(\" pip install open_spiel\")\n", + " raise" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, + "id": "cell-19", "metadata": {}, + "outputs": [], "source": [ - "### Test the Environment" + "print(\"๐Ÿ“ฑ \" + \"=\"*64 + \" ๐Ÿ“ฑ\")\n", + "print(\" Connecting to OpenSpiel Server via HTTP\")\n", + "print(\"๐Ÿ“ฑ \" + \"=\"*64 + \" ๐Ÿ“ฑ\\n\")\n", + "\n", + "# Create HTTP client for OpenSpiel\n", + "client = OpenSpielEnv(base_url=\"http://localhost:8000\")\n", + "\n", + "print(\"โœ… Client created!\")\n", + "print(\"\\n๐Ÿ’ก What just happened:\")\n", + "print(\" โ€ข OpenSpielEnv is an HTTPEnvClient subclass\")\n", + "print(\" โ€ข It knows how to talk to OpenSpiel servers\")\n", + "print(\" โ€ข All communication is type-safe and over HTTP\")\n", + "print(\" โ€ข Same client works for ALL OpenSpiel games!\\n\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cell-20", + "metadata": {}, + "outputs": [], + "source": [ + "print(\"๐ŸŽฎ \" + \"=\"*64 + \" ๐ŸŽฎ\")\n", + "print(\" Testing Connection - Playing One Step\")\n", + "print(\"๐ŸŽฎ \" + \"=\"*64 + \" ๐ŸŽฎ\\n\")\n", + "\n", + "# Reset the environment (HTTP POST /reset)\n", + "print(\"๐Ÿ“ค Calling client.reset()...\")\n", + "print(\" Under the hood: HTTP POST to http://localhost:8000/reset\\n\")\n", + "\n", + "result = client.reset()\n", + "\n", + "print(\"๐Ÿ“ฅ Received OpenSpielObservation:\")\n", + "print(f\" โ€ข info_state: {result.observation.info_state[:10]}... (first 10 values)\")\n", + "print(f\" โ€ข legal_actions: {result.observation.legal_actions}\")\n", + "print(f\" โ€ข game_phase: {result.observation.game_phase}\")\n", + "print(f\" โ€ข done: {result.done}\")\n", + "\n", + "# Take an action (HTTP POST /step)\n", + "print(\"\\n๐Ÿ“ค Calling client.step(OpenSpielAction(action_id=1, game_name=\\'catch\\'))...\")\n", + "print(\" Under the hood: HTTP POST to http://localhost:8000/step\\n\")\n", + "\n", + "action = OpenSpielAction(action_id=1, game_name=\"catch\") # STAY\n", + "result = client.step(action)\n", + "\n", + "print(\"๐Ÿ“ฅ Received response:\")\n", + "print(f\" โ€ข Reward: {result.reward}\")\n", + "print(f\" โ€ข Done: {result.done}\")\n", + "print(f\" โ€ข legal_actions: {result.observation.legal_actions}\")\n", + "\n", + "# Get state (HTTP GET /state)\n", + "state = client.state()\n", + "print(f\"\\n๐Ÿ“Š Episode state:\")\n", + "print(f\" โ€ข episode_id: {state.episode_id}\")\n", + "print(f\" โ€ข step_count: {state.step_count}\")\n", + "print(f\" โ€ข game_name: {state.game_name}\")\n", + "\n", + "print(\"\\n\" + \"=\"*70)\n", + "print(\"\\n๐ŸŽ‰ IT WORKS! We\\'re using REAL OpenSpiel via HTTP!\")\n", + "print(\" โœ… Type-safe communication\")\n", + "print(\" โœ… Same interface as any OpenEnv environment\")\n", + "print(\" โœ… Production-ready architecture\\n\")" ] }, { "cell_type": "markdown", + "id": "cell-21", "metadata": {}, "source": [ "---\n", @@ -887,93 +966,112 @@ "\n", "\n", "\n", + "**๐Ÿ’ก These policies work with ANY OpenSpiel game!**\n", + "\n", "
" ] }, { "cell_type": "code", "execution_count": null, + "id": "cell-22", "metadata": {}, "outputs": [], "source": [ + "import random\n", + "\n", "# ============================================================================\n", - "# POLICIES - Different AI strategies\n", + "# POLICIES - Different AI strategies (adapted for OpenSpiel)\n", "# ============================================================================\n", "\n", "class RandomPolicy:\n", " \"\"\"Baseline: Pure random guessing.\"\"\"\n", " name = \"๐ŸŽฒ Random Guesser\"\n", - " \n", - " def select_action(self, obs: CatchObservation) -> int:\n", + "\n", + " def select_action(self, obs: OpenSpielObservation) -> int:\n", " return random.choice(obs.legal_actions)\n", "\n", "\n", "class AlwaysStayPolicy:\n", " \"\"\"Bad strategy: Never moves.\"\"\"\n", " name = \"๐Ÿ›‘ Always Stay\"\n", - " \n", - " def select_action(self, obs: CatchObservation) -> int:\n", + "\n", + " def select_action(self, obs: OpenSpielObservation) -> int:\n", " return 1 # STAY\n", "\n", "\n", "class SmartPolicy:\n", " \"\"\"Optimal: Move paddle toward ball.\"\"\"\n", " name = \"๐Ÿง  Smart Heuristic\"\n", - " \n", - " def select_action(self, obs: CatchObservation) -> int:\n", - " ball_col = obs.ball_position[1]\n", - " paddle_col = obs.paddle_position\n", - " \n", - " if paddle_col < ball_col:\n", - " return 2 # Move RIGHT\n", - " elif paddle_col > ball_col:\n", - " return 0 # Move LEFT\n", - " else:\n", - " return 1 # STAY (already aligned)\n", + "\n", + " def select_action(self, obs: OpenSpielObservation) -> int:\n", + " # Parse OpenSpiel observation\n", + " # For Catch: info_state is a flattened 5x5 grid\n", + " # Ball position and paddle position encoded in the vector\n", + " info_state = obs.info_state\n", + "\n", + " # Find ball and paddle positions from info_state\n", + " # Catch uses a 5x5 grid, so 25 values\n", + " grid_size = 5\n", + "\n", + " # Find positions (ball = 1.0, paddle = 0.5 in the flattened grid)\n", + " ball_col = None\n", + " paddle_col = None\n", + "\n", + " for idx, val in enumerate(info_state):\n", + " if abs(val - 1.0) < 0.01: # Ball\n", + " ball_col = idx % grid_size\n", + " elif abs(val - 0.5) < 0.01: # Paddle\n", + " paddle_col = idx % grid_size\n", + "\n", + " if ball_col is not None and paddle_col is not None:\n", + " if paddle_col < ball_col:\n", + " return 2 # Move RIGHT\n", + " elif paddle_col > ball_col:\n", + " return 0 # Move LEFT\n", + "\n", + " return 1 # STAY (fallback)\n", "\n", "\n", "class LearningPolicy:\n", " \"\"\"Simulated RL: Epsilon-greedy exploration.\"\"\"\n", " name = \"๐Ÿ“ˆ Learning Agent\"\n", - " \n", + "\n", " def __init__(self):\n", " self.steps = 0\n", - " \n", - " def select_action(self, obs: CatchObservation) -> int:\n", + " self.smart_policy = SmartPolicy()\n", + "\n", + " def select_action(self, obs: OpenSpielObservation) -> int:\n", " self.steps += 1\n", - " \n", + "\n", " # Decay exploration rate over time\n", " epsilon = max(0.1, 1.0 - (self.steps / 100))\n", - " \n", + "\n", " if random.random() < epsilon:\n", " # Explore: random action\n", " return random.choice(obs.legal_actions)\n", " else:\n", " # Exploit: use smart strategy\n", - " ball_col = obs.ball_position[1]\n", - " paddle_col = obs.paddle_position\n", - " if paddle_col < ball_col:\n", - " return 2\n", - " elif paddle_col > ball_col:\n", - " return 0\n", - " else:\n", - " return 1\n", + " return self.smart_policy.select_action(obs)\n", "\n", "\n", "print(\"๐Ÿค– \" + \"=\"*64 + \" ๐Ÿค–\")\n", - "print(\" โœ… 4 Policies Created!\")\n", + "print(\" โœ… 4 Policies Created (Adapted for OpenSpiel)!\")\n", "print(\"๐Ÿค– \" + \"=\"*64 + \" ๐Ÿค–\\n\")\n", "\n", "policies = [RandomPolicy(), AlwaysStayPolicy(), SmartPolicy(), LearningPolicy()]\n", "for i, policy in enumerate(policies, 1):\n", " print(f\" {i}. {policy.name}\")\n", "\n", - "print(\"\\n๐Ÿ’ก Each policy represents a different approach to solving the game!\")\n", - "print(\" Let's see who performs best! ๐Ÿ†\\n\")" + "print(\"\\n๐Ÿ’ก These policies work with OpenSpielObservation!\")\n", + "print(\" โ€ข Read info_state (flattened grid)\")\n", + "print(\" โ€ข Use legal_actions\")\n", + "print(\" โ€ข Work with ANY OpenSpiel game that exposes these!\\n\")" ] }, { "cell_type": "markdown", + "id": "cell-23", "metadata": {}, "source": [ "### Watch a Policy Play!" @@ -982,64 +1080,76 @@ { "cell_type": "code", "execution_count": null, + "id": "cell-24", "metadata": {}, "outputs": [], "source": [ "import time\n", "\n", - "def run_episode(env, policy, visualize=True, delay=0.4):\n", - " \"\"\"Run one episode with a policy.\"\"\"\n", - " \n", + "def run_episode(env, policy, visualize=True, delay=0.3):\n", + " \"\"\"Run one episode with a policy against OpenSpiel environment.\"\"\"\n", + "\n", " # RESET\n", - " obs = env.reset()\n", - " \n", + " result = env.reset()\n", + " obs = result.observation\n", + "\n", " if visualize:\n", " print(f\"\\n{'='*60}\")\n", " print(f\" ๐ŸŽฎ {policy.name}\")\n", - " print(f\" ๐Ÿ”ด Ball will fall at column: {obs.ball_position[1]}\")\n", + " print(f\" ๐ŸŽฒ Playing against OpenSpiel Catch\")\n", " print('='*60 + '\\n')\n", - " env.render()\n", " time.sleep(delay)\n", - " \n", + "\n", " total_reward = 0\n", " step = 0\n", " action_names = [\"โฌ…๏ธ LEFT\", \"๐Ÿ›‘ STAY\", \"โžก๏ธ RIGHT\"]\n", - " \n", + "\n", " # THE RL LOOP\n", " while not obs.done:\n", " # 1. Policy chooses action\n", - " action = policy.select_action(obs)\n", - " \n", - " # 2. Environment executes\n", - " obs = env.step(action)\n", - " \n", + " action_id = policy.select_action(obs)\n", + "\n", + " # 2. Environment executes (via HTTP!)\n", + " action = OpenSpielAction(action_id=action_id, game_name=\"catch\")\n", + " result = env.step(action)\n", + " obs = result.observation\n", + "\n", " # 3. Collect reward\n", - " total_reward += obs.reward\n", - " \n", + " if result.reward is not None:\n", + " total_reward += result.reward\n", + "\n", " if visualize:\n", - " print(f\"\\n๐Ÿ“ Step {step + 1}: {action_names[action]}\")\n", - " env.render()\n", + " print(f\"๐Ÿ“ Step {step + 1}: {action_names[action_id]} โ†’ Reward: {result.reward}\")\n", " time.sleep(delay)\n", - " \n", + "\n", " step += 1\n", - " \n", + "\n", " if visualize:\n", - " result = \"๐ŸŽ‰ CAUGHT!\" if total_reward > 0 else \"๐Ÿ˜ข MISSED\"\n", + " result_text = \"๐ŸŽ‰ CAUGHT!\" if total_reward > 0 else \"๐Ÿ˜ข MISSED\"\n", " print(f\"\\n{'='*60}\")\n", - " print(f\" {result} Reward: {total_reward}\")\n", + " print(f\" {result_text} Total Reward: {total_reward}\")\n", " print('='*60)\n", - " \n", + "\n", " return total_reward > 0\n", "\n", "\n", + "print(\"๐Ÿ“บ \" + \"=\"*64 + \" ๐Ÿ“บ\")\n", + "print(\" Watch Smart Policy Play Against OpenSpiel!\")\n", + "print(\"๐Ÿ“บ \" + \"=\"*64 + \" ๐Ÿ“บ\\n\")\n", + "\n", "# Demo: Watch Smart Policy in action\n", - "env = CatchEnvironment()\n", "policy = SmartPolicy()\n", - "run_episode(env, policy, visualize=True, delay=0.4)" + "run_episode(client, policy, visualize=True, delay=0.5)\n", + "\n", + "print(\"\\n๐Ÿ’ก You just watched REAL OpenSpiel Catch being played!\")\n", + "print(\" โ€ข Every action was an HTTP call\")\n", + "print(\" โ€ข Game logic runs in the server\")\n", + "print(\" โ€ข Client only sends actions and receives observations\\n\")" ] }, { "cell_type": "markdown", + "id": "cell-25", "metadata": {}, "source": [ "---\n", @@ -1048,7 +1158,9 @@ "\n", "
\n", "\n", - "Let's run **50 episodes** for each policy and see who wins!\n", + "Let's run **50 episodes** for each policy against **REAL OpenSpiel** and see who wins!\n", + "\n", + "This is production code - every action is an HTTP call to the OpenSpiel server!\n", "\n", "
" ] @@ -1056,153 +1168,167 @@ { "cell_type": "code", "execution_count": null, + "id": "cell-26", "metadata": {}, "outputs": [], "source": [ - "def evaluate_policies(num_episodes=50):\n", - " \"\"\"Compare all policies over many episodes.\"\"\"\n", + "def evaluate_policies(env, num_episodes=50):\n", + " \"\"\"Compare all policies over many episodes using real OpenSpiel.\"\"\"\n", " policies = [\n", " RandomPolicy(),\n", " AlwaysStayPolicy(),\n", " SmartPolicy(),\n", " LearningPolicy(),\n", " ]\n", - " \n", + "\n", " print(\"\\n๐Ÿ† \" + \"=\"*66 + \" ๐Ÿ†\")\n", " print(f\" POLICY SHOWDOWN - {num_episodes} Episodes Each\")\n", + " print(f\" Playing against REAL OpenSpiel Catch!\")\n", " print(\"๐Ÿ† \" + \"=\"*66 + \" ๐Ÿ†\\n\")\n", - " \n", + "\n", " results = []\n", " for policy in policies:\n", " print(f\"โšก Testing {policy.name}...\", end=\" \")\n", - " env = CatchEnvironment()\n", - " successes = sum(run_episode(env, policy, visualize=False) \n", + " successes = sum(run_episode(env, policy, visualize=False)\n", " for _ in range(num_episodes))\n", " success_rate = (successes / num_episodes) * 100\n", " results.append((policy.name, success_rate, successes))\n", " print(f\"โœ“ Done!\")\n", - " \n", + "\n", " print(\"\\n\" + \"=\"*70)\n", " print(\" ๐Ÿ“Š FINAL RESULTS\")\n", " print(\"=\"*70 + \"\\n\")\n", - " \n", + "\n", " # Sort by success rate (descending)\n", " results.sort(key=lambda x: x[1], reverse=True)\n", - " \n", + "\n", " # Award medals to top 3\n", " medals = [\"๐Ÿฅ‡\", \"๐Ÿฅˆ\", \"๐Ÿฅ‰\", \" \"]\n", - " \n", + "\n", " for i, (name, rate, successes) in enumerate(results):\n", " medal = medals[i]\n", " bar = \"โ–ˆ\" * int(rate / 2)\n", " print(f\"{medal} {name:25s} [{bar:<50}] {rate:5.1f}% ({successes}/{num_episodes})\")\n", - " \n", + "\n", " print(\"\\n\" + \"=\"*70)\n", " print(\"\\nโœจ Key Insights:\")\n", " print(\" โ€ข Random (~20%): Baseline - pure luck ๐ŸŽฒ\")\n", " print(\" โ€ข Always Stay (~20%): Bad strategy - stays center ๐Ÿ›‘\")\n", " print(\" โ€ข Smart (100%): Optimal - perfect play! ๐Ÿง \")\n", " print(\" โ€ข Learning (~85%): Improves over time ๐Ÿ“ˆ\")\n", - " print(\"\\n๐ŸŽ“ This is Reinforcement Learning in action:\")\n", - " print(\" 1. Start with exploration (trying random things)\")\n", - " print(\" 2. Learn from rewards (what works, what doesn't)\")\n", - " print(\" 3. Converge to optimal behavior (smart strategy)\")\n", - " print(\"\\n๐ŸŽฏ The Learning Agent gets smarter with every episode!\\n\")\n", + " print(\"\\n๐ŸŽ“ This is Reinforcement Learning + OpenEnv in action:\")\n", + " print(\" 1. We USED existing OpenSpiel environment (didn\\'t build it)\")\n", + " print(\" 2. Type-safe communication over HTTP\")\n", + " print(\" 3. Same code works for ANY OpenSpiel game\")\n", + " print(\" 4. Production-ready architecture\\n\")\n", "\n", "# Run the epic competition!\n", - "print(\"๐ŸŽฎ Starting the showdown...\")\n", - "evaluate_policies(num_episodes=50)" + "print(\"๐ŸŽฎ Starting the showdown against REAL OpenSpiel...\\n\")\n", + "evaluate_policies(client, num_episodes=50)" ] }, { "cell_type": "markdown", + "id": "cell-27", "metadata": {}, "source": [ "---\n", "\n", "\n", - "# Part 9: Using Real OpenSpiel ๐ŸŽฎ\n", + "# Part 9: Switching to Other Games ๐ŸŽฎ\n", "\n", "
\n", "\n", - "## What We Just Built vs Production OpenSpiel\n", + "## What We Just Used: Real OpenSpiel! ๐ŸŽ‰\n", + "\n", + "In Parts 6-8, we **USED** the existing OpenSpiel Catch environment:\n", "\n", "\n", "\n", - "\n", - "\n", - "\n", + "\n", + "\n", "\n", "\n", - "\n", - "\n", - "\n", + "\n", + "\n", "\n", "\n", - "\n", - "\n", - "\n", + "\n", + "\n", "\n", "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", + "\n", + "\n", "\n", "\n", - "\n", - "\n", - "\n", + "\n", + "\n", "\n", "
ComponentOur DemoOpenEnv + OpenSpielWhat We DidHow It Works
EnvironmentLocal Python classDocker containerImportedOpenSpielEnv client (pre-built)
CommunicationDirect function callsHTTP/JSONStartedOpenSpiel server via uvicorn
ClientDirect accessHTTPEnvClient
Type Safetyโœ… Dataclassesโœ… DataclassesConnectedHTTP client to server
APIreset(), step()reset(), step() (same!)PlayedReal OpenSpiel Catch game
\n", "\n", - "**๐ŸŽฏ Same structure, production features!**\n", + "**๐ŸŽฏ This is production code!** Every action was an HTTP call to a real OpenSpiel environment.\n", "\n", "
\n", "\n", - "### Using OpenSpiel Integration:\n", + "## ๐ŸŽฎ 6 Games Available - Same Interface!\n", "\n", - "```python\n", - "# 1. Install OpenSpiel\n", - "!pip install open_spiel\n", + "The beauty of OpenEnv? **Same code, different games!**\n", "\n", - "# 2. Import OpenEnv's integration\n", - "from envs.openspiel_env import OpenSpielEnv, OpenSpielAction\n", - "\n", - "# 3. Connect to server (HTTP!)\n", + "```python\n", + "# We just used Catch\n", "env = OpenSpielEnv(base_url=\"http://localhost:8000\")\n", + "# game_name=\"catch\" was set via environment variable\n", "\n", - "# 4. Same API you just learned!\n", - "result = env.reset()\n", - "result = env.step(OpenSpielAction(action_id=2, game_name=\"catch\"))\n", - "state = env.state()\n", - "\n", - "# 5. Switch games by changing game_name:\n", - "result = env.step(OpenSpielAction(action_id=4, game_name=\"tic_tac_toe\"))\n", + "# Want Tic-Tac-Toe instead? Just change the game!\n", + "# Start server with: OPENSPIEL_GAME=tic_tac_toe uvicorn ...\n", + "# Same client code works!\n", "```\n", "\n", "
\n", "\n", - "**๐ŸŽฎ 6 Games Available:**\n", + "**๐ŸŽฎ All 6 Games:**\n", "\n", - "1. `\"catch\"` - What we just built!\n", - "2. `\"tic_tac_toe\"` - Classic 3ร—3\n", - "3. `\"kuhn_poker\"` - Imperfect information poker\n", - "4. `\"cliff_walking\"` - Grid navigation\n", - "5. `\"2048\"` - Tile puzzle\n", - "6. `\"blackjack\"` - Card game\n", + "1. โœ… **`catch`** - What we just used!\n", + "2. **`tic_tac_toe`** - Classic 3ร—3\n", + "3. **`kuhn_poker`** - Imperfect information poker\n", + "4. **`cliff_walking`** - Grid navigation\n", + "5. **`2048`** - Tile puzzle\n", + "6. **`blackjack`** - Card game\n", "\n", - "**All use the exact same interface!**\n", + "**All use the exact same OpenSpielEnv client!**\n", "\n", - "
" + "
\n", + "\n", + "### Try Another Game (Optional):\n", + "\n", + "```python\n", + "# Stop the current server (kill the server_process)\n", + "# Then start a new game:\n", + "\n", + "server_process = subprocess.Popen(\n", + " [sys.executable, \"-m\", \"uvicorn\",\n", + " \"envs.openspiel_env.server.app:app\",\n", + " \"--host\", \"0.0.0.0\",\n", + " \"--port\", \"8000\"],\n", + " env={**os.environ,\n", + " \"PYTHONPATH\": f\"{work_dir}/src\",\n", + " \"OPENSPIEL_GAME\": \"tic_tac_toe\", # Changed!\n", + " \"OPENSPIEL_AGENT_PLAYER\": \"0\",\n", + " \"OPENSPIEL_OPPONENT_POLICY\": \"random\"},\n", + " # ... rest of config\n", + ")\n", + "\n", + "# Same client works!\n", + "client = OpenSpielEnv(base_url=\"http://localhost:8000\")\n", + "result = client.reset() # Now playing Tic-Tac-Toe!\n", + "```\n", + "\n", + "**๐Ÿ’ก Key Insight**: You don't rebuild anything - you just USE different games with the same client!\n" ] }, { "cell_type": "markdown", + "id": "cell-28", "metadata": {}, "source": [ "---\n", @@ -1337,6 +1463,7 @@ }, { "cell_type": "markdown", + "id": "cell-29", "metadata": {}, "source": [ "---\n", @@ -1350,6 +1477,7 @@ }, { "cell_type": "markdown", + "id": "cell-30", "metadata": {}, "source": [ "## What You Learned\n", @@ -1402,6 +1530,7 @@ }, { "cell_type": "markdown", + "id": "cell-31", "metadata": {}, "source": [ "## OpenEnv vs Traditional RL\n", @@ -1468,6 +1597,7 @@ }, { "cell_type": "markdown", + "id": "cell-32", "metadata": {}, "source": [ "\n",