# Demonstrating `mobile-env:smart-city`

`mobile-env` is a simple and open environment for training, testing, and evaluating a decentralized metaverse environment.

* `mobile-env:smart-city` is written in pure Python
* It allows simulating various scenarios with moving users in a cellular network with a single base station and multiple stationary sensors
* `mobile-env:smart-city` implements the standard [Gymnasium](https://gymnasium.farama.org/) (previously [OpenAI Gym](https://gym.openai.com/)) interface such that it can be used with all common frameworks for reinforcement learning
* `mobile-env:smart-city` is not restricted to reinforcement learning approaches but can also be used with conventional control approaches or dummy benchmark algorithms
* It can be configured easily (e.g., adjusting number and movement of users, properties of cells, etc.)
* It is also easy to extend `mobile-env:smart-city`, e.g., implementing different observations, actions, or reward

As such `mobile-env:smart-city` is a simple platform to test RL algorithms in a decentralized metaverse environment.


**Demonstration Steps:**

This demonstration consists of the following steps:

1. Installation and usage of `mobile-env` with dummy actions
2. Configuration of `mobile-env` and adjustment of the observation space (optional)
3. Training a single-agent reinforcement learning approach with [`stable-baselines3`](https://github.com/DLR-RM/stable-baselines3)

In [None]:
# First, install stable baselines; only SB3 v2.0.0+ supports Gymnasium
%pip install stable-baselines3==2.0.0 tensorboard

In [None]:
# Importing necessary libraries
import gymnasium as gym
import mobile_env
import numpy as np

# predefined small scenarios
from mobile_env.scenarios.smart_city import MComSmartCity

# easy access to the default configuration
MComSmartCity.default_config()

In [None]:
# Parameters
ENV_NAME = "mobile-smart_city-smart_city_handler-v0"
LOG_DIR = "./ppo_logs/"
MODEL_DIR = "./ppo_models/"
NUM_SEEDS = 5
EVAL_EPISODES = 100
MAX_TIMESTEPS = 10_000
SEEDS = [111, 222, 333, 444, 555]

In [None]:
import os
os.makedirs(LOG_DIR, exist_ok=True)
os.makedirs(MODEL_DIR, exist_ok=True)

In [None]:
from stable_baselines3 import PPO
from stable_baselines3.ppo import MlpPolicy
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.evaluation import evaluate_policy

# Training loop
for idx, seed in enumerate(SEEDS):
    print(f"\n==== Training Run {idx+1} with Seed {seed} ====\n")
    
    # Set seed for reproducibility
    env = Monitor(gym.make(ENV_NAME, config={"seed": seed}))
    
    # Initialize PPO model with seed
    model = PPO("MlpPolicy", env, seed=seed, tensorboard_log=LOG_DIR, verbose=1)
    print(f"Environment seed: {env.env.seed}")

    # Train the model
    model.learn(total_timesteps=MAX_TIMESTEPS, log_interval=1, tb_log_name=f'PPO_seed_{seed}', progress_bar=True)

    # Save the trained model
    model_path = os.path.join(MODEL_DIR, f"ppo_seed_{seed}.zip")
    model.save(model_path)
    print(f"Model saved at {model_path}")

    # Evaluate model
    eval_env = Monitor(gym.make(ENV_NAME, config={"seed": seed}))
    mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=EVAL_EPISODES, deterministic=True)

    print(f"Seed {seed} Evaluation Results:")
    print(f"Mean Reward: {mean_reward}, Std Reward: {std_reward}")

    print(f"Finished training with seed: {seed}\n")
    
    # Cleanup
    env.close()
    eval_env.close()
