# Advanced Agent Training with MLFlow-Assist 🚀

This notebook demonstrates advanced agent training capabilities including:
1. Multi-agent evolutionary training
2. Meta-learning across different tasks
3. Curriculum learning progression
4. Performance visualization
5. Advanced environment interactions

[![Buy me a coffee](https://img.shields.io/badge/Buy%20me%20a%20coffee-happyvibess-orange)](https://www.buymeacoffee.com/happyvibess)

In [None]:
import gymnasium as gym
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from torch.utils.data import DataLoader

from mlflow_assist.agents import (
    AdvancedTrainer,
    AdvancedTrainingConfig,
    RLAgent,
    RLConfig
)
from mlflow_assist.agents.environments import (
    CurriculumEnv,
    TaskDifficulty,
    MultiAgentEnv,
    MetaLearningEnv
)

## 1. Multi-Agent Evolutionary Training 🧬

Train a population of agents using evolutionary strategies:

In [None]:
# Create base environment and agent
env = gym.make('LunarLander-v2')
config = RLConfig(
    state_size=env.observation_space.shape[0],
    action_size=env.action_space.n,
    hidden_size=128
)
base_agent = RLAgent(config)

# Configure evolutionary training
trainer_config = AdvancedTrainingConfig(
    population_size=20,
    evolution_rate=0.1,
    num_workers=4
)

# Create trainer
trainer = AdvancedTrainer(
    config=trainer_config,
    base_agent=base_agent,
    train_env=env,
    experiment_name="lunar_lander_evolution"
)

# Train population
best_agent = trainer.train_population(
    num_generations=50,
    steps_per_generation=1000,
    tournament_size=4
)

## 2. Meta-Learning Across Tasks 🎯

Train an agent that can quickly adapt to new tasks:

In [None]:
# Create meta-learning environment
class CustomMetaEnv(MetaLearningEnv):
    def sample_tasks(self, num_tasks):
        return [
            {"gravity": np.random.uniform(-1.0, 0.0)}
            for _ in range(num_tasks)
        ]
    
    def set_task(self, task):
        self.gravity = task["gravity"]

meta_env = CustomMetaEnv()

# Configure meta-learning
meta_config = AdvancedTrainingConfig(
    meta_learning=True,
    meta_lr=0.001,
    num_tasks=5
)

# Create trainer
meta_trainer = AdvancedTrainer(
    config=meta_config,
    base_agent=base_agent,
    train_env=meta_env,
    experiment_name="meta_learning"
)

# Train with meta-learning
meta_trainer.train_meta(
    num_epochs=100,
    tasks_per_batch=4,
    adaptation_steps=5
)

## 3. Curriculum Learning 📚

Train an agent with progressively harder tasks:

In [None]:
# Create curriculum environment
base_env = gym.make('BipedalWalker-v3')
curr_env = CurriculumEnv(base_env)

# Train with curriculum
for difficulty in TaskDifficulty:
    print(f"\nTraining at {difficulty.value} level...")
    curr_env.difficulty = difficulty
    
    # Train for this difficulty
    trainer = AdvancedTrainer(
        config=AdvancedTrainingConfig(curriculum_learning=True),
        base_agent=base_agent,
        train_env=curr_env,
        experiment_name=f"curriculum_{difficulty.value}"
    )
    
    trainer.train_population(
        num_generations=20,
        steps_per_generation=500
    )

## 4. Performance Visualization 📈

Analyze and visualize training results:

In [None]:
def plot_training_curves(metrics_dict, title):
    plt.figure(figsize=(12, 6))
    for metric_name, values in metrics_dict.items():
        plt.plot(values, label=metric_name)
    plt.title(title)
    plt.xlabel('Generation/Epoch')
    plt.ylabel('Value')
    plt.legend()
    plt.grid(True)
    plt.show()

# Plot evolutionary training progress
evo_metrics = {
    'Best Fitness': trainer.get_metrics('best_fitness'),
    'Average Fitness': trainer.get_metrics('avg_fitness')
}
plot_training_curves(evo_metrics, 'Evolutionary Training Progress')

# Plot meta-learning progress
meta_metrics = {
    'Meta Loss': meta_trainer.get_metrics('meta_loss'),
    'Adaptation Rate': meta_trainer.get_metrics('adaptation_rate')
}
plot_training_curves(meta_metrics, 'Meta-Learning Progress')

# Plot curriculum learning progress
curr_metrics = {}
for diff in TaskDifficulty:
    curr_metrics[f'{diff.value}_performance'] = trainer.get_metrics(f'performance_{diff.value}')
plot_training_curves(curr_metrics, 'Curriculum Learning Progress')

## 5. Advanced Environment Interactions 🌍

Demonstrate complex environment handling:

In [None]:
# Create multi-agent environment
class TeamEnvironment(MultiAgentEnv):
    def __init__(self, num_agents=3):
        super().__init__(num_agents)
        self.agents = [gym.make('LunarLander-v2') for _ in range(num_agents)]
    
    def step(self, actions):
        results = [env.step(a) for env, a in zip(self.agents, actions)]
        states, rewards, dones, infos = zip(*results)
        return list(states), list(rewards), list(dones), {}
    
    def reset(self):
        return [env.reset() for env in self.agents]

# Train team of agents
team_env = TeamEnvironment(num_agents=3)
team_config = AdvancedTrainingConfig(
    num_agents=3,
    self_play=True
)

team_trainer = AdvancedTrainer(
    config=team_config,
    base_agent=base_agent,
    train_env=team_env,
    experiment_name="team_training"
)

# Train the team
team_agents = team_trainer.train_population(
    num_generations=30,
    steps_per_generation=800
)

## Next Steps 🎯

Try experimenting with:
1. Different environments and tasks
2. Custom evolutionary strategies
3. More complex meta-learning setups
4. Advanced curriculum design
5. Custom multi-agent scenarios

For more examples and documentation, check out:
- [Documentation](../../docs/)
- [Example Notebooks](../notebooks/)
- [GitHub Repository](https://github.com/happyvibess/mlflow-assist)

If this notebook helped you, consider [buying me a coffee](https://www.buymeacoffee.com/happyvibess)! ☕️