# MARL for Distributed Optimization

## Introduction

In this notebook, we'll explore how Multi-Agent Reinforcement Learning (MARL) can be applied to distributed optimization problems. This approach is particularly relevant for app modernization scenarios where different components of a system need to be optimized simultaneously, taking into account their interdependencies.

## Setup

First, let's import the necessary libraries and set up our environment.

In [13]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt

# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)
torch.autograd.set_detect_anomaly(True)

<torch.autograd.anomaly_mode.set_detect_anomaly at 0x79d46bf5bbe0>

## Implementing Distributed Optimization Agents

We'll create a simple environment where multiple agents need to collaborate to optimize a global objective function. Each agent will be responsible for optimizing a subset of parameters.

In [14]:
class OptimizationAgent(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(OptimizationAgent, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        self.fc3 = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return self.fc3(x)

class DistributedOptimizationEnvironment:
    def __init__(self, num_agents, param_dim, target_value):
        self.num_agents = num_agents
        self.param_dim = param_dim
        self.target_value = target_value
        self.reset()
        
    def reset(self):
        self.global_params = torch.rand(self.param_dim) * 2 - 1  # Initialize between -1 and 1
        return self.global_params.repeat(self.num_agents, 1)
    
    def step(self, actions):
        # Initialize a new global_params tensor
        new_global_params = self.global_params.clone()
        for i, action in enumerate(actions):
            start_idx = i * (self.param_dim // self.num_agents)
            end_idx = (i + 1) * (self.param_dim // self.num_agents)
            # Update without in-place operation
            new_global_params[start_idx:end_idx] = new_global_params[start_idx:end_idx] + action
        
        # Clip parameters to be between -1 and 1
        new_global_params = torch.clamp(new_global_params, -1, 1)
        
        # Calculate reward based on distance to target value
        reward = -torch.abs(self.target_value - torch.sum(new_global_params))
        done = torch.abs(self.target_value - torch.sum(new_global_params)) < 0.1
        
        # Update the global parameters
        self.global_params = new_global_params.detach()
        
        return self.global_params.repeat(self.num_agents, 1), reward.repeat(self.num_agents), done

# Hyperparameters
num_agents = 4
param_dim = 20
hidden_dim = 64
target_value = 5.0

# Initialize agents and environment
agents = [OptimizationAgent(param_dim, hidden_dim, param_dim // num_agents) for _ in range(num_agents)]
env = DistributedOptimizationEnvironment(num_agents, param_dim, target_value)

## Training Loop

Now, let's implement a training loop where agents learn to collaboratively optimize the global objective.

In [15]:
def train_agents(num_episodes, max_steps):
    optimizers = [optim.Adam(agent.parameters(), lr=0.001) for agent in agents]
    episode_rewards = []
    
    for episode in range(num_episodes):
        states = env.reset()
        episode_reward = 0

        for step in range(max_steps):
            actions = [agent(states[i].unsqueeze(0)).squeeze(0) for i, agent in enumerate(agents)]
            next_states, rewards, done = env.step(actions)
            episode_reward += rewards[0].item()

            # Aggregate losses
            total_loss = 0
            for i, agent in enumerate(agents):
                total_loss += -rewards[i]

            # Update agents collectively
            for optimizer in optimizers:
                optimizer.zero_grad()
            total_loss.backward()
            for optimizer in optimizers:
                optimizer.step()

            states = next_states  # Remove .detach()
            if done:
                break

        episode_rewards.append(episode_reward)
        if episode % 100 == 0:
            print(f"Episode {episode}, Avg Reward: {np.mean(episode_rewards[-100:]):.2f}")

    return episode_rewards


# Train the agents
num_episodes = 1000
max_steps = 50
rewards = train_agents(num_episodes, max_steps)

# Plot the learning curve
plt.plot(rewards)
plt.title("Learning Curve")
plt.xlabel("Episode")
plt.ylabel("Total Reward")
plt.show()

  File "/home/ethan/anaconda3/envs/test/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ethan/anaconda3/envs/test/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ethan/anaconda3/envs/test/lib/python3.10/site-packages/ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "/home/ethan/anaconda3/envs/test/lib/python3.10/site-packages/traitlets/config/application.py", line 1075, in launch_instance
    app.start()
  File "/home/ethan/anaconda3/envs/test/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 739, in start
    self.io_loop.start()
  File "/home/ethan/anaconda3/envs/test/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 205, in start
    self.asyncio_loop.run_forever()
  File "/home/ethan/anaconda3/envs/test/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/home/ethan/anacond

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 5]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

## Evaluating the Optimized Solution

Let's evaluate the final solution found by our agents and visualize how close it is to the target value.

In [None]:
def evaluate_solution():
    states = env.reset()
    actions = [agent(states[i].unsqueeze(0)).squeeze(0) for i, agent in enumerate(agents)]
    final_states, _, _ = env.step(actions)
    
    final_value = torch.sum(env.global_params).item()
    
    print(f"Target value: {env.target_value}")
    print(f"Final optimized value: {final_value}")
    print(f"Difference: {abs(env.target_value - final_value)}")
    
    plt.figure(figsize=(10, 5))
    plt.bar(range(param_dim), env.global_params.numpy())
    plt.title("Final Parameter Values")
    plt.xlabel("Parameter Index")
    plt.ylabel("Value")
    plt.axhline(y=env.target_value / param_dim, color='r', linestyle='--', label='Target Average')
    plt.legend()
    plt.show()

evaluate_solution()

## Conclusion

In this notebook, we implemented a MARL approach to distributed optimization. We saw how multiple agents can learn to collaboratively optimize a global objective function by each focusing on a subset of parameters. This approach has several potential applications in app modernization:

1. Optimizing multiple components of a large-scale application simultaneously
2. Balancing resource allocation across different microservices
3. Tuning hyperparameters of distributed machine learning models in modernized AI/ML pipelines

Future work could involve:
- Implementing more complex objective functions that better represent real-world app modernization challenges
- Exploring different communication strategies between agents to improve coordination
- Applying this approach to specific app modernization tasks, such as optimizing database queries or load balancing

## References

1. Qu, G., & Li, N. (2019). Harnessing smoothness to accelerate distributed optimization. IEEE Transactions on Control of Network Systems, 7(1), 19-29.
2. Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual multi-agent policy gradients. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 32, No. 1).