# Code Optimization: Robot Controller for Pick-and-Place tasks

## Introduction

This tutorial demonstrates how to use the `trace` package to optimize a policy for a simulated robot performing a pick-and-place task.

## Setup and Installation

This example requires [LLF-Bench](https://github.com/microsoft/LLF-Bench) in addition to `trace`. You can install them as follows

    git clone https://github.com/microsoft/LLF-Bench.git
    cd LLF-Bench
    pip install -e .[metaworld]

Let's start by importing the necessary libraries.

In [1]:
from autogen import config_list_from_json
import llfbench
import random
import os
import pickle
import numpy as np
from datetime import datetime
import opto.trace as trace
from opto.optimizers import FunctionOptimizerV2Memory
from opto.trace.bundle import ExceptionNode
from opto.trace.errors import ExecutionError

## Environment Setup

Define the environment and helper functions to parse observations.

In [2]:
def parse_obs(obs):
    """Parse the observation string into a dictionary of lists of floats."""
    import json

    obs = json.loads(obs)
    for key in obs:
        obs[key] = obs[key].replace("[", "").replace("]", "").split()
        obs[key] = [float(i) for i in obs[key]]
    return obs

class TracedEnv:
    def __init__(self, env_name, seed=0, relative=True):
        self.seed = seed
        self.env_name = env_name
        self.relative = relative
        self.init()

    def init(self):
        random.seed(self.seed)
        np.random.seed(self.seed)
        self.env = llfbench.make(self.env_name)
        self.env.reset(seed=self.seed)
        self.env.action_space.seed(self.seed)
        self.env.control_mode("relative" if self.relative else "absolute")
        self.obs = None

    @trace.bundle()
    def reset(self):
        """Reset the environment and return the initial observation."""
        obs, info = self.env.reset()
        obs["observation"] = parse_obs(obs["observation"])
        self.obs = obs
        return obs, info

    def step(self, action):
        try:
            control = action.data if isinstance(action, trace.Node) else action
            next_obs, reward, termination, truncation, info = self.env.step(control)
            next_obs["observation"] = parse_obs(next_obs["observation"])
            self.obs = next_obs
        except Exception as e:
            e_node = ExceptionNode(
                e,
                inputs={"action": action},
                description="[exception] The operator step raises an exception.",
                name="exception_step",
            )
            raise ExecutionError(e_node)

        @trace.bundle()
        def step(action):
            return next_obs

        next_obs = step(action)
        return next_obs, reward, termination, truncation, info

## Rollout Function

Define a function to perform a rollout using the current policy.

In [3]:
def rollout(env, horizon, controller):
    """Rollout a controller in an env for horizon steps."""
    traj = dict(observation=[], action=[], reward=[], termination=[], truncation=[], success=[], info=[])
    obs, info = env.reset()
    traj["observation"].append(obs)

    for t in range(horizon):
        controller_input = obs["observation"]
        error = None
        try:
            action = controller(controller_input)
            next_obs, reward, termination, truncation, info = env.step(action)
        except trace.TraceExecutionError as e:
            error = e
            break

        if error is None:
            traj["observation"].append(next_obs)
            traj["action"].append(action)
            traj["reward"].append(reward)
            traj["termination"].append(termination)
            traj["truncation"].append(truncation)
            traj["success"].append(info["success"])
            traj["info"].append(info)
            if termination or truncation or info["success"]:
                break
            obs = next_obs
    return traj, error

## Optimize using Trace

Define the function to optimize the policy using the Trace package.

In [4]:
def optimize_policy(
    env_name,
    horizon,
    n_episodes=10,
    n_optimization_steps=100,
    seed=0,
    relative=True,
    feedback_type="a",
    verbose=False,
    model="gpt-4-0125-preview",
):

    @trace.bundle(trainable=True)
    def controller(obs):
        """A feedback controller that computes the action based on the observation."""
        return [0, 0, 0, 0]

    optimizer = FunctionOptimizerV2Memory(controller.parameters(), config_list=config_list_from_json("OAI_CONFIG_LIST"))

    env = TracedEnv(env_name, seed=seed, relative=relative)

    print("Optimization Starts")
    for i in range(n_optimization_steps):
        env.init()
        traj, error = rollout(env, horizon, controller)

        if error is None:
            feedback = f"Success: {traj['success'][-1]}\nReturn: {sum(traj['reward'])}"
            target = traj["observation"][-1]["observation"]
            returns = [sum(traj["reward"]) for _ in range(n_episodes)]
        else:
            feedback = str(error)
            target = error.exception_node

        optimizer.objective = f"The goal is to optimize the pick-and-place task. {optimizer.default_objective}"
        optimizer.zero_feedback()
        optimizer.backward(target, feedback)
        optimizer.step(verbose=verbose)

        print(f"Iteration: {i}, Feedback: {feedback}, Parameter: {controller.parameter.data}")

    returns = [sum(traj["reward"]) for _ in range(n_episodes)]
    print("Final Returns:", returns)

## Execute the Optimization Process


In [5]:
optimize_policy(
    env_name="llf-metaworld-pick-place-v2",
    horizon=10,
    n_episodes=10,
    n_optimization_steps=100,
    seed=0,
    relative=True,
    feedback_type="a",
    verbose=True,
    model="gpt-4-0125-preview"
)

AttributeError: module 'autogen' has no attribute 'OpenAIWrapper'

## Running the Notebook

Now, you can run each cell of the notebook step-by-step to see how the simulation and optimization are performed. You can modify the parameters and observe the effects on the optimization process.

In [10]:
demand_dict = create_demand(0.25)
returned_val = run_approach(10, trace_memory=0)
print(returned_val)

Prompt
 
You're tasked to solve a coding/algorithm problem. You will see the instruction, the code, the documentation of each function used in the code, and the feedback about the execution result.

Specifically, a problem will be composed of the following parts:
- #Instruction: the instruction which describes the things you need to do or the question you should answer.
- #Code: the code defined in the problem.
- #Documentation: the documentation of each function used in #Code. The explanation might be incomplete and just contain high-level description. You can use the values in #Others to help infer how those functions work.
- #Variables: the input variables that you can change.
- #Constraints: the constraints or descriptions of the variables in #Variables.
- #Inputs: the values of other inputs to the code, which are not changeable.
- #Others: the intermediate values created through the code execution.
- #Outputs: the result of the code output.
- #Feedback: the feedback about the code

This completes the tutorial on using the Trace package for numerical optimization in a traffic simulation. Happy optimizing!