## REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction

### Setup
Register for an [OpenAI API key](https://openai.com/blog/openai-api/) to use GPT-4 (there's a free trial) and enter it below.

In [1]:
from dotenv import load_dotenv
import os

load_dotenv()
API_KEY = os.getenv('OPENAI_API_KEY')
API_BASE_URL = os.getenv('OPENAI_API_BASE_URL')

In [3]:
# Add HuggingFace mirror endpoint to avoid connection timeout issues when downloading models
# os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"

# Add proxy settings to avoid connection issues
os.environ["https_proxy"] = "http://127.0.0.1:15732"
os.environ["http_proxy"] = "http://127.0.0.1:15732" 
os.environ["all_proxy"] = "socks5://127.0.0.1:15732"

In [None]:
%cd main
%pwd

In [None]:
import json
from IPython.display import HTML
from base64 import b64encode

from main.gen_data import *
from main.data import load_data
from main.exp import *
from main.execute_replan import run_correction
from LLM.prompt import LLMPrompter

# You may change the GPT version here
llm_prompter = LLMPrompter(gpt_version="gpt-4o", api_key=API_KEY, base_url=API_BASE_URL)

with open('tasks.json') as f:
    tasks = json.load(f)

def show_video(video_path, video_width=300):
  video_file = open(video_path, "r+b").read()
  video_url = f"data:video/mp4;base64,{b64encode(video_file).decode()}"
  return HTML(f"""<video width={video_width} controls><source src="{video_url}"></video>""")

### Data Generation in Simulation
We provide a few example task configurations in ``tasks.json``. Let's take the first one as an example.

The robot task is to "make coffee" and the robot failed because it cannot "put the mug inside the coffee machine because there was already a cup inside it, occupying the space".

In [5]:
task_info = {
    "name": "make coffee", # name of the task
    "task_idx": 5, # index of the task, defined in TASK_LIST in constants.py
    "num_samples": 1, # the number of samples to generate, this applies for randomly injected failures (e.g. see "Task 6" in tasks.json, to automatically generate two dropping failures which occurred at different times)
    "failure_injection": False, # whether to inject failures manaully or automatically
    "folder_name": "makeCoffee-1", # name of the folder to save data
    "scene": "FloorPlan16", # scene id as in ai2thor
    "chosen_failure": "occupied", # selected failure type, can also be blocking, occupied_put, ambiguous_plan, wrong_perception, drop, failed_action, and missing_step. See tasks.json for examples.
    "gt_failure_reason": "The robot failed to put the mug inside the coffee machine because there was already a cup inside it, occupying the space.",
    "gt_failure_step": "00:51",
    "preactions": [ # actions taken to set object state before task execution
        "(dirty_obj, Mug)"
    ],
    "failure_injection_params" : { # parameters used to configure the environment according to chosen_failure
        "src_obj_type": "Cup",
        "target_obj_type": "CoffeeMachine",
        "disp_x": 0.0,
        "disp_z": 0.05,
        "disp_y": 0.02
    },
    "actions": [ # the original robot plan
        "(navigate_to_obj, Mug)",
        "(pick_up, Mug)",
        "(navigate_to_obj, Sink)",
        "(put_on, Mug, SinkBasin)",
        "(toggle_on, Faucet)",
        "(toggle_off, Faucet)",
        "(pick_up, Mug)",
        "(pour, Mug, Sink)",
        "(navigate_to_obj, CoffeeMachine)",
        "(put_in, Mug, CoffeeMachine)",
        "(toggle_on, CoffeeMachine)",
        "(toggle_off, CoffeeMachine)",
        "(pick_up, Mug)",
        "(put_on, Mug, CounterTop)"
    ],
    "success_condition": "a clean mug is filled with coffee and on top of the countertop."
}

In [None]:
# the data will be saved under {data_path}/thor_tasks/makeCoffee/makeCoffee-1.
run_data_gen(data_path=os.getcwd(), task=task_info)

In [None]:
FOLDER_NAME = 'makeCoffee/makeCoffee-1'  # specify the task folder name here. In this example, it's makeCoffee/makeCoffee-1.
show_video(f'thor_tasks/{FOLDER_NAME}/original-video.mp4')

### Hierarchical Summary
Once we have the task execution data, we can generate a hierarchical summary of the robot experiences. The summary contains 3 levels:
1. The sensory input summary will convert raw sensory data (RGB-D, audio, robot state) into a structured format. 
2. The event-based summary is composed of captions for selected key event frames (e.g. changes in visual scene graph, audio event, robot event). 
3. The subgoal-based summary is composed of end frame of each subgoal to facilitate faster failure localization.

Please check `main/state_summary/makeCoffee/makeCoffee-1` for the generated summaries.

In [None]:
WITH_AUDIO = 1 # 1: using audio deteceted with wav2clip, 0: using ground truth audio information
events, task, object_list, interact_actions, nav_actions = load_data(f"thor_tasks/{FOLDER_NAME}", task_info)
print(len(events))

# Sensory-input summary
detected_sounds = []
if WITH_AUDIO == 1:
    detected_sounds = run_sound_module(FOLDER_NAME, object_list)
generate_scene_graphs(FOLDER_NAME, events, object_list, nav_actions, interact_actions, WITH_AUDIO, detected_sounds)
with open(f'state_summary/{FOLDER_NAME}/global_sg.pkl', 'rb') as f:
    global_sg = pickle.load(f)
    print("================ Global SG ================")
    print(global_sg)

# Event-based summary & Subgoal-based summary
generate_summary(FOLDER_NAME, events, nav_actions, interact_actions, WITH_AUDIO, detected_sounds)

### Failure Reasoning and Correction with LLM
`LLM/prompts.json` contains all prompt template used to query GPT-4.

The failure explanation and correction plan generated by LLM will automatically be saved under `LLM/makeCoffee/makeCoffee-1/response.json`. You can disable auto-save by changing the "save" parameter in `LLM/prompts.json` to "false" for these two queries.

In [None]:
run_reasoning(FOLDER_NAME, llm_prompter, global_sg)

In [None]:
generate_replan(FOLDER_NAME, llm_prompter, global_sg, events[-1], object_list)
run_correction(data_path=os.getcwd(), f_name=FOLDER_NAME)

In [None]:
show_video(f'recovery/{FOLDER_NAME}/recovery-video.mp4')

### Another example
Let's see another example. The robot task is to "boil water" and the robot failed because "it missed the step to pick up the pot from sink before moving to stove burner".

In [None]:
task_info = tasks["Task 2"]
task_info

In [None]:
run_data_gen(data_path=os.getcwd(), task=task_info)

In [None]:
FOLDER_NAME = 'boilWater/boilWater-1'
show_video(f'thor_tasks/{FOLDER_NAME}/original-video.mp4')

In [None]:
WITH_AUDIO = 1
events, task, object_list, interact_actions, nav_actions = load_data(f"thor_tasks/{FOLDER_NAME}", task_info)
print(len(events))

# ===========sensory-input summary============
detected_sounds = []
if WITH_AUDIO == 1:
    detected_sounds = run_sound_module(FOLDER_NAME, object_list)
generate_scene_graphs(FOLDER_NAME, events, object_list, nav_actions, interact_actions, WITH_AUDIO, detected_sounds)
with open(f'state_summary/{FOLDER_NAME}/global_sg.pkl', 'rb') as f:
    global_sg = pickle.load(f)
    print("================ Global SG ================")
    print(global_sg)

# ===========event-based summary & subgoal-based summary============
generate_summary(FOLDER_NAME, events, nav_actions, interact_actions, WITH_AUDIO, detected_sounds)

In [None]:
run_reasoning(FOLDER_NAME, llm_prompter, global_sg)

In [None]:
generate_replan(FOLDER_NAME, llm_prompter, global_sg, events[-1], object_list)
run_correction(data_path=os.getcwd(), f_name=FOLDER_NAME)

In [None]:
show_video(f'recovery/{FOLDER_NAME}/recovery-video.mp4')