# Generating Executable Action Plans with Environmentally-Aware Language Models

### Official code for the paper [Generating Executable Action Plans with Environmentally-Aware Language Models](https://arxiv.org/abs/2210.04964).

Large Language Models (LLMs) trained using massive text datasets have recently shown promise in generating action plans for robotic agents from high level text queries. However, these models typically do not consider the robot's environment, resulting in generated plans that may not actually be executable due to ambiguities in the planned actions or environmental constraints. In this paper, we propose an approach to generate environmentally-aware action plans that can be directly mapped to executable agent actions. Our approach involves integrating environmental objects and object relations as additional inputs into LLM action plan generation to provide the system with an awareness of its surroundings, resulting in plans where each generated action is mapped to objects present in the scene. We also design a novel scoring function that, along with generating the action steps and associating them with objects, helps the system disambiguate among object instances and take into account their states. We evaluate our approach using the VirtualHome simulator and the ActivityPrograms knowledge base. Our results show that the action plans generated from our system outperform prior work in terms of their correctness and executability by 5.3% and 8.9% respectively.

## 1. Setup

In [1]:
!git clone https://github.com/hri-ironlab/scene_aware_language_planner.git
%cd scene_aware_language_planner/
!pip install -r requirements.txt
%cd src

Cloning into 'scene_aware_language_planner'...
remote: Enumerating objects: 66, done.[K
remote: Counting objects: 100% (66/66), done.[K
remote: Compressing objects: 100% (52/52), done.[K
remote: Total 66 (delta 11), reused 55 (delta 8), pack-reused 0[K
Unpacking objects: 100% (66/66), 30.47 MiB | 6.46 MiB/s, done.
/content/scene_aware_language_planner
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.27.4-py3-none-any.whl (6.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.8/6.8 MB[0m [31m61.3 MB/s[0m eta [36m0:00:00[0m
Collecting sentence-transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting openai
  Downloading openai-0.27.4-py3-none-any.whl (70 kB)


In [2]:
import openai
import numpy as np
import torch
from sentence_transformers import SentenceTransformer
from sentence_transformers import util as st_utils
# from evaluate import load
import pickle
import json
import re
import copy
from tqdm import tqdm
import os
import random
from pathlib import Path
import sys
import datetime
import pprint
import gc
import time

sys.path.append('../datasets/')

import add_preconds
import check_programs

GPU = 0
if torch.cuda.is_available():
  torch.cuda.set_device(GPU)
OPENAI_KEY = ""  # replace this with your OpenAI API key, if you choose to use OpenAI API

  elif last_room is not None and action is not 'PutOff':


## 2. Models

### Model Hyperparameters

In [5]:
source = 'huggingface'  # select from ['openai', 'huggingface']
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
planning_lm_id = 'gpt2-large'
translation_lm_id = 'sentence-transformers/all-roberta-large-v1'

### Planning LM Initialization

Available language models for **Planning LM** can be found from:
- [Huggingface Transformers](https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads)
- [OpenAI API](https://beta.openai.com/docs/engines) (you would need to paste your OpenAI API key from your account to `openai.api_key` below)

In [6]:
def lm_engine(source, planning_lm_id, device):
  if source == 'huggingface':
    from transformers import AutoModelForCausalLM, AutoTokenizer
    tokenizer = AutoTokenizer.from_pretrained(planning_lm_id)
    model = AutoModelForCausalLM.from_pretrained(planning_lm_id, pad_token_id=tokenizer.eos_token_id).to(device)

  def _generate(prompt, sampling_params, max_tokens, stop):
    if source == 'openai':
      response = openai.Completion.create(engine=planning_lm_id, prompt=prompt, **sampling_params)
      generated_samples = [response['choices'][i]['text'] for i in range(sampling_params['n'])]
      # calculate mean log prob across tokens
      mean_log_probs = [np.mean(response['choices'][i]['logprobs']['token_logprobs']) for i in range(sampling_params['n'])]
    elif source == 'huggingface':
      input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
      prompt_len = input_ids.shape[-1]

      output_dict = model.generate(input_ids, max_length=prompt_len + max_tokens, **sampling_params)
      # discard the prompt (only take the generated text)
      generated_samples = tokenizer.batch_decode(output_dict.sequences[:, prompt_len:])
      # calculate per-token logprob
      vocab_log_probs = torch.stack(output_dict.scores, dim=1).log_softmax(-1)  # [n, length, vocab_size]
      token_log_probs = torch.gather(vocab_log_probs, 2, output_dict.sequences[:, prompt_len:, None]).squeeze(-1).tolist()  # [n, length]
      # truncate each sample if it contains '\n' (the current step is finished)
      # e.g. 'open fridge\n<|endoftext|>' -> 'open fridge'
      for i, sample in enumerate(generated_samples):
        stop_idx = sample.index(stop) if stop in sample else None
        generated_samples[i] = sample[:stop_idx]
        token_log_probs[i] = token_log_probs[i][:stop_idx]
      # calculate mean log prob across tokens
      mean_log_probs = [np.mean(token_log_probs[i]) for i in range(sampling_params['num_return_sequences'])]
    generated_samples = [sample.strip().lower() for sample in generated_samples]
    return generated_samples, mean_log_probs

  return _generate, tokenizer, model

# generator = lm_engine(source, planning_lm_id, device)
generator, tokenizer, model = lm_engine(source, planning_lm_id, device)

Downloading (…)lve/main/config.json:   0%|          | 0.00/666 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/3.25G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

### Translation LM Initialization

Available language models for **Translation LM** can be found from:
- [Sentence Transformers](https://huggingface.co/sentence-transformers)

In [7]:
translation_lm = SentenceTransformer(translation_lm_id).to(device)

Downloading (…)eaf99/.gitattributes:   0%|          | 0.00/737 [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

Downloading (…)a0f59eaf99/README.md:   0%|          | 0.00/9.84k [00:00<?, ?B/s]

Downloading (…)f59eaf99/config.json:   0%|          | 0.00/650 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)f99/data_config.json:   0%|          | 0.00/15.7k [00:00<?, ?B/s]

Downloading (…)0f59eaf99/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading (…)eaf99/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/328 [00:00<?, ?B/s]

Downloading (…)af99/train_script.py:   0%|          | 0.00/13.1k [00:00<?, ?B/s]

Downloading (…)0f59eaf99/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)59eaf99/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

## 3. Download and Preprocess Dataset

The entire dataset can be downloaded and formatted using these instructions. However, you don't need to do this step just to run the demo.

The following code has been tested for a macbook, but it should also run for Linux/Windows with small modifications.

1. Download programs_processed_precond_nograb_morepreconds.zip from http://virtual-home.org/release/programs/programs_processed_precond_nograb_morepreconds.zip
2. unzip programs_processed_precond_nograb_morepreconds.zip to Scene_Aware_Language_Planner/dataset/
3. Download Virtual-Home executable v1.0 from https://github.com/xavierpuigf/virtualhome/releases/tag/v1.0.0
4. Start the Virtual-Home executable
5. Run Scene_Aware_Language_Planner/make_dataset/make_dataset.py

## 4. Load Dataset from Scratch

The formatted dataset can be loaded from scratch using this code. This step is not required for the demo.

### Load setup

In [None]:
%cd ../datasets/

/content/drive/MyDrive/Scene_Aware_Language_Planner/datasets


In [None]:
# Load tasks
tasks_file = open("tasks.txt")
tasks = tasks_file.read().split('\n')[:-1]
random.shuffle(tasks)

validation_tasks_init = tasks[:25]
test_tasks_init = tasks[25:125]
example_tasks_init = tasks[125:]

In [None]:
len(tasks), len(example_tasks_init), len(validation_tasks_init), len(test_tasks_init)

(285, 160, 25, 100)

### Load Action embeddings

In [None]:
%%time
# create action embeddings using Translated LM
with open('available_actions.json', 'r') as f:
    action_list = json.load(f)

action_list_embeddings = translation_lm.encode(action_list, batch_size=512, convert_to_tensor=True, device=device)  # lower batch_size if limited by GPU memory
len(action_list_embeddings)

CPU times: user 22.3 s, sys: 421 ms, total: 22.8 s
Wall time: 20.1 s


### Load Example Dataset

In [None]:
%%time
# Load example tasks and action plans
robot_ap_paths = list(Path('action_plans_NL').rglob("*.txt"))
random.shuffle(robot_ap_paths)

example_tasks = []
example_aps = []
example_ap_paths = []
temp_example_tasks_set = set(copy.deepcopy(example_tasks_init))
for ap_path in tqdm(robot_ap_paths):
  task = str(ap_path).split('/')[3]
  if task not in temp_example_tasks_set:
    continue
  
  temp_example_tasks_set.remove(task)
  example_tasks.append(task)
  example_ap_paths.append(ap_path)

  program = open(ap_path).read().split('\n')
  action_plan = []
  object_plan = []
  object_ids = []
  for step in program:
    step = step.split(" - ")
    action_plan.append(step[0])
    step[1] = step[1].replace(' ', '').split("(")
    object_plan.append(step[1][0].split(","))
    object_ids.append(step[1][1][:-1].split(", "))
  example_aps.append([action_plan, object_plan, object_ids])

100%|██████████| 6020/6020 [00:00<00:00, 276185.53it/s]

CPU times: user 77.2 ms, sys: 23.1 ms, total: 100 ms
Wall time: 100 ms





In [None]:
%%time
# Load example init graphs
example_init_graphs = []
for ap_path in tqdm(example_ap_paths):
  ap_path = str(ap_path)
  task = ap_path.split("/")[3]
  graph_name = ap_path.split("/")[-1].replace('txt', 'json')
  graph_path = ap_path.replace('action_plans_NL', 'init_graphs').replace('txt', 'json')

  init_graph = json.load(open(graph_path))
  init_graph['nodes_summary'] = {}
  init_graph['edges_summary'] = {}
  init_graph['nodes_num'] = len(init_graph['nodes'])
  init_graph['edges_num'] = len(init_graph['edges'])

  graph_nodes = {}
  for node_id in range(init_graph['nodes_num']):
    graph_nodes[init_graph['nodes'][node_id]['id']] = init_graph['nodes'][node_id]
  
  for node_id in range(init_graph['nodes_num']):
    if init_graph['nodes'][node_id]['class_name'] in init_graph['nodes_summary']:
      init_graph['nodes_summary'][init_graph['nodes'][node_id]['class_name']].append((set(init_graph['nodes'][node_id]['properties']), set(init_graph['nodes'][node_id]['states'])))
    else:
      init_graph['nodes_summary'][init_graph['nodes'][node_id]['class_name']] = [(set(init_graph['nodes'][node_id]['properties']), set(init_graph['nodes'][node_id]['states']))]

  for edge_id in range(init_graph['edges_num']):
    init_graph['edges'][edge_id]['from_name'] = graph_nodes[init_graph['edges'][edge_id]['from_id']]['class_name']
    init_graph['edges'][edge_id]['to_name'] = graph_nodes[init_graph['edges'][edge_id]['to_id']]['class_name']

  for edge_id in range(init_graph['edges_num']):
    if (init_graph['edges'][edge_id]['from_name'], init_graph['edges'][edge_id]['to_name'], init_graph['edges'][edge_id]['relation_type']) in init_graph['edges_summary']:
      init_graph['edges_summary'][(init_graph['edges'][edge_id]['from_name'], init_graph['edges'][edge_id]['to_name'], init_graph['edges'][edge_id]['relation_type'])] += 1
    else:
      init_graph['edges_summary'][(init_graph['edges'][edge_id]['from_name'], init_graph['edges'][edge_id]['to_name'], init_graph['edges'][edge_id]['relation_type'])] = 1
  example_init_graphs.append(init_graph)

100%|██████████| 160/160 [00:04<00:00, 33.61it/s]

CPU times: user 4.33 s, sys: 486 ms, total: 4.81 s
Wall time: 4.77 s





In [None]:
%%time
# create example task embeddings using Translated LM
example_task_embeddings = translation_lm.encode(example_tasks, batch_size=512, convert_to_tensor=True, device=device)  # lower batch_size if limited by GPU memory

CPU times: user 43.3 ms, sys: 3.05 ms, total: 46.4 ms
Wall time: 34.4 ms


### Load Validation Dataset

In [None]:
%%time
# Load validation tasks and action plans
robot_ap_paths = list(Path('action_plans_NL').rglob("*.txt"))
val_repeat = 1

validation_tasks = []
val_aps = []
val_ap_paths = []
for _ in range(val_repeat):
  random.shuffle(robot_ap_paths)
  temp_val_tasks_set = set(copy.deepcopy(validation_tasks_init))
  for ap_path in tqdm(robot_ap_paths):
    task = str(ap_path).split('/')[3]
    if task not in temp_val_tasks_set:
      continue
    
    temp_val_tasks_set.remove(task)
    validation_tasks.append(task)
    val_ap_paths.append(ap_path)

    program = open(ap_path).read().split('\n')
    action_plan = []
    object_plan = []
    object_ids = []
    for step in program:
      step = step.split(" - ")
      action_plan.append(step[0])
      step[1] = step[1].replace(' ', '').split("(")
      object_plan.append(step[1][0].split(","))
      object_ids.append(step[1][1][:-1].split(", "))
    val_aps.append([action_plan, object_plan, object_ids])

100%|██████████| 6020/6020 [00:00<00:00, 409505.67it/s]
100%|██████████| 6020/6020 [00:00<00:00, 743010.04it/s]
100%|██████████| 6020/6020 [00:00<00:00, 743163.12it/s]
100%|██████████| 6020/6020 [00:00<00:00, 766978.83it/s]
100%|██████████| 6020/6020 [00:00<00:00, 753587.72it/s]

CPU times: user 126 ms, sys: 40.3 ms, total: 166 ms
Wall time: 162 ms





In [None]:
%%time
# Load validation init graphs
val_init_graphs = []
for ap_path in tqdm(val_ap_paths):
  ap_path = str(ap_path)
  task = ap_path.split("/")[3]
  graph_name = ap_path.split("/")[-1].replace('txt', 'json')
  graph_path = ap_path.replace('action_plans_NL', 'init_graphs').replace('txt', 'json')

  init_graph = json.load(open(graph_path))
  init_graph['nodes_summary'] = {}
  init_graph['edges_summary'] = {}
  init_graph['nodes_num'] = len(init_graph['nodes'])
  init_graph['edges_num'] = len(init_graph['edges'])

  graph_nodes = {}
  for node_id in range(init_graph['nodes_num']):
    graph_nodes[init_graph['nodes'][node_id]['id']] = init_graph['nodes'][node_id]
  
  for node_id in range(init_graph['nodes_num']):
    if init_graph['nodes'][node_id]['class_name'] in init_graph['nodes_summary']:
      init_graph['nodes_summary'][init_graph['nodes'][node_id]['class_name']].append((set(init_graph['nodes'][node_id]['properties']), set(init_graph['nodes'][node_id]['states'])))
    else:
      init_graph['nodes_summary'][init_graph['nodes'][node_id]['class_name']] = [(set(init_graph['nodes'][node_id]['properties']), set(init_graph['nodes'][node_id]['states']))]

  for edge_id in range(init_graph['edges_num']):
    init_graph['edges'][edge_id]['from_name'] = graph_nodes[init_graph['edges'][edge_id]['from_id']]['class_name']
    init_graph['edges'][edge_id]['to_name'] = graph_nodes[init_graph['edges'][edge_id]['to_id']]['class_name']

  for edge_id in range(init_graph['edges_num']):
    if (init_graph['edges'][edge_id]['from_name'], init_graph['edges'][edge_id]['to_name'], init_graph['edges'][edge_id]['relation_type']) in init_graph['edges_summary']:
      init_graph['edges_summary'][(init_graph['edges'][edge_id]['from_name'], init_graph['edges'][edge_id]['to_name'], init_graph['edges'][edge_id]['relation_type'])] += 1
    else:
      init_graph['edges_summary'][(init_graph['edges'][edge_id]['from_name'], init_graph['edges'][edge_id]['to_name'], init_graph['edges'][edge_id]['relation_type'])] = 1
  val_init_graphs.append(init_graph)

100%|██████████| 125/125 [00:06<00:00, 19.76it/s]

CPU times: user 5.87 s, sys: 539 ms, total: 6.4 s
Wall time: 6.33 s





In [None]:
%%time
# Load validation objects
val_objects = []
val_object_names = []
for ap_path in tqdm(val_ap_paths):
  ap_path = str(ap_path)
  task = ap_path.split("/")[3]
  graph_name = ap_path.split("/")[-1].replace('txt', 'json')
  objects_path = ap_path.replace('action_plans_NL', 'init_graphs').replace('txt', 'json')

  objects = json.load(open(objects_path))['nodes']
  object_names = list(set([obj['class_name'] for obj in objects]))

  val_objects.append(objects)
  val_object_names.append(object_names)

100%|██████████| 125/125 [00:01<00:00, 90.28it/s]

CPU times: user 1.22 s, sys: 181 ms, total: 1.4 s
Wall time: 1.39 s





In [None]:
# create available object embeddings using Translated LM
val_object_name_embeddings = []
for object_names in tqdm(val_object_names):
  val_object_name_embeddings.append(translation_lm.encode(object_names, batch_size=512, convert_to_tensor=True, device=device))  # lower batch_size if limited by GPU memory

100%|██████████| 125/125 [00:10<00:00, 12.26it/s]


### Load Test Dataset

In [None]:
%%time
# Load test tasks and action plans
robot_ap_paths = list(Path('action_plans_NL').rglob("*.txt"))
test_repeat = 1

test_tasks = []
test_aps = []
test_ap_paths = []
for _ in range(test_repeat):
  random.shuffle(robot_ap_paths)
  temp_test_tasks_set = set(copy.deepcopy(test_tasks_init))
  for ap_path in tqdm(robot_ap_paths):
    task = str(ap_path).split('/')[3]
    if task not in temp_test_tasks_set:
      continue
    
    temp_test_tasks_set.remove(task)
    test_tasks.append(task)
    test_ap_paths.append(ap_path)

    program = open(ap_path).read().split('\n')
    action_plan = []
    object_plan = []
    object_ids = []
    for step in program:
      step = step.split(" - ")
      action_plan.append(step[0])
      step[1] = step[1].replace(' ', '').split("(")
      object_plan.append(step[1][0].split(","))
      object_ids.append(step[1][1][:-1].split(", "))
    test_aps.append([action_plan, object_plan, object_ids])

100%|██████████| 6020/6020 [00:00<00:00, 325328.36it/s]
100%|██████████| 6020/6020 [00:00<00:00, 419520.99it/s]
100%|██████████| 6020/6020 [00:00<00:00, 495170.03it/s]
100%|██████████| 6020/6020 [00:00<00:00, 510023.03it/s]
100%|██████████| 6020/6020 [00:00<00:00, 525882.24it/s]

CPU times: user 134 ms, sys: 48.4 ms, total: 182 ms
Wall time: 177 ms





In [None]:
%%time
# Load test init graphs
test_init_graphs = []
for ap_path in tqdm(test_ap_paths):
  ap_path = str(ap_path)
  task = ap_path.split("/")[3]
  graph_name = ap_path.split("/")[-1].replace('txt', 'json')
  graph_path = ap_path.replace('action_plans_NL', 'init_graphs').replace('txt', 'json')

  init_graph = json.load(open(graph_path))
  init_graph['nodes_summary'] = {}
  init_graph['edges_summary'] = {}
  init_graph['nodes_num'] = len(init_graph['nodes'])
  init_graph['edges_num'] = len(init_graph['edges'])

  graph_nodes = {}
  for node_id in range(init_graph['nodes_num']):
    graph_nodes[init_graph['nodes'][node_id]['id']] = init_graph['nodes'][node_id]
  
  for node_id in range(init_graph['nodes_num']):
    if init_graph['nodes'][node_id]['class_name'] in init_graph['nodes_summary']:
      init_graph['nodes_summary'][init_graph['nodes'][node_id]['class_name']].append((set(init_graph['nodes'][node_id]['properties']), set(init_graph['nodes'][node_id]['states'])))
    else:
      init_graph['nodes_summary'][init_graph['nodes'][node_id]['class_name']] = [(set(init_graph['nodes'][node_id]['properties']), set(init_graph['nodes'][node_id]['states']))]

  for edge_id in range(init_graph['edges_num']):
    init_graph['edges'][edge_id]['from_name'] = graph_nodes[init_graph['edges'][edge_id]['from_id']]['class_name']
    init_graph['edges'][edge_id]['to_name'] = graph_nodes[init_graph['edges'][edge_id]['to_id']]['class_name']

  for edge_id in range(init_graph['edges_num']):
    if (init_graph['edges'][edge_id]['from_name'], init_graph['edges'][edge_id]['to_name'], init_graph['edges'][edge_id]['relation_type']) in init_graph['edges_summary']:
      init_graph['edges_summary'][(init_graph['edges'][edge_id]['from_name'], init_graph['edges'][edge_id]['to_name'], init_graph['edges'][edge_id]['relation_type'])] += 1
    else:
      init_graph['edges_summary'][(init_graph['edges'][edge_id]['from_name'], init_graph['edges'][edge_id]['to_name'], init_graph['edges'][edge_id]['relation_type'])] = 1
  test_init_graphs.append(init_graph)

100%|██████████| 500/500 [00:18<00:00, 26.98it/s]

CPU times: user 17.2 s, sys: 1.48 s, total: 18.7 s
Wall time: 18.5 s





In [None]:
%%time
# Load test objects
test_objects = []
test_object_names = []
for ap_path in tqdm(test_ap_paths):
  ap_path = str(ap_path)
  task = ap_path.split("/")[3]
  graph_name = ap_path.split("/")[-1].replace('txt', 'json')
  objects_path = ap_path.replace('action_plans_NL', 'init_graphs').replace('txt', 'json')

  objects = json.load(open(objects_path))['nodes']
  object_names = list(set([obj['class_name'] for obj in objects]))

  test_objects.append(objects)
  test_object_names.append(object_names)

100%|██████████| 500/500 [00:07<00:00, 64.52it/s]

CPU times: user 6.99 s, sys: 842 ms, total: 7.83 s
Wall time: 7.75 s





In [None]:
# create available object embeddings using Translated LM
test_object_name_embeddings = []
for object_names in tqdm(test_object_names):
  test_object_name_embeddings.append(translation_lm.encode(object_names, batch_size=512, convert_to_tensor=True, device=device))  # lower batch_size if limited by GPU memory

100%|██████████| 500/500 [00:40<00:00, 12.27it/s]


In [None]:
print(len(example_tasks), len(example_aps), len(example_ap_paths), len(example_init_graphs))
print(len(validation_tasks), len(val_aps), len(val_ap_paths), len(val_init_graphs), len(val_objects))
print(len(test_tasks), len(test_aps), len(test_ap_paths), len(test_init_graphs), len(test_objects))

160 160 160 160
125 125 125 125 125
500 500 500 500 500


## 5. Save Dataset Pickles

The loaded dataset can be stored as pickle files using this code for fast reloading. This step is not required for the demo.

### Create Pickles Directory

In [None]:
ds_pickles_dir = "ds_pickles-" + str(datetime.datetime.now())
os.mkdir(ds_pickles_dir)

### Save Example Dataset

In [None]:
with open(ds_pickles_dir + '/example_tasks.pickle', 'wb') as handle:
    pickle.dump(example_tasks, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [None]:
with open(ds_pickles_dir + '/example_aps.pickle', 'wb') as handle:
    pickle.dump(example_aps, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open(ds_pickles_dir + '/example_ap_paths.pickle', 'wb') as handle:
    pickle.dump(example_ap_paths, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [None]:
with open(ds_pickles_dir + '/example_init_graphs.pickle', 'wb') as handle:
    pickle.dump(example_init_graphs, handle, protocol=pickle.HIGHEST_PROTOCOL)

### Save Validation Dataset

In [None]:
with open(ds_pickles_dir + '/validation_tasks.pickle', 'wb') as handle:
    pickle.dump(validation_tasks, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [None]:
with open(ds_pickles_dir + '/val_aps.pickle', 'wb') as handle:
    pickle.dump(val_aps, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open(ds_pickles_dir + '/val_ap_paths.pickle', 'wb') as handle:
    pickle.dump(val_ap_paths, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [None]:
with open(ds_pickles_dir + '/val_init_graphs.pickle', 'wb') as handle:
    pickle.dump(val_init_graphs, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [None]:
with open(ds_pickles_dir + '/val_objects.pickle', 'wb') as handle:
    pickle.dump(val_objects, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open(ds_pickles_dir + '/val_object_names.pickle', 'wb') as handle:
    pickle.dump(val_object_names, handle, protocol=pickle.HIGHEST_PROTOCOL)

### Save Test Dataset

In [None]:
with open(ds_pickles_dir + '/test_tasks.pickle', 'wb') as handle:
    pickle.dump(test_tasks, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [None]:
with open(ds_pickles_dir + '/test_aps.pickle', 'wb') as handle:
    pickle.dump(test_aps, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open(ds_pickles_dir + '/test_ap_paths.pickle', 'wb') as handle:
    pickle.dump(test_ap_paths, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [None]:
with open(ds_pickles_dir + '/test_init_graphs.pickle', 'wb') as handle:
    pickle.dump(test_init_graphs, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [None]:
with open(ds_pickles_dir + '/test_objects.pickle', 'wb') as handle:
    pickle.dump(test_objects, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open(ds_pickles_dir + '/test_object_names.pickle', 'wb') as handle:
    pickle.dump(test_object_names, handle, protocol=pickle.HIGHEST_PROTOCOL)

## 6. Load Dataset from Pickles

Load the dataset for the demo.

### Load setup

In [8]:
%cd ../datasets/

/content/scene_aware_language_planner/datasets


In [10]:
ds_pickles_dir = "ds_pickles_demo"

### Load Action embeddings

In [11]:
%%time
# create action embeddings using Translated LM
with open('available_actions.json', 'r') as f:
    action_list = json.load(f)

action_list_embeddings = translation_lm.encode(action_list, batch_size=512, convert_to_tensor=True, device=device)  # lower batch_size if limited by GPU memory
len(action_list_embeddings)

CPU times: user 1min 11s, sys: 1.01 s, total: 1min 12s
Wall time: 1min 5s


164163

### Load Example Dataset

In [12]:
with open(ds_pickles_dir + '/example_tasks.pickle', 'rb') as handle:
    example_tasks = pickle.load(handle)

In [13]:
with open(ds_pickles_dir + '/example_aps.pickle', 'rb') as handle:
    example_aps = pickle.load(handle)

with open(ds_pickles_dir + '/example_ap_paths.pickle', 'rb') as handle:
    example_ap_paths = pickle.load(handle)

In [14]:
with open(ds_pickles_dir + '/example_init_graphs.pickle', 'rb') as handle:
    example_init_graphs = pickle.load(handle)

In [15]:
%%time
# create example task embeddings using Translated LM
example_task_embeddings = translation_lm.encode(example_tasks, batch_size=512, convert_to_tensor=True, device=device)  # lower batch_size if limited by GPU memory

CPU times: user 51.1 ms, sys: 1.35 ms, total: 52.4 ms
Wall time: 37.9 ms


### Load Validation Dataset

In [16]:
with open(ds_pickles_dir + '/validation_tasks.pickle', 'rb') as handle:
    validation_tasks = pickle.load(handle)

In [17]:
with open(ds_pickles_dir + '/val_aps.pickle', 'rb') as handle:
    val_aps = pickle.load(handle)

with open(ds_pickles_dir + '/val_ap_paths.pickle', 'rb') as handle:
    val_ap_paths = pickle.load(handle)

In [18]:
with open(ds_pickles_dir + '/val_init_graphs.pickle', 'rb') as handle:
    val_init_graphs = pickle.load(handle)

In [19]:
with open(ds_pickles_dir + '/val_objects.pickle', 'rb') as handle:
    val_objects = pickle.load(handle)

with open(ds_pickles_dir + '/val_object_names.pickle', 'rb') as handle:
    val_object_names = pickle.load(handle)

In [20]:
# create available object embeddings using Translated LM
val_object_name_embeddings = []
for object_names in tqdm(val_object_names):
  val_object_name_embeddings.append(translation_lm.encode(object_names, batch_size=512, convert_to_tensor=True, device=device))  # lower batch_size if limited by GPU memory


  0%|          | 0/25 [00:00<?, ?it/s][A
  8%|▊         | 2/25 [00:00<00:01, 14.37it/s][A
 16%|█▌        | 4/25 [00:00<00:01, 11.57it/s][A
 24%|██▍       | 6/25 [00:00<00:01, 11.83it/s][A
 32%|███▏      | 8/25 [00:00<00:01, 12.12it/s][A
 40%|████      | 10/25 [00:00<00:01, 12.23it/s][A
 48%|████▊     | 12/25 [00:00<00:01, 12.19it/s][A
 56%|█████▌    | 14/25 [00:01<00:00, 12.22it/s][A
 64%|██████▍   | 16/25 [00:01<00:00, 12.32it/s][A
 72%|███████▏  | 18/25 [00:01<00:00, 12.22it/s][A
 80%|████████  | 20/25 [00:01<00:00, 12.29it/s][A
 88%|████████▊ | 22/25 [00:01<00:00, 12.35it/s][A
100%|██████████| 25/25 [00:02<00:00, 12.24it/s]


### Load Test Dataset

In [21]:
with open(ds_pickles_dir + '/test_tasks.pickle', 'rb') as handle:
    test_tasks = pickle.load(handle)

In [22]:
with open(ds_pickles_dir + '/test_aps.pickle', 'rb') as handle:
    test_aps = pickle.load(handle)

with open(ds_pickles_dir + '/test_ap_paths.pickle', 'rb') as handle:
    test_ap_paths = pickle.load(handle)

In [23]:
with open(ds_pickles_dir + '/test_init_graphs.pickle', 'rb') as handle:
    test_init_graphs = pickle.load(handle)

In [24]:
with open(ds_pickles_dir + '/test_objects.pickle', 'rb') as handle:
    test_objects = pickle.load(handle)

with open(ds_pickles_dir + '/test_object_names.pickle', 'rb') as handle:
    test_object_names = pickle.load(handle)

In [25]:
# create available object embeddings using Translated LM
test_object_name_embeddings = []
for object_names in tqdm(test_object_names):
  test_object_name_embeddings.append(translation_lm.encode(object_names, batch_size=512, convert_to_tensor=True, device=device))  # lower batch_size if limited by GPU memory


  0%|          | 0/100 [00:00<?, ?it/s][A
  2%|▏         | 2/100 [00:00<00:06, 14.20it/s][A
  4%|▍         | 4/100 [00:00<00:07, 12.81it/s][A
  6%|▌         | 6/100 [00:00<00:07, 12.51it/s][A
  8%|▊         | 8/100 [00:00<00:07, 12.22it/s][A
 10%|█         | 10/100 [00:00<00:07, 12.32it/s][A
 12%|█▏        | 12/100 [00:00<00:07, 12.42it/s][A
 14%|█▍        | 14/100 [00:01<00:06, 12.43it/s][A
 16%|█▌        | 16/100 [00:01<00:06, 12.40it/s][A
 18%|█▊        | 18/100 [00:01<00:06, 12.38it/s][A
 20%|██        | 20/100 [00:01<00:06, 12.40it/s][A
 22%|██▏       | 22/100 [00:01<00:06, 12.27it/s][A
 24%|██▍       | 24/100 [00:01<00:06, 12.38it/s][A
 26%|██▌       | 26/100 [00:02<00:05, 12.38it/s][A
 28%|██▊       | 28/100 [00:02<00:05, 12.42it/s][A
 30%|███       | 30/100 [00:02<00:05, 12.49it/s][A
 32%|███▏      | 32/100 [00:02<00:05, 12.44it/s][A
 34%|███▍      | 34/100 [00:02<00:05, 12.31it/s][A
 36%|███▌      | 36/100 [00:02<00:05, 12.25it/s][A
 38%|███▊      | 38/100 

In [26]:
print(len(example_aps), len(example_ap_paths), len(example_init_graphs))
print(len(val_aps), len(val_ap_paths), len(val_init_graphs), len(val_objects))
print(len(test_aps), len(test_ap_paths), len(test_init_graphs), len(test_objects))
print(len(list(set(example_ap_paths))), len(list(set(val_ap_paths))), len(list(set(test_ap_paths))))

160 160 160
25 25 25 25
100 100 100 100
160 25 100


## 7. Helper Classes

In [27]:
class Hyperparameters:
  # def __init__(self, res_path, source='huggingface', MAX_STEPS=20, P=0.5, TEMP=0.3, WT_SCENE=0.0, MAX_EXAMPLES=10, SAMPLE_MATCH_NUM=1, LLM_ACT=0.3, LLM_ACT_PREV=0.3, ACT_CUTOFF_THRESHOLD=0.7, ACT_CUTOFF_THRESHOLD_PREV=0.8, WT_ACT=1.0, LLM_OBJ=0.0, DIST_OBJ=0.0, OBJ_CUTOFF_THRESHOLD=0.0, WT_OBJ=0.0, STEP_CUTOFF_THRESHOLD=0.8, sampling_params=None):
  # def __init__(self, res_path, source='huggingface', MAX_STEPS=20, P=0.5, TEMP=0.1, WT_SCENE=0.0, MAX_EXAMPLES=10, SAMPLE_MATCH_NUM=1, LLM_ACT=0.3, LLM_ACT_PREV=0.3, ACT_CUTOFF_THRESHOLD=0.7, ACT_CUTOFF_THRESHOLD_PREV=0.8, WT_ACT=1.0, LLM_OBJ=0.0, DIST_OBJ=0.0, OBJ_CUTOFF_THRESHOLD=0.0, WT_OBJ=0.0, STEP_CUTOFF_THRESHOLD=0.8, sampling_params=None):
  def __init__(self, res_path, source='huggingface', MAX_STEPS=20, P=0.5, TEMP=0.3, WT_SCENE=0.25, MAX_EXAMPLES=10, SAMPLE_MATCH_NUM=1, LLM_ACT=0.3, LLM_ACT_PREV=0.3, ACT_CUTOFF_THRESHOLD=0.7, ACT_CUTOFF_THRESHOLD_PREV=0.8, WT_ACT=1.0, LLM_OBJ=0.1, DIST_OBJ=0.5, OBJ_CUTOFF_THRESHOLD=0.0, WT_OBJ=0.5, STEP_CUTOFF_THRESHOLD=1.6, sampling_params=None):
    self.res_path = res_path
    self.source = source
    self.MAX_STEPS = MAX_STEPS
    self.P = P  # hyperparameter for early stopping heuristic to detect whether Planning LM believes the plan is finished

    self.TEMP = TEMP

    self.WT_SCENE = WT_SCENE
    self.MAX_EXAMPLES = MAX_EXAMPLES

    self.SAMPLE_MATCH_NUM = SAMPLE_MATCH_NUM
    self.LLM_ACT = LLM_ACT
    self.LLM_ACT_PREV = LLM_ACT_PREV
    self.ACT_CUTOFF_THRESHOLD = ACT_CUTOFF_THRESHOLD
    self.ACT_CUTOFF_THRESHOLD_PREV = ACT_CUTOFF_THRESHOLD_PREV
    self.WT_ACT = WT_ACT

    self.LLM_OBJ = LLM_OBJ
    self.DIST_OBJ = DIST_OBJ
    self.OBJ_CUTOFF_THRESHOLD = OBJ_CUTOFF_THRESHOLD
    self.WT_OBJ = WT_OBJ

    self.STEP_CUTOFF_THRESHOLD = STEP_CUTOFF_THRESHOLD # ACT_CUTOFF_THRESHOLD + WT_OBJ * OBJ_CUTOFF_THRESHOLD

    if source == 'openai':
      openai.api_key = OPENAI_KEY
      self.sampling_params = {
        "max_tokens": 10,
        "temperature": 0.6,
        "top_p": 0.9,
        "n": 10,
        "logprobs": 1,
        "presence_penalty": 0.5,
        "frequency_penalty": 0.3,
        "stop": '\n'
        }
      
      self.sampling_params_PREV = {
        "max_tokens": 10,
        "temperature": 0.6,
        "top_p": 0.9,
        "n": 10,
        "logprobs": 1,
        "presence_penalty": 0.5,
        "frequency_penalty": 0.3,
        "stop": '\n'
        }

    if source == 'huggingface':
      self.sampling_params = {
        "temperature": self.TEMP,
        "top_p": 0.9,
        "num_return_sequences": 20,
        "repetition_penalty": 1.2,
        'use_cache': True,
        'output_scores': True,
        'return_dict_in_generate': True,
        'do_sample': True,
        }
      
      self.sampling_params_PREV = {
        "temperature": 0.1,
        "top_p": 0.9,
        "num_return_sequences": 10,
        "repetition_penalty": 1.2,
        'use_cache': True,
        'output_scores': True,
        'return_dict_in_generate': True,
        'do_sample': True,
        }

    self.max_tokens = 10
    self.stop = '\n'
  
  def __repr__(self):
    pprint.pprint(dict(res_path = self.res_path, MAX_STEPS = self.MAX_STEPS, P = self.P, 
            TEMP = self.TEMP, WT_SCENE = self.WT_SCENE, MAX_EXAMPLES = self.MAX_EXAMPLES, 
            SAMPLE_MATCH_NUM = self.SAMPLE_MATCH_NUM, LLM_ACT = self.LLM_ACT, 
            LLM_ACT_PREV = self.LLM_ACT_PREV, ACT_CUTOFF_THRESHOLD = self.ACT_CUTOFF_THRESHOLD, 
            ACT_CUTOFF_THRESHOLD_PREV = self.ACT_CUTOFF_THRESHOLD_PREV, 
            WT_ACT = self.WT_ACT, LLM_OBJ = self.LLM_OBJ, DIST_OBJ = self.DIST_OBJ, 
            OBJ_CUTOFF_THRESHOLD = self.OBJ_CUTOFF_THRESHOLD, WT_OBJ = self.WT_OBJ, 
            STEP_CUTOFF_THRESHOLD = self.STEP_CUTOFF_THRESHOLD, 
            sampling_params = self.sampling_params,
            sampling_params_PREV = self.sampling_params_PREV))
    return ""

  def __str__(self):
    pprint.pprint(dict(res_path = self.res_path, MAX_STEPS = self.MAX_STEPS, P = self.P, 
            TEMP = self.TEMP, WT_SCENE = self.WT_SCENE, MAX_EXAMPLES = self.MAX_EXAMPLES, 
            SAMPLE_MATCH_NUM = self.SAMPLE_MATCH_NUM, LLM_ACT = self.LLM_ACT, 
            LLM_ACT_PREV = self.LLM_ACT_PREV, ACT_CUTOFF_THRESHOLD = self.ACT_CUTOFF_THRESHOLD, 
            ACT_CUTOFF_THRESHOLD_PREV = self.ACT_CUTOFF_THRESHOLD_PREV, 
            WT_ACT = self.WT_ACT, LLM_OBJ = self.LLM_OBJ, DIST_OBJ = self.DIST_OBJ, 
            OBJ_CUTOFF_THRESHOLD = self.OBJ_CUTOFF_THRESHOLD, WT_OBJ = self.WT_OBJ, 
            STEP_CUTOFF_THRESHOLD = self.STEP_CUTOFF_THRESHOLD, 
            sampling_params = self.sampling_params,
            sampling_params_PREV = self.sampling_params_PREV))
    return ""

In [28]:
class Scores:
  def __init__(self):
    self.samples = []
    self.NL_samples = []
    self.overall_scores = []
    self.best_overall_score_ind = None
    self.best_overall_score = -np.inf

    self.action_matching_scores = []
    self.action_LLM_scores = []

    self.object_matching_scores = []
    self.object_LLM_scores = []
    self.object_disamb_scores = []
  
  def __repr__(self):
    pprint.pprint(dict(samples = self.samples, NL_samples = self.NL_samples, overall_scores = np.round(self.overall_scores, 3), action_matching_scores = np.round(self.action_matching_scores, 3), 
                       action_LLM_scores = np.round(self.action_LLM_scores, 3), object_matching_scores = np.round(self.object_matching_scores, 3), 
                       object_LLM_scores = np.round(self.object_LLM_scores, 3), object_disamb_scores = np.round(self.object_disamb_scores, 3)))
    return ""

  def getOverallScores(self, hyperparams):
    for score_idx in range(len(self.action_matching_scores)):
      overall_score = hyperparams.WT_ACT * self.action_matching_scores[score_idx] + \
                      hyperparams.LLM_ACT * self.action_LLM_scores[score_idx] + \
                      hyperparams.WT_OBJ * self.object_matching_scores[score_idx] + \
                      hyperparams.LLM_OBJ * self.object_LLM_scores[score_idx] + \
                      hyperparams.DIST_OBJ * self.object_disamb_scores[score_idx]

      self.overall_scores.append(overall_score)

    self.best_overall_score_ind = np.argmax(self.overall_scores)
    self.best_overall_score = self.overall_scores[self.best_overall_score_ind]
  
  def getOverallScoresBlind(self, hyperparams):
    for score_idx in range(len(self.action_matching_scores)):
      overall_score = self.action_matching_scores[score_idx] + hyperparams.LLM_ACT_PREV * self.action_LLM_scores[score_idx]
      self.overall_scores.append(overall_score)

    self.best_overall_score_ind = np.argmax(self.overall_scores)
    self.best_overall_score = self.overall_scores[self.best_overall_score_ind]

## 8. Helper functions

In [29]:
# get similarity between 2 graphs
def graphIoU(current_graph, example_graph):
  '''
  get similarity between 2 graphs
  input: current graph, example graph to compare with
  output: IoU of nodes and edges
  '''

  common_node_num = 0
  common_edge_num = 0
  curr_nodes_num = current_graph['nodes_num']
  curr_edges_num = current_graph['edges_num']
  example_nodes_num = example_graph['nodes_num']
  example_edges_num = example_graph['edges_num']

  example_relevant_nodes = copy.deepcopy(example_graph['nodes_summary'])
  example_relevant_edges = copy.deepcopy(example_graph['edges_summary'])

  for curr_node_name in current_graph['nodes_summary']:
    if curr_node_name in example_relevant_nodes:
      curr_state_props = current_graph['nodes_summary'][curr_node_name]
      for node in curr_state_props:
        if node in example_relevant_nodes[curr_node_name]:
          common_node_num += 1
          example_relevant_nodes[curr_node_name].remove(node)

  for curr_edge in current_graph['edges_summary']:
    if curr_edge in example_relevant_edges:
      common_edge_num += min(example_relevant_edges[curr_edge], current_graph['edges_summary'][curr_edge])

  node_IoU = common_node_num / (curr_nodes_num + example_nodes_num - common_node_num)
  edge_IoU = common_edge_num / (curr_edges_num + example_edges_num - common_edge_num)
  # print(np.round(node_IoU, 3), np.round(edge_IoU, 3), np.round(node_IoU + edge_IoU, 3))
  return (node_IoU + edge_IoU)/2

In [30]:
# hyperparameters for plangeneration
def getHyperparams(res_path, **kwargs):
  '''
  hyperparameters for plangeneration
  input: result directory
  output: list of hyperparameter objects
  '''

  hyperparam_obj = Hyperparameters(res_path, **kwargs)
  return hyperparam_obj

In [31]:
# helper function for finding similar sentence in a corpus given a query
def find_most_similar(query_str, corpus_embedding, num=1):
  query_embedding = translation_lm.encode(query_str, convert_to_tensor=True, device=device)
  # calculate cosine similarity against each candidate sentence in the corpus
  cos_scores = st_utils.pytorch_cos_sim(query_embedding, corpus_embedding)[0].detach().cpu().numpy()

  most_similar_idx = np.argpartition(cos_scores, -1 * num)[-1 * num:]
  most_similar_idx = most_similar_idx[np.argsort(cos_scores[most_similar_idx])][::-1]
  # retrieve high-ranked index and similarity score
  matching_score = cos_scores[most_similar_idx]
  return most_similar_idx, matching_score

In [32]:
def getSimilarEg(task, current_init_graph, hyperparams):
  '''
  get examples similar to the given task and graph
  input: current task and graph, hyperparameters
  output: list of ids of similar examples
  '''
  example_ids, example_task_scores = find_most_similar(task, example_task_embeddings, hyperparams.MAX_EXAMPLES)
  example_scene_scores = [graphIoU(current_init_graph, example_init_graphs[idx]) for idx in example_ids]
  example_scores = [example_task_scores[i] + hyperparams.WT_SCENE * example_scene_scores[i] for i in range(hyperparams.MAX_EXAMPLES)]
  # print([example_tasks[i] for i in example_ids])
  # print('example_task_scores', [np.round(sc, 3) for sc in example_task_scores])
  # print('example_scene_scores', [np.round(sc, 3) for sc in example_scene_scores])
  # print('example_scores', [np.round(sc, 3) for sc in example_scores])
  example_idx = example_ids[np.argmax(example_scores)]
  return example_idx

In [33]:
def getSimilarEgBlind(task, hyperparams):
  '''
  get examples similar to the given task and graph
  input: current task and graph, hyperparameters
  output: list of ids of similar examples
  '''
  example_ids, example_task_scores = find_most_similar(task, example_task_embeddings, 1)
  example_idx = example_ids[0]
  return example_idx

In [34]:
def printExampleandTask(val_idx, example_task, example_actions, example_objects, task, f_print, print_str="Our Output"):
  '''
  Print example and current task
  '''
  print('-'*10 + print_str + '-'*10)
  print('-'*10 + ' GIVEN EXAMPLE ' + '-'*10, "val idx:", val_idx)
  print(f'Task: {example_task}')
  for step, (ex, obj) in enumerate(zip(example_actions, example_objects)):
    print(f'Step {step+1}:', ex, end=' '*(55-len(f'Step {step+1}: ' + ex )))
    print(", ".join(obj))
  print('-'*10 + ' EXAMPLE END ' + '-'*10)
  print(f'\nTask: {task}')

  f_print.write(str(val_idx) + "\n")
  f_print.write(print_str + "\n")
  f_print.write('-'*10 + ' GIVEN EXAMPLE ' + '-'*10 + "\n")
  f_print.write(example_task + "\n")
  for step, (ex, obj) in enumerate(zip(example_actions, example_objects)):
    f_print.write(f'Step {step+1}: ' + ex + ' '*(55-len(f'Step {step+1}: ' + ex )))
    f_print.write(", ".join(obj) + "\n")
  f_print.write('-'*10 + ' EXAMPLE END ' + '-'*10 + "\n")
  f_print.write(f'\nTask: {task}\n')

In [35]:
def printExampleandTaskBlind(val_idx, example_task, example_actions, task, f_print, print_str="Previous work Output"):
  '''
  Print example and current task
  '''
  print('-'*10 + print_str + '-'*10)
  print('-'*10 + ' GIVEN EXAMPLE ' + '-'*10, "val idx:", val_idx)
  print(f'Task: {example_task}')
  for step, ex in enumerate(example_actions):
    print(f'Step {step+1}:', ex)
  print('-'*10 + ' EXAMPLE END ' + '-'*10)
  print(f'\nTask: {task}')

  f_print.write(str(val_idx) + "\n")
  f_print.write(print_str + "\n")
  f_print.write('-'*10 + ' GIVEN EXAMPLE ' + '-'*10 + "\n")
  f_print.write(example_task + "\n")
  for step, ex in enumerate(example_actions):
    f_print.write(f'Step {step+1}: ' + ex + "\n")
  f_print.write('-'*10 + ' EXAMPLE END ' + '-'*10 + "\n")
  f_print.write(f'\nTask: {task}\n')

In [36]:
def reduceSamples(samples, log_probs):
  '''
  remove redundant samples for next action
  '''
  samples_dict = {}
  for sample, log_prob in zip(samples, log_probs):
    if sample not in samples_dict:
      samples_dict[sample] = log_prob
    elif log_prob > samples_dict[sample]:
      samples_dict[sample] = log_prob

  samples = []
  log_probs = []
  for sample in samples_dict:
    samples.append(sample)
    log_probs.append(samples_dict[sample])

  return samples, log_probs

In [37]:
def processActions(step, samples, log_probs, previous_action, SAMPLE_MATCH_NUM, LLM_ACT):
  '''
  Process sample actions and return possible next actions for this step
  '''
  NL_samples = [] # Possible actions for this step
  action_matching_scores = []
  action_LLM_scores = []

  for sample, log_prob in zip(samples, log_probs):
    most_similar_ids, matching_scores = find_most_similar(sample, action_list_embeddings, SAMPLE_MATCH_NUM)
    for most_similar_idx, matching_score in zip(most_similar_ids, matching_scores):
      tx_action = action_list[most_similar_idx]
      NL_sample = (tx_action[0].upper() + tx_action[1:]).replace('_', ' ') # 'open_fridge' -> 'Open fridge'

      # heuristic for penalizing generating the same action as the last action
      if step > 1 and NL_sample == previous_action:
        matching_score -= 0.5

      NL_samples.append(NL_sample)
      action_matching_scores.append(matching_score)
      action_LLM_scores.append(log_prob)
  return NL_samples, action_matching_scores, action_LLM_scores

In [38]:
# helper function for finding the object in a query
def findObjs(NL_sample):
  NL_action0 = ["Sleep", "Stand up", "Wake up"]
  NL_action1 = ["Close ", "Cut ", "Drink ", "Drop ", "Eat ", "Find ", "Grab ", "Greet ", "Lie on ", "Look at ", 
                "Move ", "Open ", "Plug in ", "Plug out ", "Point at ", "Pull ", "Push ", "Put back ", 
                "Take off ", "Put on ", "Read ", "Rinse ", "Run to ", "Scrub ", "Sit on ", "Squeeze ", 
                "Switch off ", "Switch on ", "Touch ", "Turn to ", "Type on ", "Walk to ", "Wash ", "Watch ", "Wipe ", "Release "]
  NL_action2 = [["Pour ", " into "], ["Put ", " on "], ["Put ", " in "]]

  if NL_sample in NL_action0:
    return []
  
  if any([NL_sample.startswith(NL_action) for NL_action in NL_action1]):
    NL_action = [NL_action for NL_action in NL_action1 if NL_sample.startswith(NL_action)]
    if len(NL_action) > 1:
      raise Exception('multiple actions 1')
      return None
    else:
      sample_objects_NL = NL_sample.replace(NL_action[0], '')
      return [sample_objects_NL]
  
  if any([(NL_sample.startswith(NL_action[0]) and NL_action[1] in NL_sample) for NL_action in NL_action2]):
    NL_action = [NL_action for NL_action in NL_action2 if (NL_sample.startswith(NL_action[0]) and NL_action[1] in NL_sample)]
    if len(NL_action) > 1:
      raise Exception('multiple actions 2')
      return None
    else:
      sample_objects_NL = NL_sample.split(NL_action[0][0])[-1]
      sample_objects_NL = sample_objects_NL.split(NL_action[0][1])
      return sample_objects_NL

  print("NL_sample for no object match", NL_sample)
  raise Exception('no object match')

In [39]:
def distance_score(last_obj, curr_obj):
  if not last_obj or not curr_obj:
    return 1.0

  if last_obj['bounding_box'] == None:
    last_obj_name = last_obj['class_name']
    # print(f'{last_obj_name} bounding box not present')
    return 0.0

  if curr_obj['bounding_box'] == None:
    curr_obj_name = curr_obj['class_name']
    # print(f'{curr_obj_name} bounding box not present')
    return 0.0

  last_obj_loc = np.array(last_obj['bounding_box']['center'])
  curr_obj_loc = np.array(curr_obj['bounding_box']['center'])

  dist = np.linalg.norm(last_obj_loc - curr_obj_loc)
  return np.exp(-dist/100)

def disambiguate_obj(possible_objects_robot, previous_objects, NL_sample, action_plan):
  # score for object state
  possible_obj_scores = []
  for curr_obj in possible_objects_robot:
    curr_obj_scores_temp = []
    for last_obj in previous_objects:
      obj_dist_score = distance_score(last_obj, curr_obj) # score for distance

      obj_repeat_score = 0.0 # score to discourage repeating after releasing object
      for step in action_plan:
          if NL_sample == step[0] and curr_obj in step[1]:
              obj_repeat_score -= 1.0
      
      curr_obj_scores_temp.append(obj_dist_score + obj_repeat_score)
    
    if len(curr_obj_scores_temp) == 0:
      curr_obj_scores_temp.append(1.0)
    
    possible_obj_scores.append(np.mean(curr_obj_scores_temp))

  possible_obj_max_score = max(possible_obj_scores)
  return possible_obj_max_score, possible_objects_robot[possible_obj_scores.index(possible_obj_max_score)]

In [40]:
def processObjects(NL_samples, current_object_names, current_objects, current_objects_embeddings, \
                   curr_obj_prompt, previous_objects, action_plan, LLM_OBJ, DIST_OBJ):
  object_matching_scores = []
  object_LLM_scores = []
  object_disamb_scores = []

  objects_robot = [] # Object instances for each action at this step
  tx_object_robot_names = [] # Object instance names for each action at this step

  name_equivalence_dict = {}
  for NL_sample in NL_samples: # For each action
    sample_object_matching_scores = []
    sample_object_LLM_scores = []
    sample_object_disamb_scores = []
    sample_objects_robot = []
    sample_object_robot_names = []

    sample_object_names_NL = findObjs(NL_sample)
    # print("########", NL_sample, sample_object_names_NL)
    for object_NL_name in sample_object_names_NL: # For each object in the action
      # Object Matching Score
      [tx_obj_idx], [object_matching_score] = find_most_similar(object_NL_name.replace(' ', '_'), current_objects_embeddings)
      tx_object_robot_name = current_object_names[tx_obj_idx]
      tx_object_NL_name = tx_object_robot_name.replace('_', ' ')
      # print("######## Object Matching Score", tx_object_robot_name, object_matching_score)

      # Object LLM Score
      object_LLM_score = 0
      for prompt_obj_name in list(curr_obj_prompt):
        object_LLM_string = prompt_obj_name + f' and {tx_object_NL_name} are related'
        perplexity_input = tokenizer(object_LLM_string, return_tensors="pt").to(device)
        with torch.no_grad():
          object_LLM_loss = model(**perplexity_input, labels=perplexity_input["input_ids"]).loss
          # print("######## Object LLM Loss", object_LLM_string, object_LLM_loss.item())
        object_LLM_score -= np.log(object_LLM_loss.item())
      object_LLM_score = object_LLM_score / len(curr_obj_prompt)
      # print("######## Object LLM Score", object_LLM_score)

      # Object Disamguation Score
      # tx_object_robot_name_unity = tx_object_robot_name.replace('_', '')
      # print(tx_object_robot_name, tx_object_robot_name_unity)
      possible_objects_robot = [obj for obj in current_objects if obj['class_name'] == tx_object_robot_name]
      object_disamb_score, object_robot = disambiguate_obj(possible_objects_robot, previous_objects, NL_sample, action_plan)
      object_robot['class_name'] = tx_object_robot_name
      # print(f'######## {len(possible_objects_robot)} Object Disamguation Score', object_disamb_score, object_robot)

      sample_object_matching_scores.append(object_matching_score)
      sample_object_LLM_scores.append(object_LLM_score)
      sample_object_disamb_scores.append(object_disamb_score)
      sample_objects_robot.append(object_robot)
      sample_object_robot_names.append(tx_object_robot_name)

    if len(sample_object_names_NL) == 0:
      sample_object_matching_scores.append(0)
      sample_object_LLM_scores.append(0)
      sample_object_disamb_scores.append(0)

    object_matching_scores.append(np.mean(sample_object_matching_scores))
    object_LLM_scores.append(np.mean(sample_object_LLM_scores))
    object_disamb_scores.append(np.mean(sample_object_disamb_scores))

    objects_robot.append(sample_objects_robot)
    tx_object_robot_names.append(sample_object_robot_names)
      
  return tx_object_robot_names, objects_robot, object_matching_scores, object_LLM_scores, object_disamb_scores

## 9. Main Functions

In [60]:
def mainOur(hyperparams, idx, log_file, eval_mode):
  if eval_mode == 'validation':
    task = validation_tasks[idx]
    gt_ap = val_aps[idx]
    current_init_graph = val_init_graphs[idx]

    current_objects = val_objects[idx]
    current_object_names = val_object_names[idx]
    current_objects_embeddings = val_object_name_embeddings[idx]

  elif eval_mode == 'test':
    task = test_tasks[idx]
    gt_ap = test_aps[idx]
    current_init_graph = test_init_graphs[idx]

    current_objects = test_objects[idx]
    current_object_names = test_object_names[idx]
    current_objects_embeddings = test_object_name_embeddings[idx]
  
  # action step, objects
  # define query task
  action_plan = []
  previous_action = ''
  previous_objects = []

  # find most relevant example
  example_idx = getSimilarEg(task, current_init_graph, hyperparams)

  # modify example for creating prompt
  example_task = example_tasks[example_idx]
  example_action_plan = example_aps[example_idx]

  example_actions = example_action_plan[0]
  example_object_names = example_action_plan[1]
  example = f'Task: {example_task}'
  for i in range(len(example_actions)):
    example += f'\nStep {i + 1}: {example_actions[i]}'

  NL_example_object_names = [obj_name.replace('_', ' ') for object_names in example_object_names for obj_name in object_names]

  # construct initial prompt
  curr_prompt = f'{example}\n\nTask: {task}'
  curr_obj_prompt = set(NL_example_object_names)
  
  # print example and query task
  printExampleandTask(idx, example_task, example_actions, example_object_names, task, log_file, "Our Output")

  for step in range(1, hyperparams.MAX_STEPS + 1):
    # tic = time.perf_counter()
    step_scores = Scores()
    # query Planning LM for single-step action candidates
    samples, log_probs = generator(curr_prompt + f'\nStep {step}:', hyperparams.sampling_params, hyperparams.max_tokens, hyperparams.stop)

    # terminate early if top P*100% of samples are all 0-length (ranked by log prob)
    top_samples_ids = np.argsort(log_probs)[-int(hyperparams.P * len(samples)):]
    are_zero_length = all([len(samples[i]) == 0 for i in top_samples_ids])
    if are_zero_length:
      log_file.write(f'\n[Terminating early because top {hyperparams.P*100}% of samples are all 0-length]\n\n\n')
      print((f'\n[Terminating early because top {hyperparams.P*100}% of samples are all 0-length]'))
      break
    
    # Process actions
    samples, log_probs = reduceSamples(samples, log_probs)
    NL_samples, step_scores.action_matching_scores, step_scores.action_LLM_scores = \
      processActions(step, samples, log_probs, previous_action, hyperparams.SAMPLE_MATCH_NUM, hyperparams.LLM_ACT)
    # print("NL_samples:", NL_samples)
    # print("action_LLM_scores:", step_scores.action_LLM_scores, '\n')
    # print("action_matching_scores:", step_scores.action_matching_scores, '\n')

    # Process objects
    tx_object_robot_names, objects_robot, step_scores.object_matching_scores, step_scores.object_LLM_scores, step_scores.object_disamb_scores = \
      processObjects(NL_samples, current_object_names, current_objects, current_objects_embeddings, \
                      curr_obj_prompt, previous_objects, action_plan, hyperparams.LLM_OBJ, hyperparams.DIST_OBJ)
    # print(tx_object_robot_names, objects_robot)

    # Calculate overall score
    step_scores.getOverallScores(hyperparams)
    best_sample = NL_samples[step_scores.best_overall_score_ind]
    best_object = objects_robot[step_scores.best_overall_score_ind]
    best_object_robot_name = tx_object_robot_names[step_scores.best_overall_score_ind]

    print(f'Step {step}: ' + best_sample, end=' '*(55-len(f'Step {step}: ' + best_sample)))
    print(", ".join(best_object_robot_name))
    # print('\n\n')
    step_scores.samples = samples
    step_scores.NL_samples = NL_samples
    # print(step_scores)

    # terminate early when either the following is true:
    # 2. overall score is below CUTOFF_THRESHOLD
    # else: autoregressive generation based on previously translated action
    below_threshold = step_scores.best_overall_score < hyperparams.STEP_CUTOFF_THRESHOLD
    if below_threshold:
      log_file.write(f'\n[Terminating early because best overall score is lower than CUTOFF_THRESHOLD ({step_scores.best_overall_score} < {hyperparams.STEP_CUTOFF_THRESHOLD})]\n\n\n')
      print(f'\n[Terminating early because best overall score is lower than CUTOFF_THRESHOLD ({step_scores.best_overall_score} < {hyperparams.STEP_CUTOFF_THRESHOLD})]')
      break
    
    previous_action = best_sample
    previous_objects = best_object
    action_plan.append([best_sample, best_object])

    curr_prompt += f'\nStep {step}: {best_sample}'
    for object_name in best_object_robot_name:
      curr_obj_prompt.add(object_name)
    
    # toc = time.perf_counter()
    # our_time.append(toc - tic)
    # print(f"average step time of {len(our_time)} steps is {np.round(np.mean(our_time), 3)} seconds")
    
  out_ap_path = task + ".txt"
  result_file = open(out_ap_path, "w")
  result_file.write("")
  for step in action_plan:
    result_file.write(step[0] + " - ")
    result_file.write(', '.join([obj['class_name'] for obj in step[1]]) + " ")
    result_file.write('(' + ', '.join([str(obj['id']) for obj in step[1]]) + ')')
    result_file.write('\n')
  result_file.close()

  for step in action_plan:
    log_file.write(step[0] + " - ")
    log_file.write(', '.join([obj['class_name'] for obj in step[1]]) + " ")
    log_file.write('(' + ', '.join([str(obj['id']) for obj in step[1]]) + ')')
    log_file.write('\n')
  log_file.write('\n\n')

  return action_plan, current_init_graph, gt_ap

In [44]:
def mainPrev(hyperparams, idx, log_file, eval_mode):
  if eval_mode == 'validation':
    task = validation_tasks[idx]
    gt_ap = val_aps[idx]
    current_init_graph = val_init_graphs[idx]

    current_objects = val_objects[idx]
    current_object_names = val_object_names[idx]
    current_objects_embeddings = val_object_name_embeddings[idx]

  elif eval_mode == 'test':
    task = test_tasks[idx]
    gt_ap = test_aps[idx]
    current_init_graph = test_init_graphs[idx]

    current_objects = test_objects[idx]
    current_object_names = test_object_names[idx]
    current_objects_embeddings = test_object_name_embeddings[idx]
  
  # action step, objects
  # define query task
  action_plan = []
  previous_action = ''

  # find most relevant example
  example_idx = getSimilarEgBlind(task, hyperparams)

  # modify example for creating prompt
  example_task = example_tasks[example_idx]
  example_action_plan = example_aps[example_idx]

  example_actions = example_action_plan[0]
  example = f'Task: {example_task}'
  for i in range(len(example_actions)):
    example += f'\nStep {i + 1}: {example_actions[i]}'

  # construct initial prompt
  curr_prompt = f'{example}\n\nTask: {task}'
  
  # print example and query task
  printExampleandTaskBlind(idx, example_task, example_actions, task, log_file)

  for step in range(1, hyperparams.MAX_STEPS + 1):
    # tic = time.perf_counter()
    step_scores = Scores()
    # query Planning LM for single-step action candidates
    samples, log_probs = generator(curr_prompt + f'\nStep {step}:', hyperparams.sampling_params_PREV, hyperparams.max_tokens, hyperparams.stop)

    # terminate early if top P*100% of samples are all 0-length (ranked by log prob)
    top_samples_ids = np.argsort(log_probs)[-int(hyperparams.P * len(samples)):]
    are_zero_length = all([len(samples[i]) == 0 for i in top_samples_ids])
    if are_zero_length:
      log_file.write(f'\n[Terminating early because top {hyperparams.P*100}% of samples are all 0-length]\n\n\n')
      print((f'\n[Terminating early because top {hyperparams.P*100}% of samples are all 0-length]'))
      break
    
    # Process actions
    # samples, log_probs = reduceSamples(samples, log_probs)
    # print("samples:", samples)
    NL_samples, step_scores.action_matching_scores, step_scores.action_LLM_scores = \
      processActions(step, samples, log_probs, previous_action, 1, hyperparams.LLM_ACT_PREV)
    # print("NL_samples:", NL_samples)
    # print("action_LLM_scores:", step_scores.action_LLM_scores, '\n')
    # print("action_matching_scores:", step_scores.action_matching_scores, '\n')

    # Calculate overall score
    step_scores.getOverallScoresBlind(hyperparams)
    best_sample = NL_samples[step_scores.best_overall_score_ind]

    print(f'Step {step}: ' + best_sample)
    # print('\n\n')
    step_scores.samples = samples
    step_scores.NL_samples = NL_samples
    # print(step_scores)

    # terminate early when either the following is true:
    # 2. overall score is below CUTOFF_THRESHOLD
    # else: autoregressive generation based on previously translated action
    below_threshold = step_scores.best_overall_score < hyperparams.ACT_CUTOFF_THRESHOLD_PREV
    if below_threshold:
      log_file.write(f'\n[Terminating early because best overall score is lower than CUTOFF_THRESHOLD ({step_scores.best_overall_score} < {hyperparams.ACT_CUTOFF_THRESHOLD_PREV})]\n\n\n')
      print(f'\n[Terminating early because best overall score is lower than CUTOFF_THRESHOLD ({step_scores.best_overall_score} < {hyperparams.ACT_CUTOFF_THRESHOLD_PREV})]')
      break
    
    previous_action = best_sample
    action_plan.append(best_sample)
    curr_prompt += f'\nStep {step}: {best_sample}'

    # toc = time.perf_counter()
    # prev_time.append(toc - tic)
    # print(f"average step time of {len(prev_time)} steps is {np.round(np.mean(prev_time), 3)} seconds")
    
  out_ap_path = task + ".txt"
  result_file = open(out_ap_path, "w")
  result_file.write("")
  for step in action_plan:
    result_file.write(step)
    result_file.write('\n')
  result_file.close()

  for step in action_plan:
    log_file.write(step)
    log_file.write('\n')
  log_file.write('\n\n')

  return action_plan, current_init_graph, gt_ap

## 10. Evaluation functions

### Step count

In [45]:
def getSteps(action_plan):
  return len(action_plan)

### Executability

In [46]:
def getExecutability(action_plan, init_graph, id_mapping):
  low = 0
  high = len(action_plan)
  mid = high
  
  success = 0
  message = ""
  while mid != low:
    try:
      preconds = add_preconds.get_preconds_script(action_plan[:mid]).printCondsJSON()
      info = check_programs.check_script(action_plan[:mid], preconds, graph_path=None, 
                                        inp_graph_dict=copy.deepcopy(init_graph), 
                                        id_mapping=copy.deepcopy(id_mapping))
      message, _, graph_state_list, _, _, _, _, _ = info
      # print(info[4])
      success = (message == 'Script is executable')
    except:
      print(mid, "add_preconds or check_script failed")
      success = False
    # print(low, mid, high, len(action_plan), success)

    if success:
      low = mid
    else:
      high = mid
    mid = (low + high) // 2
  return mid

### LCS

In [47]:
def LCSubStr(X, Y):
  m = len(X)
  n = len(Y)
  LCSuff = [[0 for k in range(n+1)] for l in range(m+1)]
  result = 0
  i_i = 0
  j_j = 0
  
  # Following steps to build
  # LCSuff[m+1][n+1] in bottom up fashion
  for i in range(m + 1):
    for j in range(n + 1):
      if (i == 0 or j == 0):
        LCSuff[i][j] = 0
      elif (X[i-1] == Y[j-1]):
        LCSuff[i][j] = LCSuff[i-1][j-1] + 1
        result = max(result, LCSuff[i][j])
      else:
        LCSuff[i][j] = 0
  return result

In [48]:
def gap_lcs(lh, rh): 
    ln, rn = len(lh), len(rh) 
    k1 = ln
    k2 = rn

    memo = {}
    choices = {}
    def rec(li, ri, l_budget, r_budget):
        # At the end of either sequence we are forced to use
        # Case 1: terminate the match.
        if li >= ln or ri >= rn:
            return 0 

        # Cache results. This limits the complexity to O(ln * lm * k^2).
        # Without this the recursion would take exponential time.
        key = (li, ri, l_budget, r_budget) 
        if key in memo: 
            return memo[key]

        # Case 1: terminate the match.
        res = 0
        choice = (0, 0)

        # Case 2: matching characters, extend the sequence.
        if lh[li] == rh[ri]:
            test = 1 + rec(li + 1, ri + 1, k1, k2)
            if test > res:
                res = test
                choice = (1, 1)

        # Case 3: skip the left character if there's still budget.
        if l_budget > 0:
            test = rec(li + 1, ri, l_budget - 1, r_budget)
            if test > res:
                res = test
                choice = (1, 0)

        # Case 4: skip the right character if there's still budget.
        if r_budget > 0:
            test = rec(li, ri + 1, l_budget, r_budget - 1)
            if test > res:
                res = test
                choice = (0, 1)

        memo[key] = res
        choices[key] = choice
        return res

    # Find the best combination of starting points within the two sequences.
    # This is so the gap constraint will not apply to skips at the start.
    res = 0
    best_li, best_ri = 0, 0
    for li in range(ln):
        for ri in range(rn):
            test = rec(li, ri, k1, k2)
            if test > res:
                res, best_li, best_ri = test, li, ri

    # Reconstruct the LCS by following the choices we tracked,
    # starting from the best start we found.
    li, ri = best_li, best_ri
    l_budget, r_budget = k1, k2

    path = []
    while True:
        key = (li, ri, l_budget, r_budget)

        # Case 1.
        if key not in choices:
            break
        inc_li, inc_ri = choices[key]

        # Case 1.
        if inc_li == 0 and inc_ri == 0:
            break

        if inc_li == 1 and inc_ri == 1:
            # Case 2.
            l_budget, r_budget = k1, k2
            path.append((lh[li], li, ri))
        else:
            # Cases 3 and 4.
            l_budget -= inc_li
            r_budget -= inc_ri

        li += inc_li
        ri += inc_ri
    return len(path)

In [49]:
def getLCS(action_plan, gt_ap, mode):
  if mode == "our":
    action_plan = [step[0] for step in action_plan]
  
  # print(gt_ap, '\n')
  # print(action_plan, '\n')
  return 100 * gap_lcs(action_plan, gt_ap) / max(len(action_plan), len(gt_ap))

### Final Correctness

In [50]:
def getUpdatedGraph(action_plan, init_graph, id_mapping):
  success = False
  while not success:
    preconds = add_preconds.get_preconds_script(action_plan).printCondsJSON()
    info = check_programs.check_script(action_plan, preconds, graph_path=None, 
                                      inp_graph_dict=copy.deepcopy(init_graph), 
                                      id_mapping=copy.deepcopy(id_mapping))
    message, _, graph_state_list, _, _, _, _, _ = info
    success = (message == 'Script is executable')
    action_plan = action_plan[:-1]

  start_graph = copy.deepcopy(graph_state_list[0])
  end_graph = copy.deepcopy(graph_state_list[-1])

  start_nodes = {}
  for node in start_graph['nodes']:
    node['properties'] = set(node['properties'])
    node['states'] = set(node['states'])
    node.pop('prefab_name')
    node.pop('bounding_box')
    start_nodes[node['id']] = node

  end_nodes = {}
  for node in end_graph['nodes']:
    node['properties'] = set(node['properties'])
    node['states'] = set(node['states'])
    node.pop('prefab_name')
    node.pop('bounding_box')
    end_nodes[node['id']] = node

  start_edges = []
  for edge in start_graph['edges']:
    edge['from_name'] = start_nodes[edge['from_id']]['class_name']
    edge['to_name'] = start_nodes[edge['to_id']]['class_name']
    edge.pop('from_id')
    edge.pop('to_id')
    start_edges.append(edge)
  
  end_edges = []
  for edge in end_graph['edges']:
    edge['from_name'] = end_nodes[edge['from_id']]['class_name']
    edge['to_name'] = end_nodes[edge['to_id']]['class_name']
    edge.pop('from_id')
    edge.pop('to_id')
    end_edges.append(edge)

  changed_nodes = []
  for node_key in end_nodes:
    if start_nodes[node_key] == end_nodes[node_key]:
      continue

    end_nodes[node_key].pop('id')
    changed_nodes.append(end_nodes[node_key])
  
  changed_edges = []
  for edge in end_edges:
    if edge not in start_edges:
      changed_edges.append(edge)

  return changed_nodes, changed_edges

In [51]:
def getFinalCorrectness(updated_graph, gt_updated_graph):
  updated_nodes = updated_graph[0]
  updated_edges = updated_graph[1]
  gt_updated_nodes = gt_updated_graph[0]
  gt_updated_edges = gt_updated_graph[1]

  common_node_num = 0
  common_edge_num = 0
  updated_nodes_num = len(updated_nodes)
  updated_edges_num = len(updated_edges)
  gt_updated_nodes_num = len(gt_updated_nodes)
  gt_updated_edges_num = len(gt_updated_edges)
 
  for node in updated_nodes:
    if node in gt_updated_nodes:
      common_node_num += 1
      gt_updated_nodes.remove(node)

  for edge in updated_edges:
    if edge in gt_updated_edges:
      common_edge_num += 1
      gt_updated_edges.remove(edge)
  
  if (updated_nodes_num + gt_updated_nodes_num - common_node_num) == 0:
    node_IoU = 1.0
  else:
    node_IoU = common_node_num / (updated_nodes_num + gt_updated_nodes_num - common_node_num)

  if (updated_edges_num + gt_updated_edges_num - common_edge_num) == 0:
    edge_IoU = 1.0
  else:
    edge_IoU = common_edge_num / (updated_edges_num + gt_updated_edges_num - common_edge_num)

  # print(node_IoU, edge_IoU)
  return (node_IoU + edge_IoU) / 2.0

### Evaluation helper funcitons

In [52]:
def getIDmap(action_plan, init_graph, mode):
  id_mapping = {}
  id_mapping_success = [True] * len(action_plan)
  if mode == "our":
    for step in action_plan:
      objects = step[1]
      for obj in objects:
        id_mapping[(obj['class_name'], obj['id'])] = obj['id']
  
  elif mode == "prev":
    for step_id, step in enumerate(action_plan):
      object_names_NL = findObjs(step)
      object_names_robot = [obj.replace(' ', '_') for obj in object_names_NL]
      for obj_name in object_names_robot:
        graph_objects = [obj for obj in init_graph['nodes'] if obj['class_name'] == obj_name]
        if len(graph_objects) == 0:
          id_mapping[(obj_name, 1)] = None
          id_mapping_success[step_id] = False
        else:
          graph_object_random = random.choice(graph_objects)
          id_mapping[(obj_name, 1)] = graph_object_random['id']
  
  return id_mapping, id_mapping_success

In [53]:
def getRobotAP(action_plan, mode):
  robot_action0 = ["[SLEEP]", "[STANDUP]", "[WAKEUP]"]
  robot_action1 = ["[CLOSE]", "[CUT]", "[DRINK]", "[DROP]", "[EAT]", "[FIND]", "[GRAB]", "[GREET]", "[LIE]", "[LOOKAT]", 
                  "[MOVE]", "[OPEN]", "[PLUGIN]", "[PLUGOUT]", "[POINTAT]", "[PULL]", "[PUSH]", "[PUTOBJBACK]", 
                  "[PUTOFF]", "[PUTON]", "[READ]", "[RINSE]", "[RUN]", "[SCRUB]", "[SIT]", "[SQUEEZE]", 
                  "[SWITCHOFF]", "[SWITCHON]", "[TOUCH]", "[TURNTO]", "[TYPE]", "[WALK]", "[WASH]", "[WATCH]", "[WIPE]", "[RELEASE]"]
  robot_action2 = ["[POUR]", "[PUTBACK]", "[PUTIN]"]

  NL_action0 = ["Sleep", "Stand up", "Wake up"]
  NL_action1 = ["Close ", "Cut ", "Drink ", "Drop ", "Eat ", "Find ", "Grab ", "Greet ", "Lie on ", "Look at ", 
                "Move ", "Open ", "Plug in ", "Plug out ", "Point at ", "Pull ", "Push ", "Put back ", 
                "Take off ", "Put on ", "Read ", "Rinse ", "Run to ", "Scrub ", "Sit on ", "Squeeze ", 
                "Switch off ", "Switch on ", "Touch ", "Turn to ", "Type on ", "Walk to ", "Wash ", "Watch ", "Wipe ", "Release "]
  NL_action2 = [["Pour ", " into "], ["Put ", " on "], ["Put ", " in "]]

  action_plan_robot = []
  for step in action_plan:
    if mode == "our":
      obj_names = [obj['class_name'] for obj in step[1]]
      obj_IDs = [obj['id'] for obj in step[1]]
      step = step[0]
    elif mode == "prev":
      obj_names = [obj.replace(' ', '_') for obj in findObjs(step)]
      obj_IDs = [1] * len(obj_names)
    
    if step in NL_action0:
      robot_step = robot_action0[NL_action0.index(step)]
    
    elif any([step.startswith(NL_action) for NL_action in NL_action1]):
      robot_action = [robot_action1[idx] for idx in range(len(NL_action1)) if step.startswith(NL_action1[idx])]
      if len(robot_action) > 1:
        raise Exception('multiple actions 1')
        return None
      else:
        robot_step = robot_action[0] + " <" + obj_names[0] + "> (" + str(obj_IDs[0]) + ")"

    elif any([(step.startswith(NL_action[0]) and NL_action[1] in step) for NL_action in NL_action2]):
      robot_action = [robot_action2[idx] for idx in range(len(NL_action2)) if (step.startswith(NL_action2[idx][0]) and NL_action2[idx][1] in step)]
      if len(robot_action) > 1:
        raise Exception('multiple actions 2')
        return None
      else:
        robot_step = robot_action[0] + " <" + obj_names[0] + "> (" + str(obj_IDs[0]) + ") <" + obj_names[1] + "> (" + str(obj_IDs[1]) + ")"
    
    action_plan_robot.append(robot_step)
  return action_plan_robot

In [54]:
def evaluate(action_plan, init_graph, gt_ap, id_mapping_gt, gt_action_plan_robot, gt_step_count, gt_exec, gt_exec_len, gt_updated_graph, mode):
  # id map
  if id_mapping_gt == None:
    id_mapping_gt = {}
  id_mapping, id_mapping_success = getIDmap(action_plan, init_graph, mode)
  # print(id_mapping)
  # print(id_mapping_success)

  # robot AP
  if gt_action_plan_robot == None:
    gt_action_plan_robot = getRobotAP(gt_ap, "prev")
  action_plan_robot = getRobotAP(action_plan, mode)

  for step_id in range(len(action_plan_robot)):
    if not id_mapping_success[step_id]:
      action_plan_robot = action_plan_robot[:step_id]
  # print(action_plan_robot)

  # steps
  if gt_step_count == None:
    gt_step_count = getSteps(gt_ap)
  step_count = getSteps(action_plan)
  # print("steps:", gt_step_count, step_count)

  # executability
  if gt_exec == None:
    if len(gt_ap) == 0:
      gt_exec_len = 0
      gt_exec = 0
    else:
      gt_exec_len = getExecutability(gt_action_plan_robot, init_graph, id_mapping_gt)
      gt_exec = 100 * gt_exec_len / len(gt_ap)
  if len(action_plan) == 0:
    exec_len = 0
    exec = 0
  else:
    exec_len = getExecutability(action_plan_robot, init_graph, id_mapping)
    exec = 100 * exec_len / len(action_plan)

  # print("exec:", gt_exec_len, gt_exec, exec_len, exec)

  # LCS
  LCS = getLCS(action_plan, gt_ap, mode)
  # print("LCS:", LCS)

  # final correctness
  if gt_updated_graph == None:
    gt_updated_graph = getUpdatedGraph(gt_action_plan_robot[:gt_exec_len], init_graph, id_mapping_gt)
  updated_graph = getUpdatedGraph(action_plan_robot[:exec_len], init_graph, id_mapping)
  final_correctness = getFinalCorrectness(updated_graph, gt_updated_graph) * 100
  # print("final_correctness:", final_correctness)

  result_dict = {"step count": step_count, "executability": exec, "LCS": LCS, "final correctness": final_correctness}
  gt_result_dict = {"gt id mapping": id_mapping_gt, "gt AP robot": gt_action_plan_robot, "gt step count": gt_step_count, 
                    "gt executability length": gt_exec_len, "gt executability": gt_exec, "gt updated graph": gt_updated_graph}
  return result_dict, gt_result_dict

## 11. Plan Generation

### Load Hyperparameters

In [73]:
%cd ../output/

/content/scene_aware_language_planner/output


In [74]:
result_path = "results-" + str(datetime.datetime.now())
result_path = result_path.replace(' ', '-')

hyperparameterList = []
hyperparameterList.append(getHyperparams(res_path = result_path, source=source, WT_SCENE=0.25, TEMP=0.3, STEP_CUTOFF_THRESHOLD=1.2))
# hyperparameterList.append(getHyperparams(res_path = result_path, source=source, DIST_OBJ=0.0, WT_SCENE=0.0, STEP_CUTOFF_THRESHOLD=0.9))

os.makedirs(result_path)

In [75]:
%cd $result_path

/content/scene_aware_language_planner/output/results-2023-04-11-07:05:45.341613


### Validation

In [76]:
# Setup
os.makedirs('val')
%cd val

log_path = "log.txt"
log_file = open(log_path, "a")

/content/scene_aware_language_planner/output/results-2023-04-11-07:05:45.341613/val


In [77]:
%%time
action_plan_list = []
action_plan_PREV_list = []

all_gt_result_list = []
all_result_list = []
all_PREV_result_list = []

for hyperparams in tqdm(hyperparameterList):
  print("\n------------------ HYPERPARAMS ------------------")
  print(hyperparams)
  print("---------------------------------------------------")

  gt_result_list = []
  result_list = []
  PREV_result_list = []

  for val_idx in tqdm(range(len(validation_tasks))):
  # for val_idx in tqdm(range(10,13)):
    action_plan, init_graph, gt_ap = mainOur(hyperparams, val_idx, log_file, eval_mode='validation')
    action_plan_list.append(action_plan)
    # print('\n')
    # action_plan_PREV, init_graph, gt_ap = mainPrev(hyperparams, val_idx, log_file, eval_mode='validation')
    # action_plan_PREV_list.append(action_plan_PREV)

    # print(gt_ap[0])
    # print([step[0] for step in action_plan])
    # print([step[1] for step in action_plan])
    # print(action_plan_PREV)

    # result_dict, gt_result_dict = evaluate(action_plan, init_graph, gt_ap[0], None, None, None, None, None, None, mode="our")
    # PREV_result_dict, gt_result_dict = evaluate(action_plan_PREV, init_graph, gt_ap[0], None, None, None, None, None, None, mode="prev")
    # PREV_result_dict, _ = evaluate(action_plan_PREV, init_graph, gt_ap[0], 
    #                                gt_result_dict["gt id mapping"], 
    #                                gt_result_dict["gt AP robot"], 
    #                                gt_result_dict["gt step count"], 
    #                                gt_result_dict["gt executability"], 
    #                                gt_result_dict["gt executability length"], 
    #                                gt_result_dict["gt updated graph"], mode="prev")

    # gt_result_dict.pop("gt id mapping")
    # gt_result_dict.pop("gt AP robot")
    # gt_result_dict.pop("gt executability length")
    # gt_result_dict.pop("gt updated graph")

    # gt_result_list.append(gt_result_dict)
    # result_list.append(result_dict)
    # PREV_result_list.append(PREV_result_dict)

    # print("gt_result_dict")
    # pprint.pprint(gt_result_dict)
    # print('\n')
    # print("result_dict")
    # pprint.pprint(result_dict)
    # print('\n')
    # print("PREV_result_dict")
    # pprint.pprint(PREV_result_dict)
    # print('\n')

    # print('\n')
    # print("step count:        ", np.round(np.mean([res['gt step count'] for res in gt_result_list]), 2), "|", 
    #                      np.round(np.mean([res['step count'] for res in result_list]), 2), "|", 
    #                      np.round(np.mean([res['step count'] for res in PREV_result_list]), 2))
    
    # print("executability:     ", np.round(np.mean([res['gt executability'] for res in gt_result_list]), 2), "|", 
    #                         np.round(np.mean([res['executability'] for res in result_list]), 2), "|", 
    #                         np.round(np.mean([res['executability'] for res in PREV_result_list]), 2))
    
    # print("LCS:               ", np.round(np.mean([res['LCS'] for res in result_list]), 2), "|", 
    #               np.round(np.mean([res['LCS'] for res in PREV_result_list]), 2))
    
    # print("final correctness: ", np.round(np.mean([res['final correctness'] for res in result_list]), 2), "|", 
    #                             np.round(np.mean([res['final correctness'] for res in PREV_result_list]), 2))
    
    # print('\n\n')
    gc.collect()
    torch.cuda.empty_cache()
  all_gt_result_list.append(gt_result_list)
  all_result_list.append(result_list)
  all_PREV_result_list.append(PREV_result_list)

log_file.close()

  0%|          | 0/1 [00:00<?, ?it/s]


------------------ HYPERPARAMS ------------------
{'ACT_CUTOFF_THRESHOLD': 0.7,
 'ACT_CUTOFF_THRESHOLD_PREV': 0.8,
 'DIST_OBJ': 0.5,
 'LLM_ACT': 0.3,
 'LLM_ACT_PREV': 0.3,
 'LLM_OBJ': 0.1,
 'MAX_EXAMPLES': 10,
 'MAX_STEPS': 20,
 'OBJ_CUTOFF_THRESHOLD': 0.0,
 'P': 0.5,
 'SAMPLE_MATCH_NUM': 1,
 'STEP_CUTOFF_THRESHOLD': 1.2,
 'TEMP': 0.3,
 'WT_ACT': 1.0,
 'WT_OBJ': 0.5,
 'WT_SCENE': 0.25,
 'res_path': 'results-2023-04-11-07:05:45.341613',
 'sampling_params': {'do_sample': True,
                     'num_return_sequences': 20,
                     'output_scores': True,
                     'repetition_penalty': 1.2,
                     'return_dict_in_generate': True,
                     'temperature': 0.3,
                     'top_p': 0.9,
                     'use_cache': True},
 'sampling_params_PREV': {'do_sample': True,
                          'num_return_sequences': 10,
                          'output_scores': True,
                          'repetition_penalty': 1.2,
      


  0%|          | 0/25 [00:00<?, ?it/s][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 0
Task: Watch  TV
Step 1: Walk to living room                            living_room
Step 2: Walk to remote control                         remote_control
Step 3: Find remote control                            remote_control
Step 4: Grab remote control                            remote_control
Step 5: Walk to couch                                  couch
Step 6: Sit on couch                                   couch
Step 7: Touch remote control                           remote_control
Step 8: Find television                                television
Step 9: Switch on television                           television
Step 10: Turn to television                            television
Step 11: Watch television                              television
---------- EXAMPLE END ----------

Task: Play games
Step 1: Walk to living room                            dining_room
Step 2: Walk to remote control                        


  4%|▍         | 1/25 [00:03<01:24,  3.52s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 1
Task: Put away clean clothes
Step 1: Find soap                                      soap
Step 2: Turn to soap                                   soap
Step 3: Point at soap                                  soap
Step 4: Wash soap                                      soap
Step 5: Turn to soap                                   soap
Step 6: Look at soap                                   soap
---------- EXAMPLE END ----------

Task: Wash clothes
Step 1: Find laundry detergent                         laundry_detergent
Step 2: Turn to laundry detergent                      laundry_detergent
Step 3: Point at laundry detergent                     laundry_detergent
Step 4: Wash laundry detergent                         laundry_detergent
Step 5: Turn to laundry detergent                      laundry_detergent
Step 6: Look at laundry detergent                      laundry_detergent
Step 7: Wash laundry detergent           


  8%|▊         | 2/25 [00:18<03:50, 10.00s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 2
Task: Watch  TV
Step 1: Walk to living room                            living_room
Step 2: Walk to remote control                         remote_control
Step 3: Find remote control                            remote_control
Step 4: Grab remote control                            remote_control
Step 5: Walk to couch                                  couch
Step 6: Sit on couch                                   couch
Step 7: Touch remote control                           remote_control
Step 8: Find television                                television
Step 9: Switch on television                           television
Step 10: Turn to television                            television
Step 11: Watch television                              television
---------- EXAMPLE END ----------

Task: Listen to music
Step 1: Walk to living room                            dining_room
Step 2: Walk to remote control                   


 12%|█▏        | 3/25 [00:21<02:32,  6.94s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 3
Task: Put away groceries
Step 1: Walk to kitchen                                kitchen
Step 2: Walk to fridge                                 fridge
Step 3: Find fridge                                    fridge
Step 4: Open fridge                                    fridge
Step 5: Find groceries                                 groceries
Step 6: Grab groceries                                 groceries
Step 7: Put groceries in fridge                        groceries, fridge
Step 8: Close fridge                                   fridge
---------- EXAMPLE END ----------

Task: Put groceries in Fridge
Step 1: Walk to fridge                                 freezer
Step 2: Walk to freezer                                freezer
Step 3: Open freezer                                   freezer
Step 4: Open fridge                                    freezer
Step 5: Find fridge                                    freezer
Ste


 16%|█▌        | 4/25 [00:43<04:30, 12.88s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 4
Task: Take nap
Step 1: Walk to bedroom                                bedroom
Step 2: Walk to pillow                                 pillow
Step 3: Find pillow                                    pillow
Step 4: Grab pillow                                    pillow
Step 5: Find bed                                       bed
Step 6: Put pillow on bed                              pillow, bed
Step 7: Lie on bed                                     bed
Step 8: Sleep                                          
---------- EXAMPLE END ----------

Task: Go to sleep
Step 1: Walk to bedroom                                bedroom
Step 2: Walk to pillow                                 pillow
Step 3: Find pillow                                    pillow
Step 4: Grab pillow                                    pillow
Step 5: Find bed                                       bed
Step 6: Put pillow in bed                              p


 20%|██        | 5/25 [00:52<03:53, 11.67s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 5
Task: vacuum carpet
Step 1: Walk to living room                            living_room
Step 2: Walk to closet                                 closet
Step 3: Open closet                                    closet
Step 4: Find vacuum cleaner                            vacuum_cleaner
Step 5: Grab vacuum cleaner                            vacuum_cleaner
Step 6: Pull vacuum cleaner                            vacuum_cleaner
Step 7: Plug in vacuum cleaner                         vacuum_cleaner
Step 8: Switch on vacuum cleaner                       vacuum_cleaner
Step 9: Pull vacuum cleaner                            vacuum_cleaner
Step 10: Push vacuum cleaner                           vacuum_cleaner
Step 11: Pull vacuum cleaner                           vacuum_cleaner
Step 12: Push vacuum cleaner                           vacuum_cleaner
Step 13: Pull vacuum cleaner                           vacuum_cleaner
Step 14: Pu


 24%|██▍       | 6/25 [01:00<03:14, 10.24s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 6
Task: Do taxes
Step 1: Walk to home office                            home_office
Step 2: Walk to chair                                  chair
Step 3: Find chair                                     chair
Step 4: Pull chair                                     chair
Step 5: Sit on chair                                   chair
Step 6: Find computer                                  computer
Step 7: Switch on computer                             computer
Step 8: Turn to computer                               computer
Step 9: Look at computer                               computer
Step 10: Find document                                 document
Step 11: Grab document                                 document
Step 12: Read document                                 document
Step 13: Find keyboard                                 keyboard
Step 14: Type on keyboard                              keyboard
Step 15: Switch off 


 28%|██▊       | 7/25 [01:28<04:48, 16.02s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 7
Task: Browse internet
Step 1: Walk to home office                            home_office
Step 2: Walk to desk                                   desk
Step 3: Find chair                                     chair
Step 4: Sit on chair                                   chair
Step 5: Find computer                                  computer
Step 6: Switch on computer                             computer
---------- EXAMPLE END ----------

Task: Playing video game
Step 1: Type on home office                            home_office
Step 2: Watch television                               television
Step 3: Watch television                               television

[Terminating early because best overall score is lower than CUTOFF_THRESHOLD (0.9479031951009745 < 1.2)]



 32%|███▏      | 8/25 [01:35<03:44, 13.21s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 8
Task: Clean room
Step 1: Find toy                                       toy
Step 2: Grab toy                                       toy
Step 3: Find clothes dress                             clothes_dress
Step 4: Grab clothes dress                             clothes_dress
Step 5: Find vacuum cleaner                            vacuum_cleaner
Step 6: Wipe vacuum cleaner                            vacuum_cleaner
---------- EXAMPLE END ----------

Task: Clean bathroom
Step 1: Find toilet paper                              toilet_paper
Step 2: Grab toilet paper                              toilet_paper
Step 3: Grab toothbrush                                toothbrush
Step 4: Grab tooth paste                               tooth_paste
Step 5: Grab shampoo                                   shampoo
Step 6: Wash hair                                      hair

[Terminating early because best overall score is lower than 


 36%|███▌      | 9/25 [01:46<03:21, 12.62s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 9
Task: Have snack
Step 1: Walk to kitchen                                kitchen
Step 2: Walk to glass                                  glass
Step 3: Find glass                                     glass
Step 4: Grab glass                                     glass
Step 5: Walk to fridge                                 fridge
Step 6: Open fridge                                    fridge
Step 7: Find juice                                     juice
Step 8: Grab juice                                     juice
Step 9: Pour juice into glass                          juice, glass
Step 10: Put juice in fridge                           juice, fridge
Step 11: Close fridge                                  fridge
Step 12: Walk to cupboard                              cupboard
Step 13: Open cupboard                                 cupboard
Step 14: Find food snack                               food_snack
Step 15: Grab food s


 40%|████      | 10/25 [01:54<02:47, 11.16s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 10
Task: Write report
Step 1: Walk to home office                            home_office
Step 2: Walk to desk                                   desk
Step 3: Find computer                                  computer
Step 4: Switch on computer                             computer
Step 5: Find chair                                     chair
Step 6: Pull chair                                     chair
Step 7: Sit on chair                                   chair
Step 8: Turn to computer                               computer
Step 9: Look at computer                               computer
Step 10: Find mouse                                    mouse
Step 11: Grab mouse                                    mouse
Step 12: Pull mouse                                    mouse
Step 13: Touch mouse                                   mouse
Step 14: Turn to computer                              computer
Step 15: Look at computer   


 44%|████▍     | 11/25 [02:26<04:06, 17.58s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 11
Task: Take shower
Step 1: Walk to bathroom                               bathroom
Step 2: Walk to shower                                 shower
Step 3: Find soap                                      soap
Step 4: Scrub soap                                     soap
Step 5: Find water                                     water
Step 6: Rinse water                                    water
---------- EXAMPLE END ----------

Task: Apply lotion
Step 1: Put shampoo on face                            shampoo, picture
Step 2: Put shampoo on hair                            shampoo, comb

[Terminating early because best overall score is lower than CUTOFF_THRESHOLD (1.04862895648698 < 1.2)]



 48%|████▊     | 12/25 [02:32<03:00, 13.88s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 12
Task: Read
Step 1: Walk to book                                   book
Step 2: Walk to bookshelf                              bookshelf
Step 3: Find book                                      book
Step 4: Grab book                                      book
Step 5: Find chair                                     chair
Step 6: Sit on chair                                   chair
Step 7: Read book                                      book
---------- EXAMPLE END ----------

Task: Read to child
Step 1: Walk to child                                  child
Step 2: Walk to kids bedroom                           dining_room

[Terminating early because best overall score is lower than CUTOFF_THRESHOLD (1.099229675065992 < 1.2)]



 52%|█████▏    | 13/25 [02:35<02:08, 10.74s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 13
Task: Browse internet
Step 1: Walk to home office                            home_office
Step 2: Walk to desk                                   desk
Step 3: Find chair                                     chair
Step 4: Sit on chair                                   chair
Step 5: Find computer                                  computer
Step 6: Switch on computer                             computer
---------- EXAMPLE END ----------

Task: Browse computer
Step 1: Turn to computer                               computer
Step 2: Open window                                    window
Step 3: Turn to home office                            home_office
Step 4: Walk to desk                                   desk
Step 5: Walk to computer                               computer
Step 6: Close window                                   window
Step 7: Turn to home office                            home_office

[Terminating early


 56%|█████▌    | 14/25 [02:51<02:14, 12.26s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 14
Task: Open front door
Step 1: Walk to entrance hall                          entrance_hall
---------- EXAMPLE END ----------

Task: Open door
Step 1: Put keys in bedroom                            keys, bedroom
Step 2: Open cupboard                                  cupboard
Step 3: Put on light                                   light
Step 4: Switch off light                               light
Step 5: Put on light                                   light
Step 6: Put on light                                   light

[Terminating early because best overall score is lower than CUTOFF_THRESHOLD (1.1883280408000603 < 1.2)]



 60%|██████    | 15/25 [03:10<02:22, 14.28s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 15
Task: Admire art
Step 1: Walk to living room                            living_room
Step 2: Walk to painting                               painting
Step 3: Turn to painting                               painting
Step 4: Look at painting                               painting
Step 5: Find painting                                  painting
Step 6: Turn to painting                               painting
Step 7: Look at painting                               painting
---------- EXAMPLE END ----------

Task: Draw picture
Step 1: Walk to living room                            dining_room
Step 2: Walk to drawing                                drawing
Step 3: Turn to drawing                                drawing
Step 4: Look at drawing                                drawing
Step 5: Find drawing                                   drawing
Step 6: Turn to drawing                                drawing
Step 7: Look at d


 64%|██████▍   | 16/25 [03:37<02:43, 18.15s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 16
Task: Watch  TV
Step 1: Walk to living room                            living_room
Step 2: Walk to remote control                         remote_control
Step 3: Find remote control                            remote_control
Step 4: Grab remote control                            remote_control
Step 5: Walk to couch                                  couch
Step 6: Sit on couch                                   couch
Step 7: Touch remote control                           remote_control
Step 8: Find television                                television
Step 9: Switch on television                           television
Step 10: Turn to television                            television
Step 11: Watch television                              television
---------- EXAMPLE END ----------

Task: Watch fly
Step 1: Walk to living room                            dining_room
Step 2: Walk to remote control                        


 68%|██████▊   | 17/25 [03:40<01:49, 13.69s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 17
Task: Wash dishes
Step 1: Walk to kitchen                                kitchen
Step 2: Walk to sink                                   sink
Step 3: Find faucet                                    faucet
Step 4: Switch on faucet                               faucet
Step 5: Find dish soap                                 dish_soap
Step 6: Grab dish soap                                 dish_soap
Step 7: Pour dish soap into sink                       dish_soap, sink
Step 8: Put back dish soap                             dish_soap
Step 9: Find sponge                                    sponge
Step 10: Grab sponge                                   sponge
Step 11: Find bowl                                     bowl
Step 12: Grab bowl                                     bowl
Step 13: Scrub bowl                                    bowl
Step 14: Rinse bowl                                    bowl
Step 15: Find dishrack    


 72%|███████▏  | 18/25 [05:09<04:13, 36.19s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 18
Task: Cut bread
Step 1: Walk to kitchen                                kitchen
Step 2: Walk to kitchen cabinet                        kitchen_cabinet
Step 3: Find kitchen cabinet                           kitchen_cabinet
Step 4: Open kitchen cabinet                           kitchen_cabinet
Step 5: Find knife                                     knife
Step 6: Grab knife                                     knife
Step 7: Find plate                                     plate
Step 8: Grab plate                                     plate
Step 9: Close kitchen cabinet                          kitchen_cabinet
Step 10: Find table                                    table
Step 11: Put plate on table                            plate, table
Step 12: Put knife on table                            knife, table
Step 13: Find fridge                                   fridge
Step 14: Open fridge                                   


 76%|███████▌  | 19/25 [05:24<02:59, 29.85s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 19
Task: Wash face
Step 1: Walk to bathroom                               bathroom
Step 2: Walk to rag                                    rag
Step 3: Find rag                                       rag
Step 4: Grab rag                                       rag
Step 5: Walk to sink                                   sink
Step 6: Find faucet                                    faucet
Step 7: Switch on faucet                               faucet
Step 8: Put rag on sink                                rag, sink
Step 9: Grab rag                                       rag
Step 10: Find face soap                                face_soap
Step 11: Grab face soap                                face_soap
Step 12: Pour face soap into rag                       face_soap, rag
Step 13: Put back face soap                            face_soap
Step 14: Find face                                     face
Step 15: Scrub face            


 80%|████████  | 20/25 [05:33<01:57, 23.51s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 20
Task: Push all chairs in
Step 1: Walk to dining room                            dining_room
Step 2: Walk to table                                  table
Step 3: Find chair                                     chair
Step 4: Push chair                                     chair
Step 5: Find chair                                     chair
Step 6: Push chair                                     chair
Step 7: Find chair                                     chair
Step 8: Push chair                                     chair
Step 9: Find chair                                     chair
Step 10: Push chair                                    chair
---------- EXAMPLE END ----------

Task: Play musical chairs
Step 1: Walk to dining room                            dining_room
Step 2: Walk to table                                  table
Step 3: Find chair                                     chair
Step 4: Push chair            


 84%|████████▍ | 21/25 [05:55<01:32, 23.22s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 21
Task: Shred receipts
Step 1: Walk to home office                            home_office
Step 2: Walk to filing cabinet                         filing_cabinet
Step 3: Find filing cabinet                            filing_cabinet
Step 4: Open filing cabinet                            filing_cabinet
Step 5: Find receipt                                   receipt
Step 6: Grab receipt                                   receipt
Step 7: Close filing cabinet                           filing_cabinet
Step 8: Find electrical outlet                         electrical_outlet
Step 9: Find shredder                                  shredder
Step 10: Plug in shredder                              shredder
Step 11: Switch on shredder                            shredder
Step 12: Put receipt on shredder                       receipt, shredder
---------- EXAMPLE END ----------

Task: Shredding
Step 1: Walk to home office           


 88%|████████▊ | 22/25 [06:04<00:56, 18.80s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 22
Task: Story reading time
Step 1: Walk to kids bedroom                           kids_bedroom
Step 2: Walk to child                                  child
Step 3: Find child                                     child
Step 4: Greet child                                    child
Step 5: Find book                                      book
Step 6: Grab book                                      book
Step 7: Find bed                                       bed
Step 8: Sit on bed                                     bed
Step 9: Turn to child                                  child
Step 10: Look at child                                 child
Step 11: Turn to book                                  book
Step 12: Point at book                                 book
Step 13: Read book                                     book
---------- EXAMPLE END ----------

Task: Write story
Step 1: Walk to kids bedroom                        


 92%|█████████▏| 23/25 [06:09<00:29, 14.65s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 23
Task: Rearrange photo frames
Step 1: Walk to living room                            living_room
Step 2: Walk to wall                                   wall
Step 3: Find picture                                   picture
Step 4: Turn to picture                                picture
Step 5: Look at picture                                picture
Step 6: Grab picture                                   picture
Step 7: Walk to wall                                   wall
Step 8: Put picture on wall                            picture, wall
Step 9: Find picture                                   picture
Step 10: Turn to picture                               picture
Step 11: Look at picture                               picture
Step 12: Grab picture                                  picture
Step 13: Walk to wall                                  wall
Step 14: Put picture on wall                           picture, wall
---


 96%|█████████▌| 24/25 [06:34<00:17, 17.82s/it][A

----------Our Output----------
---------- GIVEN EXAMPLE ---------- val idx: 24
Task: Do homework
Step 1: Walk to living room                            living_room
Step 2: Walk to couch                                  couch
Step 3: Find couch                                     couch
Step 4: Sit on couch                                   couch
Step 5: Find paper                                     paper
Step 6: Grab paper                                     paper
Step 7: Drop paper                                     paper
---------- EXAMPLE END ----------

Task: Do work
Step 1: Walk to kitchen                                oven
Step 2: Walk to sink                                   sink
Step 3: Walk to kitchen counter                        kitchen_counter
Step 4: Walk to sink                                   sink
Step 5: Walk to sink                                   sink

[Terminating early because best overall score is lower than CUTOFF_THRESHOLD (0.7863678698720713 < 1.2)]



100%|██████████| 25/25 [06:45<00:00, 16.24s/it]
100%|██████████| 1/1 [06:45<00:00, 405.92s/it]

CPU times: user 6min 46s, sys: 1.59 s, total: 6min 47s
Wall time: 6min 45s





### Test

In [71]:
# Setup
os.makedirs('test')
%cd test

log_path = "log.txt"
log_file = open(log_path, "w")

/content/scene_aware_language_planner/output/results-2023-04-11-07:03:46.977989/test


In [None]:
%%time
action_plan_list = []
action_plan_PREV_list = []

all_gt_result_list = []
all_result_list = []
all_PREV_result_list = []
for hyperparams in tqdm(hyperparameterList):
  print("\n------------------ HYPERPARAMS ------------------")
  print(hyperparams)
  print("---------------------------------------------------")

  gt_result_list = []
  result_list = []
  PREV_result_list = []
  for test_idx in tqdm(range(len(test_tasks))):
  # for val_idx in tqdm(range(10,13)):
    action_plan, init_graph, gt_ap = mainOur(hyperparams, test_idx, log_file, eval_mode='test')
    action_plan_list.append(action_plan)
    # print('\n')
    # action_plan_PREV, init_graph, gt_ap = mainPrev(hyperparams, test_idx, log_file, eval_mode='test')
    # action_plan_PREV_list.append(action_plan_PREV)

    # print(gt_ap[0])
    # print([step[0] for step in action_plan])
    # print([step[1] for step in action_plan])
    # print(action_plan_PREV)

    result_dict, gt_result_dict = evaluate(action_plan, init_graph, gt_ap[0], None, None, None, None, None, None, mode="our")
    # PREV_result_dict, gt_result_dict = evaluate(action_plan_PREV, init_graph, gt_ap[0], None, None, None, None, None, None, mode="prev")
    # PREV_result_dict, _ = evaluate(action_plan_PREV, init_graph, gt_ap[0], 
    #                                gt_result_dict["gt id mapping"], 
    #                                gt_result_dict["gt AP robot"], 
    #                                gt_result_dict["gt step count"], 
    #                                gt_result_dict["gt executability"], 
    #                                gt_result_dict["gt executability length"], 
    #                                gt_result_dict["gt updated graph"], mode="prev")

    # gt_result_dict.pop("gt id mapping")
    # gt_result_dict.pop("gt AP robot")
    # gt_result_dict.pop("gt executability length")
    # gt_result_dict.pop("gt updated graph")

    # gt_result_list.append(gt_result_dict)
    # result_list.append(result_dict)
    # PREV_result_list.append(PREV_result_dict)

    # print("gt_result_dict")
    # pprint.pprint(gt_result_dict)
    # print('\n')
    # print("result_dict")
    # pprint.pprint(result_dict)
    # print('\n')
    # print("PREV_result_dict")
    # pprint.pprint(PREV_result_dict)
    # print('\n')

    # print('\n', len(gt_result_list), '\n')
    # print("step count:", np.round(np.mean([res['gt step count'] for res in gt_result_list]), 3), 
    #       np.round(np.mean([res['step count'] for res in result_list]), 3), 
    #       np.round(np.mean([res['step count'] for res in PREV_result_list]), 3))
    
    # print("executability:", np.round(np.mean([res['gt executability'] for res in gt_result_list]), 3), 
    #       np.round(np.mean([res['executability'] for res in result_list]), 3), 
    #       np.round(np.mean([res['executability'] for res in PREV_result_list]), 3))
    
    # print("LCS:", np.round(np.mean([res['LCS'] for res in result_list]), 3), 
    #       np.round(np.mean([res['LCS'] for res in PREV_result_list]), 3))
    
    # print("final correctness:", np.round(np.mean([res['final correctness'] for res in result_list]), 3), 
    #       np.round(np.mean([res['final correctness'] for res in PREV_result_list]), 3))
    
    # print('\n\n')
  all_gt_result_list.append(gt_result_list)
  all_result_list.append(result_list)
  all_PREV_result_list.append(PREV_result_list)

log_file.close()