A Corpus of Natural Language Instructions for Collaborative Manipulation
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
code
data
docs
study1_images_with_red_arrows
study2_images_without_red_arrows
.gitignore
README.md

README.md

Introduction

This web page supports a dataset of natural language instructions for object specification in manipulation scenarios. It is comprised of 1582 individual written instructions which were collected via online crowdsourcing. Each of these instructions was elicited from one of 28 included scenario images. This dataset is particularly useful for researchers who work in natural language processing, human-robot interaction, and robotic tabletop manipulation. In addition to serving as a rich corpus of domain specific language, it provides a benchmark of image/instruction pairs to be used in system evaluations as well as uncovers inherent challenges in tabletop object specification.

Referred Journal Publication

R. Scalise*, S. Li*, H. Admoni, S. Rosenthal, and S. Srinivasa "Natural Language Instructions for Human-Robot Collaborative Manipulation", International Journal of Robotics Research, in press.

Data and accessing code

  1. Primary Dataset: Natural Language Instructions Corpus

    • Data in CSV

    • Downsampled Data in CSV

      (We downsampled the data from 1582 to 1400 and use the 1400 instructions in the evaluation study)

    • Accessing code in Python

    • Example of one row from the main instruction dataset table:


    • Instruction Index Scenario AgentType Difficulty TimeToComplete Strategy Challenging GeneralComments Age Gender Occupation ComputerUsage DominantHand EnglishFirst ExpWithRobots ExpWithRCCars ExpWithFPS ExpWithRTS ExpWithRobotComments
      Pick up the yellow cube. 1341 Configuration_1_v1.png human 1 00:00:16 Tried to find something that would differ the specific cube from others Moderately challenging at the beginning but it get's easier with practice. 28 female Engineer 15-20 Right 1 3 1 5 3 Yes, I had to build one in one of my classes 3.593606

      • Instruction: When prompted with the scenario stimulus image, the participant generated this instruction.

      • Index: This is Instruction ID #1341 - it is consistent across all data files

      • Scenario: The corresponding stimulus image used to elicit the instruction. Configuration_01_v1.png

      • AgentType: The participant was told they were instructing a human rather than a robot.

      • Difficulty: The participant rated this scenario as 5 (most difficult).

      • TimeToComplete: The participant spent 16 seconds generating their instruction.

      • Demographics: Please refer to Table 2 in the IJRR Data Paper for further details.

  2. Supplementary Dataset: Instruction Evaluation

    • Full Data in JSON

    • Full Data in CSV

    • Averaged Data in JSON

    • Averaged Data in CSV

    • Python code to access JSON data

    • Python code to access CSV data

      • Note: in accessing code of study 2, r_target_block_index is referring to the index of the target block. The index of all the blocks and the indices of target blocks in both versions of each scenario on the tabletop are annotated in images_code.pdf

    • Example of one row from the evaluation data table:


    • Instruction Index Scenario NumOfWords TargetBlockId ClickedBlockId Correctness TimeToComplete DifficultyComm ObsHardComm ObsEasyComm AddiComm Age Gender Occupation ComputerUsage DominantHand EnglishFirst ExpWithRobots ExpWithRCCars ExpWithFPS ExpWithRTS ExpWithRobotComments InternalUserID
      Pick up the yellow cube. 1341 Configuration_1_v1.png 5 1 1 1 3.593606 Nice Game and Fun Nothing Nothing 37 female SEO >20 Right 1 6 6 6 6 No Idea 165

      • Instruction: The instruction a participant was prompted with when searching for the block from the source stimulus image.

      • Index: This is Instruction ID #1341 - it is consistent across all data files

      • Scenario: In the evaluation study, there was no distinction between versions of stimulus images e.g. 'v0 or v1'. Configuration_01_v1.png or Configuration_01_v2.png would both correspond to Configuration_01.png as they are the same when the red indication arrow is removed.

      • NumOfWords: Number of words within this instruction

      • TargetBlockId: A number assigned to each block within an image as defined in images_code.pdf Page 1. The real target block in the original source images was block '1'.

      • ClickedBlockId: A number which comes from the same definition as the TargetBlockId. This number refers to the block that was selected by the evaluating participant.

      • Correctness: This participant successfully selected the target block (1 if correct, 0 if wrong).

      • TimeToComplete: The participant spent ~3.6 seconds evaluating this instruction.

      • Demographics: Please refer to Table 3 in the IJRR Data Paper for further details.

Stimulus Images

  1. The stimulus images used in Study 1

    • "Configuration_example_page.png" is specifically used in the example page of the online Mechanical Turk study.

    • Images from "Configuration_01_**.png" to "Configuration_14_**.png" are the images used as actual stimuli. From each of the 14 configurations, there are 2 possible target blocks selected which are indicated by a red arrow ("Configuration_**_v1.png" and "Configuration_**_v2.png"). In total, there are 28 unique scenarios in the set of stimulus.

    • An example

      stimulus_image_example_1

  2. The stimulus images used in Study 2

    • It contains 1 image specifically used in the example page and 14 different images as the actual stimuli.

    • An example

      stimulus_image_example_2

Publications based on this dataset

  1. Conference Papers

    Shen Li*, Rosario Scalise*, Henny Admoni, Stephanie Rosenthal, and Siddhartha S Srinivasa. Spatial references and perspective in natural language instructions for collaborative manipulation. In Proceedings of the International Symposium on Robot and Human Interactive Communication Conference. IEEE, 2016.

  2. Workshop Papers

    Shen Li*, Rosario Scalise*, Henny Admoni, Stephanie Rosenthal, and Siddhartha S Srinivasa. Perspective in Natural Language Instructions for Collaborative Manipulation. In Proceedings of the Robotics: Science and Systems Workshop on Model Learning for Human-Robot Communication. 2016.

  3. Posters

    Workshop at Robotics: Science and Systems 2016 - Model Learning for Human-Robot Communication

Contact

If you have any questions about the dataset, or intend to collaborate with us on human-robot communication, please contact us! We are open and excited to collaborate!

You can reach either of us via email:

Rosario Scalise rscalise@andrew.cmu.edu

Shen Li shenli@cmu.edu