SIMMC 2.1 Dataset

Summary

Our challenge focuses on the new SIMMC 2.1 dataset, which is grounded in an immersive virtual environment in the shopping domain: furniture and fashion.

The dataset was collected through the multimodal dialog simulator, followed by a manual paraphrasing step.

The following paper describes in detail the dataset, the collection process, and the annotations we provide:

Satwik Kottur*, Seungwhan Moon*, Alborz Geramifard and Babak Damavandi, "SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations" (2021).

If you want to publish experimental results with our dataset or use the baseline models, please cite the following article:

@article{kottur2021simmc,
  title={SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations},
  author={Kottur, Satwik and Moon, Seungwhan and Geramifard, Alborz and Damavandi, Babak},
  journal={arXiv preprint arXiv:2104.08667},
  year={2021}
}

Dataset Splits

We randomly split each of our SIMMC 2.0 dataset into four components:

Split	Number of Dialogs
Train (64%)	7307
Dev (5%)	563
Test-Dev (15%)	1687
Test-Std (15%)	1687

NOTE

Dev is for hyperparameter selection and other modeling choices.
Test-Dev is the publicly available test set to measure model performance and report results outside the challenge.
Test-Std is used as the main test set for evaluation for Challenge Phase 2 (to be released later).

Download the Dataset

We are hosting our datasets in this Github Repository (with Git LFS). First, install Git LFS

$ git lfs install

Clone our repository to download both the dataset and the code:

$ git clone https://github.com/facebookresearch/simmc2.git

Overview of the Dataset Repository

The data are made available in the following files:

[Main Data]
- Full dialogs: ./simmc2.1_dials_dstc11_{train|dev|devtest|test}.json
- Scene images: ./simmc2_scene_images_dstc10_public.zip
- Scene JSONs: ./simmc2_scene_jsons_dstc10_public.zip

[Metadata]
- Fashion metadta: ./fashion_prefab_metadata_all.json
- Furniture metadata: ./furniture_prefab_metadata_all.json

NOTE: The test set will be made available after DSTC10.

Data Format

For each {train|dev|devtest} split, the JSON data is formatted as follows:

{
  "split": support.extract_split_from_filename(json_path),
  "version": "simmc2.1_dstc11",
  "year": 2021,
  "domain": FLAGS.domain,
  "dialogue_data": [
    {
      “dialogue”: [
        {
          “turn_idx”: <int>,      
          “system_transcript”: <str>,
          “system_transcript_annotated”: 
          {
            “act”: <str>,
            "act_attributes": {
              "slot_values": {
                <str> slot_name : <str> slot_value, ...
              },
              "request_slots": [ <str> ],
              "object": [ <int> ],
          },
          “transcript”: <str>,
          “transcript_annotated”: {
	      (same format as system_transcript_annotated, plus the following)
              “disambiguation_label”: {0, 1},
              “disambiguation_candidates”: [ <int> ],
              “disambiguation_candidates_raw”: [ <int> or <str> ],	      	  
	  },
        }, // end of a turn (always sorted by turn_idx)
        ...
      ],
      “dialogue_idx”: <int>,    
      “domains”: <str>,    
      “mentioned_object_ids”: [ <int> ],
      "scene_ids": {
        <int> start_turn_id : <str> scene_id
      }
    }
  ] // end of a dialogue
}

The scene information file ({scene_name}_scene.json) is formatted as follows:

{
	"scenes": [
		{
			"objects": [
				{
					"index": <int>, 
					"unique_id": <int>,
					"prefab_path": <str>,
					"bbox": [<int>, <int>, <int>, <int>],
					"position": [<float>, <float>, <float>, <float>],
				}
			],
			"relationships": {
				"<relation>": {
					<obj_id>: <list of objects with relation to obj_id>
				}
			}
		}
	]
}

bbox: x, y, height, width (x and y are of top left corner of the bounding box)
Please see models/utils/visualize_bboxes.py to better understand these coordinates.
position: Position in the 3D scene, can be ignored for modeling index: Index for the instance in the scene unique_id: Unique index for the instance based on the object

The data can be processed with respective data readers / preprocessing scripts for each sub-task (please refer to the respective README documents). Each sub-task will describe which fields can be used as input.

NOTES

transcript_annotated provides the detailed structural intents, slots and values for each USER turn. system_transcript_annotated provides the similar information for ASSISTANT turns. object field in act_attributes includes a list of objects referred to in each turn - each marked with a local index throughout the dialog (obj_idx).

For instance, a transcript_annotated with act: DA:REQUEST:ADD_TO_CART:CLOTHING with an object field [2, 3] would annotate a user belief state with the intention of adding objects 2 and 3 to the cart.

Participants may use the visual image information for inspection, or as training signals for some of the sub-tasks.

We also release the metadata for each object referred in the dialog data:

{
    <int> object_id: {
        “metadata”: {dict},
        “url”: <str> source image
    }, // end of an object
}

Attributes for each object either pulled from the original sources or annotated manually. Note that some of the catalog-specific attributes (e.g. availableSizes, brand, etc.) were randomly and synthetically generated.

Each item in a catalog metadata has a unique <int> object_id. Each scene_json defines the mapping from the local_idx (local to each dialog), to its canonical object_id reference, for each dialog. This local_idx is used in transcript_annotated as an object slot. For example, given a local_id_to_obj_id_map = {0: 123, 1: 234, 2: 345} -- the transcript_annotated: {‘act’: ‘DA:REQUEST:ADD_TO_CART’, ‘objects’: [2]} would indicate this particular dialog act performed upon OBJECT_2 (2 == local_idx), which has a canonical reference to an object with object_id: 345. We are including this information in case you want to refer to the additional information provided in the metadata.json file.

NOTES on Additional Annotations in SIMMC 2.1 (Updated from SIMMC 2.0)

The main difference is in the following fields: object, disambiguation_label, disambiguation_candidates, and disambiguation_candidates_raw, all under the transcript_annotated field.

disambiguation_label == 1 indicates that the specific turn has an ambiguous mention (and vice versa).

disambiguation_candidates lists all possible object IDs that could be identified given an ambiguous mention.

disambiguation_candidates_raw lists all individual object IDs or object categories that could be identified given an ambiguous mention.

For instance, an utterance with an ambiguous mention "How much is this shirt", disambiguation_candidates_raw could be labeled as ['all', 'shirt'], while disambiguation_candidates = [1, 3, 4, 5, ...] - listing all object IDs that are shirts.

Please note that for turns with ambiguous mentions, object field is annotated as empty.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SIMMC 2.1 Dataset

Summary

Dataset Splits

Download the Dataset

Overview of the Dataset Repository

Data Format

Files

README.md

Latest commit

History

README.md

File metadata and controls

SIMMC 2.1 Dataset

Summary

Dataset Splits

Download the Dataset

Overview of the Dataset Repository

Data Format