Task-Oriented Dialogue Dataset Survey

A dataset survey about task-oriented dialogue, including information about recent datasets.

See Survey Here or in Excel File

Name	Introduction	Multi/Single Turn	Task	Task Detail	Public Accessible	Links	Size & Stats	Included Label	Missing Label
Dialog bAbI tasks data	1. Facebook's 6 task-oriented dialogues data set consist of 6 different tasks.2. Dataset for task 1-5 is constucted automaticly from bots' chat(Bot2Bot). And dataset for task 6 is simply reformated dstc2 dataset.3. A Shared database is included.4. This is the only task-oriented dataset among bAbI tasks.5. The goal of it is to evaluate end2end tasks, so there is not intents and slots.	M	Task Oriented	Book a table at a restaurant	Yes	Download:https://research.fb.com/downloads/babi/Paper:http://arxiv.org/abs/1605.07683	For each task, training 1000 develop 1000test 1000 For tasks 1-5, second test set (with suffix -OOV.txt) that contains dialogs including entities not present.	API callFull Database	SlotIntentUser ActAgent Act
Stanford Dialog Dataset	1. Standford NLP group's data of car autopilot agent.2. Human2Human3. A quick intro http://m.sohu.com/n/499803391/	M	Task Oriented	car autopilot agent: schedule, weather, navigation	Yes	Download:http://nlp.stanford.edu/projects/kvret/kvret_dataset_public.zipPaper:https://arxiv.org/abs/1705.05414	Training Dialogues 2,425Validation Dialogues 302Test Dialogues 304Avg. # of Utterances Per Dialogue 5.25	Dialogue level databaseUser Act(inform, request slots)Agent Act(inform, request slots)	API callIntentSlot
Stanford Dialog Dataset Labeled	1. Stanford data labeled by us, relabel slot & intent2. Human2Human3. A quick intro http://m.sohu.com/n/499803391/ to stanford data4. Annotation handbook: https://docs.google.com/document/d/1ROARKf8AJNnG2_nPINe1Xm5Rza7V0jPnQV8io09hcFY/edit	M	Task Oriented	car autopilot agent: schedule, weather, navigation	No	N/A	Training Dialogues 2,425Validation Dialogues 302Test Dialogues 304Avg. # of Utterances Per Dialogue 5.25	SlotIntent	API callNeed to do sample alignment to get the following:Dialogue level databaseUser Act(inform, request slots)Agent Act(inform, request slots)Agent Reply
灵犀数据	1. The data is all single round user input divided into good words. There is more noise.2. Completed part of speech tagging and slot labeling3. Language: Chinese	S	Task Oriented	conversational robot service user log	No	N/A	Utterance: 5132	SlotPOS	Agent replyIntentAPI callDatabase
DSTC-2	1. Human2Bot restaurant booking dataset2. For usage refer to: http://camdial.org/~mh521/dstc/downloads/handbook.pdf3. Each dialofue is stored in different folder, which contains log and label.	M	Task Oriented	Booking restautant	Yes	http://camdial.org/~mh521/dstc/	Train 1612 callsDev 506 callsTest 1117 dialogs	SlotUser Act(inform, request slots)Agent Act(inform, request slots)	IntentAPI callDatabase
CamRest676	CamRest676 Human2Human dataset contains the following three json files:1. CamRest676.json: the woz dialogue dataset, which contains the conversion from users and wizards, as well as a set of coarse labels for each user turn.2. CamRestDB.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes.3. The ontology file, specific all the values the three informable slots can take.	M	Task Oriented	Booking restautant	Yes	Download:https://www.repository.cam.ac.uk/handle/1810/260970Paper:https://arxiv.org/abs/1604.04562	Total 676 DialoguesTotal 1500 TurnsTrain:Dev:Test 3:1:1 (Test set not given)	SlotUser Act(inform, request slots)Agent Act(inform, request slots)	IntentAPI callDatabase
Human-human goal oriented dataset	1. Maluuba reased a travel booking dataset2. Design for new task: frame tracking (allow comparing between history entities)3. Homepage: https://datasets.maluuba.com/Frames4. Human2Human	M	Task Oriented	Travel Booking	Yes	Download:https://datasets.maluuba.com/Frames/dlPaper:https://arxiv.org/abs/1706.01690https://1drv.ms/b/s!Aqj1OvgfsHB7dsg42yp2BzDUK6U	Dialogues 1369Turns 19986Average user satisfaction (from 1-5) 4.58	FrameUser agendaUser Act(inform, request slots)Agent Act(inform, request slots)API CallUser's satisfactionTask successfulDatabaseEntity reference	Intent
DSTC4	1. Data name as TourSG consists of 35 dialog sessions on touristic information for Singapore collected from Skype calls between three tour guides and 35 tourists2. All the recorded dialogs with the total length of 21 hours have been manually transcribed and annotated with speech act and semantic labels for each turn level.3. Homepage: http://www.colips.org/workshop/dstc4/data.html4. Human2Human	M	Task Oriented	Querry touristic information	No	N/A	Train 20 dialogsTest 15 dialogs	speech act (User & Agent)semantic labels(Intent? User & Agent)topic for turn (Intent?)	N/A
Movie Booking Dataset	1. (Microsoft) Raw conversational data collected via Amazon Mechanical Turk, with annotations provided by domain experts.2. Human2Human	M	Task Oriented	Booking Movie	Yes	Download:https://github.com/MiuLab/TC-Bot#dataPaper:TC-bot	280 dialoguesturns per dialogue is approximately 11	User Act(inform, request slots)Agent Act(inform, request slots)IntentSlots	DatabaseAPI-call
Microsoft Dialogue Challenge	human-annotated conversational data in three domains (movie-ticket booking, restaurant reservation, and taxi booking), as well as an experiment platform with built-in simulators in each domain, for training and evaluation purposes.	M	Task Oriented	Movie-Ticket BookingRestaurant ReservationTaxi Ordering	Yes	Paper：https://arxiv.org/pdf/1807.11125.pdf	Task Intents Slots DialoguesMovie-Ticket Booking 11 29 2890Restaurant Reservation 11 30 4103Taxi Ordering 11 29 3094	IntentSlots	DatabaseAPI-call

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Atma'sDatasetSurvey.xlsx		Atma'sDatasetSurvey.xlsx
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atma'sDatasetSurvey.xlsx

Atma'sDatasetSurvey.xlsx

readme.md

readme.md

Repository files navigation

Task-Oriented Dialogue Dataset Survey

See Survey Here or in Excel File

About

Releases

Packages

fendaq/Task-Oriented-Dialogue-Dataset-Survey

Folders and files

Latest commit

History

Atma'sDatasetSurvey.xlsx

Atma'sDatasetSurvey.xlsx

readme.md

readme.md

Repository files navigation

Task-Oriented Dialogue Dataset Survey

See Survey Here or in Excel File

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages