Skip to content

fendaq/Task-Oriented-Dialogue-Dataset-Survey

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Task-Oriented Dialogue Dataset Survey

A dataset survey about task-oriented dialogue, including information about recent datasets.

See Survey Here or in Excel File

Name Introduction Multi/Single Turn Task Task Detail Public Accessible Links Size & Stats Included Label Missing Label
Dialog bAbI tasks data 1. Facebook's 6 task-oriented dialogues data set consist of 6 different tasks.2. Dataset for task 1-5 is constucted automaticly from bots' chat(Bot2Bot). And dataset for task 6 is simply reformated dstc2 dataset.3. A Shared database is included.4. This is the only task-oriented dataset among bAbI tasks.5. The goal of it is to evaluate end2end tasks, so there is not intents and slots. M Task Oriented Book a table at a restaurant Yes Download:https://research.fb.com/downloads/babi/Paper:http://arxiv.org/abs/1605.07683 For each task, training 1000 develop 1000test 1000 For tasks 1-5, second test set (with suffix -OOV.txt) that contains dialogs including entities not present. API callFull Database SlotIntentUser ActAgent Act
Stanford Dialog Dataset 1. Standford NLP group's data of car autopilot agent.2. Human2Human3. A quick intro http://m.sohu.com/n/499803391/ M Task Oriented car autopilot agent: schedule, weather, navigation Yes Download:http://nlp.stanford.edu/projects/kvret/kvret_dataset_public.zipPaper:https://arxiv.org/abs/1705.05414 Training Dialogues 2,425Validation Dialogues 302Test Dialogues 304Avg. # of Utterances Per Dialogue 5.25 Dialogue level databaseUser Act(inform, request slots)Agent Act(inform, request slots) API callIntentSlot
Stanford Dialog Dataset Labeled 1. Stanford data labeled by us, relabel slot & intent2. Human2Human3. A quick intro http://m.sohu.com/n/499803391/ to stanford data4. Annotation handbook: https://docs.google.com/document/d/1ROARKf8AJNnG2_nPINe1Xm5Rza7V0jPnQV8io09hcFY/edit M Task Oriented car autopilot agent: schedule, weather, navigation No N/A Training Dialogues 2,425Validation Dialogues 302Test Dialogues 304Avg. # of Utterances Per Dialogue 5.25 SlotIntent API callNeed to do sample alignment to get the following:Dialogue level databaseUser Act(inform, request slots)Agent Act(inform, request slots)Agent Reply
灵犀数据 1. The data is all single round user input divided into good words. There is more noise.2. Completed part of speech tagging and slot labeling3. Language: Chinese S Task Oriented conversational robot service user log No N/A Utterance: 5132 SlotPOS Agent replyIntentAPI callDatabase
DSTC-2 1. Human2Bot restaurant booking dataset2. For usage refer to: http://camdial.org/~mh521/dstc/downloads/handbook.pdf3. Each dialofue is stored in different folder, which contains log and label. M Task Oriented Booking restautant Yes http://camdial.org/~mh521/dstc/ Train 1612 callsDev 506 callsTest 1117 dialogs SlotUser Act(inform, request slots)Agent Act(inform, request slots) IntentAPI callDatabase
CamRest676 CamRest676 Human2Human dataset contains the following three json files:1. CamRest676.json: the woz dialogue dataset, which contains the conversion from users and wizards, as well as a set of coarse labels for each user turn.2. CamRestDB.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes.3. The ontology file, specific all the values the three informable slots can take. M Task Oriented Booking restautant Yes Download:https://www.repository.cam.ac.uk/handle/1810/260970Paper:https://arxiv.org/abs/1604.04562 Total 676 DialoguesTotal 1500 TurnsTrain:Dev:Test 3:1:1 (Test set not given) SlotUser Act(inform, request slots)Agent Act(inform, request slots) IntentAPI callDatabase
Human-human goal oriented dataset 1. Maluuba reased a travel booking dataset2. Design for new task: frame tracking (allow comparing between history entities)3. Homepage: https://datasets.maluuba.com/Frames4. Human2Human M Task Oriented Travel Booking Yes Download:https://datasets.maluuba.com/Frames/dlPaper:https://arxiv.org/abs/1706.01690https://1drv.ms/b/s!Aqj1OvgfsHB7dsg42yp2BzDUK6U Dialogues 1369Turns 19986Average user satisfaction (from 1-5) 4.58 FrameUser agendaUser Act(inform, request slots)Agent Act(inform, request slots)API CallUser's satisfactionTask successfulDatabaseEntity reference Intent
DSTC4 1. Data name as TourSG consists of 35 dialog sessions on touristic information for Singapore collected from Skype calls between three tour guides and 35 tourists2. All the recorded dialogs with the total length of 21 hours have been manually transcribed and annotated with speech act and semantic labels for each turn level.3. Homepage: http://www.colips.org/workshop/dstc4/data.html4. Human2Human M Task Oriented Querry touristic information No N/A Train 20 dialogsTest 15 dialogs speech act (User & Agent)semantic labels(Intent? User & Agent)topic for turn (Intent?) N/A
Movie Booking Dataset 1. (Microsoft) Raw conversational data collected via Amazon Mechanical Turk, with annotations provided by domain experts.2. Human2Human M Task Oriented Booking Movie Yes Download:https://github.com/MiuLab/TC-Bot#dataPaper:TC-bot 280 dialoguesturns per dialogue is approximately 11 User Act(inform, request slots)Agent Act(inform, request slots)IntentSlots DatabaseAPI-call
Microsoft Dialogue Challenge human-annotated conversational data in three domains (movie-ticket booking, restaurant reservation, and taxi booking), as well as an experiment platform with built-in simulators in each domain, for training and evaluation purposes. M Task Oriented Movie-Ticket BookingRestaurant ReservationTaxi Ordering Yes Paper:https://arxiv.org/pdf/1807.11125.pdf Task Intents Slots DialoguesMovie-Ticket Booking 11 29 2890Restaurant Reservation 11 30 4103Taxi Ordering 11 29 3094 IntentSlots DatabaseAPI-call

About

A dataset survey about task-oriented dialogue, including recent datasets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published