Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

[TOD][Datasets][Easy] Taskmaster(1) in Conversations format #4189

Merged
merged 141 commits into from Dec 23, 2021

Conversation

moyapchen
Copy link
Contributor

Title. I only include System + UserSimulator Teachers here since that's all we need right now from dataset.

There's non-fb people that made edits in the original version of Taskmaster, so keep those teachers around too.

Datasets added in this substack:

  • Google SGD

    • Google SGD Simulation Splits (In-domain, Out-domain)
    • MetalWoz
    • MSR_E2E
    • Multidogo
    • MultiWoz V2.2
    • Taskmaster
    • Taskmaster2
    • Taskmaster3 (TicketTalk)

    Test plan:
    Regression test, parlai dd of dataset

Moya Chen added 25 commits November 15, 2021 20:15
See documentation block in `tod_agents.py`
As noted in the README, this agent takes data generated from `tod_world_script.py` and dumps it out to a teacher.

(Note that I tried setting up a regression test for this teacher, but I ran into issues getting it to save the output directory to not be something that included my local homedir name in it..)
See documentation block in `tod_agents.py`

(I'm not 100% sure if `conftest.py` is a right file to change, though I did notice that `pytest.ini` was necessary to get pytest to run.)
See documentation in `tod_world_script.py` for usage.
Refactor Google SGD away from old format into TOD Conversations format.

Datasets added in this substack:
* *Google SGD*
   * Google SGD Simulation Splits (In-domain, Out-domain)
* MetalWoz
* MSR_E2E
* Multidogo
* MultiWoz V2.2
* Taskmaster
* Taskmaster2
* Taskmaster3 (TicketTalk)

Test plan:
Regression test, `parlai dd` of dataset
Code for processing Google SGD into In-domain and Out-domain data via `build.py`, using via agents.

Datasets added in this substack:
* Google SGD
   * **Google SGD Simulation Splits (In-domain, Out-domain)**
* MetalWoz
* MSR_E2E
* Multidogo
* MultiWoz V2.2
* Taskmaster
* Taskmaster2
* Taskmaster3 (TicketTalk)

Test plan:
Regression test, `parlai dd` of dataset
Code for process MetalWoz into System + User Simulator teachers

Getting it to be in the Conversations format is a pain, so I don't even try here. (It's documented this way in the paper as well)

----------------------------
Datasets added in this substack:
* Google SGD
   * Google SGD Simulation Splits (In-domain, Out-domain)
* **MetalWoz**
* MSR_E2E
* Multidogo
* MultiWoz V2.2
* Taskmaster
* Taskmaster2
* Taskmaster3 (TicketTalk)

Test plan:
Regression test, `parlai dd` of dataset
Title. I only include System + UserSimulator Teachers here since that's all we need right now from dataset.

Datasets added in this substack:
* Google SGD
   * Google SGD Simulation Splits (In-domain, Out-domain)
* MetalWoz
* **MSR_E2E**
* Multidogo
* MultiWoz V2.2
* Taskmaster
* Taskmaster2
* Taskmaster3 (TicketTalk)

Test plan:
Regression test, `parlai dd` of dataset
Title. I only include System + UserSimulator Teachers here since that's all we need right now from dataset.

Datasets added in this substack:
* Google SGD
   * Google SGD Simulation Splits (In-domain, Out-domain)
* MetalWoz
* **MSR_E2E**
* Multidogo
* MultiWoz V2.2
* Taskmaster
* Taskmaster2
* Taskmaster3 (TicketTalk)

Test plan:
Regression test, `parlai dd` of dataset
Title. I only include System + UserSimulator Teachers here since that's all we need right now from dataset.

There are so many versions of MultiWoz, but this one is closest to our simulator.

---------------------------------

Datasets added in this substack:
* Google SGD
   * Google SGD Simulation Splits (In-domain, Out-domain)
* MetalWoz
* **MSR_E2E**
* Multidogo
* MultiWoz V2.2
* Taskmaster
* Taskmaster2
* Taskmaster3 (TicketTalk)

Test plan:
Regression test, `parlai dd` of dataset
Title. I only include System + UserSimulator Teachers here since that's all we need right now from dataset.

There's non-fb people that made edits in the original version of Taskmaster, so keep those teachers around too.
---------------
Datasets added in this substack:
* Google SGD
   * Google SGD Simulation Splits (In-domain, Out-domain)
   * MetalWoz
   * MSR_E2E
   * Multidogo
   * MultiWoz V2.2
   * **Taskmaster**
   * Taskmaster2
   * Taskmaster3 (TicketTalk)

   Test plan:
   Regression test, `parlai dd` of dataset
Moya Chen added 26 commits December 22, 2021 09:54
…ta and new one; relized I was missing a +1 in the episode length count
Base automatically changed from simpler_tod_5f_multiwoz_v22 to main December 23, 2021 02:28
@moyapchen moyapchen merged commit ce92103 into main Dec 23, 2021
@moyapchen moyapchen deleted the simpler_tod_5g_taskmaster1 branch December 23, 2021 02:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants