Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

AirDialogue dataset #2663

Merged
merged 9 commits into from Jun 5, 2020
Merged

AirDialogue dataset #2663

merged 9 commits into from Jun 5, 2020

Conversation

domrigoglioso
Copy link
Contributor

Patch description
Add AirDialogue dataset to tasks (https://github.com/google/airdialogue)

Logs

python3 examples/display_data.py -t airdialogue

[creating task(s): airdialogue]
- - - NEW EPISODE: airdialogue - - -
Hello
   Hello
How may I help you?
   Can you help me to change my recent reservation because my trip dates are got postponed?
I will help you with that please share your name to proceed further?
   Edward hall here.
Please wait for a while.
   Sure, take your own time.
There is no active reservation found under your name to amend it.
   That's ok, thank you for checking.
Thank you for choosing us.
   
- - - NEW EPISODE: airdialogue - - -
HI
   Hello
How may I be of your address?
   I want to book a flight ticket to Las Vegas with price under 1000. Can you please help me with it?
Sure, can I know your connection limit?
   I need a single connection.
Please let me know your boarding and landing points.
   Airport codes are HOU-LAS.
[ loaded 321459 episodes with a total of 2224199 examples ]

Data tests (if applicable)
python tests/datatests/test_new_tasks.py
This was taking exceptionally long to run for me (wasn't finished after running overnight) so I'm not sure what to do here.

@github-actions
Copy link

Your PR contains a change to a task. Please paste the results of the following command into a comment:

python tests/datatests/test_new_tasks.py

Copy link
Contributor

@emilydinan emilydinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome job! 😄 thanks for the very clean code. i have a few small changes requested in the comments


# set up path to data (specific to each dataset)
jsons_path = os.path.join(
opt['datapath'], 'airdialogue', 'airdialogue_data', 'airdialogue'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this path is a little bit redundant, why not just os.path.join(opt['datapath'], airdialogue)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path should be a little less redundant now. It's airdialogue/airdialogue/ for all the jsons. There's also another folder for other resources airdialogue/resources which is why it's a little redundant. I could also change airdialogue/airdialogue to something like airdialogue/data if you think that would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be airdialogue_data/airdialogue now and not airdialogue/airdialogue

parlai/tasks/airdialogue/agents.py Show resolved Hide resolved
@@ -1130,4 +1130,15 @@
"<https://arxiv.org/abs/1911.03842>."
),
},
{
"id": "AirDialogue",
"display_name": "AirDialogue",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

parlai/tasks/airdialogue/agents.py Outdated Show resolved Hide resolved
parlai/tasks/airdialogue/agents.py Outdated Show resolved Hide resolved
parlai/tasks/airdialogue/agents.py Outdated Show resolved Hide resolved
@domrigoglioso
Copy link
Contributor Author

So it seems like the dataset was made private.
I'm getting a forbidden error from https://storage.googleapis.com/airdialogue/airdialogue_data.tar.gz

@emilydinan
Copy link
Contributor

So it seems like the dataset was made private.
I'm getting a forbidden error from https://storage.googleapis.com/airdialogue/airdialogue_data.tar.gz

hmmm, i wonder if this was on purpose. maybe we can file a bug on the repo

@emilydinan
Copy link
Contributor

following up here, do we still get the 403 error?

@stephenroller
Copy link
Contributor

Presumably, no one replied to my github issue. We should email them I suppose.

@stephenroller
Copy link
Contributor

I just emailed the first author, hopefully we get a response.

@stephenroller
Copy link
Contributor

Actually I just downloaded the dataset.

@@ -31,4 +32,14 @@ def build(opt):
for downloadable_file in RESOURCES:
downloadable_file.download_file(dpath)

# Re-organize the directory to be less redundant
print('reorganizing airdialogue directory')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make more sense to just download the file to opt['datapath'] so it unzips properly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point did not think of that! I'll fix that

Copy link
Contributor

@emilydinan emilydinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, nice job!!! 😄

one last little change, left in comments, and then i think it should be ready to go

parlai/tasks/airdialogue/agents.py Show resolved Hide resolved
@domrigoglioso domrigoglioso merged commit 3c20e47 into master Jun 5, 2020
@domrigoglioso domrigoglioso deleted the airdialogue branch June 5, 2020 17:34
Gnivom pushed a commit to Gnivom/ParlAI that referenced this pull request Jun 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants