-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions regarding Strategies for Structuring Story Generation #993
Comments
I am answering here so it's more searchable instead of the email. #1 - for each generated story of 250 words, we compared to all of the stories in the training set and tried to find the longest common sequence. e.g. the phrase "it was a dark and stormy night" would count for 7, and the tokens need to be consecutive. Then we took the average across the test set. Note this is a bit slow to compute. #2 - For all of the evaluations in the work, we compared 250 word generated stories. The reason is because if a model tends to generate longer than another one, it would have an advantage since the # of verbs generated is highly correlated with length. The percentage reported is out of 250 words. So the # of unique verbs is: take generated story -> tag with spacy POS tagging -> count number of unique verbs -> divide by 250 to get % -> average across all generated stories in the test set #3 - Yes, the dataset does not contain genre labels. One thing I was interested in is if we generated different entities depending on the story context (e.g. fantasy stories maybe are more unique entity names), so I annotated a few stories myself. |
Thank you for your detailed answer. |
in stage 2, it's the loss with respect to the ground truth stories in the test set, conditioned upon stage 1. I think Github has lost whatever you wrote next to Generated because it looks like [?], but concretely: In the summary based decomposition, stage 1 is loss of prompt -> summary with respect to the silver summaries (e.g. output of summarizer on stories), and stage 2 is loss of summary -> story. Stage 2 loss is not conditioned upon the generation of stage 1, but the real summaries. |
@huihuifan No actually I did put the [?] on purpose to express that I am doubtful whether it is conditioned upon ground truth extracted action plan (or real summaries in your example) or the generated action plan in stage 1. Thanks again. |
Summary: As per fairinternal/fairseq-py#983, we find that you cannot just modify the defaults of arch because it will still "require" you to input something on the command line. I'm not sure why this is required where task, criterion, etc (and actually, anything else) are not. Additionally, the [argparse docs](https://docs.python.org/3/library/argparse.html#required) claim that required=True is bad form so should be avoided. Pull Request resolved: fairinternal/fairseq-py#993 Differential Revision: D19446805 Pulled By: myleott fbshipit-source-id: 53221bb8bc1cea66197c5cee48a307b88d0d20b7
Summary: As per fairinternal/fairseq-py#983, we find that you cannot just modify the defaults of arch because it will still "require" you to input something on the command line. I'm not sure why this is required where task, criterion, etc (and actually, anything else) are not. Additionally, the [argparse docs](https://docs.python.org/3/library/argparse.html#required) claim that required=True is bad form so should be avoided. Pull Request resolved: fairinternal/fairseq-py#993 Differential Revision: D19446805 Pulled By: myleott fbshipit-source-id: 53221bb8bc1cea66197c5cee48a307b88d0d20b7
Summary: As per fairinternal/fairseq-py#983, we find that you cannot just modify the defaults of arch because it will still "require" you to input something on the command line. I'm not sure why this is required where task, criterion, etc (and actually, anything else) are not. Additionally, the [argparse docs](https://docs.python.org/3/library/argparse.html#required) claim that required=True is bad form so should be avoided. Pull Request resolved: fairinternal/fairseq-py#993 Differential Revision: D19446805 Pulled By: myleott fbshipit-source-id: 53221bb8bc1cea66197c5cee48a307b88d0d20b7
Hi,
I recently read this paper and I really enjoyed that. Thanks for the nice work.
In this regard I do have a few questions mainly in the evaluation which I appreciate your answering.
1- When computing the Longest Common Sub-sequence, did you mean longest common substring? since there is subtle difference between sub-sequence and substring. I meant did you count for consecutive tokens or not? And for that did you concatenate the whole training set as being a single sequence and find the LCS for all test instances and averaged across test set size?
2- There are also 2 nice verb diversity evaluations.
3- Regarding Fig 8: I assume "WritingPrompt" dataset doesn't contain genre label, right? did you hand annotate the genre for couple of stories or did you use any classifier for that? if yes, will be able to release the genre information along with the code as well?
Many thanks in advance,
The text was updated successfully, but these errors were encountered: