Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions regarding Strategies for Structuring Story Generation #993

Closed
fabrahman opened this issue Aug 7, 2019 · 4 comments
Closed

Questions regarding Strategies for Structuring Story Generation #993

fabrahman opened this issue Aug 7, 2019 · 4 comments

Comments

@fabrahman
Copy link

Hi,

I recently read this paper and I really enjoyed that. Thanks for the nice work.
In this regard I do have a few questions mainly in the evaluation which I appreciate your answering.

1- When computing the Longest Common Sub-sequence, did you mean longest common substring? since there is subtle difference between sub-sequence and substring. I meant did you count for consecutive tokens or not? And for that did you concatenate the whole training set as being a single sequence and find the LCS for all test instances and averaged across test set size?

2- There are also 2 nice verb diversity evaluations.

  • (1) the number of unique verbs averaged across all stories: does this mean that unique verbs averaged across all verbs in all stories? That being said if the value is 10.3, it means 10.3% of all the verbs being generated are unique?

3- Regarding Fig 8: I assume "WritingPrompt" dataset doesn't contain genre label, right? did you hand annotate the genre for couple of stories or did you use any classifier for that? if yes, will be able to release the genre information along with the code as well?

Many thanks in advance,

@huihuifan
Copy link
Contributor

I am answering here so it's more searchable instead of the email.

#1 - for each generated story of 250 words, we compared to all of the stories in the training set and tried to find the longest common sequence. e.g. the phrase "it was a dark and stormy night" would count for 7, and the tokens need to be consecutive. Then we took the average across the test set. Note this is a bit slow to compute.

#2 - For all of the evaluations in the work, we compared 250 word generated stories. The reason is because if a model tends to generate longer than another one, it would have an advantage since the # of verbs generated is highly correlated with length. The percentage reported is out of 250 words. So the # of unique verbs is: take generated story -> tag with spacy POS tagging -> count number of unique verbs -> divide by 250 to get % -> average across all generated stories in the test set

#3 - Yes, the dataset does not contain genre labels. One thing I was interested in is if we generated different entities depending on the story context (e.g. fantasy stories maybe are more unique entity names), so I annotated a few stories myself.

@fabrahman
Copy link
Author

Thank you for your detailed answer.
I do have just one more question which is about section 6.1 when computing the negative log likelihood.
For example for computing p(x|z*) in Stage 2, did you compute the loss with respect to ground truth stories in the test set?
and is it given the "Generated[?] Action Plan, Z*" ?
Really appreciate your answer.

@huihuifan
Copy link
Contributor

in stage 2, it's the loss with respect to the ground truth stories in the test set, conditioned upon stage 1. I think Github has lost whatever you wrote next to Generated because it looks like [?], but concretely:

In the summary based decomposition, stage 1 is loss of prompt -> summary with respect to the silver summaries (e.g. output of summarizer on stories), and stage 2 is loss of summary -> story. Stage 2 loss is not conditioned upon the generation of stage 1, but the real summaries.

@fabrahman
Copy link
Author

@huihuifan No actually I did put the [?] on purpose to express that I am doubtful whether it is conditioned upon ground truth extracted action plan (or real summaries in your example) or the generated action plan in stage 1.
Which from you explanation I got that it is the loss of "extracted action plans from true stories" -> true stories.

Thanks again.

facebook-github-bot pushed a commit that referenced this issue Jan 17, 2020
Summary:
As per fairinternal/fairseq-py#983, we find that you cannot just modify the defaults of arch because it will still "require" you to input something on the command line. I'm not sure why this is required where task, criterion, etc (and actually, anything else) are not. Additionally, the [argparse docs](https://docs.python.org/3/library/argparse.html#required) claim that required=True is bad form so should be avoided.
Pull Request resolved: fairinternal/fairseq-py#993

Differential Revision: D19446805

Pulled By: myleott

fbshipit-source-id: 53221bb8bc1cea66197c5cee48a307b88d0d20b7
louismartin pushed a commit to louismartin/fairseq that referenced this issue Mar 24, 2020
Summary:
As per fairinternal/fairseq-py#983, we find that you cannot just modify the defaults of arch because it will still "require" you to input something on the command line. I'm not sure why this is required where task, criterion, etc (and actually, anything else) are not. Additionally, the [argparse docs](https://docs.python.org/3/library/argparse.html#required) claim that required=True is bad form so should be avoided.
Pull Request resolved: fairinternal/fairseq-py#993

Differential Revision: D19446805

Pulled By: myleott

fbshipit-source-id: 53221bb8bc1cea66197c5cee48a307b88d0d20b7
moussaKam pushed a commit to moussaKam/language-adaptive-pretraining that referenced this issue Sep 29, 2020
Summary:
As per fairinternal/fairseq-py#983, we find that you cannot just modify the defaults of arch because it will still "require" you to input something on the command line. I'm not sure why this is required where task, criterion, etc (and actually, anything else) are not. Additionally, the [argparse docs](https://docs.python.org/3/library/argparse.html#required) claim that required=True is bad form so should be avoided.
Pull Request resolved: fairinternal/fairseq-py#993

Differential Revision: D19446805

Pulled By: myleott

fbshipit-source-id: 53221bb8bc1cea66197c5cee48a307b88d0d20b7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants