Questions regarding Strategies for Structuring Story Generation #993

fabrahman · 2019-08-07T18:55:03Z

Hi,

I recently read this paper and I really enjoyed that. Thanks for the nice work.
In this regard I do have a few questions mainly in the evaluation which I appreciate your answering.

1- When computing the Longest Common Sub-sequence, did you mean longest common substring? since there is subtle difference between sub-sequence and substring. I meant did you count for consecutive tokens or not? And for that did you concatenate the whole training set as being a single sequence and find the LCS for all test instances and averaged across test set size?

2- There are also 2 nice verb diversity evaluations.

(1) the number of unique verbs averaged across all stories: does this mean that unique verbs averaged across all verbs in all stories? That being said if the value is 10.3, it means 10.3% of all the verbs being generated are unique?

3- Regarding Fig 8: I assume "WritingPrompt" dataset doesn't contain genre label, right? did you hand annotate the genre for couple of stories or did you use any classifier for that? if yes, will be able to release the genre information along with the code as well?

Many thanks in advance,

huihuifan · 2019-08-08T14:26:23Z

I am answering here so it's more searchable instead of the email.

#1 - for each generated story of 250 words, we compared to all of the stories in the training set and tried to find the longest common sequence. e.g. the phrase "it was a dark and stormy night" would count for 7, and the tokens need to be consecutive. Then we took the average across the test set. Note this is a bit slow to compute.

#2 - For all of the evaluations in the work, we compared 250 word generated stories. The reason is because if a model tends to generate longer than another one, it would have an advantage since the # of verbs generated is highly correlated with length. The percentage reported is out of 250 words. So the # of unique verbs is: take generated story -> tag with spacy POS tagging -> count number of unique verbs -> divide by 250 to get % -> average across all generated stories in the test set

#3 - Yes, the dataset does not contain genre labels. One thing I was interested in is if we generated different entities depending on the story context (e.g. fantasy stories maybe are more unique entity names), so I annotated a few stories myself.

fabrahman · 2019-08-08T21:40:30Z

Thank you for your detailed answer.
I do have just one more question which is about section 6.1 when computing the negative log likelihood.
For example for computing p(x|z*) in Stage 2, did you compute the loss with respect to ground truth stories in the test set?
and is it given the "Generated[?] Action Plan, Z*" ?
Really appreciate your answer.

huihuifan · 2019-08-09T09:38:27Z

in stage 2, it's the loss with respect to the ground truth stories in the test set, conditioned upon stage 1. I think Github has lost whatever you wrote next to Generated because it looks like [?], but concretely:

In the summary based decomposition, stage 1 is loss of prompt -> summary with respect to the silver summaries (e.g. output of summarizer on stories), and stage 2 is loss of summary -> story. Stage 2 loss is not conditioned upon the generation of stage 1, but the real summaries.

fabrahman · 2019-08-09T17:20:31Z

@huihuifan No actually I did put the [?] on purpose to express that I am doubtful whether it is conditioned upon ground truth extracted action plan (or real summaries in your example) or the generated action plan in stage 1.
Which from you explanation I got that it is the loss of "extracted action plans from true stories" -> true stories.

Thanks again.

Summary: As per fairinternal/fairseq-py#983, we find that you cannot just modify the defaults of arch because it will still "require" you to input something on the command line. I'm not sure why this is required where task, criterion, etc (and actually, anything else) are not. Additionally, the [argparse docs](https://docs.python.org/3/library/argparse.html#required) claim that required=True is bad form so should be avoided. Pull Request resolved: fairinternal/fairseq-py#993 Differential Revision: D19446805 Pulled By: myleott fbshipit-source-id: 53221bb8bc1cea66197c5cee48a307b88d0d20b7

huihuifan closed this as completed Aug 8, 2019

yfyeung added a commit to yfyeung/fairseq that referenced this issue Dec 6, 2023

Fix filter_cuts in compute_fbank_librispeech.py (facebookresearch#993)

33578cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions regarding Strategies for Structuring Story Generation #993

Questions regarding Strategies for Structuring Story Generation #993

fabrahman commented Aug 7, 2019

huihuifan commented Aug 8, 2019

fabrahman commented Aug 8, 2019

huihuifan commented Aug 9, 2019

fabrahman commented Aug 9, 2019

Questions regarding Strategies for Structuring Story Generation #993

Questions regarding Strategies for Structuring Story Generation #993

Comments

fabrahman commented Aug 7, 2019

huihuifan commented Aug 8, 2019

fabrahman commented Aug 8, 2019

huihuifan commented Aug 9, 2019

fabrahman commented Aug 9, 2019