Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling code flags descriptions (support for --help?) #27

Closed
ilopezfr opened this issue Feb 15, 2019 · 13 comments
Closed

Sampling code flags descriptions (support for --help?) #27

ilopezfr opened this issue Feb 15, 2019 · 13 comments
Labels
good first issue help wanted

Comments

@ilopezfr
Copy link

@ilopezfr ilopezfr commented Feb 15, 2019

Is there a list of the flags for both conditional and unconditional models with their definitions?
(I looked in the blog and paper and couldn't find any mention.)

In particular, for reproducibility purposes, it'd be great to know the definition of temperature and top_k and how choosing different values for these affect the results.

Thanks!

@opencoca
Copy link

@opencoca opencoca commented Feb 17, 2019

From exploring things I've found the following flags work with the sample generators:

   --model_name='117M'  #What model to use
   --seed=None          # Unsure
   --nsamples=1         # How many samples to return
   --batch_size=None    # this doesn't work with the unconditional sample generator
   --length=None        # How much test to return in each sample
   --temperature=1      # This affects how predictable text or surprising the text will be
   --top_k=0            # Unsure

@ilopezfr
Copy link
Author

@ilopezfr ilopezfr commented Feb 18, 2019

Thanks, @opencoca! I think most of them are self-explanatory but I'm curious to know the exact definition of temperature and top_k and how choosing different values for these affect the results.

@kettenfett
Copy link

@kettenfett kettenfett commented Feb 18, 2019

I also want to know what --top_k does.
Also a seed is usually used to make outputs reproducable, but this is not the case for gpt-2 when --seed is used. So what does it do then?

@WuTheFWasThat
Copy link
Collaborator

@WuTheFWasThat WuTheFWasThat commented Feb 20, 2019

@philippHRO RE seed: that sounds like a bug, i filed #58

Temperature scales logits before sampling prior to softmax. Top_k truncates the set of logits considered to those with the highest values.

@WuTheFWasThat WuTheFWasThat changed the title Model flags definition Sampling code flags descriptions (support for --help?) Feb 20, 2019
@WuTheFWasThat WuTheFWasThat added help wanted good first issue labels Feb 20, 2019
@kettenfett
Copy link

@kettenfett kettenfett commented Feb 22, 2019

@philippHRO RE seed: that sounds like a bug, i filed #58

Temperature scales logits before sampling prior to softmax. Top_k truncates the set of logits considered to those with the highest values.

So by setting top_k=40, I'm saying "Give me the top 40% of logits"? Does that mean a smaller value for top_k generates better ("more natural") text outputs, because the net is only using the best logits?

@madisonmay
Copy link
Contributor

@madisonmay madisonmay commented Feb 22, 2019

@philippHRO it means "Give me the top 40 logits". So setting top_k=1 should produce deterministic results because there is no randomness in the sampling process -- you always select the logit with the highest probability. "More natural" is a subjective thing, but setting top_k=1 is almost definitely not what you want if you want "more natural" text outputs.

Running python3 src/generate_unconditional_samples.py --top_k=1 produces text that loops because the model always goes with the "safe" option that's usually just a high frequency word / phrase. See generated sample below:

The first time I saw the new version of the game, I was so excited. I was so excited to see the new version of the game, I was so excited to see the new version of the game, I was so excited to see the new version of the game, I was so excited to see the new version of the game, I was so excited to see the new version of the game...

You need some amount of randomness / variation for things to feel more natural.

@ArmaanBhullar
Copy link
Contributor

@ArmaanBhullar ArmaanBhullar commented Feb 24, 2019

Hi, I came here looking for the same thing, looks like it needs fixing of --help flag, to return the parameters and definitions.
If no one has picked this up, I'd like to pick it up.

@WuTheFWasThat
Copy link
Collaborator

@WuTheFWasThat WuTheFWasThat commented Feb 24, 2019

@ArmaanBhullar go for it!

@ArmaanBhullar
Copy link
Contributor

@ArmaanBhullar ArmaanBhullar commented Feb 26, 2019

Based on above discussion and going through code, I came up with following doc (for interactive model):
"""
Interactively run the model
:model_name=117M : which model to use (default 117M)
:seed=None : Seed for random number generators, fix seed to reproduce
results
:nsamples=1 : number of samples to return
:batch_size=None : Number of batches, model runs nsamples//batch_size
times, each batch run is independent of previous run.
:length=None : Length of text to be returned, inclusive of punctuations
etc.
:temperature=1 : Controls degree of surprise in final output
:top_k=0 : Number of logits to be sampled, top_k=1 gives deterministic
output
"""
@WuTheFWasThat can you confirm if this fits?

@WuTheFWasThat
Copy link
Collaborator

@WuTheFWasThat WuTheFWasThat commented Feb 26, 2019

Sure, thanks! some nits/comments:

for seed, mention it's an integer? (or if using argparse or some library enforcing it, then no need)
for nsamples, "number" should be capitalized
batch_size should have a default right?
length is in tokens

here are my attempts at temperature and top_k:

:temperature=1: Float value controlling randomness in boltzmann distribution. Lower temperature results in less random completions. As the temperature approaches zero, the model will become deterministic and repetitive. Higher temperature results in more random completions.
:top_k=0: Integer value controlling diversity. 1 means only 1 word is considered for each step (token), resulting in deterministic completions, while 40 means 40 words are considered at each step. 0 is a special setting meaning no restrictions. 40 generally is a good value.

@ArmaanBhullar
Copy link
Contributor

@ArmaanBhullar ArmaanBhullar commented Feb 27, 2019

Thanks @WuTheFWasThat ! I incorporated these in a pull request

@rohuns
Copy link

@rohuns rohuns commented Mar 4, 2019

@WuTheFWasThat Has there been any analysis done on the optimal temperature and top_k sampling for performance on some predefined set of prompts? Optimal could be human eval, ppl, or any other eval metric

@WuTheFWasThat
Copy link
Collaborator

@WuTheFWasThat WuTheFWasThat commented Mar 5, 2019

No analysis. We started using top_k = 40 based on a very small amount of experimentation, but really haven't explored very thoroughly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue help wanted
Projects
None yet
Development

No branches or pull requests

7 participants