You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the notebook gpt2-sentiment-control.ipynb (Optimize model section) , logits = [torch.tensor(output[1]["score"]) for output in sentiment_pipe(texts, **sentiment_pipe_kwargs)]
Why do we store output[1]["score"] as the reward? I assumed that "We will use the logits for positive class as a reward signal for the language model." but does sentiment_pipe always has the output[1] as an index for the the positive class?
The text was updated successfully, but these errors were encountered:
In the notebook gpt2-sentiment-control.ipynb (Optimize model section) ,
logits = [torch.tensor(output[1]["score"]) for output in sentiment_pipe(texts, **sentiment_pipe_kwargs)]
Why do we store output[1]["score"] as the reward? I assumed that "We will use the logits for positive class as a reward signal for the language model." but does sentiment_pipe always has the output[1] as an index for the the positive class?
The text was updated successfully, but these errors were encountered: