You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 28, 2021. It is now read-only.
I have a few questions. I would be grateful if you answer them.
Could you please tell me what the argument is and what it affects?
parser.add_argument('--smart_bob', action='store_true', default=False, help='make Bob smart again')
In Deal or No Deal? End-to-End Learning for Negotiation Dialogues in 6.1: During reinforcement learning, we use a learning rate of 0.1, clip gradients above 1.0, and use a discount factor of γ=0.95.
But in reinforce.py: parser.add_argument('--gamma', type=float, default=0.99, help='discount factor'). It matters to learning?
Also in reinforce.py we see: 'parser.add_argument('--clip', type=float, default=0.1, help='gradient clip') In report: clip gradients above 1.0
Reinforce learning rate and gradient clip. In script default value is:
parser.add_argument('--rl_lr', type=float, default=0.002, help='RL learning rate')
parser.add_argument('--rl_clip', type=float, default=2.0, help='RL gradient clip')
In code snippet in readme file: --rl_lr 0.00001 \ --rl_clip 0.0001 \ It matters to learning?
How long does it take to execute the script reinforce with arguments from snippet?
During training, a large number of dialogs appear in which one of the agents repeats one word a large number of times. It' ok?
The text was updated successfully, but these errors were encountered:
Hello.
I have a few questions. I would be grateful if you answer them.
Could you please tell me what the argument is and what it affects?
parser.add_argument('--smart_bob', action='store_true', default=False, help='make Bob smart again')
In Deal or No Deal? End-to-End Learning for Negotiation Dialogues in 6.1: During reinforcement learning, we use a learning rate of 0.1, clip gradients above 1.0, and use a discount factor of γ=0.95.
But in reinforce.py: parser.add_argument('--gamma', type=float, default=0.99, help='discount factor'). It matters to learning?
Also in reinforce.py we see: 'parser.add_argument('--clip', type=float, default=0.1, help='gradient clip') In report: clip gradients above 1.0
Reinforce learning rate and gradient clip. In script default value is:
parser.add_argument('--rl_lr', type=float, default=0.002, help='RL learning rate')
parser.add_argument('--rl_clip', type=float, default=2.0, help='RL gradient clip')
In code snippet in readme file: --rl_lr 0.00001 \ --rl_clip 0.0001 \ It matters to learning?
How long does it take to execute the script reinforce with arguments from snippet?
During training, a large number of dialogs appear in which one of the agents repeats one word a large number of times. It' ok?
The text was updated successfully, but these errors were encountered: