-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help me understand :) #2
Comments
Thank you for reaching out.
Best, |
Thank you for replying. I got some further questions! :)
Boris |
1) Look at the "requests" plot for confirmation of your theory, if requests
are high and not decreasing as you train, then you might need to play with
the rewards, or add a convnet before the lstm (if your data is larger
images), or increase the capacity of the network, or switch away from lstm
to attention... If requests are decreasing, then just train longer. If
requests are really low, then maybe you are penalizing requests too much. I
would simplify the problem to the same 2 classes per episode and the same
labels and see if it can memorize (the lstm is useless in this case), if it
learns that then randomizing the labels, if it learns that then the general
setup is working, increase the pool of classes that you are sampling the 2
classes from, if you run into a problem you may need to add a convnet or
switch away from lstm's. A "matching network" architecture with a Q
function output might scale better and to harder problems than an lstm; if
you received the label for an example, then you would add the pair to your
example set (https://arxiv.org/abs/1606.04080).
2) I think the RL^2 paper tried not resetting the lstm state, but it only
hurt (https://arxiv.org/abs/1611.02779). That fits with my intuition. I
would want anything that is consistent across episodes to be trained into
the weights of the network, in order to minimize the work done by the lstm.
…On Fri, Oct 19, 2018 at 6:13 AM borismilicevic ***@***.***> wrote:
Thank you for replying. I got some further questions?
1.
How would you comment gradual decrease of loss function while the
accuracy on validation set does not increase (stays below 25%)? I am using
my own data which contains 3 possible labels. Does that mean agent is eager
to request label? Maybe changing reward parameters could change his
attitude.
2.
What would be the consequence of keeping LSTM memory in between
batches of episodes during the training process? In that case, I assume, I
would have to keep last_label as well.
Boris
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGgTpZ7Xark-u0c1rkPoWGTz_4oYK-uWks5umdAJgaJpZM4Xjykz>
.
|
Thank you for responding thus far but I have two more questions.
Thanks in advance! |
1. I don't have any problem specific advice. Generally, larger is better,
until you over fit or training time is too slow for iteration.
2. No current reason. There was a bug early in development, and this made
it "work" before I realized the bug. I agree that larger values are more
common and will capture longer dependencies.
…On Tue, Oct 30, 2018 at 6:39 AM borismilicevic ***@***.***> wrote:
Thank you for responding thus far but I have two more questions.
1. Could you give me any advice on how to set the number of lstm units
(num_lstm_units)?
What should I base this parameter on? Maybe the shape of an input
feature vector?
If my data has only two features, for example, I doubt I should keep
the number of lstm units as 200.
2. Any particular reason the discount factor is set to 0.5? Isn't it
more ordinary in Q learning for it to be set around 0.9? I feel like this
is greatly decreasing the importance of later steps in an episode, meaning,
only the first few steps of an episode have a greater impact on the loss
function. Any advice on how to set this parameter?
Thanks in advance!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGgTpT8aMMwO6UXrPKz3s8rkDZ8MrobOks5uqFZxgaJpZM4Xjykz>
.
|
I am currently looking into your code. I've read the paper behind it and I must say it is most impressive and really interesting. The code is pretty readable and for the most part easy to understand but there are small details I need clarification on. I must say I am rather new to tensorflow's estimator mechanism, but I've done a lot of reading just to understand your code better.
This confuses me. Why do you treat dense layer different to LSTM cell? Does this mean that each new batch of episodes a new "blank" dense layer is being created?
Would you be so kind to answer me these?
Thanks in advance!
The text was updated successfully, but these errors were encountered: