Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help me understand :) #2

Open
borismilicevic opened this issue Oct 17, 2018 · 5 comments
Open

Help me understand :) #2

borismilicevic opened this issue Oct 17, 2018 · 5 comments

Comments

@borismilicevic
Copy link

I am currently looking into your code. I've read the paper behind it and I must say it is most impressive and really interesting. The code is pretty readable and for the most part easy to understand but there are small details I need clarification on. I must say I am rather new to tensorflow's estimator mechanism, but I've done a lot of reading just to understand your code better.

  1. The agent contains all the trainable parts, meaning complete network architecture with himself. While the LSTM cell is stored as his private attribute, a dense layer behind it is created "just in time". So each new batch agent starts with "reuse=False", creates a new dense layer, then changes "reuse" to True. So at a t=1 new dense layer is created, and for t>1 (for the rest of the batch of episodes duration) existing dense layer is used.
    This confuses me. Why do you treat dense layer different to LSTM cell? Does this mean that each new batch of episodes a new "blank" dense layer is being created?
  2. I assume same LSTM cell, once created, is being each training step. But, it's internal state is reset each new batch of episodes. So the memory of lstm is not being transferred from batch to batch, am I right?

Would you be so kind to answer me these?
Thanks in advance!

@markpwoodward
Copy link
Owner

Thank you for reaching out.

  1. I think you mostly have it. tf.layers.dense creates an object behind the scenes and reuses it if "reuse" is set to True, as you pointed out. For clarity I probably should have instantiated a tf.layers.Dense object in the constructor and then call()'ed it in next_action(), like I do with rnn_cell. The Agent code sets up the graph, it is only called once (well next_action() is called time_steps times), training starts after the graph is set up. So the next_action() method isn't actually "called" on each episode, each episode feeds the inputs, which runs through the already created graph.

  2. You are correct. Each batch starts with the Agent.rnn_state_t zero'd out, since the episode graph starts with self.rnn_state_t equal to rnn_cell.zero_state()

Best,
Mark

@borismilicevic
Copy link
Author

borismilicevic commented Oct 19, 2018

Thank you for replying. I got some further questions! :)

  1. How would you comment gradual decrease of loss function while the accuracy on validation set does not increase (stays below 25%)? I am using my own data which contains 3 possible labels. Does that mean agent is eager to request label? Maybe changing reward parameters could change his attitude.

  2. What would be the consequence of keeping LSTM memory in between batches of episodes during the training process? In that case, I assume, I would have to keep last_label as well.

Boris

@markpwoodward
Copy link
Owner

markpwoodward commented Oct 19, 2018 via email

@borismilicevic
Copy link
Author

Thank you for responding thus far but I have two more questions.

  1. Could you give me any advice on how to set the number of lstm units (num_lstm_units)?
    What should I base this parameter on? Maybe the shape of an input feature vector?
    If my data has only two features, for example, I doubt I should keep the number of lstm units as 200.
  2. Any particular reason the discount factor is set to 0.5? Isn't it more ordinary in Q learning for it to be set around 0.9? I feel like this is greatly decreasing the importance of later steps in an episode, meaning, only the first few steps of an episode have a greater impact on the loss function. Any advice on how to set this parameter?

Thanks in advance!

@markpwoodward
Copy link
Owner

markpwoodward commented Nov 2, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants