Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss function/Labels for neural network used? #4

Open
abhigenie92 opened this issue Jun 25, 2017 · 2 comments
Open

Loss function/Labels for neural network used? #4

abhigenie92 opened this issue Jun 25, 2017 · 2 comments

Comments

@abhigenie92
Copy link

I do understand the backpropagation in policy gradient networks, but am not sure how your code work keras's auto-differentiation.

That is, how you transform it into a supervised learning problem.
For example, the code below:

Y = self.probs + self.learning_rate * np.squeeze(np.vstack([gradients]))

Why is Y not 1-hot vector for the action taken?
You compute the gradient assuming the action is correct, Y is one-hot vector. Then you multiplies it by the reward in the corresponding time-step. But while training you feed it as the correction.
I think one could multiply the rewards by one-hot vector instead. And then feed it straight away.

If possible please clarify my doubt. :)
https://github.com/keon/policy-gradient/blob/master/pg.py#L67

@abhigenie92 abhigenie92 changed the title Loss function for neural network? Loss function/Labels for neural network used? Jun 25, 2017
@LinkToPast1990
Copy link

opt = Adam(lr=self.learning_rate)
model.compile(loss='categorical_crossentropy', optimizer=opt)

First, I think the loss should be gradient = (y-prob)*reward.
Second, we already set the learning_rate of opt.

So, Y should be self.probs + np.vstack([gradients]) ?
Y-Y_predict = Y - self.probs = np.vstack([gradients])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants