-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss Calculation Question #10
Comments
Hi @jpanaro, glad you are still working on this project. See my answers for your questions below:
I hope these comments help. Let me know if you have any more questions! |
Your comments were very helpful! A few more small questions that cropped up:
|
|
@lvwerra |
@lvwerra assuming memory is not an issue, do you expect the code to run fine if the minibatch size is set to something > 1? |
Hello! Thanks again for your help last time, right now I am on the brink of finishing my system but I have a couple questions regarding the loss function inside the PPOTrainer class.
So for the following lines: here, using the default parameters (ppo_epochs = 4 and batch_size = 256) we run the second loop, and therefore the train_minibatch function a total of 1024 times but with only 256 unique samples. This means we also backpropagate the calculated loss and take an optimizer step 1024 times or 4 times for every single sample in the given batch since we only pass a single sample to the train_minibatch function. Would my understanding be correct here? And if so, is there a reason we are doing this for each sample instead of, for example, each forward_batch_size (default 16)?
In this loop found here we are calculating the reversed advantages for use later in the loss function and we perform this loop for the length of the query. Would you mind explaining to me why it has to be approached this way? In my system I don't have a spearation of query and response just input features and a text caption as output so I am struggling a bit as to how to adapt it for this loop specifically.
The text was updated successfully, but these errors were encountered: