You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I can’t figure out if SGD or small-batch is definitively implemented and available in DeepXDE, without any adjustment to the source code by the user.
I’m trying to train a [3] + [128]*4 + [3] NN in a 2D Riemann problem where I would like to use some tens of thousands of training points. Doing it with the full gradient descent would take like one hour for a single 1000 epochs step (the max power available to me is the Google Colab GPU, whatever GPU it is).
I tried a small-batch-like approach by myself, without touching the source code, also because I’m not sure of what to do.
It’s more or less like this:
I train a small batch of domain/BC/IC, like 200/100/100 for like 10000 epochs, saving the model, save_better_only.
I start over the training, restoring the model, for a brand-new set of 200/100/100 points for another 10000 epochs.
And so on...
One thing I do not like of this approach is that I would like to save really an improved model, but every time I start over the training, after the first new 1000 epochs I get the message of the train loss improved from inf to the new loss, say:
Epoch 1000: train loss improved from inf to 8.91e-01, saving model to ...
where 8.91e-01 is way higher than the loss I got to at the end of the previous 10000 epochs run. I feel like I’m updating bad the hyperparameters every time I start over with the training.
Another approach I’m testing, inspired from #305, is:
I train a medium-size set of IC points, say 5000 points for 10000 epochs, I set to zero the weights and the points for PDE and BC. I save the model, save_better_only.
I start over the training, restoring the model, for a brand-new set of 0/0/5000 points for another 10000 epochs.
And so on...
When I’m satisfied with the results with the IC, I take on the BC part, keeping always some points for the IC, say 0/5000/100
When I’m satisfied with the results with the BC, I take on the PDE part, keeping always some points for BC and IC, say 5000/100/100
That’s what I’m thinking about.
I’m not sure it is a sound way to train a NN.
I’m still in the middle of it, it still “improves” the train loss from inf to the new loss every time it starts over, but it seems to work better than the first approach.
Thank you
Riccardo
The text was updated successfully, but these errors were encountered:
Hello @lululxvi and @smao-astro
I can’t figure out if SGD or small-batch is definitively implemented and available in DeepXDE, without any adjustment to the source code by the user.
I’m trying to train a
[3] + [128]*4 + [3]
NN in a 2D Riemann problem where I would like to use some tens of thousands of training points. Doing it with the full gradient descent would take like one hour for a single 1000 epochs step (the max power available to me is the Google Colab GPU, whatever GPU it is).I tried a small-batch-like approach by myself, without touching the source code, also because I’m not sure of what to do.
It’s more or less like this:
I train a small batch of domain/BC/IC, like 200/100/100 for like 10000 epochs, saving the model,
save_better_only
.I start over the training, restoring the model, for a brand-new set of 200/100/100 points for another 10000 epochs.
And so on...
One thing I do not like of this approach is that I would like to save really an improved model, but every time I start over the training, after the first new 1000 epochs I get the message of the train loss improved from
inf
to the new loss, say:Epoch 1000: train loss improved from inf to 8.91e-01, saving model to ...
where
8.91e-01
is way higher than the loss I got to at the end of the previous 10000 epochs run. I feel like I’m updating bad the hyperparameters every time I start over with the training.Another approach I’m testing, inspired from #305, is:
And so on...
That’s what I’m thinking about.
I’m not sure it is a sound way to train a NN.
I’m still in the middle of it, it still “improves” the train loss from
inf
to the new loss every time it starts over, but it seems to work better than the first approach.Thank you
Riccardo
The text was updated successfully, but these errors were encountered: