SGD/small-batch #320

Ryszard2 · 2021-06-24T13:34:01Z

I can’t figure out if SGD or small-batch is definitively implemented and available in DeepXDE, without any adjustment to the source code by the user.

I’m trying to train a [3] + [128]*4 + [3] NN in a 2D Riemann problem where I would like to use some tens of thousands of training points. Doing it with the full gradient descent would take like one hour for a single 1000 epochs step (the max power available to me is the Google Colab GPU, whatever GPU it is).

I tried a small-batch-like approach by myself, without touching the source code, also because I’m not sure of what to do.
It’s more or less like this:

I train a small batch of domain/BC/IC, like 200/100/100 for like 10000 epochs, saving the model, save_better_only.
I start over the training, restoring the model, for a brand-new set of 200/100/100 points for another 10000 epochs.

And so on...

One thing I do not like of this approach is that I would like to save really an improved model, but every time I start over the training, after the first new 1000 epochs I get the message of the train loss improved from inf to the new loss, say:

Epoch 1000: train loss improved from inf to 8.91e-01, saving model to ...

where 8.91e-01 is way higher than the loss I got to at the end of the previous 10000 epochs run. I feel like I’m updating bad the hyperparameters every time I start over with the training.

Another approach I’m testing, inspired from #305, is:

I train a medium-size set of IC points, say 5000 points for 10000 epochs, I set to zero the weights and the points for PDE and BC. I save the model, save_better_only.
I start over the training, restoring the model, for a brand-new set of 0/0/5000 points for another 10000 epochs.

And so on...

When I’m satisfied with the results with the IC, I take on the BC part, keeping always some points for the IC, say 0/5000/100
When I’m satisfied with the results with the BC, I take on the PDE part, keeping always some points for BC and IC, say 5000/100/100

That’s what I’m thinking about.
I’m not sure it is a sound way to train a NN.

I’m still in the middle of it, it still “improves” the train loss from inf to the new loss every time it starts over, but it seems to work better than the first approach.

Thank you
Riccardo

The text was updated successfully, but these errors were encountered:

lululxvi · 2021-06-28T03:22:51Z

Yes, SGD/mini-batch is available in DeepXDE. You can use dde.callbacks.PDEResidualResampler(period=100), see an example https://github.com/lululxvi/deepxde/blob/master/examples/diffusion_1d_resample.py It will resample a new batch of training points every 100 iterations.

Ryszard2 · 2021-06-28T06:15:11Z

Thank you @lululxvi , I'm running it!

Ryszard2 closed this as completed Sep 2, 2021

pescap mentioned this issue May 25, 2022

Add support for Horovod (tensorflow) #703

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SGD/small-batch #320

SGD/small-batch #320

Ryszard2 commented Jun 24, 2021 •

edited

lululxvi commented Jun 28, 2021 •

edited

Ryszard2 commented Jun 28, 2021 •

edited

SGD/small-batch #320

SGD/small-batch #320

Comments

Ryszard2 commented Jun 24, 2021 • edited

lululxvi commented Jun 28, 2021 • edited

Ryszard2 commented Jun 28, 2021 • edited

Ryszard2 commented Jun 24, 2021 •

edited

lululxvi commented Jun 28, 2021 •

edited

Ryszard2 commented Jun 28, 2021 •

edited