Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big Rescaling + O(0.001) IC + Complicated-Shape Domain #345

Closed
Ryszard2 opened this issue Jul 23, 2021 · 18 comments
Closed

Big Rescaling + O(0.001) IC + Complicated-Shape Domain #345

Ryszard2 opened this issue Jul 23, 2021 · 18 comments

Comments

@Ryszard2
Copy link

Ryszard2 commented Jul 23, 2021

Good morning Dr. @lululxvi,

I got stuck in a problem I already asked about some time ago. It's a Riemann Problem of the Shallow Water Equations where the domain is complicated and big:

  • Lmax = 16,500 m
  • T = 2,000 s

That's the original IC for the Water Height:

IC Original

I massively rescaled the problem in such a way to keep the equations unchanged, I decided to do it this way because otherwise some coefficients very very far from unitary value emerge in the equations.
Now:

  • Lmax = 1 m
  • T = 15.57 s
  • IC unfortunately becomes O(0.001)

IC Scaled

The problems are that:

  • I can't get the training to convergence in no way at all, basically because the Loss for the IC for the height can't get down
  • I can't hard-constrain the ICs nor the BCs because otherwise I get nans
  • Trying and trying I kind of concluded that I need a lot of training points, something that can be done with the small-batch resampling, but still the IC for the height keeps the Loss pretty high
  • The input data come all in the form of matrices, that seems definitely to be a problem for the learning of the IC

The code seems pretty correct to me, I can't figure out at all what I'm missing to get the job done.

Thank you
Riccardo

@lululxvi
Copy link
Owner

As you discovered, the first issue could be the training of IC. Could you try only with IC loss to see whether the IC can be trained well? You can set the PDE to be None or the weight of PDE is 0.

@Ryszard2
Copy link
Author

Thank you @lululxvi

Say I run a couple-hundred-thousands epochs to learn the one IC that bothers me (there are 3 ICs, but the other 2 are not a big deal), and I end up with a train loss like that:

[0.00e+00, 0.00e+00, 0.00e+00, 4.19e-07, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00]

as a result of this loss weights setup:

model.compile('adam', lr=0.001, loss_weights=[0, 0, 0, 1e2, 0, 0, 0, 0])

How do I proceed in the next training phase?
Am I supposed to set the remaining weights so that every component of the loss is O(1e-7) to match the loss I've achieved with the IC component?

By the way, I settled for a [50]*4 NN, I can't afford a bigger NN :-(

Thank you
Riccardo

@Ryszard2
Copy link
Author

Another question, @lululxvi

is it reasonable to sample 1,000 training points, let the training go for like 20,000 epochs, then sample 1,000 new training points for other 20,000 epochs and going this way up to some hundred thousands epochs?

@lululxvi
Copy link
Owner

You can use the two-stage training in this paper: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007575 with the code https://github.com/alirezayazdani1/SBINNs.

Yes, resampling makes sense. Here is an example: https://github.com/lululxvi/deepxde/blob/master/examples/diffusion_1d_resample.py using dde.callbacks.PDEResidualResampler

@Ryszard2
Copy link
Author

Thank you @lululxvi

Do the numbers in diffusion_1d_resample.py

num_domain   = 40,
num_boundary = 20,
num_initial  = 10,

resampler = dde.callbacks.PDEResidualResampler(period=100)

make sense also in the 2D small domain I'm working on? Or rather, the number of points should be increased, keeping the resampling period fixed at 100?

If this works, it would totally bypass - in a limited-computational-power-available situation - the annoying trade-off between the NN size and the number of points.

@lululxvi
Copy link
Owner

lululxvi commented Sep 1, 2021

The resampling period seems OK. You may increase the other three numbers by a factor of 10. It always good to have more points if the computation cost is not a problem.

@Ryszard2
Copy link
Author

Ryszard2 commented Sep 1, 2021

Thank you @lululxvi

First of all, I totally scaled the IC like this:

index

But still the learning of it is a challenge I never faced before. I did it so far without small-batch, only by setting to zero the non-BC loss weights. After 600k epochs, still some bothersome non-physical negative water remains, stucked in the narrow coves near the border

ic1

Now I'm going all-in for other hundreds of thousands of epochs, working on the good-but-not-great model I already got to after 600k epochs, for IC-BC only with this setting:

num_domain   = 0,
num_boundary = 400,
num_initial  = 400,

resampler = dde.callbacks.PDEResidualResampler(period=100)

hoping for a faster learning of this absurd IC :-(
I prefer not to go on with the learning of the PDE outputs until the IC is properly learned.

Edit: It didn't work out, this setting got the learning of the IC way worse :-(

Re-Edit: I noticed that setting the training like this:

num_domain   = 0,
num_boundary = 1000,
num_initial  = 4000,

resampler = dde.callbacks.PDEResidualResampler(period=100)

works a little better. Not as good as a full gradient descent with 10,000 IC points, but seems like sampling a zillion IC points every 100 epochs, for this specific case, could bring to something good. I can't figure out why is that.

@lululxvi
Copy link
Owner

lululxvi commented Sep 5, 2021

To clarify, you currently only learn the IC and BC, right? Does the full gradient descent with 10,000 IC points work?

@Ryszard2
Copy link
Author

Ryszard2 commented Sep 8, 2021

@lululxvi I made a simple test and I realized that the share of sampled points that goes to the complex non-homogeneous little sector is less than 10% of the points generated, i.e. the 90% of the training points are placed in the easy homogeneous big sector!

This is a pretty big waste, so I decided to anchor the points by myself with the random_points generator:

ic

With such training points' number and distribution - with the [256]*5 NN I'm finally using - the full gradient descent makes totally sense and I'm sure that the learning of the IC will be perfect.

Still, the spatio-temporal problem of PDE and BC remains.
The highly irregular distribution of IC forces me to choose the training points, I'll anchor the points by myself but I can't process a full gradient descent with that amount of points, so I'd ask you:

  • say that I anchor a zillion points in the spatio-temporal domain, would it be possible for DeepXDE to split that array in small batches? I would like to:

    1. create the giant array of training point for IC, BC, Domain
    2. ask DeepXDE to take 1,000 points, train 'em for 100 epochs, then take the next 1,000. So on till the end of the array.

Is it possible?
Maybe every small-batch should include points from every component (PDE and BC and IC)?

Honestly I can't see another way to tacke this problem.

@lululxvi
Copy link
Owner

It is great to see that you made some good progress.

If the training points are sampled by yourself, then DeepXDE won't split them into batches. Currently DeepXDE only splits the points that sampled by DeepXDE. Here are a few thoughs:

  • Actually I think you may not need a zillion points. Usually, a relatively dense and fixed points works well enough, and there is no need to use lots of batches of differenet points.
  • If you still want to use batches, then you may have to modify the function:
    def resample_train_points(self):

    to define how to resample each time by yourself.
    Then you can use it together with dde.callbacks.PDEResidualResampler

@Ryszard2
Copy link
Author

Ryszard2 commented Sep 16, 2021

Indeed, @lululxvi, I was now thinking about a kinda trade-off:

  • I sample the residual points by myself - with the nonuniform distribution I desire - ONLY for the IC (and maybe the BC too, I'll think about it)
  • I let DeepXDE do the sampling/resampling of the domain points

In my idea, I would make sure to learn well the complicated IC. There wouldn't be strict need of my anchoring afterwards, since the hyperbolic nature of the problem would move the solution downstream, where the polygon widens and the training points are naturally more.

@Rdfing
Copy link

Rdfing commented Sep 18, 2021

@Ryszard2 I am very interested in this project. Could you please share a link to your paper once you get the DeepXDE working on 2D SWE? Thanks, Haochen

@Ryszard2
Copy link
Author

@Rdfing actually this is a test case for my MSc thesis, a considerably challenging case.

@Rdfing
Copy link

Rdfing commented Sep 24, 2021

@Rdfing actually this is a test case for my MSc thesis, a considerably challenging case.

Wow, that is impressive! I get SWE to work with PINNs for some simple 2D benchmark cases, but have not tried for this benchmark.

@Ryszard2 Ryszard2 closed this as completed Nov 7, 2021
@FZUcipher
Copy link

As you discovered, the first issue could be the training of IC. Could you try only with IC loss to see whether the IC can be trained well? You can set the PDE to be None or the weight of PDE is 0.

@lululxvi Excuse me, Lulu. I would like to ask whether to set the weight of all but IC to 0 (including BC and PDE), or just set the weight of PDE to 0?

@lululxvi
Copy link
Owner

lululxvi commented Jun 3, 2022

@FZUcipher See FAQ Q: I failed to train the network or get the right solution, e.g., large training loss, unbalanced losses.

@thevelvetunderground
Copy link

Thank you @lululxvi

First of all, I totally scaled the IC like this:

index

But still the learning of it is a challenge I never faced before. I did it so far without small-batch, only by setting to zero the non-BC loss weights. After 600k epochs, still some bothersome non-physical negative water remains, stucked in the narrow coves near the border

ic1

Now I'm going all-in for other hundreds of thousands of epochs, working on the good-but-not-great model I already got to after 600k epochs, for IC-BC only with this setting:

num_domain   = 0,
num_boundary = 400,
num_initial  = 400,

resampler = dde.callbacks.PDEResidualResampler(period=100)

hoping for a faster learning of this absurd IC :-( I prefer not to go on with the learning of the PDE outputs until the IC is properly learned.

Edit: It didn't work out, this setting got the learning of the IC way worse :-(

Re-Edit: I noticed that setting the training like this:

num_domain   = 0,
num_boundary = 1000,
num_initial  = 4000,

resampler = dde.callbacks.PDEResidualResampler(period=100)

works a little better. Not as good as a full gradient descent with 10,000 IC points, but seems like sampling a zillion IC points every 100 epochs, for this specific case, could bring to something good. I can't figure out why is that.

Hello I want to know how do you visualize the training points randomly sampled in the geometry?

@gongsunlijiu
Copy link

@lululxvi I made a simple test and I realized that the share of sampled points that goes to the complex non-homogeneous little sector is less than 10% of the points generated, i.e. the 90% of the training points are placed in the easy homogeneous big sector!

This is a pretty big waste, so I decided to anchor the points by myself with the random_points generator:

ic

With such training points' number and distribution - with the [256]*5 NN I'm finally using - the full gradient descent makes totally sense and I'm sure that the learning of the IC will be perfect.

Still, the spatio-temporal problem of PDE and BC remains. The highly irregular distribution of IC forces me to choose the training points, I'll anchor the points by myself but I can't process a full gradient descent with that amount of points, so I'd ask you:

  • say that I anchor a zillion points in the spatio-temporal domain, would it be possible for DeepXDE to split that array in small batches? I would like to:

    1. create the giant array of training point for IC, BC, Domain
    2. ask DeepXDE to take 1,000 points, train 'em for 100 epochs, then take the next 1,000. So on till the end of the array.

Is it possible? Maybe every small-batch should include points from every component (PDE and BC and IC)?

Honestly I can't see another way to tacke this problem.

@Ryszard2 Hello, I see your question about the sampling points, which is very effective.
I am now thinking about the same question and would like to ask you how you customize the sampling points to achieve different sampling densities for narrow and wide areas. Thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants