General Discussion for train loss, test Loss value and result accuracy #84

sumantkrsoni · 2020-07-11T13:39:30Z

Hello @lululxvi ,

How can we get the desired accuracy of the output result? ( suppose for error value== 1e-8).
can we correlate Train, Test value to the accuracy of the result?
Also, how can we prevent a problem for overfitting and underfitting

lululxvi · 2020-07-11T16:14:50Z

You need to train the network such that the train loss is small.
Yes, if we have sufficient residual points, and then train loss, test loss and accuracy are highly correlated.
If we have sufficient residual points, overfitting is not an issue. If the train loss is small, we don't have underfitting.

sumantkrsoni · 2020-07-11T17:05:46Z

Thanks, @lululxvi, You have clarified my doubts.

One more doubt arises for the small value of the train loss term.

suppose if two value for the training loss we have
a) train loss == 1e-3
b) train loss == 1e-4

so, can we say that train loss for (b) is better than (a).

lululxvi · 2020-07-11T17:38:58Z

If the training residual points are dense, yes. You can also check the test error by using more dense test points.

sumantkrsoni · 2020-07-12T02:06:05Z

Thanks @lululxvi. It's sort out my doubts.

sumantkrsoni · 2020-09-08T12:01:31Z

dear Dr @lululxvi
I have trained a model but the training graph has been showing the test loss as Zero-value every iteration in the graph plot( parallel

to the x-axis at x =0).

Also, result for this iteration has a small difference from the computed result.
what should, I do?

comment, please.

lululxvi · 2020-09-08T19:21:56Z

I didn't get what you mean. In the plot, the test loss is not zero.

sumantkrsoni · 2020-09-09T02:51:44Z

sorry @lululxvi
actually, I was asking about train loss.
I have trained the model for various iteration with various parameter combinations, but the training loss is of O(1e-1).

from the above training loss plot, can we say my model is learning well

lululxvi · 2020-09-09T02:59:18Z

No, the training loss is too large. See "Q: I failed to train the network or get the right solution, e.g., the training loss is large." at FAQ.

sumantkrsoni · 2020-09-09T05:36:33Z

Thanks @lululxvi
I have gone through the FAQ section for improving the training process of my model. Unfortunately, after implementing the trick used in the FAQ section would not help me.

for your Kind information,
My governing equation is:

ds_xx + ds_yy - Ra* dT_x = 0

dT_y * ds_x - dT_x * ds_y -( dT_xx + dT_yy) = 0

Neumann BC:
dT_x = 0 at x = 0
dT_y = 0, y = 0, y = 1

Dirichlet BC:
s = 0 , x = 0, x =1, y = 0 , y =1

for that, I have coded it as
Vprasad1984_2.zip

training graph for train and test loss is:

Numerical train loss value during epoch is:

BC for the left edge of the domain is not learning properly,

Kindly, comment and give some tips to make it learn properly.

lululxvi · 2020-09-10T06:12:21Z

The code looks OK. Do you have exact solution to compare?

sumantkrsoni · 2020-09-10T07:48:05Z

Yes @lululxvi
Streamline plot is:

Isotherm plot is:

Actually the streamline plot obtained by DeepXDE simulation appears similar to the above result but the numerical values are not close to the result.

I am little bit confused for the final outcomes from Deep learning.

Kindly suggest something.

lululxvi · 2020-09-10T19:18:39Z

Try:

Increase num_domain to 10000
Increase network size
10000 steps is definitely not enough. Train for longer time, e.g., 1 million with a smaller learning rate.
Enforce DirichletBC exactly

sumantkrsoni · 2020-09-11T02:38:36Z

After your suggestion for I have modified the code as:

increased num_domain =100

    num_domain = 10000
    num_boundary = 2000
    num_test = 10000
    data  = dde.data.PDE(square, pde, BC, num_domain, num_boundary, num_test)

increased network size is:

layer_size = [2] + [80]*3 + [2]

increased step size is:

losshistory, train_state = model.train(epochs = 95000,callbacks=[early_stopping],disregard_previous_best=True)

To enforced exact DirichletBC for equation S_new and T_new is:
S_new = x*(1-x)y(1-y)*S
T_new = (1-x)*T

and code is:

net.apply_output_transform(
        lambda x, y : tf.concat(
            [
            x[:,0:1]*(1-x[:,0:1])*x[:,1:2]*(1-x[:,1:2])*y[ :, 0:1],
            (1-x[:,0:1])*y[ :, 1:2]
            ], axis =1
            )   )

but, unfortunately, the intermediate learning process does not seem good, especially for Neumann BC.
After, enforcing the exact BC over S and T, the learning rate goes zero on several BC and
The training loss for Neumann BC for T on the left edge has a huge value: ( 1 )

The intermediate training loss for 20000 out of 95000 epochs is:

For a long time, I am not getting proper learning in the case of Neumann BC.

Kindly, give me some tips to get rid of this learning issue.

Thanks

lululxvi · 2020-09-11T05:20:19Z

Could you try even large network, e.g., [256] * 5, and a smaller learning rate, e.g., 1e-4?

sumantkrsoni · 2020-09-11T06:25:59Z

Dear @lululxvi Sir,
Can you comment on the learning graph for different BC over edges,
Because, after enforcing the exact BC , the training loss values become zero to the few columns.
I am not getting, why few BC are not learning.

Kindly, comment something

Also, after your suggestion, I have further modified my code:

All the above changes remain the same but the only network and learning rate has modified as:

network size:

layer_size = [2] + [260]*5 + [2]

learning rate
model.compile("adam", lr=1e-4)

after compilation of the code, I'll intimate you.

sumantkrsoni · 2020-09-11T07:26:40Z

Dear @lululxvi
After modifying

network size = [256] * 5
learning rate = 1e-4
num_domain = 10000

Unfortunately, I am getting an error due to overflooded on size volume which snapshot is:

After reducing the network size from 5 to 4, code is running but a huge time.

sumantkrsoni · 2020-09-12T12:21:39Z

Unfortunately, no luck again based on your suggestion.

I kept a few parameters as:

Rayleigh Number:

def main(): Ra = 50 ............

Number of domain and boundary point as:
num_domain = 7000 num_boundary = 2000 num_test = 10000 data = dde.data.PDE(square, pde, BC, num_domain, num_boundary, num_test)

NN Network size as:

layer_size = [2] + [150]*3 + [2]

Enforcing the exact Dirichlet BC for s and T is:

net.apply_output_transform(
        lambda x, y : tf.concat(
            [
            x[:,0:1]*(1-x[:,0:1])*x[:,1:2]*(1-x[:,1:2])*y[ :, 0:1],
            (1-x[:,0:1])*y[ :, 1:2]
            ], axis =1
            )   )

learning rate: 1e-8

early_stopping = dde.callbacks.EarlyStopping(min_delta=1e-8, patience=15000)

no of epochs = 95000

    losshistory, train_state = model.train(epochs = 95000,callbacks=[early_stopping],disregard_previous_best=True)

Combining all these modifications to the code, the training graph and output results are really disappointing.

Streamline (S) plot after 95k epochs.

Tepmr (T) plo after 95k epochs

Training loss during the last few epochs which very huge.

training

graph during the whole iteration:

I am eagerly waiting for your response @lululxvi sir.

Kindly take a look at my code_file is :
Vprasad1984_2b.zip

lululxvi · 2020-09-13T04:59:08Z

It is really strange why the training loss does not go down. I am not sure what is wrong here. Could you try an easier setup, e.g., smaller Rayleigh Number.

sumantkrsoni · 2020-09-13T05:20:09Z

Sir, Rayleigh Number is already very less ( Ra = 50), that I think very small compare to Ra = 1e4 or 1e5.

I am not sure about goin beyond the Ra = 50 would effect any thing.

Could you please give some alternate to deal with the coupled equation with the help of Deep Learning method.

lululxvi · 2020-09-15T00:42:36Z

I think this problem could be solved. Is this the only problem you try to solve? What about try other similar problems in your mind first, so that we may know what is wrong here?

sumantkrsoni · 2020-09-15T02:47:50Z

before moving to the further problem sir.
In my code script, I have defined ds_yy and dt_yy as:

ds_yy = tf.gradients(ds_x, x)[0][:, 1:2]
dt_yy = tf.gradients(dt_x, x)[0][:, 1:2]

which should be,

ds_yy = tf.gradients(ds_y, x)[0][:, 1:2]
dt_yy = tf.gradients(dt_y, x)[0][:, 1:2]

I think.

sumantkrsoni · 2020-09-15T05:42:59Z

I am trying to solve the similar problem,
only the Neumann BC has been changed from left vertical wall to Dirichlet BC

Also, the same training loss problem was occurring on simulation.

bc_left_s = dde.DirichletBC(square, fnZero, boundary_left, component = 0)
bc_left_t = dde.DirichletBC(square, fnOnes, boundary_left, component = 1)

Now, with the same parameters that was suggested by you, I am going to implement now,

and let you know about the final result after the compilation.

sumantkrsoni · 2020-09-15T18:39:29Z

Dear @lululxvi Sir,
I have implemented the same type of problem with minor change in the boundary condition.

Applying Dirichlet BC in place of Neumann BC to the left vertical edge.

Which code script is:

    bc_left_s = dde.DirichletBC(square, fnZero, boundary_left, component = 0)
    bc_left_t = dde.DirichletBC(square, fnOnes, boundary_left, component = 1)

Unfortunately, no luck again for the training loss. It's training loss is still high and final output is not matching the other method result output.

Training values for 95k epochs.

results pattern are matching upto some extent, but their the results values are not same.

Find the attached Code Script :

Vprasad1984_2c_corrected.zip

Please comment for improving the result output value.

sumantkrsoni · 2020-09-16T03:56:24Z

@lululxvi Sir,
Please suggest something, why the training loss is still high.
How to fix it?

lululxvi · 2020-09-17T20:39:55Z

Could you try larger network, exact DirichletBC, running for more iterations (e.g., one million)

sumantkrsoni · 2020-09-18T15:48:54Z

Dear @lululxvi Sir, Can we predict the training loss behavior after having the first 20k iteration out of one million iterations.

I am asking this because of, one million iterations with your suggested combination of parameters are taking 10-11 hr to complete the training process.

By getting the behavior of the first 10k iteration, I can inform you the nature of the training process.

Please, comment something.

lululxvi · 2020-09-19T17:36:08Z

No, theoretically it is almost impossible, otherwise the whole deep learning community would be happy. 10 hr seems acceptable.

Neural networks are very flexible, and tuning neural networks require experience. That's why I suggest you to start from slightly simple examples.

sumantkrsoni · 2020-09-19T17:54:23Z

You are right @lululxvi Sir,

Neural networks are very flexible, and tuning neural networks require experience. That's why I suggest you to start from slightly simple examples.

I have successfully solved some simple examples used DeepXDE and now, I am trying to solve some well known coupled differential equation.

#107 and #84 is one among such problem. If either of the problem will have a clue for getting smaller train loss then I could solve both problem simultaneously.

I'll compile the code for #84 and intimate you about the result output and training loss behaviour.

Thanks

sumantkrsoni · 2020-09-21T02:58:28Z

Could you try a larger network, exact DirichletBC, running for more iterations (e.g., one million)

Based on your suggestion, I have compiled the code for 100000 iterations under the mentioned parameters.

Rayleigh No. = 50
Enforcing exact Dirichlet BC as per given in Convergence issue on Navier-Stokes equation #80
Incresed layer size as ( 2 + 250 * 3 + 2 )
No of domain point = 10000, No of Boundary point = 30000
learning rate = 0.0001

the whole iteration took almost 23hr to complete.

Learning process has improved compare to the previous parameters combination.

Unfortunately, result are not close to the solution.

The output result and code has attached herewith.
code_square.zip

Kindly, suggest me for the improving the training loss. (@lululxvi , @smao-astro )

sumantkrsoni · 2020-11-13T15:02:30Z

Hii Dr @lululxvi
I am curious about a few points frequently used during the training process in deepXDE module.

Am I right about these points?
a) The data sampling used for boundary and domain points have been separately chosen.
b) During the draining process, the loss associated with boundary and domain has been chosen for boundary and domain data separately.

kindly, help me to figure out my doubts about the above points.

lululxvi · 2020-11-14T07:25:00Z

Yes for both questions. Also, see #39 and "Q: More details about DeepXDE source code, and want to modify DeepXDE, e.g., to use multiple GPUs and mini batch." at FAQ.

sumantkrsoni · 2020-11-15T02:06:31Z

Thanks for your suggestion.

Also, could you clarify one more doubt,

data feeding facility for boundary condition and PDE usually choose the separate data (from BC data and domain data points).
or
there is no restriction over choosing the training points among BC data and domain data.

lululxvi · 2020-11-18T00:40:41Z

Here are the details:

DeepXDE generates all the points (including inside domain, on boundary, and anchors) according to the arguments specified in PDE(...).
All the points in Step 1 are used for PDE loss.
For each BC/IC, we loop all the points in Step 1 to pick up the points for this BC/IC based on the definition of this BC/IC.

sumantkrsoni · 2021-01-04T05:53:57Z

Dear Lu Sir,
Could you help me to figure out my doubt:

Deepxde build only single network for a problem.?
How can we access the neuron at the output level.

say if we have two neuron at the output level, then can we access it by coding it
firsts output neuron----------> N[:, 0:1]
second output neuron -------> N[:, 1:]

How the Neumann boundary condition using the boundary data. Specially when it has a gradients over output variable.
does it use the BC data before the gradients or after the gradients.

Thanks Sir

lululxvi · 2021-01-08T01:06:03Z

DeepXDE supports several different types of networks. If you use FNN, then yes, it is a single network.
Yes, the way to access the output is correct.
I don't quite get your question. DeepXDE will compute the normal derivative automatically for each Neumann BC location.

deepxde/deepxde/boundary_conditions.py

Line 33 in ecfe9ba

def normal_derivative(self, X, inputs, outputs, beg, end):

sumantkrsoni · 2021-01-08T04:19:53Z

Thanks Sir.

If I am not wrong, can we say the normai_derivative term deal with separate data over the boundary over which that term defined to.

lululxvi · 2021-01-11T05:15:29Z

I don't understand your question. What is separate data?

sumantkrsoni · 2021-01-11T07:00:46Z

I got the answer to my question.

Thanks Lu Sir.

Laoliu66 · 2022-08-06T09:09:27Z

@sumantkrsoni Hello, I have encountered similar problems. Have you solved them? Can you share your code conveniently? Thank you

sumantkrsoni closed this as completed Jul 12, 2020

sumantkrsoni mentioned this issue Sep 19, 2020

BC Over a Right Angle Triangle #107

Closed

General Discussion for train loss, test Loss value and result accuracy #84

General Discussion for train loss, test Loss value and result accuracy #84

Comments

sumantkrsoni commented Jul 11, 2020

lululxvi commented Jul 11, 2020

sumantkrsoni commented Jul 11, 2020

lululxvi commented Jul 11, 2020

sumantkrsoni commented Jul 12, 2020

sumantkrsoni commented Sep 8, 2020

lululxvi commented Sep 8, 2020

sumantkrsoni commented Sep 9, 2020

lululxvi commented Sep 9, 2020

sumantkrsoni commented Sep 9, 2020

lululxvi commented Sep 10, 2020

sumantkrsoni commented Sep 10, 2020

lululxvi commented Sep 10, 2020 • edited

sumantkrsoni commented Sep 11, 2020 • edited

lululxvi commented Sep 11, 2020

sumantkrsoni commented Sep 11, 2020 • edited

sumantkrsoni commented Sep 11, 2020 • edited

sumantkrsoni commented Sep 12, 2020 • edited

lululxvi commented Sep 13, 2020

sumantkrsoni commented Sep 13, 2020

lululxvi commented Sep 15, 2020 • edited

sumantkrsoni commented Sep 15, 2020

sumantkrsoni commented Sep 15, 2020

sumantkrsoni commented Sep 15, 2020

sumantkrsoni commented Sep 16, 2020

lululxvi commented Sep 17, 2020

sumantkrsoni commented Sep 18, 2020

lululxvi commented Sep 19, 2020

sumantkrsoni commented Sep 19, 2020

sumantkrsoni commented Sep 21, 2020

sumantkrsoni commented Nov 13, 2020

lululxvi commented Nov 14, 2020 • edited

sumantkrsoni commented Nov 15, 2020

lululxvi commented Nov 18, 2020

sumantkrsoni commented Jan 4, 2021

lululxvi commented Jan 8, 2021 • edited

sumantkrsoni commented Jan 8, 2021

lululxvi commented Jan 11, 2021

sumantkrsoni commented Jan 11, 2021

Laoliu66 commented Aug 6, 2022

lululxvi commented Sep 10, 2020 •

edited

sumantkrsoni commented Sep 11, 2020 •

edited

sumantkrsoni commented Sep 11, 2020 •

edited

sumantkrsoni commented Sep 11, 2020 •

edited

sumantkrsoni commented Sep 12, 2020 •

edited

lululxvi commented Sep 15, 2020 •

edited

lululxvi commented Nov 14, 2020 •

edited

lululxvi commented Jan 8, 2021 •

edited