Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Discussion for train loss, test Loss value and result accuracy #84

Closed
sumantkrsoni opened this issue Jul 11, 2020 · 39 comments
Closed

Comments

@sumantkrsoni
Copy link

Hello @lululxvi ,

  • How can we get the desired accuracy of the output result? ( suppose for error value== 1e-8).

  • can we correlate Train, Test value to the accuracy of the result?

  • Also, how can we prevent a problem for overfitting and underfitting

@lululxvi
Copy link
Owner

  • You need to train the network such that the train loss is small.
  • Yes, if we have sufficient residual points, and then train loss, test loss and accuracy are highly correlated.
  • If we have sufficient residual points, overfitting is not an issue. If the train loss is small, we don't have underfitting.

@sumantkrsoni
Copy link
Author

Thanks, @lululxvi, You have clarified my doubts.

One more doubt arises for the small value of the train loss term.

  • suppose if two value for the training loss we have
    a) train loss == 1e-3
    b) train loss == 1e-4

so, can we say that train loss for (b) is better than (a).

@lululxvi
Copy link
Owner

If the training residual points are dense, yes. You can also check the test error by using more dense test points.

@sumantkrsoni
Copy link
Author

Thanks @lululxvi. It's sort out my doubts.

@sumantkrsoni
Copy link
Author

dear Dr @lululxvi
I have trained a model but the training graph has been showing the test loss as Zero-value every iteration in the graph plot( parallel

to the x-axis at x =0).

training_graph

Also, result for this iteration has a small difference from the computed result.
what should, I do?

comment, please.

@lululxvi
Copy link
Owner

lululxvi commented Sep 8, 2020

I didn't get what you mean. In the plot, the test loss is not zero.

@sumantkrsoni
Copy link
Author

sorry @lululxvi
actually, I was asking about train loss.
I have trained the model for various iteration with various parameter combinations, but the training loss is of O(1e-1).

  • from the above training loss plot, can we say my model is learning well

@lululxvi
Copy link
Owner

lululxvi commented Sep 9, 2020

No, the training loss is too large. See "Q: I failed to train the network or get the right solution, e.g., the training loss is large." at FAQ.

@sumantkrsoni
Copy link
Author

Thanks @lululxvi
I have gone through the FAQ section for improving the training process of my model. Unfortunately, after implementing the trick used in the FAQ section would not help me.

for your Kind information,
My governing equation is:

ds_xx + ds_yy - Ra* dT_x = 0

dT_y * ds_x - dT_x * ds_y -( dT_xx + dT_yy) = 0

Neumann BC:
dT_x = 0 at x = 0
dT_y = 0, y = 0, y = 1

Dirichlet BC:
s = 0 , x = 0, x =1, y = 0 , y =1

for that, I have coded it as
Vprasad1984_2.zip

training graph for train and test loss is:

traininggraph10000

Numerical train loss value during epoch is:

trainingvalue

BC for the left edge of the domain is not learning properly,

Kindly, comment and give some tips to make it learn properly.

@lululxvi
Copy link
Owner

The code looks OK. Do you have exact solution to compare?

@sumantkrsoni
Copy link
Author

Yes @lululxvi
Streamline plot is:
RPaperStreamline_Ra_50

Isotherm plot is:
RPaper_Isotherm_Ra_50

Actually the streamline plot obtained by DeepXDE simulation appears similar to the above result but the numerical values are not close to the result.

I am little bit confused for the final outcomes from Deep learning.

Kindly suggest something.

@lululxvi
Copy link
Owner

lululxvi commented Sep 10, 2020

Try:

  • Increase num_domain to 10000
  • Increase network size
  • 10000 steps is definitely not enough. Train for longer time, e.g., 1 million with a smaller learning rate.
  • Enforce DirichletBC exactly

@sumantkrsoni
Copy link
Author

sumantkrsoni commented Sep 11, 2020

After your suggestion for I have modified the code as:

  • increased num_domain =100
    num_domain = 10000
    num_boundary = 2000
    num_test = 10000
    data  = dde.data.PDE(square, pde, BC, num_domain, num_boundary, num_test)
  • increased network size is:

layer_size = [2] + [80]*3 + [2]

  • increased step size is:

losshistory, train_state = model.train(epochs = 95000,callbacks=[early_stopping],disregard_previous_best=True)

  • To enforced exact DirichletBC for equation S_new and T_new is:
    S_new = x*(1-x)y(1-y)*S
    T_new = (1-x)*T

and code is:

net.apply_output_transform(
        lambda x, y : tf.concat(
            [
            x[:,0:1]*(1-x[:,0:1])*x[:,1:2]*(1-x[:,1:2])*y[ :, 0:1],
            (1-x[:,0:1])*y[ :, 1:2]
            ], axis =1
            )   )

but, unfortunately, the intermediate learning process does not seem good, especially for Neumann BC.
After, enforcing the exact BC over S and T, the learning rate goes zero on several BC and
The training loss for Neumann BC for T on the left edge has a huge value: ( 1 )

The intermediate training loss for 20000 out of 95000 epochs is:

training 20k

For a long time, I am not getting proper learning in the case of Neumann BC.

Kindly, give me some tips to get rid of this learning issue.

Thanks

@lululxvi
Copy link
Owner

Could you try even large network, e.g., [256] * 5, and a smaller learning rate, e.g., 1e-4?

@sumantkrsoni
Copy link
Author

sumantkrsoni commented Sep 11, 2020

Dear @lululxvi Sir,
Can you comment on the learning graph for different BC over edges,
Because, after enforcing the exact BC , the training loss values become zero to the few columns.
I am not getting, why few BC are not learning.

Kindly, comment something

Also, after your suggestion, I have further modified my code:

All the above changes remain the same but the only network and learning rate has modified as:

network size:

layer_size = [2] + [260]*5 + [2]

learning rate
model.compile("adam", lr=1e-4)

after compilation of the code, I'll intimate you.

@sumantkrsoni
Copy link
Author

sumantkrsoni commented Sep 11, 2020

Dear @lululxvi
After modifying

  • network size = [256] * 5
  • learning rate = 1e-4
  • num_domain = 10000

Unfortunately, I am getting an error due to overflooded on size volume which snapshot is:

error2

error_message

After reducing the network size from 5 to 4, code is running but a huge time.

@sumantkrsoni
Copy link
Author

sumantkrsoni commented Sep 12, 2020

Unfortunately, no luck again based on your suggestion.

I kept a few parameters as:

Rayleigh Number:

def main(): Ra = 50 ............

Number of domain and boundary point as:
num_domain = 7000 num_boundary = 2000 num_test = 10000 data = dde.data.PDE(square, pde, BC, num_domain, num_boundary, num_test)

NN Network size as:

layer_size = [2] + [150]*3 + [2]

Enforcing the exact Dirichlet BC for s and T is:

net.apply_output_transform(
        lambda x, y : tf.concat(
            [
            x[:,0:1]*(1-x[:,0:1])*x[:,1:2]*(1-x[:,1:2])*y[ :, 0:1],
            (1-x[:,0:1])*y[ :, 1:2]
            ], axis =1
            )   )

learning rate: 1e-8

early_stopping = dde.callbacks.EarlyStopping(min_delta=1e-8, patience=15000)

no of epochs = 95000

    losshistory, train_state = model.train(epochs = 95000,callbacks=[early_stopping],disregard_previous_best=True)

Combining all these modifications to the code, the training graph and output results are really disappointing.

Streamline (S) plot after 95k epochs.
streamline_95k

Tepmr (T) plo after 95k epochs
isotherm_95k

Training loss during the last few epochs which very huge.
training_95k

training
training graph95k
graph during the whole iteration:

I am eagerly waiting for your response @lululxvi sir.

Kindly take a look at my code_file is :
Vprasad1984_2b.zip

@lululxvi
Copy link
Owner

It is really strange why the training loss does not go down. I am not sure what is wrong here. Could you try an easier setup, e.g., smaller Rayleigh Number.

@sumantkrsoni
Copy link
Author

Sir, Rayleigh Number is already very less ( Ra = 50), that I think very small compare to Ra = 1e4 or 1e5.

I am not sure about goin beyond the Ra = 50 would effect any thing.

Could you please give some alternate to deal with the coupled equation with the help of Deep Learning method.

@lululxvi
Copy link
Owner

lululxvi commented Sep 15, 2020

I think this problem could be solved. Is this the only problem you try to solve? What about try other similar problems in your mind first, so that we may know what is wrong here?

@sumantkrsoni
Copy link
Author

before moving to the further problem sir.
In my code script, I have defined ds_yy and dt_yy as:

ds_yy = tf.gradients(ds_x, x)[0][:, 1:2]
dt_yy = tf.gradients(dt_x, x)[0][:, 1:2]

which should be,

ds_yy = tf.gradients(ds_y, x)[0][:, 1:2]
dt_yy = tf.gradients(dt_y, x)[0][:, 1:2]

I think.

@sumantkrsoni
Copy link
Author

I am trying to solve the similar problem,
only the Neumann BC has been changed from left vertical wall to Dirichlet BC

Also, the same training loss problem was occurring on simulation.

bc_left_s = dde.DirichletBC(square, fnZero, boundary_left, component = 0)
bc_left_t = dde.DirichletBC(square, fnOnes, boundary_left, component = 1)

Now, with the same parameters that was suggested by you, I am going to implement now,

and let you know about the final result after the compilation.

@sumantkrsoni
Copy link
Author

Dear @lululxvi Sir,
I have implemented the same type of problem with minor change in the boundary condition.

  • Applying Dirichlet BC in place of Neumann BC to the left vertical edge.

Which code script is:

    bc_left_s = dde.DirichletBC(square, fnZero, boundary_left, component = 0)
    bc_left_t = dde.DirichletBC(square, fnOnes, boundary_left, component = 1)

Unfortunately, no luck again for the training loss. It's training loss is still high and final output is not matching the other method result output.

training loss

Training values for 95k epochs.

training values95k

results pattern are matching upto some extent, but their the results values are not same.

Find the attached Code Script :

Vprasad1984_2c_corrected.zip

Please comment for improving the result output value.

@sumantkrsoni
Copy link
Author

@lululxvi Sir,
Please suggest something, why the training loss is still high.
How to fix it?

@lululxvi
Copy link
Owner

Could you try larger network, exact DirichletBC, running for more iterations (e.g., one million)

@sumantkrsoni
Copy link
Author

Dear @lululxvi Sir, Can we predict the training loss behavior after having the first 20k iteration out of one million iterations.

I am asking this because of, one million iterations with your suggested combination of parameters are taking 10-11 hr to complete the training process.

By getting the behavior of the first 10k iteration, I can inform you the nature of the training process.

Please, comment something.

@lululxvi
Copy link
Owner

No, theoretically it is almost impossible, otherwise the whole deep learning community would be happy. 10 hr seems acceptable.

Neural networks are very flexible, and tuning neural networks require experience. That's why I suggest you to start from slightly simple examples.

@sumantkrsoni
Copy link
Author

You are right @lululxvi Sir,

Neural networks are very flexible, and tuning neural networks require experience. That's why I suggest you to start from slightly simple examples.

I have successfully solved some simple examples used DeepXDE and now, I am trying to solve some well known coupled differential equation.

#107 and #84 is one among such problem. If either of the problem will have a clue for getting smaller train loss then I could solve both problem simultaneously.

I'll compile the code for #84 and intimate you about the result output and training loss behaviour.

Thanks

@sumantkrsoni
Copy link
Author

Could you try a larger network, exact DirichletBC, running for more iterations (e.g., one million)

Based on your suggestion, I have compiled the code for 100000 iterations under the mentioned parameters.

the whole iteration took almost 23hr to complete.

Learning process has improved compare to the previous parameters combination.

1Lack_iteration

Unfortunately, result are not close to the solution.

The output result and code has attached herewith.
code_square.zip

Kindly, suggest me for the improving the training loss. (@lululxvi , @smao-astro )

@sumantkrsoni
Copy link
Author

Hii Dr @lululxvi
I am curious about a few points frequently used during the training process in deepXDE module.

Am I right about these points?
a) The data sampling used for boundary and domain points have been separately chosen.
b) During the draining process, the loss associated with boundary and domain has been chosen for boundary and domain data separately.

kindly, help me to figure out my doubts about the above points.

@lululxvi
Copy link
Owner

lululxvi commented Nov 14, 2020

Yes for both questions. Also, see #39 and "Q: More details about DeepXDE source code, and want to modify DeepXDE, e.g., to use multiple GPUs and mini batch." at FAQ.

@sumantkrsoni
Copy link
Author

Thanks for your suggestion.

Also, could you clarify one more doubt,

  • data feeding facility for boundary condition and PDE usually choose the separate data (from BC data and domain data points).
    or
    there is no restriction over choosing the training points among BC data and domain data.

@lululxvi
Copy link
Owner

Here are the details:

  1. DeepXDE generates all the points (including inside domain, on boundary, and anchors) according to the arguments specified in PDE(...).
  2. All the points in Step 1 are used for PDE loss.
  3. For each BC/IC, we loop all the points in Step 1 to pick up the points for this BC/IC based on the definition of this BC/IC.

@sumantkrsoni
Copy link
Author

Dear Lu Sir,
Could you help me to figure out my doubt:

  • Deepxde build only single network for a problem.?
  • How can we access the neuron at the output level.

say if we have two neuron at the output level, then can we access it by coding it
firsts output neuron----------> N[:, 0:1]
second output neuron -------> N[:, 1:]

  • How the Neumann boundary condition using the boundary data. Specially when it has a gradients over output variable.
    does it use the BC data before the gradients or after the gradients.

Thanks Sir

@lululxvi
Copy link
Owner

lululxvi commented Jan 8, 2021

  • DeepXDE supports several different types of networks. If you use FNN, then yes, it is a single network.
  • Yes, the way to access the output is correct.
  • I don't quite get your question. DeepXDE will compute the normal derivative automatically for each Neumann BC location.
    def normal_derivative(self, X, inputs, outputs, beg, end):

@sumantkrsoni
Copy link
Author

Thanks Sir.

If I am not wrong, can we say the normai_derivative term deal with separate data over the boundary over which that term defined to.

@lululxvi
Copy link
Owner

I don't understand your question. What is separate data?

@sumantkrsoni
Copy link
Author

I got the answer to my question.

Thanks Lu Sir.

@Laoliu66
Copy link

Laoliu66 commented Aug 6, 2022

@sumantkrsoni Hello, I have encountered similar problems. Have you solved them? Can you share your code conveniently? Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants