-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shell_simple_3d crashes #123
Comments
I notice that the CFL number is rather large (1.0) for this parameter file, and that the crash happens on the second time step after a refinement. Is it possible that the bdf2 scheme is losing stability? That is to say, could the old_solution and the old_old_solution be sufficiently different across refinement levels that the quadratic extrapolation is not well behaved? Did this cookbook work previously? A test of this theory would be (1) refining after, say, 5 timesteps instead of 15, or (2) turning off bdf2 and just using implicit Euler. |
I'm quite certain I ran the input file pretty much as is to generate the corresponding section in the manual... |
and I think we should be unconditionally stable so the CFL number should not be the problem. This could be related to the change of the linear residual that we did. |
On 06/06/2014 01:52 PM, Timo Heister wrote:
Good point. I've been running this on a branch here but would appreciate W. Wolfgang Bangerth email: bangerth@math.tamu.edu |
Wasn't the change to the linear residual only in the Stokes system? It appears to me that the failure is in the solve for the temperature system. |
I can reproduce the problem here (even with 4 cpus). Changing how often we do AMR doesn't dramatically change the result. I still see the crash with 2 instead of 3 initial adaptive steps (doesn't make the problem much smaller though). |
I managed to get the crash much sooner (timestep 6 with 900k DoFs): |
I can confirm that Timo's simplified prm file crashes on my workstation with 4 cores. |
I filled an hour of free-time with some tests concerning this problem, and while I have no solution, here are some observations and ways to circumvent the crash:
|
interestingly enough, GMRES is unable to reduce the residual at all: If I switch from entropy viscosity to first order viscosity, everything is working fine. I am not sure if any of the old solution variables is wrong after mesh refinement, or if we for some reason do not apply enough stabilization. |
The entropy variation jumps up by a factor of 2 in this step and the extrapolated_advection_field_range for the temperature is 973, 2398.36 instead of 973, 1973 in every step before. |
Hmm, the velocity is changing a lot between the old solution and old old solution and the maximum entropy stabilization and the advection system residual is calculated with the mean velocity of the two, however the velocity is decreasing so in theory we should only have an overstabilization and not too less stabilization. With higher stabilization values for beta and c_R one can delay the crash to later time steps, but it still occurs. |
You know what the problem could be? old_old_solution is interpolated, so it might not be divergence free. The increase of the max temperature in global_field_range is not the problem... |
Ok that jump in entropy variation is not surprising if the extrapolated_advection_field_range changes, because we approximate the average temperature by taking the arithmetic mean of maximal and minimal temperature, and (973+2398.36)/2 is much farther away from 1473 than (973+1973)/2. The interesting question would be why the extrapolated_advection_field_range changes. This would require having largely different old_solution and old_old_solution temperature values just because of the mesh refinement in the step before. By the way, do you know, why we use an iterated trapezoidal quadrature rule for the estimation of the extrapolated_advection_field_range? I was just confused because of the difference to the otherwise used Gauss quadrature in the assembly. |
I know you wrote further up that the time stepping scheme should be unconditionally stable, but I would like to come back to this point for a moment. As far as I can see we calculate the entropy stabilization with the old_solution, however in the assembly for the equation we use the extrapolated current_linearization_point velocities and we know the velocities have changed a lot between old_old_solution and old_solution so the current_linearization_point will be quite different from old_solution (even more so, because the velocities are much lower now, and therefore the timestep length is now 3 times larger than in the timestep before). Additionally, if I for test purposes use the old_solution to assemble the matrix the solver converges well. Maybe calculating the entropy viscosity with the old solution is not accurate enough in this particular case? |
Rene, so if you replace the extrapolated Stokes solution by old_solution in the entropy computation, it works? |
Has anybody attempted to do a git bisect for this problem? On Sun, Sep 14, 2014 at 5:08 PM, Timo Heister notifications@github.com
|
What I did so far was replacing the current_velocities in the matrix_assembly in assembly.cc:1597 by the old velocities. This is clearly wrong (changing our timestepping just there but nowhere else), but stable. Currently we use the old_velocities for calculating the entropy viscosity, but we might test to replace it by the current velocities. I think we did not do this, because it is some kind of a nonlinearity (basing the assembly of the matrix on a extrapolated quantity)? It is however a bit more work to replace all the calls to scratch.explicit_material_model by the normal material_model, and I am currently a bit short on schedule, so I can only look into it in two weeks or so. @ian-r-rose: I do not think somebody did a bisect yet, however I remember faintly that I tried to run it with aspect-1.1 and it was already in there. Might be wrong though. |
Not a proper bisection, but I get the same broken behavior as far back as the switch to cmake. I didn't go further back due to not wanting to switch up the build system. |
I recall that the issue predates the switch to git for sure. If it even predates the switch to cmake then it's indeed pretty old. I dimly recall having done a bisection at one point in the past and found that it was a complete innocently looking patch -- it may have been the one where Timo fiddled with computing the way we compute the relative tolerance for convergence. But I really can't say for sure any more. I think finding the patch that caused this would really help nail down the problem. |
I went back all the way to 5385905 (May 2013), fought a bit to make it compile, and it fails in the same way. So I don't have a revision where my reduced code is working. |
Is that before or after the section in the manual was added? I know I ran it when I wrote the section... |
before. The file got added in Feb 2014:
What do you propose? Maybe it worked back then but only on a finer mesh? |
(I am now rerunning your original shell_3d when you checked it in, we'll see where that goes) |
Original shell_3d (b5241f3) crashes in step 17 (as W. wrote in the first message in this thread). So either it never worked with this resolution, or the changes were inside deal.II. |
good news: it looks like it is working with the recent manifold changes (my file, 42 timesteps and running). We need to do some more testing, though. 👍 |
Let's get a release out right now, quick, pronto, before anyone finds something that doesn't work! ;-) |
Confirmed to work now. Not fast, but that's because it's a big model I suppose. |
When running shell_simple_3d.prm with 32 processors, it works for 16 time steps but then I get this:
*** Timestep 15: t=7.83037e+06 years
Solving temperature system... 21 iterations.
Solving Stokes system... 30+14 iterations.
Postprocessing:
RMS, max velocity: 0.0434 m/year, 0.162 m/year
Temperature min/avg/max: 973 K, 1409 K, 1973 K
Heat fluxes through boundary parts: -7.917e+12 W, 3.016e+12 W
Writing depth average output/depth_average.vtu
Number of active cells: 74,324 (on 6 levels)
Number of degrees of freedom: 3,053,696 (2,213,988+101,712+737,996)
*** Timestep 16: t=8.35066e+06 years
Solving temperature system... 19 iterations.
Rebuilding Stokes preconditioner...
Solving Stokes system... 30+19 iterations.
Postprocessing:
Writing graphical output: output/solution-00008
RMS, max velocity: 0.00439 m/year, 0.0298 m/year
Temperature min/avg/max: 973 K, 1408 K, 1973 K
Heat fluxes through boundary parts: -7.68e+12 W, 2.922e+12 W
*** Timestep 17: t=1.00836e+07 years
Solving temperature system...
An error occurred in line <828> of file </u/bangerth/p/deal.II/3/install-bottom/include/deal.II/lac/solver_gmres.h> in function
void dealii::SolverGMRES::solve(const MATRIX&, VECTOR&, const VECTOR&, const PRECONDITIONER&) [with MATRIX = dealii::TrilinosWrappers::SparseMatrix, PRECONDITIONER = dealii::TrilinosWrappers::PreconditionILU, VECTOR = dealii::TrilinosWrappers::MPI::Vector]
The violated condition was:
false
The name and call sequence of the exception was:
SolverControl::NoConvergence (this->control().last_step(), this->control().last_value())
Additional Information:
Iterative method reported convergence failure in step 737996 with residual 3.30962e+26
Does anyone else see this? I'm running with
mpirun -np 32 aspect ../Aspect/cookbooks/shell_simple_3d.prm
The text was updated successfully, but these errors were encountered: