Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion `i*j < _val.size()' failed when coupling displaced and undisplaced variables with nonzero BC #9659

Closed
tophmatthews opened this issue Aug 14, 2017 · 29 comments · Fixed by #14311
Assignees
Labels
C: Framework P: normal A defect affecting operation with a low possibility of significantly affects. T: defect An anomaly, which is anything that deviates from expectations.

Comments

@tophmatthews
Copy link
Contributor

tophmatthews commented Aug 14, 2017

Description of the enhancement or error report

When running a tensor mechanics problem with debug, the following assertion fails:

Output
Time Step  0, time = 0
                dt = 0

Time Step  1, time = 1
                dt = 1
 0 Nonlinear |R| = 1.414214e-03
Assertion `i*j < _val.size()' failed.
i*j = 0
_val.size() = 0


sh: line 1: 56583 Trace/BPT trap: 5       gdb -p 56581 -batch -ex bt -ex detach 2> /dev/null > temp_print_trace.njF0L8
Stack frames: 32
0: 0   libmesh_dbg.0.dylib                 0x000000010e4d93af libMesh::print_trace(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) + 2287
1: 1   libmesh_dbg.0.dylib                 0x000000010e4c81ca libMesh::MacroFunctions::report_error(char const*, int, char const*, char const*) + 634
2: 2   libsolid_mechanics-dbg.0.dylib      0x0000000109fab9f8 libMesh::DenseMatrix<double>::operator()(unsigned int, unsigned int) + 1752
3: 3   libmoose-dbg.0.dylib                0x000000010c84b3fa Kernel::computeOffDiagJacobian(unsigned int) + 426
4: 4   libtensor_mechanics-dbg.0.dylib     0x000000010b07c3d3 StressDivergenceTensors::computeOffDiagJacobian(unsigned int) + 259
5: 5   libmoose-dbg.0.dylib                0x000000010c00d86a ComputeFullJacobianThread::computeJacobian() + 1434
6: 6   libmoose-dbg.0.dylib                0x000000010c024b4b ComputeJacobianThread::onElement(libMesh::Elem const*) + 347
7: 7   libmoose-dbg.0.dylib                0x000000010bec9889 ThreadedElementLoopBase<libMesh::StoredRange<libMesh::MeshBase::const_element_iterator, libMesh::Elem const*> >::operator()(libMesh::StoredRange<libMesh::MeshBase::const_element_iterator, libMesh::Elem const*> const&, bool) + 665
8: 8   libmoose-dbg.0.dylib                0x000000010c552f3b void libMesh::Threads::parallel_reduce<libMesh::StoredRange<libMesh::MeshBase::const_element_iterator, libMesh::Elem const*>, ComputeFullJacobianThread>(libMesh::StoredRange<libMesh::MeshBase::const_element_iterator, libMesh::Elem const*> const&, ComputeFullJacobianThread&) + 155
9: 9   libmoose-dbg.0.dylib                0x000000010c54fca5 NonlinearSystemBase::computeJacobianInternal(libMesh::SparseMatrix<double>&, Moose::KernelType) + 1813
10: 10  libmoose-dbg.0.dylib                0x000000010c55498a NonlinearSystemBase::computeJacobian(libMesh::SparseMatrix<double>&, Moose::KernelType) + 106
11: 11  libmoose-dbg.0.dylib                0x000000010c0dd745 FEProblemBase::computeJacobian(libMesh::NumericVector<double> const&, libMesh::SparseMatrix<double>&, Moose::KernelType) + 1717
12: 12  libmoose-dbg.0.dylib                0x000000010c0dd07e FEProblemBase::computeJacobian(libMesh::NonlinearImplicitSystem&, libMesh::NumericVector<double> const&, libMesh::SparseMatrix<double>&) + 78
13: 13  libmoose-dbg.0.dylib                0x000000010c521c0b Moose::compute_jacobian(libMesh::NumericVector<double> const&, libMesh::SparseMatrix<double>&, libMesh::NonlinearImplicitSystem&) + 299
14: 14  libmesh_dbg.0.dylib                 0x000000010faa9eb3 __libmesh_petsc_snes_jacobian + 6531
15: 15  libpetsc.3.7.dylib                  0x00000001150bca7e SNESComputeJacobian + 926
16: 16  libpetsc.3.7.dylib                  0x00000001150f00cc SNESSolve_NEWTONLS + 1356
17: 17  libpetsc.3.7.dylib                  0x00000001150c118a SNESSolve + 858
18: 18  libmesh_dbg.0.dylib                 0x000000010faa50bc libMesh::PetscNonlinearSolver<double>::solve(libMesh::SparseMatrix<double>&, libMesh::NumericVector<double>&, libMesh::NumericVector<double>&, double, unsigned int) + 2652
19: 19  libmesh_dbg.0.dylib                 0x000000010fbc10b0 libMesh::NonlinearImplicitSystem::solve() + 1264
20: 20  libmoose-dbg.0.dylib                0x000000010cdae87e TimeIntegrator::solve() + 46
21: 21  libmoose-dbg.0.dylib                0x000000010c52391c NonlinearSystem::solve() + 956
22: 22  libmoose-dbg.0.dylib                0x000000010c0d7f28 FEProblemBase::solve() + 136
23: 23  libmoose-dbg.0.dylib                0x000000010cdcaec4 TimeStepper::step() + 36
24: 24  libmoose-dbg.0.dylib                0x000000010c722f92 Transient::solveStep(double) + 962
25: 25  libmoose-dbg.0.dylib                0x000000010c722b8d Transient::takeStep(double) + 253
26: 26  libmoose-dbg.0.dylib                0x000000010c722823 Transient::execute() + 211
27: 27  libmoose-dbg.0.dylib                0x000000010c49e19a MooseApp::executeExecutioner() + 170
28: 28  libmoose-dbg.0.dylib                0x000000010c4a085d MooseApp::run() + 157
29: 29  bison-dbg                           0x0000000108fd6150 main + 592
30: 30  libdyld.dylib                       0x00007fff81fe65ad start + 1
31: 31  ???                                 0x0000000000000003 0x0 + 3
[0] /Users/topher/projects/moose/scripts/../libmesh/installed/include/libmesh/dense_matrix.h, line 837, compiled nodate at notime
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor

It seems to happen when the DenseMatrix is formulated for the off-diagonal jacobian, and only when there is a non-zero BC condition preset. It also requires an additional variable beyond just the displacement variables, and for strain = FINITE

Rationale for the enhancement or information for reproducing the error

The following input file can be used to reproduce the error with tensor-mechanics-dbg.

Input File
[GlobalParams]
  displacements = 'disp_x disp_y'
[]

[Mesh]
  type = GeneratedMesh
  dim = 2
[]

[Variables]
  [./temp]
  [../]
[]

[Modules]
  [./TensorMechanics]
    [./Master]
      [./all]
        add_variables = true
        strain = FINITE
      [../]
    [../]
  [../]
[]

[Kernels]
  [./dt]
    type = TimeDerivative
    variable = temp
  [../]
[]

[BCs]
  [./no_x]
    type = DirichletBC
    variable = disp_x
    boundary = left
    value = 1.0e-3
  [../]
[]


[Materials]
  [./stress]
    type = ComputeFiniteStrainElasticStress
  [../]
  [./elasticity_tensor]
    type = ComputeIsotropicElasticityTensor
    youngs_modulus = 1e11
    poissons_ratio = 0.3
  [../]
[]


[Preconditioning]
  [./SMP]
    type = SMP
    full = true
  [../]
[]

[Executioner]
  type = Transient
[]

Identified impact

This will allow more complicated input files to be debugged.

@permcody permcody added C: Modules P: normal A defect affecting operation with a low possibility of significantly affects. T: defect An anomaly, which is anything that deviates from expectations. labels Aug 15, 2017
@dschwen
Copy link
Member

dschwen commented Aug 16, 2017

I'm looking at this right now. Notes of what I found so far:

  • This occurs in TM_all0 which acts on disp_x
  • The ke off diagonal Jacobian block for the temp variable is empty
  • Removing temp or setting SMP full = false makes the error go away
  • Switching to strain = SMALL and ComputeLinearElasticStress make the error go away
  • putting temperature = temp into the TM action block makes the error go away

Looks like an off diagonal jacobian is computed "unexpectedly". I.e. during setup MOOSE does not know the variables are coupled, but it computes the odj anyways...

@dschwen
Copy link
Member

dschwen commented Aug 16, 2017

This is not a bug in the TensorMechanics action. If I unfold the action using the extremely useful DumpProblem object (#8876)... @permcody, I get

Input File
[GlobalParams]
  displacements = 'disp_x disp_y'
[]

[Mesh]
  type = GeneratedMesh
  dim = 2
[]

[Variables]
  [./temp]
  [../]
  [./disp_x]
  [../]
  [./disp_y]
  [../]
[]

[Kernels]
  [./TM_all0]
    type = StressDivergenceTensors
    component = 0
    use_displaced_mesh = true
    variable = disp_x
  [../]
  [./TM_all1]
    type = StressDivergenceTensors
    component = 1
    use_displaced_mesh = true
    variable = disp_y
  [../]
  [./dt]
    type = TimeDerivative
    variable = temp
  [../]
[]

[BCs]
  [./no_x]
    type = DirichletBC
    variable = disp_x
    boundary = left
    value = 1.0e-3
  [../]
[]


[Materials]
  [./stress]
    type = ComputeFiniteStrainElasticStress
  [../]
  [./elasticity_tensor]
    type = ComputeIsotropicElasticityTensor
    youngs_modulus = 1e11
    poissons_ratio = 0.3
  [../]
  [./all_strain]
    type = ComputeFiniteStrain
  [../]
[]


[Preconditioning]
  [./SMP]
    type = SMP
    full = true
  [../]
[]

[Executioner]
  type = Transient
[]

Which fails with the same error

@dschwen
Copy link
Member

dschwen commented Aug 16, 2017

The issue seems coupling a variable that has kernels only on the displaced mesh to a variable that has kernels only on the undisplaced mesh.

Minimal example without tensor mechanics:

[GlobalParams]
  displacements = 'u'
[]

[Mesh]
  type = GeneratedMesh
  dim = 1
[]

[Variables]
  [./u]
  [../]
  [./v]
  [../]
[]

[Kernels]
  [./u]
    type = Diffusion
    use_displaced_mesh = true
    variable = u
  [../]
  [./v]
    type = Diffusion
    use_displaced_mesh = false
    variable = v
  [../]
[]

[BCs]
  [./no_x]
    type = DirichletBC
    variable = u
    boundary = left
    value = 1.0e-3
  [../]
[]

[Preconditioning]
  [./SMP]
    type = SMP
    full = true
  [../]
[]

[Executioner]
  type = Transient
[]

@dschwen dschwen changed the title Assertion `i*j < _val.size()' failed in tensor mechanics with nonzero BC Assertion `i*j < _val.size()' failed when coupling displaced and undisplaced variables with nonzero BC Aug 16, 2017
@tophmatthews
Copy link
Contributor Author

This came up again. Should this really be a restriction? ping @dschwen @permcody

@dschwen
Copy link
Member

dschwen commented Mar 17, 2018

Yeah, bummer, this issue just got forgotten. Let me 🏓 @andrsd, who should know all about the allocation of the Jacobian blocks. Maybe he has some fresh ideas.

@andrsd
Copy link
Contributor

andrsd commented Mar 19, 2018

So, you guys have a PDE where volumetric terms are on displaced mesh and its BCs are on undisplaced? Or is this just a simplification of something more complex?

BTW: just by looking at the input file, this should work. I suspect where the problem might be...

@dschwen
Copy link
Member

dschwen commented Mar 19, 2018

@andrsd, I don't exactly remember, but I thought this required coupling the u and v (where u is in the displaced and v is on the undisplaced mesh), and then adding a non zero BC (no idea if the BC is on the displaced or undisplaced mesh though).

@andrsd
Copy link
Contributor

andrsd commented Mar 19, 2018

The minimal example that breaks has variable v on non-displaced mesh. It is not coupled anywhere as far as I see.

I suspect the real problem is that a kernel is on a displaced mesh - those would be q-point values - plus there is a nodal BC - those would be nodal values - on a non-displaced mesh. And we do not flip the right flag to compute one of those... It is just my working theory for now...

@dschwen
Copy link
Member

dschwen commented Mar 19, 2018

D'oh, you're right, there is no coupling. I guess you just need variables on both meshes.

@tophmatthews
Copy link
Contributor Author

Cool, looks like you guys are tracking it down? I just don't want this to get lost again :)

@andrsd
Copy link
Contributor

andrsd commented Mar 23, 2018

So, here is the problem (mostly for me so that I do not forget):

  1. SMP full=true has to be active, so that we are computing all off-diagonal blocks
  2. then we resize jacobian blocks in Assembly like so:
     jacobianBlock(vi, vj).resize(ivar.dofIndices().size(), jvar.dofIndices().size());
    
    Now, u is on displaced mesh and v is not. So in the non-displaced instance for variable v, we resize the block to (2, 0).
  3. in Kernel::computeOffDiagJacobian for v variable we get:
    _test.size() == 2
    _phi.size() == 2
    
    And then we go into the previously sized block that has incorrect dimensions.

@andrsd andrsd self-assigned this Mar 29, 2018
@tophmatthews
Copy link
Contributor Author

Looks like this is close, I'll be glad to see this resolved after all this time!

@tophmatthews
Copy link
Contributor Author

I guess my last comment was confusing for some...this isn't closed, but sounds close since @andrsd has an idea on what the problem is

@tophmatthews
Copy link
Contributor Author

@andrsd @dschwen Okay, it looks like this was solved by something else! Maybe the simplified input file given here should be implemented as a test to ensure it doesn't get broken in the future?

@tophmatthews
Copy link
Contributor Author

Looks like this has be solved by something else

@tophmatthews
Copy link
Contributor Author

tophmatthews commented May 31, 2018

UGGG!!!! This problem still exists. Here is another condensed input file that reproduces the error. As far as I can tell, no line can be removed without clearing the error. This is causing problems with some large BISON runs I'm trying to run through debug and is hindering development. Any ideas @dschwen @andrsd ?

Output
[GlobalParams]
  displacements = 'disp_x disp_y'
  order = SECOND
[]

[Mesh]
  type = GeneratedMesh
  dim = 2
  nx = 10
  ny = 10
  second_order = 2
[]

[Variables]
  [./temp]
    initial_condition = 298
  [../]
  [./diff]
  [../]
[]

[Modules/TensorMechanics/Master]
  [./tm]
    add_variables = true
  [../]
[]

[Kernels]
  [./temp_dt]
    type = TimeDerivative
    variable = temp
  [../]
  [./diff_dt]
    type = TimeDerivative
    variable = diff
  [../]
[]

[ThermalContact]
  [./thermal_contact]
    type = GapHeatTransfer
    variable = temp
    master = left
    slave = right
    quadrature = true
  [../]
[]

[Materials]
  [./clad_elasticity_tensor]
    type = ComputeIsotropicElasticityTensor
    youngs_modulus = 1
    poissons_ratio = 0.3
  [../]
  [./clad_stress]
    type = ComputeLinearElasticStress
  [../]
[]


[Preconditioning]
  [./SMP]
    type = SMP
    full = true
  [../]
[]

[Executioner]
  type = Transient
[]

@tophmatthews tophmatthews reopened this May 31, 2018
@tophmatthews
Copy link
Contributor Author

This latest input has to be run with combined-dbg

@naveen-pr
Copy link

Any new thoughts on this issue ? I've been trying to run some test cases with the peridynamics module and running into the same error in devel mode

@tophmatthews
Copy link
Contributor Author

ping @dschwen @andrsd ???

@dschwen
Copy link
Member

dschwen commented Nov 6, 2018

I ran the input with the latest moose/libmes combined-dbg and it errors out with

Solve failed and timestep already at or below dtmin, cannot continue!

but not with the assertion you mentioned.

Argh, never mind, the one in your comment from May 31st still triggers the bug.

@dschwen
Copy link
Member

dschwen commented Nov 6, 2018

Ok, some preliminary observation:

TaggingInterface::prepareMatrixTag sets up the _local_ke matrix size. For the temp/diff coupling case only this sets up the matrix as 9x0 a few times. The vast majority of calls sets it up to the correct 9x9 size. Scaling with mesh size indicates that only the size in the IntegratedBC is screwed up (the kernels get the correct _local_ke). It is always the matrix dimension corresponding to the diff variable that is messed up.

This can be traced further up to Assembly::prepareJacobianBlock, where dofIndices().size() for that variable occasionally is 0.

The problem also seems to vanish when only using first order elements (just setting order 1 on temp or diff alone does not make the problem go away).

@tophmatthews
Copy link
Contributor Author

@dschwen @andrsd This popped up again and is slowing development...any way we can get this through?! What can I do to help?

@dschwen
Copy link
Member

dschwen commented Apr 23, 2019

I'm on travel this week

@tophmatthews
Copy link
Contributor Author

Whew...I think this was fixed along the way...

@permcody
Copy link
Member

Really? It's not referenced. I hope we have a test case, or if not adding one now would be great!

@tophmatthews
Copy link
Contributor Author

There are three different inputs that should prove it in this issue

@tophmatthews
Copy link
Contributor Author

Nope! Still a problem!!!!! I closed this prematurely somehow.

I keep coming around to this when I have to use dbg on thermo-mechanical problems. Once I get pulled away to something else and I stop pushing this, I forget, @dschwen forgets, @andrsd forgets, everyone else forgets, and it gets consistently lost.

This was so close at some point with @andrsd identifying the problem, but this never being solved.

I'll open a PR with a test that clearly fails so this hopefully gets more visibility, and this damn thing gets closed finally!

@permcody
Copy link
Member

permcody commented Nov 5, 2019

Gonna rename this to "Topher's bug" - Seems like it's the bane of your existence.

@tophmatthews
Copy link
Contributor Author

Glad someone could sense my frustration... :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Framework P: normal A defect affecting operation with a low possibility of significantly affects. T: defect An anomaly, which is anything that deviates from expectations.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants