Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel low-order refined classes for spaces of vectors #3167

Closed
karthichockalingam opened this issue Aug 23, 2022 · 8 comments · Fixed by #3177
Closed

Parallel low-order refined classes for spaces of vectors #3167

karthichockalingam opened this issue Aug 23, 2022 · 8 comments · Fixed by #3177

Comments

@karthichockalingam
Copy link

karthichockalingam commented Aug 23, 2022

Hello,

I used the below fix

#3153

to run problems for linear elasticity (ex2.cpp) preconditioned. When running LOR serially - it works and converges.

In parallel the LOR classes seem (type) limited to Hypre solvers. The LORSolver<HypreBoomerAMG> compiles and runs, whereas LORSolver<GSSmoother> at runtime errors out with the following message

Assembling: r.h.s. ... matrix ...
SparseSmoother::SetOperator : not a SparseMatrix!.

But unfortunately the elasticity problem with LORSolver<HypreBoomerAMG> severely underperforms, to the point not passing a preconditioner is better

Can you help sort if LOR works in parallel for linear elasticity problems?

Thank you!

@pazner
Copy link
Member

pazner commented Aug 23, 2022

Hi @karthichockalingam,

The LOR classes are not limited to Hypre solvers, you can use any solver that works with a SparseMatrix in serial, or a HypreParMatrix in parallel.

The problem you are running into is that in parallel, the LOR discretization will give a HypreParMatrix and not a SparseMatrix, while GSSmoother is only intended for serial problems using SparseMatrix. If you want a parallel Gauss-Seidel, you can use HypreSmoother, e.g.

LORSolver<HypreSmoother> gs(a, ess_dofs);
gs.GetSolver().SetType(HypreSmoother::GS);

Note that Gauss-Seidel is not usually very suitable for parallel runs, so you might want to use a different HypreSmoother::Type.


To address your second question, I don't know immediately if the LOR solvers will work well for elasticity problems. LOR is known to work well for many elliptic problems, but it depends on the PDE you are solving. However, if the LOR solvers work properly in serial, then in parallel it should also be OK. Maybe you can post the source code so we can reproduce locally.

@pazner pazner self-assigned this Aug 23, 2022
@tzanio tzanio self-assigned this Aug 23, 2022
@karthichockalingam
Copy link
Author

Thank you for the prompt response. Yes, I was able to use the HypreSmoother options.
But I still find that the parallel version takes significantly more iterations compared to the serial version.
As I mentioned before, an unpreconditioned matrix still converges with signficantly fewer iterations.

Here is the source code if you would like to have a look. :-)

@pazner
Copy link
Member

pazner commented Aug 29, 2022

I think it's because the LOR class wasn't matching the ordering of the vdim, and in parallel it was using a different ordering. Does #3177 fix it for you?

@karthichockalingam
Copy link
Author

karthichockalingam commented Aug 31, 2022

I tried the fix. The LOR preconditioning in parallel performs better than without providing one for the most part. However, I am faced with an issue when the problem is ran with polynomial order 4 or higher. It exits with the following error message:

MFEM abort: (r,c,f) = (421,427,973)
... in function: int mfem::STable3D::operator()(int, int, int) const
... in file: general/stable3d.cpp:112

Abort(1) on node 1 (rank 1 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1

I was able to reproduce the same error both on my local machine and on the cluster.

Here is the source code and the mesh file used to run the problem using the command mpirun -np 2 ./ex2p_lor -m tagged_ted_beam.msh -o 4

@pazner
Copy link
Member

pazner commented Aug 31, 2022

Your code seems to work for me with order 4 and the given mesh. Maybe try pulling from master, make clean, rebuilding in debug mode, etc. and if it still fails then post the backtrace.

It looks like your mesh is a tet mesh. LOR preconditioners will work much better with all-hex meshes (they will not really help at all with tet meshes).

@karthichockalingam
Copy link
Author

karthichockalingam commented Sep 1, 2022

I pulled fresh from the master and built mfem in debug mode.
I am faced with the same issue and found that it is an assertion failure

Assertion failed: (l_edge >= 0) is false:
 --> invalid shared edge
 ... in function: void mfem::ParMesh::FinalizeParTopo()
 ... in file: mesh/pmesh.cpp:899
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 1.

The program only fails under the following conditions:
(i) While running in parallel using two or more cores. It runs fine on a single core.
(ii) Only when using LOR preconditioner. It runs fine without any preconditioner - meaning no assertion failure.
(iii) I ran into this problem when using order 4 or higher.

For some reason I was not able to get the backtrace from the code while running on multiple cores. Below is the backtrace from running on a singe core (where there is no assertion failure so I just set a breakpoint at FinalizeParTopo).

    frame 0 0x0000000100266755 ex2p_lor_tet_beam mfem::ParMesh::FinalizeParTopothis0x0000000109025200 at pmeshcpp89023
    frame 1 0x000000010026b1db ex2p_lor_tet_beam mfem::ParMeshMake::Refined_this0x0000000109025200 orig_mesh0x000000010484b000 ref_factor4 ref_type1 at pmeshcpp135719
    frame 2 0x000000010026b3c3 ex2p_lor_tet_beam mfem::ParMesh::MakeRefinedorig_mesh0x000000010484b000 ref_factor4 ref_type1 at pmeshcpp137021
    frame 3 0x00000001006f903e ex2p_lor_tet_beam mfem::ParLORDiscretizationForm::LORSpacethis0x0000000103404d80 at lorcpp51279
    frame 4 0x00000001006f7cb7 ex2p_lor_tet_beam mfem::LORBase::GetFESpacethis0x0000000103404d80 const at lorcpp36263
    frame 5 0x00000001006f7e36 ex2p_lor_tet_beam mfem::LORBase::LegacyAssembleSystemthis0x0000000103404d80 a_ho0x0000000102d116e0 ess_dofs0x00007ff7bfeff1c0 at lorcpp39544
    frame 6 0x00000001006f7dd1 ex2p_lor_tet_beam mfem::LORBase::AssembleSystemthis0x0000000103404d80 a_ho0x0000000102d116e0 ess_dofs0x00007ff7bfeff1c0 at lorcpp38227
    frame 7 0x00000001006f8cf3 ex2p_lor_tet_beam mfem::ParLORDiscretization::ParLORDiscretization this0x0000000103404d80 a_ho_0x0000000102d116e0 ess_tdof_list0x00007ff7bfeff1c0 ref_type_1 at lorcpp49118
    frame 8 0x000000010000526d ex2p_lor_tet_beam mfem::LORSolvermfem::HypreBoomerAMGLORSolver (this0x00007ff7bfefed90 a_ho0x0000000102d116e0 ess_tdof_list0x00007ff7bfeff1c0 ref_type1 at lorhpp22513
    frame 9 0x000000010000439a ex2p_lor_tet_beam main(argc5 argv0x00007ff7bfeff4f0) at ex2p_lor_tet_beamcpp27551
    frame 10 0x00000001013dd52e dyldstart  462

Thank you for letting me know that LOR preconditioner is not best suited for tet meshes

@tzanio
Copy link
Member

tzanio commented Sep 1, 2022

@karthichockalingam -- I edited your post above for clarity, can you please check it?

@karthichockalingam
Copy link
Author

Thank you @tzanio - it reads fine. I wasn't exactly sure how to format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants