-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault in ParGridFunction::ExchangeFaceNbrData() when using Nedelec finite elements #1440
Comments
Hello, @hongbo-yao , Best wishes, |
Hi @mlstowell , I tested ex3p and ex22p again and it still produced the segmentation fault. I guess it is not caused by failure of allocating memory since I did not refine the mesh in both cases and the memory of my laptop is 16GB (Number of edge unknows: 125872). Is that possible caused by METIS? My version is metis-5.1.0 and I will test lower version later. I will change my test computer when I work at school, but now I have to wait for some time during this period. And I will think more and find out what exactly happened. Thanks, |
Not METIS's fault, 4.0.3 still produces segmentation fault. |
Do you get the segfault on 1 mpi rank? If so, you can run this in a debugger and when the segfault happens, the debugger will stop and let you examine the function call stack, e.g. in |
Hi, 1 process passed the test, more than 1 processs failed |
OK, you can still run the debugger with two rank but it is a little trickier. Try running this command: mpirun -np 2 xterm -e lldb -- ./ex1p replacing |
Hi @v-dobrev , This may be caused by the not-installed of debugger tool, I will install and test and I will let you know when I make progress. But I found a confused phenomenon:metis-4.0.3 passed with 1,2,3 processs but failed with 4 processs. metis-5.1.0 passed with 1,2 processs but failed with 3, 4 processs. However, they all failed before with more than one process. a little random... And it is also affected by mesh, In my sphere2.geo file, if reduce computational domain and regenerate .msh, it can passed on 4 ranks. Best, |
I think The debugger, The random behavior may mean memory is used after free or something like that. If that is the case, |
Hi @v-dobrev , before showing the outputs, here are some changed codes in ex22p if(myid==0) Following this advice, with command -------proc1 It seems that xterm and lldb cannot work together with mpirun since these errors also happened to ex1p Thanks, |
This is strange, I had no issues running What MPI are you using? Is it MPICH, OpenMPI, or something else? On my Mac I use OpenMPI v2.1.6. |
Hi @v-dobrev , It finally can run, the following are outputs of command: (lldb) target create "./ex22p"
Please take a look, |
It looks like you are running an optimized build without debug information. Can you rebuild mfem in debug mode and re-run? For example, you can use |
Hi @v-dobrev, It seems that we found it, (lldb) target create "./ex22p"
|
I see what the issue is: However, I'm not sure you really need to call |
Hi @v-dobrev , What I want to do is to compute the face jumps of the electric fields in parallel (see #1417 ), the face jumps are a part of the widely used residual based error estimator (volume residual + face jumps, MFEM only supports ZZ error estimator) for Maxwell's equations, so I think I really need to call ExchangeFaceNbrData() in ex3p or ex22p. Are there any other ways to realize this goal? Without calling ExchangeFaceNbrData(), I can still compute the face jumps except the MPI shared faces, but I think it is better to consider the MPI shared faces. Thanks! |
OK, I see. This makes sense. I think the fix in Line 235 in 7690eca
with const int ldof = d_send_ldof[i];
d_send_data[i] = d_data[ldof >= 0 ? ldof : -1-ldof]; Try it out and check to see if you get the same result on different numbers of processors. If this is the right fix, we can create a branch to merge it into |
Thanks, @v-dobrev, Both ex3p (H(curl)) and ex22p (H(curl) and H(div)) passed with 1-4 ranks for both optimized and debug builds. But it is better do more tests before merging. Finally, sincere thanks for your help on this issue! Hongbo |
Issue reported by: @hongbo-yao (#1440)
Hi,
I wanna to report one segmentation fault happened when calling ParGridFunction::ExchangeFaceNbrData() in ex3p and ex22p.
sphere2.msh is my mesh file. In ex22p, firstly, I set
const char *mesh_file = "sphere2.msh"; int ser_ref_levels = 0; int par_ref_levels = 0;
and call
u.real().ExchangeFaceNbrData();
after step13:a->RecoverFEMSolution(U, b, u);
When I run ex22p with command: mpirun -np 4 ./ex22p -p 0 it works.
When I run ex22p with command: mpirun -np 4 ./ex22p -p 1, Segmentation fault happened:
Similar fault happened in ex3p that uses Nedelec elements.
Here are my mesh files(please delete .txt when using):
sphere2.geo.txt
sphere2.msh.txt
Thanks,
Hongbo
The text was updated successfully, but these errors were encountered: