New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test problem for HYPRE_MIXEDINT #326
Comments
Assuming you configured with –enable-mixed-int, this is the correct way to run it.
Which machine are you running it on? Did this problem work for you running it with 64-bit integers?
From: Julian Andrej ***@***.***>
Sent: Monday, April 12, 2021 7:59 AM
To: hypre-space/hypre ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [hypre-space/hypre] Test problem for HYPRE_MIXEDINT (#326)
In the process of getting mfem to work with the HYPRE_MIXEDINT option (see mfem/mfem#1583<https://urldefense.us/v3/__https:/github.com/mfem/mfem/pull/1583__;!!G2kpM7uM-TzIFchu!iva8P8RzxV3HGbky0RX27ZSgSiXRAnsNCGMnm8cOyKjvtzCs4dpsRAjD7z9nn25j$>) we are running into issues.
I tried to run the current hypre version (recent git master) using the ij test executable with
$ srun -n1728 -ppbatch -A *** ./test/ij -P 12 12 12 -n 1400 1400 1400
to run a large enough test. This fails with a memory
ij: hypre_memory.c:34: hypre_OutOfMemory: Assertion `0' failed.
[***:mpi_rank_341][error_sighandler] Caught error: Aborted (signal 6)
Am I using the option wrong?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<https://urldefense.us/v3/__https:/github.com/hypre-space/hypre/issues/326__;!!G2kpM7uM-TzIFchu!iva8P8RzxV3HGbky0RX27ZSgSiXRAnsNCGMnm8cOyKjvtzCs4dpsRAjD7-Q7Olp1$>, or unsubscribe<https://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AD4NLLKVMQELOBOOYWGSYWTTIMDEXANCNFSM42ZO4B6Q__;!!G2kpM7uM-TzIFchu!iva8P8RzxV3HGbky0RX27ZSgSiXRAnsNCGMnm8cOyKjvtzCs4dpsRAjD7yxSTfNt$>.
|
Yes I configured with --enable-mixed-int. I ran the test on quartz. I did not try to run with the bigint option only, but I can do that if that helps. |
It’s possible that you just run out of memory, since this a very large problem and if the 64-bit integer works, there really is an issue with the mixed-int version. Another thing you could try with the mixed-int version, which would use less memory is to add -agg_nl 1 to your command line for an AMG version with lower complexity and memory requirements.
From: Julian Andrej ***@***.***>
Sent: Monday, April 12, 2021 8:26 AM
To: hypre-space/hypre ***@***.***>
Cc: Yang, Ulrike Meier ***@***.***>; Comment ***@***.***>
Subject: Re: [hypre-space/hypre] Test problem for HYPRE_MIXEDINT (#326)
Yes I configured with --enable-mixed-int. I ran the test on quartz. I did not try to run with the bigint option only, but I can do that if that helps.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<https://urldefense.us/v3/__https:/github.com/hypre-space/hypre/issues/326*issuecomment-817904268__;Iw!!G2kpM7uM-TzIFchu!iHVvh0X0QJBhoRatVyvECuHScUmaG3L85ANu1pM12WfatcJ4DKcv4aJCV3RDiHNY$>, or unsubscribe<https://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AD4NLLJT3B3GPPYMOUUJTVLTIMGHPANCNFSM42ZO4B6Q__;!!G2kpM7uM-TzIFchu!iHVvh0X0QJBhoRatVyvECuHScUmaG3L85ANu1pM12WfatcJ4DKcv4aJCV8LEBN4F$>.
|
Using your suggested options
works fine. Thanks! |
I tried another option since in mfem a simple Laplace problem works fine with mixed-int. Elasticity with the systems option fails. When I run
the example segfaults (without further information about errors etc.) Is the combination of |
This problem is now twice as big as the previous one, so it is possible you ran out of memory.
You also solve this as a scalar problem but using a nodal interpolation (however that should not be the problem).
However, we haven’t really tested this interpolation for mixed-int, so there could be an issue.
Can you try to rerun setting -n 1200 1200 600?
From: Julian Andrej ***@***.***>
Sent: Monday, April 12, 2021 10:30 AM
To: hypre-space/hypre ***@***.***>
Cc: Yang, Ulrike Meier ***@***.***>; Comment ***@***.***>
Subject: Re: [hypre-space/hypre] Test problem for HYPRE_MIXEDINT (#326)
I tried another option since in mfem a simple Laplace problem works fine with mixed-int. Elasticity with the systems option fails.
When I run
$ srun -n1728 -ppbatch -A ceed ./test/ij -P 12 12 12 -n 1200 1200 1200 -sysL 2 -agg_nl 1 -intptype 10
the example segfaults (without further information about errors etc.)
Is the combination of -agg_nl 1 -intptype 10 supposed to work on -sysL 2? I expect this to produce a 7pt stencil with 2 equations.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<https://urldefense.us/v3/__https:/github.com/hypre-space/hypre/issues/326*issuecomment-817992045__;Iw!!G2kpM7uM-TzIFchu!m3GWs2Q4L8IW4b5zto7GaxTkqTJMxkYcCZnSUJK1R1wuis2l5A5hNn_JH_PXAHCV$>, or unsubscribe<https://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AD4NLLMCMBABNDM2IXOKKTDTIMUYDANCNFSM42ZO4B6Q__;!!G2kpM7uM-TzIFchu!m3GWs2Q4L8IW4b5zto7GaxTkqTJMxkYcCZnSUJK1R1wuis2l5A5hNn_JHys-WYWt$>.
|
The test also fails with a much smaller allocation
so I don't suspect running OOM here. |
Probably not, but this is also a problem that is much smaller and can be solved with 32bits only, so I don’t expect this to be a mixed-int problem.
You generally would not run this as you have. Can you add -nf 2 -nodal 1 to this and see what happens?
Thanks
From: Julian Andrej ***@***.***>
Sent: Monday, April 12, 2021 11:04 AM
To: hypre-space/hypre ***@***.***>
Cc: Yang, Ulrike Meier ***@***.***>; Comment ***@***.***>
Subject: Re: [hypre-space/hypre] Test problem for HYPRE_MIXEDINT (#326)
The test also fails with a much smaller allocation
$ srun -n1728 -ppbatch -A ceed ./test/ij -P 12 12 12 -n 700 700 700 -sysL 2 -agg_nl 1 -interptype 10
so I don't suspect running OOM here.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<https://urldefense.us/v3/__https:/github.com/hypre-space/hypre/issues/326*issuecomment-818014580__;Iw!!G2kpM7uM-TzIFchu!kBa1g5esB6Hvdo6DP6TKLh96SBbPAgDoeGfdeLvPjLj7cDWbVFgz6aZ0mizNl8RA$>, or unsubscribe<https://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AD4NLLP3CLZJGY7YK4IBHLTTIMYZDANCNFSM42ZO4B6Q__;!!G2kpM7uM-TzIFchu!kBa1g5esB6Hvdo6DP6TKLh96SBbPAgDoeGfdeLvPjLj7cDWbVFgz6aZ0ml2G4QzE$>.
|
Running
segfaults without further information |
I just realize that it doesn’t make sense to combine interptype 10 with aggressive coarsening. It also fails on a small problem with 2 processes.
Obviously, this should not just segfault, so we have to do something about that. For now remove -agg_nl 1.
From: Julian Andrej ***@***.***>
Sent: Monday, April 12, 2021 11:25 AM
To: hypre-space/hypre ***@***.***>
Cc: Yang, Ulrike Meier ***@***.***>; Comment ***@***.***>
Subject: Re: [hypre-space/hypre] Test problem for HYPRE_MIXEDINT (#326)
Running
$ srun -n1728 -ppbatch -A ceed ./test/ij -P 12 12 12 -n 700 700 700 -sysL 2 -agg_nl 1 -interptype 10 -nf 2 -nodal 1
segfaults without further information
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<https://urldefense.us/v3/__https:/github.com/hypre-space/hypre/issues/326*issuecomment-818029432__;Iw!!G2kpM7uM-TzIFchu!hzzs1ga18ybSqRNxsyC2yt8JY3VckiWTwho6kh1mmXKHUKq0cFg8GTEnQvqv4aiK$>, or unsubscribe<https://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AD4NLLN7IVYHCIPZSQ4OFB3TIM3IZANCNFSM42ZO4B6Q__;!!G2kpM7uM-TzIFchu!hzzs1ga18ybSqRNxsyC2yt8JY3VckiWTwho6kh1mmXKHUKq0cFg8GTEnQnGTl-Ij$>.
|
I'm also having problems with mixedint tests, presumably the same or related to the problem reported here. Building 2.21.0 (patch set here) After building the mixedint library and tests, TEST_ams for instance generates this output (corresponding to solvers.out.10 from TEST_ams/solvers.jobs. Segfault also occurs for out.8, 9 and 11. I have 8 processors on this system, if that's relevant)
I'm also building the standard and bigint configurations. bigint does not generate an mpirun segfault. This is with openmpi 4.1.0 and pmix 4.0.0. More distressing than the mpi segfault itself, the error correlates with a complete linux kernel meltdown. For some reason the hard drive bus seems to get detached after the mpi segfault is triggered, causing all filesystems to be dropped to read-only, hence a complete system failure. Since /var is also made read-only, I can't give an exact log of this behaviour. My kernel crash is reproducible in the sense that it is currently happening every time I run mixedint tests, but does not occur with bigint tests. However the precise point at which the filesystem lockup occurs varies, sometimes during TEST_ams. sometimes TEST_ij, or TEST_lobpcg. More often in TEST_lobpcg. With respect to the workaround suggested above, there is no |
In the process of getting mfem to work with the
HYPRE_MIXEDINT
option (see mfem/mfem#1583) we are running into issues.I tried to run the current hypre version (recent git master) using the
ij
test executable withto run a large enough test. This fails with a memory
Am I using the option wrong?
The text was updated successfully, but these errors were encountered: