Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault with lbpm_color_simulator on CPU #39

Closed
ahmedsrizk95 opened this issue Aug 10, 2021 · 6 comments
Closed

Segmentation fault with lbpm_color_simulator on CPU #39

ahmedsrizk95 opened this issue Aug 10, 2021 · 6 comments

Comments

@ahmedsrizk95
Copy link

ahmedsrizk95 commented Aug 10, 2021

We have installed lbpm and all the excutables are running well except for lbpm_color_simulator which is giving a segmentation fault:

trying different number of nodes each time it is giving error like this:

> --------------------------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that process rank 6 with PID 755395 on node cn-03-14 exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------

or this:

> [cn-04-23:762490] *** An error occurred in MPI_Isend
> 
> [cn-04-23:762490] *** reported by process [3580887041,21]
> 
> [cn-04-23:762490] *** on communicator MPI COMMUNICATOR 3 DUP FROM 0
> 
> [cn-04-23:762490] *** MPI_ERR_OTHER: known error not in list
> 
> [cn-04-23:762490] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> 
> [cn-04-23:762490] *** and potentially your MPI job)
> 
> [cn-02-52:759507] 4 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
> 
> [cn-02-52:759507] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

or this:

> --------------------------------------------------------------------------
> 
> WARNING: Open MPI failed to TCP connect to a peer MPI process. This
> 
> should not happen.
> 
> 
>  
> 
> Your Open MPI job may now hang or fail.
> 
> 
>  
> 
>  Local host: cn-06-33
> 
>  PID: 785099
> 
>  Message: connect() to 172.18.6.34:1024 failed
> 
>  Error: Resource temporarily unavailable (11)
> 
> --------------------------------------------------------------------------

or this:

> Primary job terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that process rank 3 with PID 761535 on node cn-02-44 exited on signal 6 (Aborted).
@JamesEMcClure
Copy link
Collaborator

Try setting the environment variable export MPI_THREAD_MULTIPLE=1

If that does not work please provide the MPI implementation (e.g. mvapich2 or openmpi) and version so that we can give good advice on how to proceed.

@ahmedsrizk95
Copy link
Author

After setting the environment variable export MPI_THREAD_MULTIPLE=1 it gives this error:

> Program abort called in file '/apps/ku/build/lbpm/lbpm-2020/common/Database.cpp' at line 83:
>    Variable FlowAdaptor was not found in database
> Bytes used = 300266512
> Stack Trace:
>  [7] 0x00000046f56e:  lbpm_color_simulator                                    _start
>    [7] 0x00000046eb40:  lbpm_color_simulator                                      main  lbpm_color_simulator.cpp:112
>      [7] 0x00000054ba0f:  lbpm_color_simulator               ScaLBL_ColorModel::Run(int)  basic_string.h:222
>        [7] 0x0000005aed7f:  lbpm_color_simulator  Database::getDatabase(std::string const&)  shared_ptr.h:510
>          [7] 0x0000005ae2cb:  lbpm_color_simulator     Database::getData(std::string const&)  Database.cpp:83
>            [1] 0x000000556696:  lbpm_color_simulator  StackTrace::Utilities::abort(std::string const&, std::string const&, int)  stl_vector.h:108
>            | [1] 0x00000055cc51:  lbpm_color_simulator                   StackTrace::backtrace()
>            |   [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
>            [6] 0x000000553682:  lbpm_color_simulator  StackTrace::Utilities::terminate(StackTrace::abort_error const&)
>              [6] 0x000000555c45:  lbpm_color_simulator     StackTrace::abort_error::what() const
>                [6] 0x00000055ffad:  lbpm_color_simulator                     getRemoteCallStacks()
>                  [6] 0x147fc0e186bb:             libc.so.6                             __sched_yield
>  [14] 0x14d5ff520dc3:             libc.so.6                                     clone
>    [14] 0x14d6000a014a:       libpthread.so.0                                            pthread_create.c
>      [7] 0x14d5feb58c94:     libopen-pal.so.40                                            epoll.c:409
>      | [7] 0x14d5ff5210f7:             libc.so.6                                epoll_wait
>      |   [7] 0x14d6000aab20:       libpthread.so.0                                            sigaction.c
>      |     [7] 0x00000055a144:  lbpm_color_simulator                                            <artificial>
>      |       [7] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
>      [7] 0x14d5feb181fe:     libopen-pal.so.40                                            opal_progress_threads.c:104
>        [7] 0x14d5feb6553d:     libopen-pal.so.40                                            poll.c:167
>          [7] 0x14d5ff515a41:             libc.so.6                                    __poll
>            [7] 0x14d6000aab20:       libpthread.so.0                                            sigaction.c
>              [7] 0x00000055a144:  lbpm_color_simulator                                            <artificial>
>                [7] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
> Program abort called in file '/apps/ku/build/lbpm/lbpm-2020/common/Database.cpp' at line 83:
>    Variable FlowAdaptor was not found in database
> Bytes used = 305164080
> Stack Trace:
>  [2] 0x00000046f56e:  lbpm_color_simulator                                    _start
>    [2] 0x00000046eb40:  lbpm_color_simulator                                      main  lbpm_color_simulator.cpp:112
>      [2] 0x00000054ba0f:  lbpm_color_simulator               ScaLBL_ColorModel::Run(int)  basic_string.h:222
>        [2] 0x0000005aed7f:  lbpm_color_simulator  Database::getDatabase(std::string const&)  shared_ptr.h:510
>          [2] 0x0000005ae2cb:  lbpm_color_simulator     Database::getData(std::string const&)  Database.cpp:83
>            [1] 0x000000556696:  lbpm_color_simulator  StackTrace::Utilities::abort(std::string const&, std::string const&, int)  stl_vector.h:108
>            | [1] 0x00000055cc51:  lbpm_color_simulator                   StackTrace::backtrace()
>            |   [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
>            [1] 0x000000553682:  lbpm_color_simulator  StackTrace::Utilities::terminate(StackTrace::abort_error const&)
>              [1] 0x000000555c45:  lbpm_color_simulator     StackTrace::abort_error::what() const
>                [1] 0x00000055ffad:  lbpm_color_simulator                     getRemoteCallStacks()
>                  [1] 0x14d5ff5076bb:             libc.so.6                             __sched_yield
>  [4] 0x14b9b7567dc3:             libc.so.6                                     clone
>    [4] 0x14b9b80e714a:       libpthread.so.0                                            pthread_create.c
>      [2] 0x14b9b6b9fc94:     libopen-pal.so.40                                            epoll.c:409
>      | [2] 0x14b9b75680f7:             libc.so.6                                epoll_wait
>      |   [2] 0x14b9b80f1b20:       libpthread.so.0                                            sigaction.c
>      |     [2] 0x00000055a144:  lbpm_color_simulator                                            <artificial>
>      |       [2] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
>      [2] 0x14b9b6b5f1fe:     libopen-pal.so.40                                            opal_progress_threads.c:104
>        [2] 0x14b9b6bac53d:     libopen-pal.so.40                                            poll.c:167
>          [2] 0x14b9b755ca41:             libc.so.6                                    __poll
>            [2] 0x14b9b80f1b20:       libpthread.so.0                                            sigaction.c
>              [2] 0x00000055a144:  lbpm_color_simulator                                            <artificial>
>                [2] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
> Program abort called in file '/apps/ku/build/lbpm/lbpm-2020/common/Database.cpp' at line 83:
>    Variable FlowAdaptor was not found in database
> Bytes used = 304768640
> Stack Trace:
>  [1] 0x00000046f56e:  lbpm_color_simulator                                    _start
>    [1] 0x00000046eb40:  lbpm_color_simulator                                      main  lbpm_color_simulator.cpp:112
>      [1] 0x00000054ba0f:  lbpm_color_simulator               ScaLBL_ColorModel::Run(int)  basic_string.h:222
>        [1] 0x0000005aed7f:  lbpm_color_simulator  Database::getDatabase(std::string const&)  shared_ptr.h:510
>          [1] 0x0000005ae2cb:  lbpm_color_simulator     Database::getData(std::string const&)  Database.cpp:83
>            [1] 0x000000556696:  lbpm_color_simulator  StackTrace::Utilities::abort(std::string const&, std::string const&, int)  stl_vector.h:108
>              [1] 0x00000055cc51:  lbpm_color_simulator                   StackTrace::backtrace()
>                [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
>  [2] 0x14f9b021ddc3:             libc.so.6                                     clone
>    [2] 0x14f9b0d9d14a:       libpthread.so.0                                            pthread_create.c
>      [1] 0x14f9af855c94:     libopen-pal.so.40                                            epoll.c:409
>      | [1] 0x14f9b021e0f7:             libc.so.6                                epoll_wait
>      |   [1] 0x14f9b0da7b20:       libpthread.so.0                                            sigaction.c
>      |     [1] 0x00000055a144:  lbpm_color_simulator                                            <artificial>
>      |       [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
>      [1] 0x14f9af8151fe:     libopen-pal.so.40                                            opal_progress_threads.c:104
>        [1] 0x14f9af86253d:     libopen-pal.so.40                                            poll.c:167
>          [1] 0x14f9b0212a41:             libc.so.6                                    __poll
>            [1] 0x14f9b0da7b20:       libpthread.so.0                                            sigaction.c
>              [1] 0x00000055a144:  lbpm_color_simulator                                            <artificial>
>                [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
> Program abort called in file '/apps/ku/build/lbpm/lbpm-2020/common/Database.cpp' at line 83:
>    Variable FlowAdaptor was not found in database
> Bytes used = 295198464
> Stack Trace:
>  [1] 0x00000046f56e:  lbpm_color_simulator                                    _start
>    [1] 0x00000046eb40:  lbpm_color_simulator                                      main  lbpm_color_simulator.cpp:112
>      [1] 0x00000054ba0f:  lbpm_color_simulator               ScaLBL_ColorModel::Run(int)  basic_string.h:222
>        [1] 0x0000005aed7f:  lbpm_color_simulator  Database::getDatabase(std::string const&)  shared_ptr.h:510
>          [1] 0x0000005ae2cb:  lbpm_color_simulator     Database::getData(std::string const&)  Database.cpp:83
>            [1] 0x000000556696:  lbpm_color_simulator  StackTrace::Utilities::abort(std::string const&, std::string const&, int)  stl_vector.h:108
>              [1] 0x00000055cc51:  lbpm_color_simulator                   StackTrace::backtrace()
>                [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
>  [2] 0x147fc0e31dc3:             libc.so.6                                     clone
>    [2] 0x147fc19b114a:       libpthread.so.0                                            pthread_create.c
>      [1] 0x147fc0469c94:     libopen-pal.so.40                                            epoll.c:409
>      | [1] 0x147fc0e320f7:             libc.so.6                                epoll_wait
>      |   [1] 0x147fc19bbb20:       libpthread.so.0                                            sigaction.c
>      |     [1] 0x00000055a144:  lbpm_color_simulator                                            <artificial>
>      |       [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
>      [1] 0x147fc04291fe:     libopen-pal.so.40                                            opal_progress_threads.c:104
>        [1] 0x147fc047653d:     libopen-pal.so.40                                            poll.c:167
>          [1] 0x147fc0e26a41:             libc.so.6                                    __poll
>            [1] 0x147fc19bbb20:       libpthread.so.0                                            sigaction.c
>              [1] 0x00000055a144:  lbpm_color_simulator                                            <artificial>
>                [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
> Program abort called in file '/apps/ku/build/lbpm/lbpm-2020/common/Database.cpp' at line 83:
>    Variable FlowAdaptor was not found in database
> Bytes used = 301353328
> Stack Trace:
>  [1] 0x00000046f56e:  lbpm_color_simulator                                    _start
>    [1] 0x00000046eb40:  lbpm_color_simulator                                      main  lbpm_color_simulator.cpp:112
>      [1] 0x00000054ba0f:  lbpm_color_simulator               ScaLBL_ColorModel::Run(int)  basic_string.h:222
>        [1] 0x0000005aed7f:  lbpm_color_simulator  Database::getDatabase(std::string const&)  shared_ptr.h:510
>          [1] 0x0000005ae2cb:  lbpm_color_simulator     Database::getData(std::string const&)  Database.cpp:83
>            [1] 0x000000556696:  lbpm_color_simulator  StackTrace::Utilities::abort(std::string const&, std::string const&, int)  stl_vector.h:108
>              [1] 0x00000055cc51:  lbpm_color_simulator                   StackTrace::backtrace()
>                [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
>  [2] 0x14a21b475dc3:             libc.so.6                                     clone
>    [2] 0x14a21bff514a:       libpthread.so.0                                            pthread_create.c
>      [1] 0x14a21aaadc94:     libopen-pal.so.40                                            epoll.c:409
>      | [1] 0x14a21b4760f7:             libc.so.6                                epoll_wait
>      |   [1] 0x14a21bfffb20:       libpthread.so.0                                            sigaction.c
>      |     [1] 0x00000055a144:  lbpm_color_simulator                                            <artificial>
>      |       [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
>      [1] 0x14a21aa6d1fe:     libopen-pal.so.40                                            opal_progress_threads.c:104
>        [1] 0x14a21aaba53d:     libopen-pal.so.40                                            poll.c:167
>          [1] 0x14a21b46aa41:             libc.so.6                                    __poll
>            [1] 0x14a21bfffb20:       libpthread.so.0                                            sigaction.c
>              [1] 0x00000055a144:  lbpm_color_simulator                                            <artificial>
>                [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
> Program abort called in file '/apps/ku/build/lbpm/lbpm-2020/common/Database.cpp' at line 83:
>    Variable FlowAdaptor was not found in database
> Bytes used = 284188736
> Stack Trace:
>  [1] 0x00000046f56e:  lbpm_color_simulator                                    _start
>    [1] 0x00000046eb40:  lbpm_color_simulator                                      main  lbpm_color_simulator.cpp:112
>      [1] 0x00000054ba0f:  lbpm_color_simulator               ScaLBL_ColorModel::Run(int)  basic_string.h:222
>        [1] 0x0000005aed7f:  lbpm_color_simulator  Database::getDatabase(std::string const&)  shared_ptr.h:510
>          [1] 0x0000005ae2cb:  lbpm_color_simulator     Database::getData(std::string const&)  Database.cpp:83
>            [1] 0x000000556696:  lbpm_color_simulator  StackTrace::Utilities::abort(std::string const&, std::string const&, int)  stl_vector.h:108
>              [1] 0x00000055cc51:  lbpm_color_simulator                   StackTrace::backtrace()
>                [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
>  [2] 0x1545c9238dc3:             libc.so.6                                     clone
>    [2] 0x1545c9db814a:       libpthread.so.0                                            pthread_create.c
>      [1] 0x1545c8870c94:     libopen-pal.so.40                                            epoll.c:409
>      | [1] 0x1545c92390f7:             libc.so.6                                epoll_wait
>      |   [1] 0x1545c9dc2b20:       libpthread.so.0                                            sigaction.c
>      |     [1] 0x00000055a144:  lbpm_color_simulator                                            <artificial>
>      |       [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
>      [1] 0x1545c88301fe:     libopen-pal.so.40                                            opal_progress_threads.c:104
>        [1] 0x1545c887d53d:     libopen-pal.so.40                                            poll.c:167
>          [1] 0x1545c922da41:             libc.so.6                                    __poll
>            [1] 0x1545c9dc2b20:       libpthread.so.0                                            sigaction.c
>              [1] 0x00000055a144:  lbpm_color_simulator                                            <artificial>
>                [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
> Program abort called in file '/apps/ku/build/lbpm/lbpm-2020/common/Database.cpp' at line 83:
>    Variable FlowAdaptor was not found in database
> Bytes used = 271527168
> Stack Trace:
>  [1] 0x00000046f56e:  lbpm_color_simulator                                    _start
>    [1] 0x00000046eb40:  lbpm_color_simulator                                      main  lbpm_color_simulator.cpp:112
>      [1] 0x00000054ba0f:  lbpm_color_simulator               ScaLBL_ColorModel::Run(int)  basic_string.h:222
>        [1] 0x0000005aed7f:  lbpm_color_simulator  Database::getDatabase(std::string const&)  shared_ptr.h:510
>          [1] 0x0000005ae2cb:  lbpm_color_simulator     Database::getData(std::string const&)  Database.cpp:83
>            [1] 0x000000556696:  lbpm_color_simulator  StackTrace::Utilities::abort(std::string const&, std::string const&, int)  stl_vector.h:108
>              [1] 0x00000055cc51:  lbpm_color_simulator                   StackTrace::backtrace()
>                [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
>  [2] 0x14b183224dc3:             libc.so.6                                     clone
>    [2] 0x14b183da414a:       libpthread.so.0                                            pthread_create.c
>      [1] 0x14b18285cc94:     libopen-pal.so.40                                            epoll.c:409
>      | [1] 0x14b1832250f7:             libc.so.6                                epoll_wait
>      |   [1] 0x14b183daeb20:       libpthread.so.0                                            sigaction.c
>      |     [1] 0x00000055a144:  lbpm_color_simulator                                            <artificial>
>      |       [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
>      [1] 0x14b18281c1fe:     libopen-pal.so.40                                            opal_progress_threads.c:104
>        [1] 0x14b18286953d:     libopen-pal.so.40                                            poll.c:167
>          [1] 0x14b183219a41:             libc.so.6                                    __poll
>            [1] 0x14b183daeb20:       libpthread.so.0                                            sigaction.c
>              [1] 0x00000055a144:  lbpm_color_simulator                                            <artificial>
>                [1] 0x00000055a103:  lbpm_color_simulator                                            <artificial>
> --------------------------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that process rank 5 with PID 846076 on node cn-02-52 exited on signal 6 (Aborted).
> --------------------------------------------------------------------------
> 

Regarding the Openmpi, we installed it with OpenHPC package, version is mpirun (Open MPI) 4.0.5 compiled with gcc-9.3

@JamesEMcClure
Copy link
Collaborator

Add the following to your input database

FlowAdaptor {

}

@ahmedsrizk95
Copy link
Author

Noted and worked. Thanks.
But why this did not happen with me before. I was able to run the lbpm_color_simulator without this FlowAdaptor?

@JamesEMcClure
Copy link
Collaborator

We added new functionality that requires a new section in the input database. There are keys that can be populated within this section to control the behavior.

Basically the FlowAdaptor provides a way to dynamically adapt the fluid geometry so that you can more easily change the fluid saturation during steady-state flow.

@ahmedsrizk95
Copy link
Author

Thank you very much. The issue is now solved, I will close it. Thank you again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants