New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenMPI error in FunctionDT #9498
Comments
Have you run your input file through a debugger? Should be pretty easy to narrow this down. Also run in serial when you do that you and should get a line number. At the very a least an input file that we can use to reproduce would be helpful. |
See below an input file that fails, at least on my workstation. I tried with different --n-threads and it gives me the same error. Don't get me wrong, I know that there is an error in the input file and that Moose should stop. It would be just nicer if the error were caught by Moose error system instead of MPI ;) `[Mesh] [Functions] [./forcing_fn] [Variables] [ICs] [Kernels] [./diff] [./ffn] [BCs] [Executioner] start_time = 0 [Outputs] |
Well maybe... For one time checks and simple input validation errors, yes you should get a sensible error message. However, we also tend to use assertions if a check needs to happen in a hot spot to avoid the expensive of branching. We remove those assertions in the optimized binary for optimum speed. So we have a mix. We always recommend that developers check their inputs in debug mode first to see if at least we are catching a problem with an assert. It sounds like this case probably should be looked at. Thanks for the input file. |
I'm not sure why this is contentious. This class is definitely missing an
up-front input validation step. Right in the constructor it should check
that time_t and time_dt have the same length and mooseError() if they
don't. I'm honestly really surprised that it doesn't...
This is a real issue.
…On Fri, Jul 14, 2017 at 10:35 AM Cody Permann ***@***.***> wrote:
Well maybe... For one time checks and simple input validation errors, yes
you should get a sensible error message. However, we also tend to use
assertions if a check needs to happen in a hot spot to avoid the expensive
of branching. We remove those assertions in the optimized binary for
optimum speed. So we have a mix. We always recommend that developers check
their inputs in debug mode first to see if at least we are catching a
problem with an assert. It sounds like this case probably should be looked
at. Thanks for the input file.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9498 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1JMTXYmgdbpY0P0-bQ05zWONy2nBEfks5sN3zFgaJpZM4OYPpP>
.
|
To be clear: this has nothing to do with MPI. You'll just end up getting a
segfault if they don't have the same size. When that happens you may get
MPI errors spewed to your screen because a process died unexpectedly... but
it has nothing to do with MPI itself.
…On Fri, Jul 14, 2017 at 12:14 PM Derek Gaston ***@***.***> wrote:
I'm not sure why this is contentious. This class is definitely missing an
up-front input validation step. Right in the constructor it should check
that time_t and time_dt have the same length and mooseError() if they
don't. I'm honestly really surprised that it doesn't...
This is a real issue.
On Fri, Jul 14, 2017 at 10:35 AM Cody Permann ***@***.***>
wrote:
> Well maybe... For one time checks and simple input validation errors, yes
> you should get a sensible error message. However, we also tend to use
> assertions if a check needs to happen in a hot spot to avoid the expensive
> of branching. We remove those assertions in the optimized binary for
> optimum speed. So we have a mix. We always recommend that developers check
> their inputs in debug mode first to see if at least we are catching a
> problem with an assert. It sounds like this case probably should be looked
> at. Thanks for the input file.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#9498 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AA1JMTXYmgdbpY0P0-bQ05zWONy2nBEfks5sN3zFgaJpZM4OYPpP>
> .
>
|
Description of the enhancement or error report
Creating a FunctionDT TimeStepper with time_t and time_dt inputs of different sizes causes an unexpected MPI communication error such as
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
There should be a test in the constructor to check that time_t and time_dt have the same size (or where-ever that bugs occurs, I am not sure)
Rationale for the enhancement or information for reproducing the error
This would provide the user a clearer (and less scary) explanation of why his/her input file causes an error, and what to modify in its input file.
Identified impact
Changes to FunctionDT TimeStepper or one of its attributes (I am not sure where it happens).
The text was updated successfully, but these errors were encountered: