-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Security Issue with MPI_Info_get by Introducing New Replacement Function MPI_Info_get_string #146
Comments
Given that there was so much discussion on #133, why not change the paradigm: just have MPI effectively return a Yes, it's "new" in that MPI will be allocating memory and giving it back to the user. Cope. I ask this question because given how much controversy exists here, and how much time we're spending on a relatively minor (albeit important) issue, perhaps we should break the sacred rule of not having MPI allocate memory (that the application must ultimately free) and do a simpler thing, even though we have shied away from it in the past. |
Maybe a dumb question since I missed the last face-to-face, but has |
@jsquyres would this memory be allocated by C memory allocation functions ( |
@omor1's question can be extended by pointing out that the C interfaces for MPI are used from many other languages to access MPI functionality. Dealing with C allocated memory in C# and Java (for example) imposes additional unwelcome burdens on the programmer. Also, if MPI allocates it - who frees it and how? We would need MPI_INFO_FREE_STRING for this purpose. |
@jsquyres <https://github.com/jsquyres> would this memory be allocated by
C memory allocation functions (malloc and friends)? By MPI_ALLOC_MEM?
What is the behavior in Fortran? I'm not opposed to have MPI allocate it
internally and return it to the user, but specifying the behavior more
concretely is probably a good idea, particularly with regards to the
Fortran question.
Fortran doesn’t need this function or any other function attempting to
solve the top-level problem because it doesn’t have the clown car
implementation of strings that C does.
|
Given that there was so much discussion on #133
<#133>, why not change the
paradigm: just have MPI effectively return a strdup()? MPI can guarantee
to make enough space for the entire string (and, if necessary, any required
null terminator). The caller is responsible for freeing the string. This
seems much simpler.
Yes, it's "new" in that MPI will be allocating memory and giving it back
to the user. Cope.
No. This is awful.
I ask this question because given how much controversy exists here, and
how much time we're spending on a relatively minor (albeit important)
issue, perhaps we should break the sacred rule of not having MPI allocate
memory (that the application must ultimately free) and do a simpler thing,
even though we have shied away from it in the past.
For good reason. As noted by another, there is no safe way to use this
function with other language bindings, unless you specify that the memory
is freed with MPI_Free_mem, which is gross, because MPI_Alloc_mem might be
an expensive allocator because of network-awareness.
|
It's a string. In the control path (not the fast path). This is info, for pete's sake. Not performance-critical message passing. You can keep arguing over the nit-picky ways to handle the horribleness of strings in C (guaranteeing NULL termination, +/- 1 characters, how users will/will not screw it up, ...etc.), or you could just make the darn thing like Or you could make it point to internal MPI implication memory (that doesn't have to be freed, but the user had better not modify it). Just make it simple. We've seen that making it complicated leads not only to wailing and gnashing of teeth, but also to real world bugs. |
@dholmes-epcc-ed-ac-uk I hear what you're saying, but I also say: when MPI supports C#, we can have this conversation again. At that time, I foresee that the answer will be: for C#, the binding for MPI_INFO_GET should do something different that the C binding (just like the Fortran binding will do something different than the C binding). 😉 |
I agree with @jsquyres here. Users calling from other languages have to deal with any memory allocated by MPI. If they call MPI_Alloc_mem then they have to call MPI_Free_mem. If they call the shiny new info functions they have to call free. They can just deal with it. It is a C library afterall. |
I like the second solution proposed by @jsquyres. I don't think that there's necessarily a huge number of use cases where the user would want to modify the returned string; and if they do, they can make a copy using standard techniques in the language. This avoids the complexity of requiring the user to manage the memory. So the C binding would look something like: int MPI_Info_get_value(MPI_Info info, const char *key, int *valuelen, const char **value, int *flag); This avoids an extra call to |
Let me repeat: this is info. Who cares about an extra string copy?
That seems... awkward. You're fixing one problem and creating a whole new set of potential problems. |
I agree with this sentiment, I just thought that this alternate path might reduce the controversy regarding language/binding interoperability.
This already exists, though to a lesser degree. If the value associated with a key is replaced with a value of different size in between calls to MPI_INFO_GET_VALUELEN and MPI_INFO_GET, a number of problems result. If the new value is longer, the returned value is truncated; whether this is a problem depends on what exactly the user is doing with the value. If the new value is shorter, the problem might be much worse, as it is unspecified whether MPI will write a null terminator; in the worst case, the user may be unable to determine where the string ends. MPI allocating the string and returning the size in the same call solves both problems, but then you get the current controversy... |
And how to do it in Fortran.
Allocated arrays through C-Pointers are a real burden in Fortran.
Therefore up to now, we restricted such ALLOCATE Interfaces for important
cases, like MPI_ALLOC_MEM, MPI_WIN_ALLOCATE, MPI_WIN_ALLOCATE_SHARED.
|
I do not beileve that the bindings need to do exactly the same things in the different languages. Different languages have different semantics -- particularly with respect to strings. Meaning: let the bindings do language-natural things with strings. E.g., do managed memory with C#. Don't allocate memory in Fortran. Allocate member in C. ...etc. Also: returning a pointer to internal storage seems problematic -- it has thread safety / TOCTOU problems. E.g., imagine one thread changing the value which another thread -- outside of the MPI API -- is reading the value of the string (because it has a pointer to the internal string). I.e., returning a pointer to internal storage gives up any ability to enforce atomic access to the string. |
Are there any cases in the current spec where the C and Fortran bindings do different things? |
@omor1 Yes. MPI_SIZEOF is one example. |
@jsquyres that particular function doesn't have a C binding. Similarly, neither does MPI_F_SYNC_REG. I'd argue that a lack of binding is different than the binding doing different things. Of course, a potential solution to Fortran allocation problems is to simply not have a Fortran binding for the proposed MPI_INFO_GET_STRING—after all, the problem with MPI_INFO_GET is a problem only in C. |
Right -- you asked if we have functions that have differences between C and Fortran. I cited one, you cited some more. There's also MPI_INIT. ...and others. |
We talked about this in the Forum today. A key constraint here is to get this into MPI-4, and the current text is the minimum distance fix for this issue. So for radically different ideas -- e.g., strdup()-like semantics -- there just isn't enough time. So let's table all that discussion and go with the current proposal. Let's take an earmark to re-visit all string handling for MPI-NEXT. |
This passed the no-no and first vote in Albuquerque, New Mexico on 2019-12-12. |
This proposal passed a second and final vote at the February 2020 meeting: Yes - 23 |
Problem
See #133 for description of the problem. This is an alternative solution to the existing ticket #133
Proposal
Instead of two functions to replace the existing ones, this solution has a function that can be used to both query the required length of a string buffer and fetch a info value string into a given buffer.
Changes to the Text
Chapter 9: Added new function MPI_Info_get_string
Impact on Implementations
Implementations will have to provide one new function that has the functionality of the existing MPI_Info_get and MPI_Info_get_valuelen functions with the exception that they must account for the null terminator in C and always include one, even in truncated string cases.
Impact on Users
Users should modify existing code to use the new function.
References
references to secure coding practices can be found here: #133
Pull request:
https://github.com/mpi-forum/mpi-standard/pull/129
Minor changes since Zurich are in commit:
https://github.com/mpi-forum/mpi-standard/pull/129/commits/54cdbf011049e469467cd0343ee5a8bc53c3a488
PDFs of major changes:
misc-2_safe_get_changes.pdf
The text was updated successfully, but these errors were encountered: