Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Security Issue with MPI_Info_get by Introducing New Replacement Function MPI_Info_get_string #146

Closed
regrant opened this issue Aug 21, 2019 · 22 comments
Assignees
Labels
had reading Completed the formal proposal reading no-wg Discussion doesn't have a current working group passed final vote Passed the final formal vote passed first vote Passed the first formal vote scheduled reading Reading is scheduled for the next meeting

Comments

@regrant
Copy link
Member

regrant commented Aug 21, 2019

Problem

See #133 for description of the problem. This is an alternative solution to the existing ticket #133

Proposal

Instead of two functions to replace the existing ones, this solution has a function that can be used to both query the required length of a string buffer and fetch a info value string into a given buffer.

Changes to the Text

Chapter 9: Added new function MPI_Info_get_string

Impact on Implementations

Implementations will have to provide one new function that has the functionality of the existing MPI_Info_get and MPI_Info_get_valuelen functions with the exception that they must account for the null terminator in C and always include one, even in truncated string cases.

Impact on Users

Users should modify existing code to use the new function.

References

references to secure coding practices can be found here: #133

Pull request:
https://github.com/mpi-forum/mpi-standard/pull/129

Minor changes since Zurich are in commit:
https://github.com/mpi-forum/mpi-standard/pull/129/commits/54cdbf011049e469467cd0343ee5a8bc53c3a488

PDFs of major changes:
misc-2_safe_get_changes.pdf

@regrant regrant self-assigned this Aug 21, 2019
@regrant regrant added mpi-4.0 no-wg Discussion doesn't have a current working group labels Aug 21, 2019
@regrant regrant added this to the 2019-09 Zurich, Switzerland milestone Aug 21, 2019
@regrant regrant added the scheduled reading Reading is scheduled for the next meeting label Aug 28, 2019
@wesbland wesbland added the had reading Completed the formal proposal reading label Dec 2, 2019
@jsquyres
Copy link
Member

jsquyres commented Dec 8, 2019

Given that there was so much discussion on #133, why not change the paradigm: just have MPI effectively return a strdup()? MPI can guarantee to make enough space for the entire string (and, if necessary, any required null terminator). The caller is responsible for freeing the string. This seems much simpler.

Yes, it's "new" in that MPI will be allocating memory and giving it back to the user. Cope.

I ask this question because given how much controversy exists here, and how much time we're spending on a relatively minor (albeit important) issue, perhaps we should break the sacred rule of not having MPI allocate memory (that the application must ultimately free) and do a simpler thing, even though we have shied away from it in the past.

@raffenet
Copy link

raffenet commented Dec 8, 2019

Maybe a dumb question since I missed the last face-to-face, but has getenv-like behavior also been considered?

@omor1
Copy link
Member

omor1 commented Dec 8, 2019

@jsquyres would this memory be allocated by C memory allocation functions (malloc and friends)? By MPI_ALLOC_MEM? What is the behavior in Fortran? I'm not opposed to have MPI allocate it internally and return it to the user, but specifying the behavior more concretely is probably a good idea, particularly with regards to the Fortran question.

@dholmes-epcc-ed-ac-uk
Copy link
Member

@omor1's question can be extended by pointing out that the C interfaces for MPI are used from many other languages to access MPI functionality. Dealing with C allocated memory in C# and Java (for example) imposes additional unwelcome burdens on the programmer.

Also, if MPI allocates it - who frees it and how? We would need MPI_INFO_FREE_STRING for this purpose.

@jeffhammond
Copy link
Member

jeffhammond commented Dec 8, 2019 via email

@jeffhammond
Copy link
Member

jeffhammond commented Dec 8, 2019 via email

@jsquyres
Copy link
Member

jsquyres commented Dec 9, 2019

It's a string. In the control path (not the fast path). This is info, for pete's sake. Not performance-critical message passing.

You can keep arguing over the nit-picky ways to handle the horribleness of strings in C (guaranteeing NULL termination, +/- 1 characters, how users will/will not screw it up, ...etc.), or you could just make the darn thing like strdup(): the user has to free() it (forget MPI_ALLOC_FREE here).

Or you could make it point to internal MPI implication memory (that doesn't have to be freed, but the user had better not modify it).

Just make it simple. We've seen that making it complicated leads not only to wailing and gnashing of teeth, but also to real world bugs.

@dholmes-epcc-ed-ac-uk
Copy link
Member

@jsquyres
Copy link
Member

jsquyres commented Dec 9, 2019

@dholmes-epcc-ed-ac-uk I hear what you're saying, but I also say: when MPI supports C#, we can have this conversation again.

At that time, I foresee that the answer will be: for C#, the binding for MPI_INFO_GET should do something different that the C binding (just like the Fortran binding will do something different than the C binding). 😉

@hjelmn
Copy link

hjelmn commented Dec 9, 2019

I agree with @jsquyres here. Users calling from other languages have to deal with any memory allocated by MPI. If they call MPI_Alloc_mem then they have to call MPI_Free_mem. If they call the shiny new info functions they have to call free. They can just deal with it. It is a C library afterall.

@omor1
Copy link
Member

omor1 commented Dec 9, 2019

I like the second solution proposed by @jsquyres. I don't think that there's necessarily a huge number of use cases where the user would want to modify the returned string; and if they do, they can make a copy using standard techniques in the language. This avoids the complexity of requiring the user to manage the memory.

So the C binding would look something like:

int MPI_Info_get_value(MPI_Info info, const char *key, int *valuelen, const char **value, int *flag);

This avoids an extra call to MPI_Info_get_valuelen; the valuelen parameter can be omitted otherwise. (I thought that doing this might also avoid potential TOCTOU issues, but on further examination I see it doesn't. We might need to include a notice that the value associated with key cannot be replaced nor deleted and that the info object cannot be freed until the user is done with value.)

@jsquyres
Copy link
Member

jsquyres commented Dec 9, 2019

This avoids an extra call to MPI_Info_get_valuelen; the valuelen parameter can be omitted otherwise.

Let me repeat: this is info. Who cares about an extra string copy?

(I thought that doing this might also avoid potential TOCTOU issues, but on further examination I see it doesn't. We might need to include a notice that the value associated with key cannot be replaced nor deleted and that the info object cannot be freed until the user is done with value.)

That seems... awkward. You're fixing one problem and creating a whole new set of potential problems.

@omor1
Copy link
Member

omor1 commented Dec 9, 2019

Let me repeat: this is info. Who cares about an extra string copy?

I agree with this sentiment, I just thought that this alternate path might reduce the controversy regarding language/binding interoperability.

That seems... awkward. You're fixing one problem and creating a whole new set of potential problems.

This already exists, though to a lesser degree. If the value associated with a key is replaced with a value of different size in between calls to MPI_INFO_GET_VALUELEN and MPI_INFO_GET, a number of problems result. If the new value is longer, the returned value is truncated; whether this is a problem depends on what exactly the user is doing with the value. If the new value is shorter, the problem might be much worse, as it is unspecified whether MPI will write a null terminator; in the worst case, the user may be unable to determine where the string ends.

MPI allocating the string and returning the size in the same call solves both problems, but then you get the current controversy...

@RolfRabenseifner
Copy link

RolfRabenseifner commented Dec 9, 2019 via email

@jsquyres
Copy link
Member

jsquyres commented Dec 9, 2019

I do not beileve that the bindings need to do exactly the same things in the different languages. Different languages have different semantics -- particularly with respect to strings. Meaning: let the bindings do language-natural things with strings. E.g., do managed memory with C#. Don't allocate memory in Fortran. Allocate member in C. ...etc.

Also: returning a pointer to internal storage seems problematic -- it has thread safety / TOCTOU problems. E.g., imagine one thread changing the value which another thread -- outside of the MPI API -- is reading the value of the string (because it has a pointer to the internal string).

I.e., returning a pointer to internal storage gives up any ability to enforce atomic access to the string.

@omor1
Copy link
Member

omor1 commented Dec 9, 2019

Are there any cases in the current spec where the C and Fortran bindings do different things?

@jsquyres
Copy link
Member

jsquyres commented Dec 9, 2019

@omor1 Yes. MPI_SIZEOF is one example.

@omor1
Copy link
Member

omor1 commented Dec 9, 2019

@jsquyres that particular function doesn't have a C binding. Similarly, neither does MPI_F_SYNC_REG. I'd argue that a lack of binding is different than the binding doing different things. Of course, a potential solution to Fortran allocation problems is to simply not have a Fortran binding for the proposed MPI_INFO_GET_STRING—after all, the problem with MPI_INFO_GET is a problem only in C.

@jsquyres
Copy link
Member

Right -- you asked if we have functions that have differences between C and Fortran. I cited one, you cited some more. There's also MPI_INIT. ...and others.

@jsquyres
Copy link
Member

jsquyres commented Dec 11, 2019

We talked about this in the Forum today. A key constraint here is to get this into MPI-4, and the current text is the minimum distance fix for this issue. So for radically different ideas -- e.g., strdup()-like semantics -- there just isn't enough time. So let's table all that discussion and go with the current proposal. Let's take an earmark to re-visit all string handling for MPI-NEXT.

@wesbland
Copy link
Member

This passed the no-no and first vote in Albuquerque, New Mexico on 2019-12-12.

@wesbland wesbland added the passed first vote Passed the first formal vote label Dec 12, 2019
@wesbland wesbland removed this from the 2019-09 Zurich, Switzerland milestone Dec 12, 2019
@wesbland wesbland added this to the 2019-12 Albuquerque, NM, USA milestone Dec 12, 2019
@wesbland wesbland added the passed final vote Passed the final formal vote label Feb 21, 2020
@wesbland
Copy link
Member

This proposal passed a second and final vote at the February 2020 meeting:

Yes - 23
No - 0
Abstain - 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
had reading Completed the formal proposal reading no-wg Discussion doesn't have a current working group passed final vote Passed the final formal vote passed first vote Passed the first formal vote scheduled reading Reading is scheduled for the next meeting
Projects
None yet
Development

No branches or pull requests

9 participants