Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd a way to return allocated strings to C and then free them later #25777
Conversation
rust-highfive
assigned
huonw
May 25, 2015
This comment has been minimized.
This comment has been minimized.
|
r? @huonw (rust_highfive has picked a reviewer for you, use r? to override) |
bluss
reviewed
May 25, 2015
| let len_with_nul = len as usize + 1; | ||
| let vec = Vec::from_raw_parts(ptr as *mut u8, len_with_nul, len_with_nul); | ||
| CString::from_vec_unchecked(vec) | ||
| } |
This comment has been minimized.
This comment has been minimized.
bluss
May 25, 2015
Contributor
This function needsa warning itself that it only handles precisely the pointers returned from into_raw_ptr. Or otherwise explain the preconditions.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
One question is whether this should be included in the higher-level documentation, as an example of how to return allocated strings via FFI. |
This comment has been minimized.
This comment has been minimized.
|
Also note I've left all stability attributes off these methods, as I'm not sure what is most appropriate. |
alexcrichton
added
the
T-libs
label
May 26, 2015
This comment has been minimized.
This comment has been minimized.
|
Interesting! I agree with @bluss that this needs some documentation bolstering to outline some of the key aspects of this API:
In terms of naming this may want to consider either Stability-wise these just need explicit At a higher level, I'm curious what you were doing to want to do this kind of operation? I would not expect this to be a common operation at the FFI boundary (due to the need to free the resources at some point). |
This comment has been minimized.
This comment has been minimized.
I've not done anything specific myself, but I've helped various people want to do something similar. It's a straightforward next step after an FFI example that adds two numbers; one that concats two strings.
Perhaps I'm missing something, but how would you foresee any library that does heavy string processing to be called? One of the concepts that I'm excited most for Rust is writing a core library in Rust and then exposing C bindings. We can then use that library in other languages (Ruby is of particular interest to me). I feel like
|
This comment has been minimized.
This comment has been minimized.
FWIW, I see |
This comment has been minimized.
This comment has been minimized.
I suppose it definitely depends on the exact library in play, but I would expect some other owned
Yeah I do agree that this seems like a good tool to have in the toolbox when dealing with FFI, I'm just somewhat hesitant because memory management is often quite difficult when it comes to an FFI boundary and I don't think this will necessarily open up the door to easy management of strings per se. For example in the question you linked to the returned string would need to be deallocated at some point (assuming the incoming string is changed to Not that any of that should stop this at the gates, just something to keep in mind! |
This comment has been minimized.
This comment has been minimized.
But you'd have the same issue if you return the |
shepmaster
force-pushed the
shepmaster:cstring-return-to-c
branch
from
ff86ab6
to
12151bc
May 26, 2015
This comment has been minimized.
This comment has been minimized.
|
@bluss @alexcrichton I've updated the docs a smidge and added the stability attributes. |
This comment has been minimized.
This comment has been minimized.
If the caller were to free the strings, wouldn't they want to use |
This comment has been minimized.
This comment has been minimized.
Generally, no. Many larger C libraries have their own allocators, and you need to use them to prevent passing invalid pointers between allocators. GLib and libxml2 are two examples that come to mind. You can potentially compile these libraries with alternate allocation functions — perhaps you have your own allocator that does extra profiling for example.
I don't know how, as I haven't seen anything parameterized by allocator yet. However, were this to ever exist, it could be neat to use |
This comment has been minimized.
This comment has been minimized.
|
@shepmaster, @gkoz yeah I agree that we want to not recommend using I think this all looks good to me, thanks @shepmaster! I realize now though that the sibling function of cc @rust-lang/libs, with the renaming of |
This comment has been minimized.
This comment has been minimized.
netvl
reviewed
May 28, 2015
| // determinable from the string length, and shrinking to fit | ||
| // is the only way to be sure. | ||
| let mut vec = self.inner; | ||
| vec.shrink_to_fit(); |
This comment has been minimized.
This comment has been minimized.
netvl
May 28, 2015
Contributor
I thought that shrink_to_fit does not guarantee that the capacity will be exactly equal to length. Isn't this important here?
This comment has been minimized.
This comment has been minimized.
bluss
May 28, 2015
Contributor
The implementation in Vec guarantees that vec.capacity() == vec.len(), so the vector has no more information to than ptr + length either at that point. Maybe the docs should update to guarantee that(?).
This comment has been minimized.
This comment has been minimized.
shepmaster
May 28, 2015
Author
Member
For reference, Vec::shrink_to_fit says:
Shrinks the capacity of the vector as much as possible.
It will drop down as close as possible to the length but the allocator may still inform the vector that there is space for a few more elements.
This comment has been minimized.
This comment has been minimized.
netvl
May 28, 2015
Contributor
Here's a quote from the docs:
It will drop down as close as possible to the length but the allocator may still inform the vector that there is space for a few more elements.
Maybe it should state more clearly what actually happens there or removed altogether if it does not hold anymore. Also vec.capacity() == vec.len() does not mean that the actually allocated memory will be equal to vec.len(), does it?
This comment has been minimized.
This comment has been minimized.
bluss
May 28, 2015
Contributor
It doesn't. The vector needs the value capacity as a cookie to give to the allocator when freeing the memory. If len == capacity is fine in the vector internally, that's going be a compatible cookie value to use.
This comment has been minimized.
This comment has been minimized.
shepmaster
May 28, 2015
Author
Member
could this actually switch to using into_boxed_slice actually
@alexcrichton that was originally my first attempt, but I would have needed to change CString, as you noticed. I can give that refactoring a shot to start with, and then add this commit on top of that.
This comment has been minimized.
This comment has been minimized.
shepmaster
May 28, 2015
Author
Member
as we already depend on shrink_to_fit to actually "shrink to fit"
Should the docs for this method be updated? Or maybe some non-doc-comments added that denote that it's being used to truly shrink-to-fit?
This comment has been minimized.
This comment has been minimized.
alexcrichton
May 28, 2015
Member
I'd leave the shrink_to_fit docs as-is for now, but layering on a second commit sounds good to me!
This comment has been minimized.
This comment has been minimized.
Gankro
May 28, 2015
Contributor
The note on allocators is, to my knowledge (I'm pretty sure I wrote that doc comment), a reservation for future behaviour that is not implemented.
This comment has been minimized.
This comment has been minimized.
Gankro
May 28, 2015
Contributor
In particular it might happen once we get local allocators which naturally might favour a more coarse-grain than full-blown jemalloc.
This comment has been minimized.
This comment has been minimized.
|
cc me (i'm wondering about potential overlap with allocator API's that should be coming in the future). |
This comment has been minimized.
This comment has been minimized.
@alexcrichton I actually deliberately chose to not call it that, because |
This comment has been minimized.
This comment has been minimized.
|
r? @alexcrichton (Transferring reviewership, I haven't been keeping track at all...) |
rust-highfive
assigned
alexcrichton
and unassigned
huonw
Jun 1, 2015
This comment has been minimized.
This comment has been minimized.
|
@shepmaster ah yes indeed! I think I would personally err on the side of |
alexcrichton
added
the
I-needs-decision
label
Jun 2, 2015
shepmaster
force-pushed the
shepmaster:cstring-return-to-c
branch
from
12151bc
to
a425a42
Jun 3, 2015
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton OK, updated with the change to |
alexcrichton
reviewed
Jun 4, 2015
| pub unsafe fn from_ptr(ptr: *const libc::c_char) -> CString { | ||
| let len = libc::strlen(ptr); | ||
| let len_with_nul = len as usize + 1; | ||
| let vec = Vec::from_raw_parts(ptr as *mut u8, len_with_nul, len_with_nul); |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jun 4, 2015
Member
Instead of going through Vec, I think it may be best to go directly to a Box<[u8]> because it's where the pointer originally came from, for example:
let len = libc:;strlen(ptr) + 1;
let slice = slice::from_raw_parts(ptr, len as usize);
CString { inner: mem::transmute(slice) }Going through Vec and then possibly hitting shrink_to_fit may be adding a bit more indirection than necessary (and may also be somewhat less robust)
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jun 4, 2015
Member
Ah as one final note, could you add to the docs that this method will calculate the length of the string specified by ptr as well?
This comment has been minimized.
This comment has been minimized.
shepmaster
Jun 6, 2015
Author
Member
@alexcrichton both concerns addressed! Let me know if the added doc line conveys what you intended.
This comment has been minimized.
This comment has been minimized.
|
Looks great to me, thanks @shepmaster! |
shepmaster
force-pushed the
shepmaster:cstring-return-to-c
branch
from
a425a42
to
e20a6db
Jun 6, 2015
This comment has been minimized.
This comment has been minimized.
alexcrichton
removed
the
I-needs-decision
label
Jun 10, 2015
bors
added a commit
that referenced
this pull request
Jun 10, 2015
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
bors
merged commit e20a6db
into
rust-lang:master
Jun 11, 2015
This comment has been minimized.
This comment has been minimized.
|
Isn't there a naming inconsistency now between slice and Box that provide "from/into_raw" and CString with "from/into_ptr"? |
shepmaster
deleted the
shepmaster:cstring-return-to-c
branch
Jun 11, 2015
This comment has been minimized.
This comment has been minimized.
Kind of, but there was already one with |
This comment has been minimized.
This comment has been minimized.
|
@gkoz not that it can't be changed of course, but at least we thought about it some |
This comment has been minimized.
This comment has been minimized.
|
@shepmaster I'd ask @alexcrichton to take another look at this. The Maybe |
This comment has been minimized.
This comment has been minimized.
|
@gkoz thanks for pointing this out, I hadn't considered the |
This comment has been minimized.
This comment has been minimized.
|
For what it's worth, I think |
This comment has been minimized.
This comment has been minimized.
|
Apparently some APIs do want to When writing a Zabbix loadable module:
Corroborated by https://github.com/zabbix/zabbix/blob/trunk/include/module.h#L109-114 /* NOTE: always allocate new memory for val! DON'T USE STATIC OR STACK MEMORY!!! */
#define SET_STR_RESULT(res, val) \
( \
(res)->type |= AR_STRING, \
(res)->str = (char *)(val) \
)So there might be some demand for a I wonder how the RFC: Allow changing the default allocator is going to impact this. Looks like |
shepmaster commentedMay 25, 2015
As far as I was able to determine, it's currently impossible to allocate a C NUL-terminated string in Rust and then return it to C (transferring ownership), without leaking memory. There is support for passing the string to C (borrowing).
To complicate matters, it's not possible for the C code to just call
freeon the allocated string, due to the different allocators in use.CStringhas no way to recreate itself from a pointer. This commit adds one. This is complicated a bit because RustVecs want the pointer, size, and capacity.To deal with that, another method to shrink and "leak" the
CStringto achar *is also provided.We can then use
strlento determine the length of the string, which must match the capacity.TODO
Box<[u8]>Example code
With this example code:
Compiled using
rustc --crate-type dylib str.rs, I was able to link against it from C (gcc -L. -l str str.c -o str):As well as dynamically link via Ruby: