New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds intern and an intern_static! macro #255
Conversation
rust_src/src/symbols.rs
Outdated
|
||
/// Intern (e.g. create a symbol for) a `&'static str`, avoiding | ||
/// allocation. This macro is inherently unsafe: if the string contains internal | ||
/// NULL bytes (`\0`), we will violate memory safety. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it violate memory safety? It would just use the shorter string since you're using strlen
to determine the length. (And all you're passing is a pointer to static memory, which can't leak.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment here comes from my understanding is that handling of strings with internal NULL bytes is undefined behaviour in C, and from the "unsafe" denotation on the equivalent std method . You might be right, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's assume intern_1 didn't take the length; then as far as the C API is concerned the string would end at the first null byte. That's not undefined behavior, but of course not what you intended either.
The method on CString is unsafe because it maintains the invariant that its string has the same length for Rust and C consumers, so not checking must be an unsafe function.
rust_src/src/symbols.rs
Outdated
/// NULL bytes (`\0`), we will violate memory safety. | ||
macro_rules! intern_static { | ||
($s:expr) => ({ | ||
let __intern_s = concat!($s, "\0") as *const str as *const [::libc::c_char] as *const ::libc::c_char; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be concat!(...).as_ptr() as *const libc::c_char
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
I'm newbe in rust, but can it be checked at compile time inside macro? |
Actually, since |
rust_src/src/symbols.rs
Outdated
/// NULL bytes (`\0`), we will violate memory safety. | ||
macro_rules! intern_static { | ||
($s:expr) => ({ | ||
let __intern_s = concat!($s, "\0") as *const str as *const [::libc::c_char] as *const ::libc::c_char; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, Rust has hygienic macros. No longer do we have to name our variables in macros horrible underscore salads (unless you like the convention and did it for a reason besides to avoid shadowing at macro call sites).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it for convention, although it has been a while since I've looked at the source code for the standard library macros. Happy to change it, though.
@birkenfeld Well, that seems obvious in retrospect. Don't know why I was overthinking it. I'll change it. |
@s-kostyaev Using a procedural macro, yes. Using the basic |
@atheriel Thanks for reply. Maybe would be better to use procedural macro in this case and add this compile time check? It can prevent memory leak. |
No leaks possible in this case. |
|
There's no "passing a string". There's only passing a pointer to a string constant in static memory. |
Static memory can't leak. Thanks for explanation. |
I should be able to rewrite this and rebase tonight. Thanks for all the comments! |
@birkenfeld How does this look? |
Can't |
Signed-off-by: Aaron Jacobs <atheriel@gmail.com>
@birkenfeld Yep. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now it's just Travis complaining that the function is never used, which generates an error (@Wilfred how to proceed?). Otherwise looks fine!
Signed-off-by: Aaron Jacobs <atheriel@gmail.com>
@birkenfeld I've fixed the dead code issue. I also realized that this function can clearly be generic, to accomodate other string-like types. |
Nice! |
It's a common pattern in the C codebase to call
intern("some static c string")
. I thought that it would be nice to have a similar API available in Rust. Unfortunately, there's no way of doing this with thestd::cstring
interface that avoids heap-allocating and unnecessary copies. I've written a macro that avoids this by concatenating a null byte to the end of a string at compile time, and casting to a*const c_char
. This allows us to writeThis macro is, and always will be, unsafe. If you pass it a string with internal NULL bytes, e.g.
intern("my-\0dangerous-\0string")
, then C will not correctly deduce the length of the string, and we'll leak memory. However, for symbols defined at compile time and clearly audited by the project, I don't think this is a problem.One could also add a safe, non-static version of
intern
using thestd::cstring
machinery.