Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds intern and an intern_static! macro #255

Merged
merged 2 commits into from Jul 23, 2017
Merged

Adds intern and an intern_static! macro #255

merged 2 commits into from Jul 23, 2017

Conversation

atheriel
Copy link
Contributor

It's a common pattern in the C codebase to call intern("some static c string"). I thought that it would be nice to have a similar API available in Rust. Unfortunately, there's no way of doing this with the std::cstring interface that avoids heap-allocating and unnecessary copies. I've written a macro that avoids this by concatenating a null byte to the end of a string at compile time, and casting to a *const c_char. This allows us to write

let symbol = unsafe { intern_static!("my-symbol-1") };

This macro is, and always will be, unsafe. If you pass it a string with internal NULL bytes, e.g. intern("my-\0dangerous-\0string"), then C will not correctly deduce the length of the string, and we'll leak memory. However, for symbols defined at compile time and clearly audited by the project, I don't think this is a problem.

One could also add a safe, non-static version of intern using the std::cstring machinery.

@atheriel atheriel mentioned this pull request Jul 20, 2017

/// Intern (e.g. create a symbol for) a `&'static str`, avoiding
/// allocation. This macro is inherently unsafe: if the string contains internal
/// NULL bytes (`\0`), we will violate memory safety.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it violate memory safety? It would just use the shorter string since you're using strlen to determine the length. (And all you're passing is a pointer to static memory, which can't leak.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment here comes from my understanding is that handling of strings with internal NULL bytes is undefined behaviour in C, and from the "unsafe" denotation on the equivalent std method . You might be right, though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assume intern_1 didn't take the length; then as far as the C API is concerned the string would end at the first null byte. That's not undefined behavior, but of course not what you intended either.

The method on CString is unsafe because it maintains the invariant that its string has the same length for Rust and C consumers, so not checking must be an unsafe function.

/// NULL bytes (`\0`), we will violate memory safety.
macro_rules! intern_static {
($s:expr) => ({
let __intern_s = concat!($s, "\0") as *const str as *const [::libc::c_char] as *const ::libc::c_char;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be concat!(...).as_ptr() as *const libc::c_char

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

@s-kostyaev
Copy link

This macro is, and always will be, unsafe. If you pass it a string with internal NULL bytes, e.g. intern("my-\0dangerous-\0string"), then C will not correctly deduce the length of the string, and we'll leak memory. However, for symbols defined at compile time and clearly audited by the project, I don't think this is a problem.

I'm newbe in rust, but can it be checked at compile time inside macro?

@birkenfeld
Copy link
Collaborator

Actually, since intern_1 gets the length as well, I don't think we even need the trailing NULL byte.

/// NULL bytes (`\0`), we will violate memory safety.
macro_rules! intern_static {
($s:expr) => ({
let __intern_s = concat!($s, "\0") as *const str as *const [::libc::c_char] as *const ::libc::c_char;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, Rust has hygienic macros. No longer do we have to name our variables in macros horrible underscore salads (unless you like the convention and did it for a reason besides to avoid shadowing at macro call sites).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it for convention, although it has been a while since I've looked at the source code for the standard library macros. Happy to change it, though.

@atheriel
Copy link
Contributor Author

Actually, since intern_1 gets the length as well, I don't think we even need the trailing NULL byte.

@birkenfeld Well, that seems obvious in retrospect. Don't know why I was overthinking it. I'll change it.

@atheriel
Copy link
Contributor Author

@s-kostyaev Using a procedural macro, yes. Using the basic macro_rules!, no.

@s-kostyaev
Copy link

s-kostyaev commented Jul 20, 2017

@atheriel Thanks for reply. Maybe would be better to use procedural macro in this case and add this compile time check? It can prevent memory leak.

@birkenfeld
Copy link
Collaborator

Maybe would be better to use procedural macro in this case and add this compile time check? It can prevent memory leak.

No leaks possible in this case.

@s-kostyaev
Copy link

If you pass it a string with internal NULL bytes, e.g. intern("my-\0dangerous-\0string"), then C will not correctly deduce the length of the string, and we'll leak memory.

@birkenfeld
Copy link
Collaborator

There's no "passing a string". There's only passing a pointer to a string constant in static memory.

@s-kostyaev
Copy link

Static memory can't leak. Thanks for explanation.

@atheriel
Copy link
Contributor Author

I should be able to rewrite this and rebase tonight. Thanks for all the comments!

@atheriel
Copy link
Contributor Author

@birkenfeld How does this look?

@birkenfeld
Copy link
Collaborator

Can't intern directly take a &str, and the macro can be removed?

Signed-off-by: Aaron Jacobs <atheriel@gmail.com>
@atheriel
Copy link
Contributor Author

@birkenfeld Yep.

Copy link
Collaborator

@birkenfeld birkenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now it's just Travis complaining that the function is never used, which generates an error (@Wilfred how to proceed?). Otherwise looks fine!

Signed-off-by: Aaron Jacobs <atheriel@gmail.com>
@atheriel
Copy link
Contributor Author

@birkenfeld I've fixed the dead code issue. I also realized that this function can clearly be generic, to accomodate other string-like types.

@birkenfeld
Copy link
Collaborator

Nice!

@birkenfeld birkenfeld merged commit c688163 into remacs:master Jul 23, 2017
@atheriel atheriel deleted the intern-static branch July 23, 2017 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants