New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Issue for os_str_slice
#118485
Comments
This was discussed somewhat in the ACP (rust-lang/libs-team#306). Copying over the parts I find relevant :) Starting with "lowest common denominator invariants" can always be relaxed later, as we'd be switching cases from asserting to not-asserting. The other |
Looking over #118484, this looks like it'll have a lot of overhead to enforce the invariants when building higher level operations on top that would go away with native support for those operations. This is mostly an observation as I'm not sure what else we can do for now while pattern API support is at a stand-still. |
There's a fast path for splitting on ASCII, which you'll always take when doing traditional options parsing1. My hunch is that that's the common case in general, that even if part of the string is Unicode or raw bytes you'll typically be hunting for some bit of ASCII syntax. It can be made a lot faster on Windows (see #118484 (comment)), and I want to try that if I can get a local Windows environment set up. (EDIT: the If we relax the requirements for Unix then it just becomes a normal slicing operation there. That's one reason that feels attractive to me. Otherwise we have to keep doing something like the current validation. The ASCII fast path can be made faster by skipping bounds checks and giving it its own function. Here's what I get by mucking around in compiler explorer: But I don't know how far it's worth going, and if we choose relaxed checks then we can just get rid of it. Footnotes
|
With strict checks it would technically be optimal for user code to have a function like this: use std::ffi::OsStr;
fn slice_os_str(s: &OsStr, start: usize, end: usize) -> &OsStr {
#[cfg(all(target_vendor = "fortanix", target_env = "sgx"))]
use std::os::fortanix_sgx::ffi::OsStrExt;
#[cfg(target_os = "hermit")]
use std::os::hermit::ffi::OsStrExt;
#[cfg(target_os = "solid")]
use std::os::solid::ffi::OsStrExt;
#[cfg(unix)]
use std::os::unix::ffi::OsStrExt;
#[cfg(target_os = "wasi")]
use std::os::wasi::ffi::OsStrExt;
#[cfg(target_os = "xous")]
use std::os::xous::ffi::OsStrExt;
#[cfg(any(
unix,
target_os = "wasi",
target_os = "hermit",
all(target_vendor = "fortanix", target_env = "sgx"),
target_os = "solid",
target_os = "xous"
))]
return OsStr::from_bytes(&s.as_bytes()[start..end]);
#[cfg(not(any(
unix,
target_os = "wasi",
target_os = "hermit",
all(target_vendor = "fortanix", target_env = "sgx"),
target_os = "solid",
target_os = "xous"
)))]
return s.slice_encoded_bytes(start..end);
} And that's a little sad.
So all in all I'm back to preferring to skip the check on these platforms. |
Regarding performance, the question I asked myself is why we can't have a subset of the Pattern API that doesn't take |
My thought is that we should have such an API but that this method will be easier to get stabilized. It's a small and unopinionated MVP. |
That doesn't make them mutually exclusive. My point was that for more performance critical code, we can look to #109350 while we can have #109350 mirrors an API on one type into another, is related to an approved RFC, and trims down the biggest, most contentious part of that RFC. My hope is that it can be a relatively quick to stabilize API. It hasn't gotten much attention but I'm looking into that. |
With respect to the restrictions, I am generally inclined toward @blyxxyz's point of view here where we shouldn't need them on Unix because its representation is already set in stone. With that said, keeping uniform restrictions does make the behavior easier to explain and reason about. (I think y'all mentioned that.) But I think most importantly for me anyway, if we start with uniform restrictions, we can always relax them later when we've got some solid use cases motivating us to do that. For std, I am generally inclined to the conservative posture because of our unique stability constraints. |
Add substring API for `OsStr` This adds a method for taking a substring of an `OsStr`, which in combination with [`OsStr::as_encoded_bytes()`](https://doc.rust-lang.org/std/ffi/struct.OsStr.html#method.as_encoded_bytes) makes it possible to implement most string operations in safe code. API: ```rust impl OsStr { pub fn slice_encoded_bytes<R: ops::RangeBounds<usize>>(&self, range: R) -> &Self; } ``` Motivation, examples and research at rust-lang/libs-team#306. Tracking issue: rust-lang#118485 cc `@epage` r? libs-api
…ark-Simulacrum Move `OsStr::slice_encoded_bytes` validation to platform modules This delegates OS string slicing (`OsStr::slice_encoded_bytes`) validation to the underlying platform implementation. For now that results in increased performance and better error messages on Windows without any changes to semantics. In the future we may want to provide different semantics for different platforms. The existing implementation is still used on Unix and most other platforms and is now optimized a little better. Tracking issue: rust-lang#118485 cc `@epage,` `@BurntSushi`
Rollup merge of rust-lang#118569 - blyxxyz:platform-os-str-slice, r=Mark-Simulacrum Move `OsStr::slice_encoded_bytes` validation to platform modules This delegates OS string slicing (`OsStr::slice_encoded_bytes`) validation to the underlying platform implementation. For now that results in increased performance and better error messages on Windows without any changes to semantics. In the future we may want to provide different semantics for different platforms. The existing implementation is still used on Unix and most other platforms and is now optimized a little better. Tracking issue: rust-lang#118485 cc `@epage,` `@BurntSushi`
Feature gate:
#![feature(os_str_slice)]
This is a tracking issue for an API for taking substrings of
OsStr
, which in combination withOsStr::as_encoded_bytes()
would make it possible to implement most string operations in (portable) safe code.Public API
Steps / History
OsStr
libs-team#306OsStr
#118484, MoveOsStr::slice_encoded_bytes
validation to platform modules #118569Unresolved Questions
OsStr
is already fully specified to be arbitrary bytes by means of theOsStrExt
trait. Should we:Footnotes
https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html ↩
The text was updated successfully, but these errors were encountered: