Skip to content

Conversation

thaliaarchi
Copy link
Contributor

Implement a UTF-8 version of OsStr/OsString, in addition to the existing bytes and WTF-8 platform-dependent encodings.

This is applicable for several platforms, but I've currently only implemented it for Motor OS:

  • WASI uses Unicode paths, but currently reexports the Unix bytes-assuming OsStrExt/OsStringExt traits.
    • wasi:filesystem APIs:

      Paths are passed as interface-type strings, meaning they must consist of a sequence of Unicode Scalar Values (USVs). Some filesystems may contain paths which are not accessible by this API.

    • In wasi-filesystem#17, it was decided that applications can use any Unicode transformation format, so we're free to use UTF-8 (and probably already do). This was chosen over specifically UTF-8 or an ad hoc encoding which preserves paths not representable in UTF-8.

      The current API uses strings for filesystem paths, which contains sequences of Unicode scalar values (USVs), which applications can work with using strings encoded in UTF-8, UTF-16, or other Unicode encodings.

      This does mean that the API is unable to open files which do not have well-formed Unicode encodings, which may want separate APIs for handling such paths or may want something like the arf-strings proposal, but if we need that we should file a new issue for it.

  • As of Redox OS 0.7.0, "All paths are now required to be UTF-8, and the kernel enforces this". This appears to have been implemented in commit d331f72f (Use UTF-8 for all paths, 2021-02-14). Redox does not have OsStrExt/OsStringExt.
  • Motor OS guarantees that its OS strings are UTF-8 in its current OsStrExt/OsStringExt traits, but they're still internally bytes like Unix.

This is an alternate approach to #147797, which reuses the existing bytes OsString and relies on the safety properties of from_encoded_bytes_unchecked. Compared to that, this also gains efficiency from propagating the UTF-8 invariant to the whole type, as it never needs to test for UTF-8 validity.

Note that Motor OS currently does not build until #147930 merges.

cc @tgross35 (for earlier review)
cc @alexcrichton, @rylev, @loganek (for WASI)
cc @lasiotus (for Motor OS)
cc @jackpot51 (for Redox OS)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 21, 2025
@rustbot
Copy link
Collaborator

rustbot commented Oct 21, 2025

r? @ibraheemdev

rustbot has assigned @ibraheemdev.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Copy link
Contributor

@tgross35 tgross35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me, just a few small suggestions.

r? tgross35
@rustbot author

View changes since this review

@rustbot rustbot assigned tgross35 and unassigned ibraheemdev Oct 21, 2025
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 21, 2025
@rustbot
Copy link
Collaborator

rustbot commented Oct 21, 2025

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@rustbot
Copy link
Collaborator

rustbot commented Oct 21, 2025

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@thaliaarchi
Copy link
Contributor Author

@tgross35 Thanks for the helpful feedback. I've addressed all of your comments.

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 21, 2025
@thaliaarchi
Copy link
Contributor Author

Unless WASI and/or Redox OS platform maintainers speak up here, I am creating separate PRs for those platforms for more focused attention.

@tgross35
Copy link
Contributor

Thanks!

@bors r+ rollup

@bors
Copy link
Collaborator

bors commented Oct 21, 2025

📌 Commit 7e2b76e has been approved by tgross35

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 21, 2025
bors added a commit that referenced this pull request Oct 22, 2025
Rollup of 7 pull requests

Successful merges:

 - #141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s)
 - #142339 (Add NonNull pattern types)
 - #147768 (Code refactoring on hir report_no_match_method_error)
 - #147788 (const Cell methods)
 - #147932 (Create UTF-8 version of `OsStr`/`OsString`)
 - #147933 (os_str: Make platform docs more consistent)
 - #147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit aa65c31 into rust-lang:master Oct 22, 2025
11 checks passed
@rustbot rustbot added this to the 1.92.0 milestone Oct 22, 2025
@thaliaarchi thaliaarchi deleted the utf8-osstring branch October 22, 2025 20:35
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Oct 23, 2025
Rollup of 7 pull requests

Successful merges:

 - rust-lang/rust#141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s)
 - rust-lang/rust#142339 (Add NonNull pattern types)
 - rust-lang/rust#147768 (Code refactoring on hir report_no_match_method_error)
 - rust-lang/rust#147788 (const Cell methods)
 - rust-lang/rust#147932 (Create UTF-8 version of `OsStr`/`OsString`)
 - rust-lang/rust#147933 (os_str: Make platform docs more consistent)
 - rust-lang/rust#147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+)

r? `@ghost`
`@rustbot` modify labels: rollup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants