[Stdlib] Document `StringSlice[byte=]` subscript performance in docstring by msaelices · Pull Request #6251 · modular/modular

msaelices · 2026-03-22T23:21:05Z

Summary

Add a note to StringSlice.__getitem__(byte=) docstring explaining that the UTF-8 boundary check runs in release builds, and pointing users to as_bytes() for performance-sensitive byte-scanning loops.

Assisted-by: AI

Add a note to StringSlice.__getitem__(byte=) docstring explaining that the UTF-8 boundary check runs in release builds, and pointing users to as_bytes() for performance-sensitive byte-scanning loops. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

Copilot

Pull request overview

Updates the StringSlice.__getitem__(byte=) docstring to clarify performance implications of byte subscripting, specifically calling out the UTF-8 boundary check and recommending as_bytes() for tight byte-scanning loops.

Changes:

Added a docstring note that __getitem__(byte=) performs a UTF-8 codepoint-boundary check in typical builds.
Pointed users to as_bytes() / Span[Byte] for performance-sensitive byte iteration.

Comments suppressed due to low confidence (1)

mojo/stdlib/std/collections/string/string_slice.mojo:1133

The docstring describes this as "byte-level indexing" that may return a partial/invalid UTF-8 sequence and says the return is a "String containing a single byte", but the implementation requires a UTF-8 start byte (via debug_assert[assert_mode="safe"]) and returns a StringSlice whose length is the full codepoint byte length. Please update the docstring to reflect that this returns the codepoint slice at a given byte offset (and that the offset must be a codepoint boundary).

        """Gets a single byte at the specified byte index.

        This performs byte-level indexing, not character (codepoint) indexing.
        For strings containing multi-byte UTF-8 characters, this may return a
        partial or invalid character sequence. For proper character access, use
        `codepoint_slices()` or iterate over the string directly.

        This method includes a UTF-8 codepoint boundary check that runs even
        in release builds. For performance-sensitive byte-scanning loops (e.g.
        parsers, regex engines), use `as_bytes()` to access the underlying
        `Span[Byte]` directly, which avoids this overhead.

        Parameters:
            I: A type that can be used as an index.

        Args:
            byte: The byte index (0-based). Negative indices count from the end.

        Returns:
            A new String containing a single byte at the specified position.
        """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-22T23:25:28Z

mojo/stdlib/std/collections/string/string_slice.mojo

+        This method includes a UTF-8 codepoint boundary check that runs even
+        in release builds. For performance-sensitive byte-scanning loops (e.g.


The new note says the UTF-8 boundary check "runs even in release builds", but the code is a debug_assert[assert_mode="safe"], which can be compiled out when assertions are disabled (e.g. assert_mode=none). Consider rewording to something like "a debug_assert[assert_mode=\"safe\"] boundary check (enabled by default, including in release builds)" to avoid overstating the guarantee.

Suggested change

This method includes a UTF-8 codepoint boundary check that runs even

in release builds. For performance-sensitive byte-scanning loops (e.g.

This method includes a `debug_assert[assert_mode="safe"]` UTF-8

codepoint boundary check (enabled by default, including in release

builds). For performance-sensitive byte-scanning loops (e.g.

Specify that the check is a debug_assert[assert_mode="safe"] rather than saying it "runs even in release builds", since it can be compiled out with -D ASSERT=none. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

NathanSWard · 2026-03-25T20:06:01Z

!sync

github-actions bot added mojo-stdlib Tag for issues related to standard library waiting-on-review labels Mar 22, 2026

Merge branch 'main' into stringslice-byte-docstring

6d31e6b

msaelices marked this pull request as ready for review March 22, 2026 23:23

msaelices requested a review from a team as a code owner March 22, 2026 23:23

Copilot AI review requested due to automatic review settings March 22, 2026 23:23

msaelices mentioned this pull request Mar 22, 2026

[Stdlib] Optimize StringSlice[byte=] subscript performance #6250

Closed

Copilot started reviewing on behalf of msaelices March 22, 2026 23:23 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

[Stdlib] Clarify assert_mode in byte= docstring

7037c17

Specify that the check is a debug_assert[assert_mode="safe"] rather than saying it "runs even in release builds", since it can be compiled out with -D ASSERT=none. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

msaelices changed the title ~~[Stdlib] Document byte= subscript performance in docstring~~ [Stdlib] Document StringSlice[byte=] subscript performance in docstring Mar 23, 2026

NathanSWard removed the waiting-on-review label Mar 25, 2026

modular-automation bot assigned NathanSWard Mar 25, 2026

modularbot added the imported-internally Signals that a given pull request has been imported internally. label Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stdlib] Document `StringSlice[byte=]` subscript performance in docstring#6251

[Stdlib] Document `StringSlice[byte=]` subscript performance in docstring#6251
msaelices wants to merge 3 commits intomodular:mainfrom
msaelices:stringslice-byte-docstring

msaelices commented Mar 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

NathanSWard commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		This method includes a UTF-8 codepoint boundary check that runs even
		in release builds. For performance-sensitive byte-scanning loops (e.g.

-        This method includes a UTF-8 codepoint boundary check that runs even
-        in release builds. For performance-sensitive byte-scanning loops (e.g.
+        This method includes a `debug_assert[assert_mode="safe"]` UTF-8
+        codepoint boundary check (enabled by default, including in release
+        builds). For performance-sensitive byte-scanning loops (e.g.

Conversation

msaelices commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

NathanSWard commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

msaelices commented Mar 22, 2026 •

edited

Loading