Skip to content

Conversation

@mxsm
Copy link
Owner

@mxsm mxsm commented Jan 4, 2026

fix #74

Summary by CodeRabbit

  • New Features

    • Enhanced string performance through optimized storage for short strings, reducing unnecessary memory allocations and improving cache locality for faster operations.
  • Tests

    • Added extensive test coverage for string creation, conversions, equality comparisons, Unicode support, hashing, ordering, and storage behavior to ensure reliability and correctness.

✏️ Tip: You can customize this high-level summary in your review settings.

…ng to reduce heap allocations for short strings
Copilot AI review requested due to automatic review settings January 4, 2026 15:32
…ng to reduce heap allocations for short strings
@coderabbitai
Copy link

coderabbitai bot commented Jan 4, 2026

Warning

Rate limit exceeded

@mxsm has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 0 minutes and 44 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 4c1b987 and 9c82f71.

📒 Files selected for processing (1)
  • src/serde.rs

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Introduces Small String Optimization (SSO) to CheetahString by adding an Inline variant to store strings up to 23 bytes directly within the struct, replacing the Empty variant. This reduces heap allocations for short strings. Comprehensive test suite validates the new inline storage path and boundary behaviors across multiple scenarios.

Changes

Cohort / File(s) Summary
Core SSO Implementation
src/cheetah_string.rs
Added Inline variant with 23-byte capacity for direct string storage; updated Default and empty constructors to use Inline; modified as_str, as_bytes, len, and is_empty to handle Inline data; updated from_slice and from_string to select storage strategy based on string length; removed Empty variant throughout; added INLINE_CAPACITY constant
Test Coverage
tests/sso.rs
New comprehensive test suite covering creation, length checks, equality, cloning, Unicode handling, From/Into conversions, hashing, ordering, byte access, dereferencing, formatting, mixed storage behavior, and boundary testing at 23/24-byte thresholds

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A speedy optimization hops into place,
Twenty-three bytes fit without a heap trace,
Short strings now inline, quick and spry,
No allocations needed—they fly by!
Tests verify each boundary with care,
SSO magic dancing through the air! ✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: implementing SSO for CheetahString to reduce heap allocations for short strings, directly matching the PR objectives.
Linked Issues check ✅ Passed The code changes successfully implement SSO by adding an Inline variant for storing short strings (up to 23 bytes) without heap allocation, directly fulfilling issue #74's requirement.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing SSO: the Inline variant addition, capacity constant, constructor updates, and conversion methods. The test file addition comprehensively covers the new functionality.
Docstring Coverage ✅ Passed Docstring coverage is 96.88% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements Small String Optimization (SSO) for the CheetahString type to reduce heap allocations for short strings (≤23 bytes). The optimization introduces an Inline variant in the InnerString enum that stores strings up to 23 bytes directly within the struct, eliminating heap allocations for common short string cases.

Key changes:

  • Replaces the Empty variant with Inline { len: 0, data: [0; 23] } for representing empty strings
  • Adds SSO logic to from_slice() and from_string() methods to choose between inline and heap storage based on string length
  • Updates all match statements to handle the new Inline variant across methods like as_str(), as_bytes(), len(), and is_empty()

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/cheetah_string.rs Core SSO implementation with new Inline variant, updated constructors and accessor methods, and comprehensive documentation of the SSO strategy
tests/sso.rs Comprehensive test suite covering SSO behavior including boundary cases (23/24 bytes), unicode handling, empty strings, and compatibility with existing functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ng to reduce heap allocations for short strings
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
src/cheetah_string.rs (4)

371-375: SSO bypass: from_arc_string doesn't use inline storage for short strings.

from_arc_string always creates an ArcString variant, even when the string is ≤ 23 bytes and could use inline storage. This bypasses SSO optimization.

While preserving the Arc may be intentional for sharing semantics, it's inconsistent with the SSO goal of reducing heap allocations for short strings.

🔎 Proposed refactor to check length and use inline storage
 #[inline]
 pub fn from_arc_string(s: Arc<String>) -> Self {
+    if s.len() <= INLINE_CAPACITY {
+        // Use inline storage for short strings
+        let mut data = [0u8; INLINE_CAPACITY];
+        data[..s.len()].copy_from_slice(s.as_bytes());
+        CheetahString {
+            inner: InnerString::Inline {
+                len: s.len() as u8,
+                data,
+            },
+        }
+    } else {
         CheetahString {
             inner: InnerString::ArcString(s),
         }
+    }
 }

269-273: SSO bypass: from_vec doesn't use inline storage for short byte vectors.

Similar to from_arc_string, from_vec always creates ArcVecString even for short inputs. Consider checking length and using inline storage for consistency with SSO.

Note: This requires copying and dropping the Vec, which has a cost, but eliminates the heap allocation for short strings.


293-299: SSO bypass: try_from_vec doesn't use inline storage for short byte vectors.

After UTF-8 validation, short byte vectors could use inline storage instead of ArcVecString for consistency with SSO goals.

🔎 Proposed refactor
 pub fn try_from_vec(v: Vec<u8>) -> Result<Self, Utf8Error> {
     // Validate UTF-8
     std::str::from_utf8(&v)?;
+    if v.len() <= INLINE_CAPACITY {
+        let mut data = [0u8; INLINE_CAPACITY];
+        data[..v.len()].copy_from_slice(&v);
+        Ok(CheetahString {
+            inner: InnerString::Inline {
+                len: v.len() as u8,
+                data,
+            },
+        })
+    } else {
         Ok(CheetahString {
             inner: InnerString::ArcVecString(Arc::new(v)),
         })
+    }
 }

546-547: Document memory layout implications of inline storage.

The comment correctly notes the 24-byte size (23 + 1), but consider adding a note about the impact on the overall CheetahString size. Since Inline is the largest variant, all CheetahString instances will be at least 24 bytes (plus discriminant and alignment), even when storing shorter Arc/static variants. This is an expected SSO tradeoff but worth documenting for maintainers.

📝 Suggested documentation enhancement
-/// Maximum capacity for inline string storage (23 bytes + 1 byte for length = 24 bytes total)
+/// Maximum capacity for inline string storage (23 bytes + 1 byte for length = 24 bytes total).
+/// Note: Since Inline is the largest variant, all CheetahString instances will be at least
+/// 24 bytes (plus enum discriminant and alignment), even for Arc/static variants. This is
+/// an intentional tradeoff to enable zero-allocation storage for strings ≤ 23 bytes.
 const INLINE_CAPACITY: usize = 23;
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 750ab5d and 4c1b987.

📒 Files selected for processing (2)
  • src/cheetah_string.rs
  • tests/sso.rs
🔇 Additional comments (16)
src/cheetah_string.rs (7)

20-23: LGTM: Correct default initialization with inline storage.

The default implementation correctly initializes an empty string using inline storage with zero length.


180-184: LGTM: Safe conversion from inline storage to String.

The unsafe block is justified—inline strings are only created from validated UTF-8 sources in from_slice and from_string.


333-348: LGTM: Correct inline vs. Arc storage decision.

The implementation correctly chooses inline storage for strings ≤ 23 bytes and Arc for longer strings. The length cast to u8 is safe since it's bounded by INLINE_CAPACITY.


353-368: Consider SSO consistency: from_string allocates Arc for already-owned short strings.

When from_string is called with a short owned String, the implementation copies bytes to inline storage and drops the original String. While correct, this means we copy+drop instead of just storing. This is an intentional SSO tradeoff to eliminate the heap allocation, but it's worth noting the cost for owned inputs.

For strings > 23 bytes, the String is moved into an Arc, which is optimal.


388-392: LGTM: Safe UTF-8 conversion for inline strings.

The unsafe block is justified—the invariant that inline data up to len is valid UTF-8 is maintained by all Inline constructors (from_slice, from_string).


412-412: LGTM: Correct implementations for inline storage access.

The as_bytes, len, and is_empty methods correctly handle the Inline variant using the stored length.

Also applies to: 424-424, 436-436


562-577: LGTM: Well-documented enum variants with clear SSO design.

The variant ordering (Inline first) and comprehensive documentation clearly communicate the SSO strategy and storage options.

tests/sso.rs (9)

3-9: LGTM: Correct empty string test.

Properly validates the empty string behavior with inline storage (len=0).


11-27: LGTM: Comprehensive short string tests.

Tests correctly cover various lengths up to the 23-byte boundary, validating inline storage behavior.


29-46: LGTM: Critical boundary tests at 23 and 24 bytes.

These tests correctly validate the transition point between inline storage (≤ 23 bytes) and Arc storage (≥ 24 bytes).


66-80: LGTM: Unicode byte length correctly validated.

Line 69 correctly verifies that "你好" is 6 bytes (2 chars × 3 bytes each in UTF-8). Line 77's test string "你好世界啊啊啊" is correctly noted as 21 bytes (7 chars × 3 bytes).


82-96: LGTM: Conversion tests validate String interop.

Tests correctly verify From<String> and Into<String> conversions for short strings using inline storage.


98-108: LGTM: Equality tests cover multiple comparison paths.

Tests validate PartialEq implementations including CheetahString-to-CheetahString, CheetahString-to-str, and bidirectional comparisons.


110-119: LGTM: Hash implementation validated via HashMap.

Tests confirm that inline strings hash consistently and work correctly as HashMap keys.


154-164: LGTM: Mixed storage test validates inline and Arc paths.

This test confirms that short and long strings coexist correctly with different storage strategies.


193-208: Note: try_from_bytes uses inline storage, but try_from_vec uses ArcVecString.

The test at line 205 for try_from_vec passes but uses ArcVecString storage (as per the implementation), not inline storage. This is consistent with the current implementation but means try_from_vec bypasses SSO even for short inputs.

The test correctly validates functionality but doesn't exercise the inline path for try_from_vec.

@mxsm mxsm merged commit dff947c into main Jan 4, 2026
7 checks passed
@mxsm mxsm deleted the feat-74 branch January 6, 2026 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Small String Optimization (SSO) for CheetahString to reduce heap allocations for short strings

3 participants