-
Notifications
You must be signed in to change notification settings - Fork 6
[ISSUE #74]🚀Implement Small String Optimization (SSO) for CheetahString to reduce heap allocations for short strings #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ng to reduce heap allocations for short strings
…ng to reduce heap allocations for short strings
|
Warning Rate limit exceeded@mxsm has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 0 minutes and 44 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughIntroduces Small String Optimization (SSO) to CheetahString by adding an Inline variant to store strings up to 23 bytes directly within the struct, replacing the Empty variant. This reduces heap allocations for short strings. Comprehensive test suite validates the new inline storage path and boundary behaviors across multiple scenarios. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements Small String Optimization (SSO) for the CheetahString type to reduce heap allocations for short strings (≤23 bytes). The optimization introduces an Inline variant in the InnerString enum that stores strings up to 23 bytes directly within the struct, eliminating heap allocations for common short string cases.
Key changes:
- Replaces the
Emptyvariant withInline { len: 0, data: [0; 23] }for representing empty strings - Adds SSO logic to
from_slice()andfrom_string()methods to choose between inline and heap storage based on string length - Updates all match statements to handle the new
Inlinevariant across methods likeas_str(),as_bytes(),len(), andis_empty()
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/cheetah_string.rs |
Core SSO implementation with new Inline variant, updated constructors and accessor methods, and comprehensive documentation of the SSO strategy |
tests/sso.rs |
Comprehensive test suite covering SSO behavior including boundary cases (23/24 bytes), unicode handling, empty strings, and compatibility with existing functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ng to reduce heap allocations for short strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
src/cheetah_string.rs (4)
371-375: SSO bypass: from_arc_string doesn't use inline storage for short strings.
from_arc_stringalways creates anArcStringvariant, even when the string is ≤ 23 bytes and could use inline storage. This bypasses SSO optimization.While preserving the Arc may be intentional for sharing semantics, it's inconsistent with the SSO goal of reducing heap allocations for short strings.
🔎 Proposed refactor to check length and use inline storage
#[inline] pub fn from_arc_string(s: Arc<String>) -> Self { + if s.len() <= INLINE_CAPACITY { + // Use inline storage for short strings + let mut data = [0u8; INLINE_CAPACITY]; + data[..s.len()].copy_from_slice(s.as_bytes()); + CheetahString { + inner: InnerString::Inline { + len: s.len() as u8, + data, + }, + } + } else { CheetahString { inner: InnerString::ArcString(s), } + } }
269-273: SSO bypass: from_vec doesn't use inline storage for short byte vectors.Similar to
from_arc_string,from_vecalways createsArcVecStringeven for short inputs. Consider checking length and using inline storage for consistency with SSO.Note: This requires copying and dropping the Vec, which has a cost, but eliminates the heap allocation for short strings.
293-299: SSO bypass: try_from_vec doesn't use inline storage for short byte vectors.After UTF-8 validation, short byte vectors could use inline storage instead of
ArcVecStringfor consistency with SSO goals.🔎 Proposed refactor
pub fn try_from_vec(v: Vec<u8>) -> Result<Self, Utf8Error> { // Validate UTF-8 std::str::from_utf8(&v)?; + if v.len() <= INLINE_CAPACITY { + let mut data = [0u8; INLINE_CAPACITY]; + data[..v.len()].copy_from_slice(&v); + Ok(CheetahString { + inner: InnerString::Inline { + len: v.len() as u8, + data, + }, + }) + } else { Ok(CheetahString { inner: InnerString::ArcVecString(Arc::new(v)), }) + } }
546-547: Document memory layout implications of inline storage.The comment correctly notes the 24-byte size (23 + 1), but consider adding a note about the impact on the overall
CheetahStringsize. SinceInlineis the largest variant, allCheetahStringinstances will be at least 24 bytes (plus discriminant and alignment), even when storing shorter Arc/static variants. This is an expected SSO tradeoff but worth documenting for maintainers.📝 Suggested documentation enhancement
-/// Maximum capacity for inline string storage (23 bytes + 1 byte for length = 24 bytes total) +/// Maximum capacity for inline string storage (23 bytes + 1 byte for length = 24 bytes total). +/// Note: Since Inline is the largest variant, all CheetahString instances will be at least +/// 24 bytes (plus enum discriminant and alignment), even for Arc/static variants. This is +/// an intentional tradeoff to enable zero-allocation storage for strings ≤ 23 bytes. const INLINE_CAPACITY: usize = 23;
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/cheetah_string.rstests/sso.rs
🔇 Additional comments (16)
src/cheetah_string.rs (7)
20-23: LGTM: Correct default initialization with inline storage.The default implementation correctly initializes an empty string using inline storage with zero length.
180-184: LGTM: Safe conversion from inline storage to String.The unsafe block is justified—inline strings are only created from validated UTF-8 sources in
from_sliceandfrom_string.
333-348: LGTM: Correct inline vs. Arc storage decision.The implementation correctly chooses inline storage for strings ≤ 23 bytes and Arc for longer strings. The length cast to
u8is safe since it's bounded byINLINE_CAPACITY.
353-368: Consider SSO consistency: from_string allocates Arc for already-owned short strings.When
from_stringis called with a short owned String, the implementation copies bytes to inline storage and drops the original String. While correct, this means we copy+drop instead of just storing. This is an intentional SSO tradeoff to eliminate the heap allocation, but it's worth noting the cost for owned inputs.For strings > 23 bytes, the String is moved into an Arc, which is optimal.
388-392: LGTM: Safe UTF-8 conversion for inline strings.The unsafe block is justified—the invariant that inline data up to
lenis valid UTF-8 is maintained by all Inline constructors (from_slice,from_string).
412-412: LGTM: Correct implementations for inline storage access.The
as_bytes,len, andis_emptymethods correctly handle the Inline variant using the stored length.Also applies to: 424-424, 436-436
562-577: LGTM: Well-documented enum variants with clear SSO design.The variant ordering (Inline first) and comprehensive documentation clearly communicate the SSO strategy and storage options.
tests/sso.rs (9)
3-9: LGTM: Correct empty string test.Properly validates the empty string behavior with inline storage (len=0).
11-27: LGTM: Comprehensive short string tests.Tests correctly cover various lengths up to the 23-byte boundary, validating inline storage behavior.
29-46: LGTM: Critical boundary tests at 23 and 24 bytes.These tests correctly validate the transition point between inline storage (≤ 23 bytes) and Arc storage (≥ 24 bytes).
66-80: LGTM: Unicode byte length correctly validated.Line 69 correctly verifies that "你好" is 6 bytes (2 chars × 3 bytes each in UTF-8). Line 77's test string "你好世界啊啊啊" is correctly noted as 21 bytes (7 chars × 3 bytes).
82-96: LGTM: Conversion tests validate String interop.Tests correctly verify
From<String>andInto<String>conversions for short strings using inline storage.
98-108: LGTM: Equality tests cover multiple comparison paths.Tests validate
PartialEqimplementations including CheetahString-to-CheetahString, CheetahString-to-str, and bidirectional comparisons.
110-119: LGTM: Hash implementation validated via HashMap.Tests confirm that inline strings hash consistently and work correctly as HashMap keys.
154-164: LGTM: Mixed storage test validates inline and Arc paths.This test confirms that short and long strings coexist correctly with different storage strategies.
193-208: Note: try_from_bytes uses inline storage, but try_from_vec uses ArcVecString.The test at line 205 for
try_from_vecpasses but usesArcVecStringstorage (as per the implementation), not inline storage. This is consistent with the current implementation but meanstry_from_vecbypasses SSO even for short inputs.The test correctly validates functionality but doesn't exercise the inline path for
try_from_vec.
fix #74
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.