-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refactor] #1985: Reduce size of Name
struct
#2365
Conversation
Codecov Report
@@ Coverage Diff @@
## iroha2-dev #2365 +/- ##
==============================================
+ Coverage 65.50% 70.05% +4.55%
==============================================
Files 133 142 +9
Lines 24697 27874 +3177
==============================================
+ Hits 16177 19528 +3351
+ Misses 8520 8346 -174
Continue to review full report at Codecov.
|
837595e
to
4f79413
Compare
} | ||
|
||
/// Union representing const-string variants: inlined or boxed. | ||
/// Distinction between variants are achieved by setting least significant bit for inlined variant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that this might be fragile
Have you checked that non-ASCII characters also work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, i checked it, i will add test case for that.
In general, something like proptest would be helpful for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: proptest, It would be good for unit tests, if a bit expensive. We plan on using hypothesis
from Python downstream.
Still, there's this is an easy unit test to write and comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another quick and dirty way to ensure that UTF-8 is fully supported, is to construct a boxed string when you see non-Ascii characters.
data_model/src/lib.rs
Outdated
|
||
impl Name { | ||
pub(crate) const fn empty() -> Self { | ||
Self(String::new()) | ||
Self(ConstString::new()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if Name
is immutable, and I think it is?, this can be a shared const value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think about this! Thank you for suggestion.
Do we need empty Id
in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have to support empty Name
/Id
s at at least for now. There is an issue tracking the discussion on whether we'll allow or disallow empty identifiers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the approach has to be one of two. Either we publish this as a separate crate that could be shared with other projects, in which case we need to allow empty values, or we tailor it directly to what our Name
struct needs (in which case we could equally just merge the two and get rid of another layer of indirection.
I don't mind either way, but I think that a ConstString
could be useful in Sora.
impl PartialOrd for ConstString { | ||
#[inline] | ||
fn partial_cmp(&self, other: &Self) -> Option<Ordering> { | ||
PartialOrd::partial_cmp(&**self, &**other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if you need to impl
or if you can derive, @Erigara
} | ||
|
||
/// Union representing const-string variants: inlined or boxed. | ||
/// Distinction between variants are achieved by setting least significant bit for inlined variant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another quick and dirty way to ensure that UTF-8 is fully supported, is to construct a boxed string when you see non-Ascii characters.
} | ||
|
||
#[test] | ||
fn const_string_display() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a wholly unnecessary test. All you need to do for coverage is to println!
inside of other tests, which IMO would also help if someone is modifying the code.
} | ||
|
||
#[test] | ||
fn const_string_layout() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is important. I'd put it at the top.
} | ||
|
||
#[test] | ||
fn const_string_new() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test needs refactoring. You don't test a constructor, you need to test a ConstString
's lifecycle.
|
||
use super::*; | ||
|
||
fn run_with_strings(f: impl Fn(String)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this function, since you can now add more strings to the iterator. I would suggest adding strings that contain UTF-8 characters here and seeing if the tests keep working. Consider the cases of 4-byte characters. many 2-bytes, ASCII all the way except last, odd number of UTF-8, wonky alignment…
34dc508
to
1d237dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job!
/// Can't be derived. | ||
impl Eq for ConstString {} | ||
|
||
impl Serialize for ConstString { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how come there is no deserialze implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name
struct doesn't have Deserialize
and Decode
, so i decided not to implement them for now.
I can implement then if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has them implemented manually, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, i missed them...
I will add Decode
and Deserialize
for ConstString
!
1d237dd
to
4bd3416
Compare
eea0e7c
to
a38603b
Compare
Signed-off-by: Shanin Roman <shanin1000@yandex.ru>
…Name` Signed-off-by: Shanin Roman <shanin1000@yandex.ru>
a38603b
to
4c298f3
Compare
/// Distinction between variants are achieved by tagging most significant bit of field `len`: | ||
/// - for inlined variant MSB of `len` is always equal to 1, it's enforced by `InlinedString` constructor; | ||
/// - for boxed variant MSB of `len` is always equal to 0, it's enforced by the fact | ||
/// that `Box` and `Vec` never allocate more than`isize::MAX bytes`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// that `Box` and `Vec` never allocate more than`isize::MAX bytes`. | |
/// that `Box` and `Vec` never allocate more than `isize::MAX bytes`. |
/// - for inlined variant MSB of `len` is always equal to 1, it's enforced by `InlinedString` constructor; | ||
/// - for boxed variant MSB of `len` is always equal to 0, it's enforced by the fact | ||
/// that `Box` and `Vec` never allocate more than`isize::MAX bytes`. | ||
/// For little-endian 64bit architecture memory layout of `ConstStringData` is following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see any other mentions about ConstStringData
. Looks like it was deprecated
impl Ord for ConstString { | ||
#[inline] | ||
fn cmp(&self, other: &Self) -> Ordering { | ||
Ord::cmp(&**self, &**other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason of dereferencing and taking reference after that? Why not just *self
?
($($ty:ty,)*) => { | ||
impl_eq!($($ty),*); | ||
}; | ||
($($ty:ty),*) => {$( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
($($ty:ty,)*) => { | |
impl_eq!($($ty),*); | |
}; | |
($($ty:ty),*) => {$( | |
($($ty:ty),* $(,)?) => {$( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result is the same but it's easier to write and read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really because with such approach impl_eq!(,)
would be valid macro invocation.
impl<T> From<T> for ConstString | ||
where | ||
T: TryInto<InlinedString>, | ||
<T as TryInto<InlinedString>>::Error: Into<BoxedString>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anybody would use this blanked implementation? For me it seems like it'se easier to implement From<MyType> for ConstString
manualy than implementing TryFrom<MyType> for InlinedString where Error...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting to note that you can have private InlinedString
as a bound on a public trait
/// `BoxedString` is `Send` because the data they | ||
/// reference is unaliased. Aliasing invariant is enforced by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// `BoxedString` is `Send` because the data they | |
/// reference is unaliased. Aliasing invariant is enforced by | |
/// [`BoxedString`] is [`Send`] because the data it | |
/// references is unaliased. Aliasing invariant is enforced by |
/// `BoxedString` is `Sync` because the data they | ||
/// reference is unaliased. Aliasing invariant is enforced by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// `BoxedString` is `Sync` because the data they | |
/// reference is unaliased. Aliasing invariant is enforced by | |
/// [`BoxedString`] is [`Sync`] because the data it | |
/// references is unaliased. Aliasing invariant is enforced by |
…r#2365) Signed-off-by: Shanin Roman <shanin1000@yandex.ru> Signed-off-by: BAStos525 <jungle.vas@yandex.ru>
Description of the Change
ConstString
: immutable, inlinable string, which usesunion
internally to store boxed and inlined variants in the same space;String
withConstString
inName
struct.Issue
Closes #1985.
Benefits
All structs that uses
Name
take up less space.We win at least one machine word for every instance of
Name
and up to 23 byte for strings 15 bytes long on 64-bit architecture, also strings shorter than 15 bytes are stored on the stack, which avoids allocation.Possible Drawbacks
ConstString
usesunsafe
code, which increases the risk of bugs and vulnerabilities.Usage Examples or Tests
Run unit tests:
Run unit tests with miri to find leaks and UB:
Alternate Designs
It possible to use enum instead of union to avoid
unsafe
, but it take 24 bytes instead of 16, see.