Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refactor] #1985: Reduce size of Name struct #2365

Merged
merged 2 commits into from
Jun 22, 2022

Conversation

Erigara
Copy link
Contributor

@Erigara Erigara commented Jun 16, 2022

Description of the Change

  • Add ConstString: immutable, inlinable string, which uses union internally to store boxed and inlined variants in the same space;
  • Replace String with ConstString in Name struct.

Issue

Closes #1985.

Benefits

All structs that uses Name take up less space.
We win at least one machine word for every instance of Name and up to 23 byte for strings 15 bytes long on 64-bit architecture, also strings shorter than 15 bytes are stored on the stack, which avoids allocation.

Possible Drawbacks

ConstString uses unsafe code, which increases the risk of bugs and vulnerabilities.

Usage Examples or Tests

Run unit tests:

cargo test --package iroha_data_primitives --lib -- conststr::tests --nocapture

Run unit tests with miri to find leaks and UB:

cargo +nightly-2022-04-20 miri test --package iroha_data_primitives --lib -- conststr::tests --nocapture

Alternate Designs

It possible to use enum instead of union to avoid unsafe, but it take 24 bytes instead of 16, see.

@github-actions github-actions bot added the iroha2-dev The re-implementation of a BFT hyperledger in RUST label Jun 16, 2022
@codecov
Copy link

codecov bot commented Jun 16, 2022

Codecov Report

Merging #2365 (a38603b) into iroha2-dev (6973f71) will increase coverage by 4.55%.
The diff coverage is 82.98%.

❗ Current head a38603b differs from pull request most recent head 4c298f3. Consider uploading reports for the commit 4c298f3 to get more accurate results

@@              Coverage Diff               @@
##           iroha2-dev    #2365      +/-   ##
==============================================
+ Coverage       65.50%   70.05%   +4.55%     
==============================================
  Files             133      142       +9     
  Lines           24697    27874    +3177     
==============================================
+ Hits            16177    19528    +3351     
+ Misses           8520     8346     -174     
Impacted Files Coverage Δ
cli/derive/src/lib.rs 74.72% <ø> (ø)
cli/src/torii/mod.rs 28.88% <ø> (ø)
client/src/http.rs 47.82% <ø> (ø)
config/src/lib.rs 15.38% <0.00%> (+1.09%) ⬆️
core/src/lib.rs 100.00% <ø> (ø)
core/src/smartcontracts/isi/block.rs 100.00% <ø> (ø)
core/src/smartcontracts/isi/domain.rs 39.82% <ø> (ø)
core/src/smartcontracts/isi/triggers.rs 0.00% <0.00%> (ø)
core/src/smartcontracts/isi/tx.rs 25.00% <ø> (ø)
core/src/smartcontracts/isi/world.rs 9.52% <0.00%> (+0.43%) ⬆️
... and 115 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77e6429...4c298f3. Read the comment docs.

}

/// Union representing const-string variants: inlined or boxed.
/// Distinction between variants are achieved by setting least significant bit for inlined variant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that this might be fragile

Have you checked that non-ASCII characters also work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i checked it, i will add test case for that.
In general, something like proptest would be helpful for testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: proptest, It would be good for unit tests, if a bit expensive. We plan on using hypothesis from Python downstream.

Still, there's this is an easy unit test to write and comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another quick and dirty way to ensure that UTF-8 is fully supported, is to construct a boxed string when you see non-Ascii characters.


impl Name {
pub(crate) const fn empty() -> Self {
Self(String::new())
Self(ConstString::new())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if Name is immutable, and I think it is?, this can be a shared const value

Copy link
Contributor Author

@Erigara Erigara Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think about this! Thank you for suggestion.
Do we need empty Id in the first place?

Copy link
Contributor

@mversic mversic Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have to support empty Name/Ids at at least for now. There is an issue tracking the discussion on whether we'll allow or disallow empty identifiers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the approach has to be one of two. Either we publish this as a separate crate that could be shared with other projects, in which case we need to allow empty values, or we tailor it directly to what our Name struct needs (in which case we could equally just merge the two and get rid of another layer of indirection.

I don't mind either way, but I think that a ConstString could be useful in Sora.

data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
data_model/primitives/src/conststr.rs Show resolved Hide resolved
data_model/primitives/src/conststr.rs Show resolved Hide resolved
data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
data_model/primitives/src/conststr.rs Show resolved Hide resolved
impl PartialOrd for ConstString {
#[inline]
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
PartialOrd::partial_cmp(&**self, &**other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if you need to impl or if you can derive, @Erigara

data_model/primitives/src/conststr.rs Show resolved Hide resolved
data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
}

/// Union representing const-string variants: inlined or boxed.
/// Distinction between variants are achieved by setting least significant bit for inlined variant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another quick and dirty way to ensure that UTF-8 is fully supported, is to construct a boxed string when you see non-Ascii characters.

data_model/primitives/src/conststr.rs Show resolved Hide resolved
}

#[test]
fn const_string_display() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a wholly unnecessary test. All you need to do for coverage is to println! inside of other tests, which IMO would also help if someone is modifying the code.

}

#[test]
fn const_string_layout() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is important. I'd put it at the top.

}

#[test]
fn const_string_new() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test needs refactoring. You don't test a constructor, you need to test a ConstString's lifecycle.


use super::*;

fn run_with_strings(f: impl Fn(String)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this function, since you can now add more strings to the iterator. I would suggest adding strings that contain UTF-8 characters here and seeing if the tests keep working. Consider the cases of 4-byte characters. many 2-bytes, ASCII all the way except last, odd number of UTF-8, wonky alignment…

@Erigara Erigara force-pushed the reduce_name_size branch 2 times, most recently from 34dc508 to 1d237dd Compare June 21, 2022 08:31
appetrosyan
appetrosyan previously approved these changes Jun 21, 2022
Copy link
Contributor

@appetrosyan appetrosyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
data_model/primitives/src/conststr.rs Show resolved Hide resolved
/// Can't be derived.
impl Eq for ConstString {}

impl Serialize for ConstString {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come there is no deserialze implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name struct doesn't have Deserialize and Decode, so i decided not to implement them for now.
I can implement then if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has them implemented manually, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i missed them...
I will add Decode and Deserialize for ConstString!

data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
data_model/primitives/src/conststr.rs Show resolved Hide resolved
data_model/primitives/src/conststr.rs Show resolved Hide resolved
data_model/src/lib.rs Outdated Show resolved Hide resolved
data_model/src/lib.rs Outdated Show resolved Hide resolved
data_model/src/lib.rs Outdated Show resolved Hide resolved
data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
@appetrosyan appetrosyan added Optimization Something isn't working as well as it should Unsafe labels Jun 22, 2022
@safinsaf safinsaf deleted the branch hyperledger:iroha2-dev June 22, 2022 11:56
@safinsaf safinsaf closed this Jun 22, 2022
@safinsaf safinsaf reopened this Jun 22, 2022
@Erigara Erigara force-pushed the reduce_name_size branch 2 times, most recently from eea0e7c to a38603b Compare June 22, 2022 12:54
Signed-off-by: Shanin Roman <shanin1000@yandex.ru>
…Name`

Signed-off-by: Shanin Roman <shanin1000@yandex.ru>
@appetrosyan appetrosyan merged commit eeb19b2 into hyperledger:iroha2-dev Jun 22, 2022
/// Distinction between variants are achieved by tagging most significant bit of field `len`:
/// - for inlined variant MSB of `len` is always equal to 1, it's enforced by `InlinedString` constructor;
/// - for boxed variant MSB of `len` is always equal to 0, it's enforced by the fact
/// that `Box` and `Vec` never allocate more than`isize::MAX bytes`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// that `Box` and `Vec` never allocate more than`isize::MAX bytes`.
/// that `Box` and `Vec` never allocate more than `isize::MAX bytes`.

/// - for inlined variant MSB of `len` is always equal to 1, it's enforced by `InlinedString` constructor;
/// - for boxed variant MSB of `len` is always equal to 0, it's enforced by the fact
/// that `Box` and `Vec` never allocate more than`isize::MAX bytes`.
/// For little-endian 64bit architecture memory layout of `ConstStringData` is following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see any other mentions about ConstStringData. Looks like it was deprecated

data_model/primitives/src/conststr.rs Outdated Show resolved Hide resolved
impl Ord for ConstString {
#[inline]
fn cmp(&self, other: &Self) -> Ordering {
Ord::cmp(&**self, &**other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason of dereferencing and taking reference after that? Why not just *self?

Comment on lines +153 to +169
($($ty:ty,)*) => {
impl_eq!($($ty),*);
};
($($ty:ty),*) => {$(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
($($ty:ty,)*) => {
impl_eq!($($ty),*);
};
($($ty:ty),*) => {$(
($($ty:ty),* $(,)?) => {$(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is the same but it's easier to write and read

Copy link
Contributor Author

@Erigara Erigara Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really because with such approach impl_eq!(,) would be valid macro invocation.

Comment on lines +182 to +198
impl<T> From<T> for ConstString
where
T: TryInto<InlinedString>,
<T as TryInto<InlinedString>>::Error: Into<BoxedString>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anybody would use this blanked implementation? For me it seems like it'se easier to implement From<MyType> for ConstString manualy than implementing TryFrom<MyType> for InlinedString where Error...

Copy link
Contributor

@mversic mversic Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting to note that you can have private InlinedString as a bound on a public trait

Comment on lines +369 to +370
/// `BoxedString` is `Send` because the data they
/// reference is unaliased. Aliasing invariant is enforced by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// `BoxedString` is `Send` because the data they
/// reference is unaliased. Aliasing invariant is enforced by
/// [`BoxedString`] is [`Send`] because the data it
/// references is unaliased. Aliasing invariant is enforced by

Comment on lines +375 to +376
/// `BoxedString` is `Sync` because the data they
/// reference is unaliased. Aliasing invariant is enforced by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// `BoxedString` is `Sync` because the data they
/// reference is unaliased. Aliasing invariant is enforced by
/// [`BoxedString`] is [`Sync`] because the data it
/// references is unaliased. Aliasing invariant is enforced by

@Erigara Erigara deleted the reduce_name_size branch June 22, 2022 14:50
BAStos525 pushed a commit to BAStos525/soramitsu-iroha that referenced this pull request Jul 8, 2022
…r#2365)

Signed-off-by: Shanin Roman <shanin1000@yandex.ru>
Signed-off-by: BAStos525 <jungle.vas@yandex.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iroha2-dev The re-implementation of a BFT hyperledger in RUST Optimization Something isn't working as well as it should
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants