Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer as enum type and optimized constrained and variable-sized integer encoding #289

Merged
merged 24 commits into from
Sep 5, 2024

Conversation

Nicceboy
Copy link
Contributor

@Nicceboy Nicceboy commented Aug 4, 2024

Enum approach seemed to go more complex than it should with the IntegerType trait so I just tried wrapping struct with generic as

/// `Integer`` type is newtype wrapper for any integer type which implements `IntegerType` trait. Uses `i128` by default if no type defined.
#[derive(Debug, Clone, Ord, Hash, Eq, PartialEq, PartialOrd)]
#[non_exhaustive]
pub struct Integer<I: IntegerType = i128>(pub I)

which seems to work fine in the end. It also easily allows adding any new Integer type that implements the trait.
At least encoding speed for OER is more than doubled now for integers.

One concern is that there are many standards which are defined just with Integer type and now they internally default to i128 type. Is that a problem?

@Nicceboy Nicceboy marked this pull request as draft August 4, 2024 20:21
@XAMPPRocky
Copy link
Collaborator

One concern is that there are many standards which are defined just with Integer type and now they internally default to i128 type. Is that a problem?

I think in terms of correctness, yes, because this is now fixed size, where as integer with constraints is not that. Now in fairness it's such a large integer that it likely exceeds any fixed size optimisation. But I'm generally unsure about not matching that behaviour of a variable sized integer in terms of API, because with this implementation, all libraries would have to re-export the integer generic to allow it to be configurable by applications, and I don't think that we should do that.

@Nicceboy
Copy link
Contributor Author

Nicceboy commented Aug 4, 2024

Would it be better if we default to BigInt internally instead? I think that would work similarly as earlier. It is just slower by default. Instead of type alias it would be just newtype.

all libraries would have to re-export the integer generic to allow it to be configurable by applications, and I don't think that we should do that.

Do you mean cases that they want to use Integer defined in standards differently than originally? Yeah, that does not work with this approach, unless we feature gate the default type. Maybe that would be enough for most.

But I don't think this will get any better in terms of performance, since now there is no branching in encoding, and its quite similar than the previously added decoding approach.

Code with enum approach gets quite complex, but it is possible, at least if we just use single primitive type internally.

Edit:

Enum could be used with internal generic for primitive type but then there is again one default type which cannot be changed, when reusing standards, for example.

Alternatively I tried variants of every primitive integer type but that added much of complexity and branching.

Easiest one is just using that one primitive type (e.g. i128) which makes possible to avoid allocations (what I initially did). With the current state of the library, there are no any benefits for smaller integer types in terms of performance, unless we go for very low-end devices.

There were also some challenges on benefiting from the added trait in the case of enum, which might make it confusing. But maybe it does not matter.

@Nicceboy Nicceboy force-pushed the integer-encoding branch 2 times, most recently from 3eeed71 to b1656bf Compare August 5, 2024 00:23
@XAMPPRocky
Copy link
Collaborator

XAMPPRocky commented Aug 5, 2024

Code with enum approach gets quite complex, but it is possible, at least if we just use single primitive type internally.

I do think the enum approach is the way to go forward, because I think it offers the best tradeoffs I think how I would design the enum is like the following, using isize and Box rather u128. Because with this the size of the enum is 16 bytes, rather the 32 bytes on 64 bit systems (and smaller on 32 bit systems which a lot of embedded hardware uses), so we're halving the overall size which should make it slightly faster and easier to optimise.

enum Integer {
    Primitive(isize),
    Variable(Box<num_bigint::BigInt>),
}

I also think to make the implementation easier, let's start by adding the type to crate but not replacing the integer type, and then replacing Integer usage inside codecs (e.g. internals like length decoding). Then once that is covered, focus on replacing it as the Integer, that way we don't need to focus on making a complete integer API, we can focus on just what codecs need at first, and then focus on making a good public API.

@Nicceboy
Copy link
Contributor Author

Nicceboy commented Aug 5, 2024

I see.

What would you think if I try to do it as:

pub enum Integer<P: IntegerType + PrimInt = isize> {
    Primitive(P),
    Variable(Box<BigInt>),
}

That would make it easier to use internally, e.g. if we use primitive types, like u64 when defining ASN.1 types, then for optimal usage they would need to be always then casted to isize type and that truncates the value. I was using i128 initially for that reason, to avoid conversion to BigInt, while avoiding truncation. But above allows their usage directly.

@XAMPPRocky
Copy link
Collaborator

XAMPPRocky commented Aug 5, 2024

That would make it easier to use internally, e.g. if we use primitive types, like u64 when defining ASN.1 types, then for optimal usage they would need to be always then casted to isize type and that truncates the value. I was using i128 initially for that reason, to avoid conversion to BigInt, while avoiding truncation. But above allows their usage directly.

I think there's no getting around converting to BigInt, if we want to have this as an API, I would it expect it to handle over and underflowing into a BigInt even if it was i128, and with that in mind, I don't think we'd get a lot of value from having it be generic, because using a smaller integer type, wouldn't actually make the type smaller, it would always be at least the size of a pointer, so using a Integer<u8> shouldn't have any performance difference to Integer<u64> because it will still be 16 bytes.

@Nicceboy
Copy link
Contributor Author

Nicceboy commented Aug 5, 2024

That would make it easier to use internally, e.g. if we use primitive types, like u64 when defining ASN.1 types, then for optimal usage they would need to be always then casted to isize type and that truncates the value. I was using i128 initially for that reason, to avoid conversion to BigInt, while avoiding truncation. But above allows their usage directly.

I think there's no getting around converting to BigInt, if we want to have this as an API, I would it expect it to handle over and underflowing into a BigInt even if it was i128, and with that in mind, I don't think we'd get a lot of value from having it be generic, because using a smaller integer type, wouldn't actually make the type smaller, it would always be at least the size of a pointer, so using a Integer<u8> shouldn't have any performance difference to Integer<u64> because it will still be 16 bytes.

I think we are thinking of different kind of performance. The main issue with BigInt is that it always uses system call to allocate memory and then frees it. It is very costly operation. Even if there is no underflow or overflow, every constrained type is always converted to BigInt. With constrained types, I would not like that to happen, and conversion would happen only if the types are summed or subtracted on the application size. Underflow or overflow during encoding/decoding with constrained types is typically a constraint error. Most modern standards I see are mostly constrained, particularly because of the performance benefits in encoding/decoding speed (reduced memory usage benefit is secondary).

If we always use the variable sized type, we lose the one big benefit of using Rust, and even Java might be faster (with current BigInt types it actually is). Rust would be a replacement for C/C++ in many areas, but it isn't if it does not provide competitive speed, and as a result safer language like C# or something else is chosen.

Some areas have high requirements for encoding/decoding speed, but they suffer the security issues of C/C++ when relying on them, and we could achieve the same performance by using Rust, while mitigating all the memory issues and also moving many other bugs to compile-time.

Main reason I have been working on this crate is to improve the above, but the C/C++ replacement is not competitive enough if we don't use the stack more.

For non-constrained types it is feasible to add the automatic conversion, but the main benefit would come from the stack-based constrained types by not using any system calls.

@Nicceboy Nicceboy closed this Aug 15, 2024
@XAMPPRocky
Copy link
Collaborator

XAMPPRocky commented Aug 15, 2024

I think we are thinking of different kind of performance. The main issue with BigInt is that it always uses system call to allocate memory and then frees it. It is very costly operation. Even if there is no underflow or overflow, every constrained type is always converted to BigInt.

I'm not sure I understand, because:

  • All decodes on constrained integers are decoded directly into their primitive type. (This still needs to be done for encodes).
  • This discussion has been explicitly about unconstrained integers, which is what types::Integer is.
  • If you're using an enum that I showed above, it would not allocate memory until we allocate a Variable variant, which would only happen on underflow and overflow.

@Nicceboy
Copy link
Contributor Author

Nicceboy commented Aug 15, 2024

Maybe the title was was bit misleading in that sense. This PR attempted to solve all cases of using integers, particularly the issue with constraints which are defined by using primitive integers (which I counted as integers too).

All decodes on constrained integers are decoded directly into their primitive type. (This still needs to be done for encodes).

My primary reason for suggesting the generic parameter for enum was kinda this, because all the primary types are converted to general Integer type currently. With generic they could be used as they are, just wrapped to Integer type internally, and encode with similar logic than anything else. With the generic in enum we can handle the both cases, and allocation happens only if the type is larger than any type Rust supports, if decided so, and the memory footprint could be always the smallest possible. I think it would need larger changes than there are benefits if we change the current behavior by other means, if we use other than encode_integer trait method for primitives.

In case there would be separate implementation also for encoding primitives directly on top of Integer type which uses isize internally for lower values, then it is likely okay, but personally I am not sure if it is worth it, just to avoid wrapping the type for Integer type internally.

Or maybe easy option now than I think it, is to use plain generic as parameter for encode_integer, and we just use Integer as your suggested new variable-sized enum Integer type, which then implements the IntegerType trait, and we would need only small changes for this PR, so that the above function accepts all types.
Also decoding needs only slight changes.

Maybe I can explore it a bit but that would be the fourth implementation I have tried.

@XAMPPRocky
Copy link
Collaborator

Or maybe easy option now than I think it, is to use plain generic as parameter for encode_integer, and we just use Integer as your suggested new variable-sized enum Integer type, which then implements the IntegerType trait, and we would need only small changes for this PR, so that the above function accepts all types.
Also decoding needs only slight changes.

Maybe I can explore it a bit but that would be the fourth implementation I have tried.

I believe that is what I originally stated we should do for encoding here: #254 (comment)

@Nicceboy
Copy link
Contributor Author

I would have appreciated a bit more information, as it also currently uses the trait in this PR as your comment suggests, but just for variants of enum instead of direct type.

@XAMPPRocky
Copy link
Collaborator

Sorry for not being clear.

I think for encoding it should essentially match the way decode_integer works, except instead of just a generic argument, we replace the current types::Integer argument with I: types::IntegerType and provide a new method to_bytes_be to convert it to bytes. That way we can accept all integer types directly, and not perform any conversions for encoding integers.

Separately I think it's worth moving types::Integer to an enum that we showed before, because then for unconstrained integers we can write it so the encode/decode code can decode into a isize integer when possible, and only use BigInt when it exceeds the max size, this would heavily reduce the amount of heap allocations in standards which don't use constrained integers so we can't exclusively use primitive types but nevertheless don't have large integer values.

If there's anything unclear, please feel free to open a discussion, and we can discuss it more in depth.

@Nicceboy
Copy link
Contributor Author

I think it is quite clear now. There was misunderstanding with the constrained and variable-sized integers. I though that with isize we try to optimize both cases. This PR needs only small changes to match above.

@Nicceboy Nicceboy reopened this Aug 16, 2024
@Nicceboy Nicceboy changed the title Newtype Integer and optimized integer encoding Integer as enum type and optimized constrained and variable-sized integer encoding Aug 22, 2024
@Nicceboy
Copy link
Contributor Author

Nicceboy commented Sep 3, 2024

There is one, maybe minor behavior left which should be decided (just about how PartialEq/Eq works, or what is the level of performance optimization we should consider) for

#[derive(Debug, Clone, Ord, Hash, Eq, PartialEq, PartialOrd)]
pub enum Integer {
    Primitive(isize),
    Variable(Box<BigInt>),
}

Currently builds fail on 32-bit platforms because when decoding constrained Integer types (at least on OER), isize is sized for 32 bit, and if we decode constrained integer which has around 64-bit range (so the type is not actually using i64 directly), for example, it does not possibly fit for isize. However, it many cases it could fit if the byte stream just has leading zeros or ones, which do not actually carry the value.

Some logic could be implemented in codecs in order to attempt to identify that, and drop the leading bytes, or the easy solution is to decode is as Variable sized. However, this will result into different types when encoding the type and decoding it, and the comparing the results.

Encoding part currently is aware about leading bytes (because it is necessary for fixed-size integers) and it does not convert to bigger type just because of the required leading bytes, and always uses the isize when possible.
So roundtrip will result to different types internally, and currently equality check fails.

Should we try to optimize on codec level that they drop leading bytes so more values can be used with isize when decoding or we just make equality check based on the value alone and not the internal type?

I think that checking those leading bytes is not worth the effort currently, as it mainly impacts 32 bit platforms, and I will change the equality just to compare numeric values.

@XAMPPRocky
Copy link
Collaborator

Should we try to optimize on codec level that they drop leading bytes so more values can be used with isize when decoding or we just make equality check based on the value alone and not the internal type?

I believe the other codecs do remove leading bytes first, if that's correct would probably be good to match.

I think that checking those leading bytes is not worth the effort currently, as it mainly impacts 32 bit platforms, and I will change the equality just to compare numeric values.

Make an issue for it and it can be pursued and tested separately. It's not needed for the initial implementation.

@Nicceboy
Copy link
Contributor Author

Nicceboy commented Sep 4, 2024

I believe this can be reviewed at this point.

Everything integer related is now moved into types/integer.rs.

Closes #308 #296

Some initial bench results below based on this . UPER does not get as significant improvements because there are many other allocations. I was planning to remove as many allocations as possible next, starting from OER.

image

@Nicceboy Nicceboy marked this pull request as ready for review September 4, 2024 21:38
Copy link
Collaborator

@XAMPPRocky XAMPPRocky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have one question but doesn't need to be addressed in this PR.

Thank you for your PR! And thank you for working on these integer improvements over the past few months!

isize,
u8,
u16,
u32,
u64,
// TODO cannot support u128 as it is constrained type by default and current constraints uses i128 for bounds
// u128,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we implement this with its signed variant being BigInt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify this bit more?

Casting u128 to signed bytes indeed currently changes the meaning. We could use BigInt there.

This particular line was more related to how constraints work currently, which uses i128 for bounds, in here:

rasn/src/types.rs

Lines 270 to 296 in 0aa23e3

macro_rules! asn_integer_type {
($($int:ty),+ $(,)?) => {
$(
impl AsnType for $int {
const TAG: Tag = Tag::INTEGER;
const CONSTRAINTS: Constraints<'static> = Constraints::new(&[
constraints::Constraint::Value(Extensible::new(constraints::Value::new(constraints::Bounded::const_new(<$int>::MIN as i128, <$int>::MAX as i128)))),
]);
}
)+
}
}
asn_integer_type! {
i8,
i16,
i32,
i64,
i128,
isize,
u8,
u16,
u32,
u64,
u128, // TODO upper constraint truncated
usize,
}

Maybe we could change the upper constraint bound to be u128 type. That makes checking the bounds more challenging as more type conversions are included but it should be possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I tried to change the constraint behavior, and it does not seem to be easy.

E.g. if we add more types for

#[derive(Clone, Copy, Debug, Default, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub enum Bounded<S, E> {
    #[default]
    None,
    Single(S),
    Range {
        start: Option<S>,
        end: Option<E>,
    },
}

We would need to re-adjust the constraint logic a lot, because many functions currently return single type, and expect that the type is same for Single or Range variants, for example.

@XAMPPRocky XAMPPRocky merged commit 0417ff5 into librasn:main Sep 5, 2024
65 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants