Feature request: Modular (or Machine) arithmetic #103

recoules · 2021-07-02T09:39:14Z

Hi,

I am currently (in my spare time, so pretty slowly) reworking the implementation of binsec bitvectors.

My idea is (in the same way as Z) to represent small values as native OCaml integer. For this, I use 10 bits for encoding the size of the bitvector (so the compact representation can handle bitvector up to 1023 bits) and the remaining bits are for the (sign extended) value. Other cases fall through standard boxed representation.

As I use Z as the underlying implementation, I think it often performs more check or more work than if it was directly written as a native case inside zarith library where some shortcuts could have been taken. Additionally, I would imagine that a fast, carefully optimized modular arithmetic (aka fixed size bitvector arithmetic) may also interest some people outside of binsec project if it was distributed with zarith.

So would you consider adding a module M (for either Modular or Machine arithmetic) alongside Z and Q (possibly starting from my work when it would be ready) ?

The text was updated successfully, but these errors were encountered:

antoinemine · 2021-07-02T14:01:46Z

This is interesting, but I am not considering adding this at the moment.
I think that, with the optimisations that you are proposing, there will be very little code sharing with the low-level implementation in ZArith:

you want to use two's complement representation, but Zarith uses GMP's sign bit plus unsigned integer representation for long numbers
Zarith ensures that operations on short integers are mapped directly to the OCaml ones; this is not the case if you reserve 10 bits in every number

recoules · 2021-07-02T15:03:48Z

Zarith ensures that operations on short integers are mapped directly to the OCaml ones; this is not the case if you reserve 10 bits in every number

Well, in fact, it only needs a wrapper that shift right (asr 10) the value and then, the operation can be mapped directly to the OCaml one. Then, before returning, if ((x lsl 10) asr 10) = x then you know you can use compact form.

you want to use two's complement representation, but Zarith uses GMP's sign bit plus unsigned integer representation for long numbers

If think on the contrary that we want to use the GMP's sign bit plus unsigned value because for positive value, it changes nothing but for example, if you want to represent a big negative 128 bit value, e.g. -9 223 372 036 854 775 807, you would prefer to represent it as -0x7fffffffffffffff,128 rather than 0xffffffffffffffff8000000000000001,128. It is still very hypothetical.

I think that, with the optimisations that you are proposing, there will be very little code sharing with the low-level implementation in ZArith:

In fact, as I am very tempted to reuse some of the helper functions and constant declared in the stub c file, I thought it could be smarter to directly embed it in the same library rather than creating my own "almost copy-pasted" project.

Also, as a side note, I am wondering why there is a need for long numbers to have a word reserved for sign | size. Assuming you are willing to truncate ocaml memory layout (aka, updating the header), the size would be directly mapped to the one of the header (minus one because of the custom block). Yet, we still need a bit for the sign but, if the custom block is used (and I think it is the default implementation choice), we can use different struct custom_operations addresses to make the difference between positive and negative. Thus, for a small static duplication cost, we save one word by allocated long Z. I am planning to try this because it sounds fun but I agree it looks pretty pointless.

antoinemine · 2022-01-05T14:40:41Z

Sorry for the delay in answering.

Zarith ensures that operations on short integers are mapped directly to the OCaml ones; this is not the case if you reserve 10 bits in every number

Well, in fact, it only needs a wrapper that shift right (asr 10) the value and then, the operation can be mapped directly to the OCaml one. Then, before returning, if ((x lsl 10) asr 10) = x then you know you can use compact form.

If I understand correctly, it is possible that you have a bitvector in compact representation that does not fit into a machine integer (bitsize between 53 and 1024 I guess). In that case, you'll do asr and lsl on big numbers. I think that it could end up much more costly than keeping the size separately, as an OCaml int, and let the large integer be unshifted.

you want to use two's complement representation, but Zarith uses GMP's sign bit plus unsigned integer representation for long numbers

If think on the contrary that we want to use the GMP's sign bit plus unsigned value because for positive value, it changes nothing but for example, if you want to represent a big negative 128 bit value, e.g. -9 223 372 036 854 775 807, you would prefer to represent it as -0x7fffffffffffffff,128 rather than 0xffffffffffffffff8000000000000001,128. It is still very hypothetical.

For the modular arithmetic part, would it be possible to simply use the standard Z.add, Z.sub, etc operations, followed with an extract (or signed_extract) ?
The current downside is that there is no fast (OCaml) path for these operators, but we could work on this.

Also, as a side note, I am wondering why there is a need for long numbers to have a word reserved for sign | size. Assuming you are willing to truncate ocaml memory layout (aka, updating the header), the size would be directly mapped to the one of the header (minus one because of the custom block). Yet, we still need a bit for the sign but, if the custom block is used (and I think it is the default implementation choice), we can use different struct custom_operations addresses to make the difference between positive and negative. Thus, for a small static duplication cost, we save one word by allocated long Z. I am planning to try this because it sounds fun but I agree it looks pretty pointless.

Modifying the header of an OCaml block after its allocation seems an extremely bad idea to me.

recoules · 2022-01-05T15:32:34Z

If` I understand correctly, it is possible that you have a bitvector in compact representation that does not fit into a machine integer (bitsize between 53 and 1024 I guess).

It is not exactly that.
In a first time, only the positive value that actually fit in 52 bits can use the compact representation (cover usual small constants).
Then we can figure out that handling "sign extension" could be beneficial to cover small negative constant too.
For instance -1<64> is a good candidate and could be represented as -1 lsl 10 lor 64 instead of Bigint 0xffffffffffffffffL; assuming all methods deal correctly with the "sign extension" compact representation.

For the modular arithmetic part, would it be possible to simply use the standard Z.add, Z.sub, etc operations, followed with an extract (or signed_extract) ?
The current downside is that there is no fast (OCaml) path for these operators, but we could work on this.

This is actually what is done today. I opened some times ago the PR #90 (need to be reviewed) for adding fast path to extract operations.

Modifying the header of an OCaml block after its allocation seems an extremely bad idea to me.

The function Obj.truncate makes this (but it could be more efficient to redo it inline). It can not really mess with the garbage collector since the "remaining" bytes will not be reclaimed right away (it is not as if we changed the pointer in the minor heap to mark them reallocable). However, it could help if the block is promoted as only the useful part will be copied in major heap.

Still I expect all 64 bits common values to fall in the compact form so, saving 1 word time to time should not really matter a lot.

Gbury · 2022-01-05T15:40:06Z

Modifying the header of an OCaml block after its allocation seems an extremely bad idea to me.

The function Obj.truncate makes this (but it could be more efficient to redo it inline). It can not really mess with the garbage collector since the "remaining" bytes will not be reclaimed right away (it is not as if we changed the pointer in the minor heap to mark them reallocable). However, it could help if the block is promoted as only the useful part will be copied in major heap.

Still I expect all 64 bits common values to fall in the compact form so, saving 1 word time to time should not really matter a lot.

Truncating ocaml values after their creation breaks assumptions made by the flambda optimization pass of the compiler. From what I remember, with multicore (i.e. ocaml 5.0), it is also unsafe to modify headers of objects. And finally, the Obj.truncate function has been deprecated since april 2019 (see ocaml/ocaml@466d3bc ), so I would really recommend not to use it.

recoules · 2022-01-05T15:54:51Z

@Gbury Thank you for the precision. Of course, I would not recommend to truncate arbitrary values, but here is a special case: the value is created by an external function (so flambda is, I think, out of the discussion) and, more importantly, the header will be updated before this external function return (meaning it is the only owner of the value).

Still, I do not really know how the garbage collector will work in multicore with external function calls. Can an external C primitive be interrupted by the garbage collector without calling anything related to OCaml? If yes, then I think a lot of code will become unsafe. If it is still no, then it is safe to modify the header since no one except the C function can use the value yet.

antoinemine · 2022-01-06T07:12:51Z

It is not exactly that. In a first time, only the positive value that actually fit in 52 bits can use the compact representation (cover usual small constants).

OK, this seems more reasonable.

For the modular arithmetic part, would it be possible to simply use the standard Z.add, Z.sub, etc operations, followed with an extract (or signed_extract) ?
The current downside is that there is no fast (OCaml) path for these operators, but we could work on this.

This is actually what is done today.

OK. I was not sure, given the current bitvector.ml code (linked on top of the issue).

I opened some times ago the PR #90 (need to be reviewed) for adding fast path to extract operations.

Sorry, I meant a way to code the current fast path in OCaml instead of C, as done in the current version of most operators.
PR #90 is for calls that output a small int even though the arguments are not small.

Modifying the header of an OCaml block after its allocation seems an extremely bad idea to me.

Given the discussion, it is clearly not future-proof. I am against making and exploiting such assumptions on the GC.

recoules · 2022-01-06T08:07:20Z

PR #90 is for calls that output a small int even though the arguments are not small.

It was originally for this but I updated it a time ago (#90 (comment)) to give fast path in OCaml (because small integer input restrict is a special case where the function return a small integer).

Given the discussion, it is clearly not future-proof. I am against making and exploiting such assumptions on the GC.

Fair enough, it is not the center of the topic anyway.
Still, for my personal curiosity (since I have not closely followed the last GC development), I would be curious to know an expert view (@damiendoligez). My hypothesis is that each domain will have its own minor heap and that, as long as there is no reference on a newly created value in another minor heap / in the major heap and that there is no OCaml allocation in this minor heap, then it is safe to do what you want with this value (eg. make a big alloc and split it in smaller values, truncate it, etc.). If no, some older code may not be compatible anymore. Also, I am wondering if the major collection can be triggered when one domain is currently running an external primitive? If so, does this mean that all OCaml parameter should be declared volatile in the C part?

xavierleroy · 2022-01-10T18:03:49Z

Re: truncating an already-allocated block, this was supported in OCaml up to versions 4.xx (function Obj.truncate), but rather inefficiently. In particular, reducing the size of a major heap block by one creates a one-word fragment that cannot be used for allocation, causing heap fragmentation.

In Multicore OCaml, which became OCaml 5.00 earlier today, Obj.truncate is not supported at all because of strange race conditions between truncating an object and concurrently accessing it. So, don't even think of using it.

antoinemine · 2023-01-02T11:44:37Z

Now that #90 is merged, it should be possible to write a modular arithmetic library on top of Zarith in an efficient way in 100% pure OCaml. So, closing the issue.

antoinemine mentioned this issue Mar 3, 2022

Add fast path when extraction leads to intnat #90

Merged

antoinemine closed this as completed Jan 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Modular (or Machine) arithmetic #103

Feature request: Modular (or Machine) arithmetic #103

recoules commented Jul 2, 2021

antoinemine commented Jul 2, 2021

recoules commented Jul 2, 2021

antoinemine commented Jan 5, 2022

recoules commented Jan 5, 2022

Gbury commented Jan 5, 2022

recoules commented Jan 5, 2022 •

edited

Loading

antoinemine commented Jan 6, 2022

recoules commented Jan 6, 2022

xavierleroy commented Jan 10, 2022

antoinemine commented Jan 2, 2023

Feature request: Modular (or Machine) arithmetic #103

Feature request: Modular (or Machine) arithmetic #103

Comments

recoules commented Jul 2, 2021

antoinemine commented Jul 2, 2021

recoules commented Jul 2, 2021

antoinemine commented Jan 5, 2022

recoules commented Jan 5, 2022

Gbury commented Jan 5, 2022

recoules commented Jan 5, 2022 • edited Loading

antoinemine commented Jan 6, 2022

recoules commented Jan 6, 2022

xavierleroy commented Jan 10, 2022

antoinemine commented Jan 2, 2023

recoules commented Jan 5, 2022 •

edited

Loading