Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Modular (or Machine) arithmetic #103

Closed
recoules opened this issue Jul 2, 2021 · 10 comments
Closed

Feature request: Modular (or Machine) arithmetic #103

recoules opened this issue Jul 2, 2021 · 10 comments

Comments

@recoules
Copy link
Contributor

recoules commented Jul 2, 2021

Hi,

I am currently (in my spare time, so pretty slowly) reworking the implementation of binsec bitvectors.

My idea is (in the same way as Z) to represent small values as native OCaml integer. For this, I use 10 bits for encoding the size of the bitvector (so the compact representation can handle bitvector up to 1023 bits) and the remaining bits are for the (sign extended) value. Other cases fall through standard boxed representation.

As I use Z as the underlying implementation, I think it often performs more check or more work than if it was directly written as a native case inside zarith library where some shortcuts could have been taken. Additionally, I would imagine that a fast, carefully optimized modular arithmetic (aka fixed size bitvector arithmetic) may also interest some people outside of binsec project if it was distributed with zarith.

So would you consider adding a module M (for either Modular or Machine arithmetic) alongside Z and Q (possibly starting from my work when it would be ready) ?

@antoinemine
Copy link
Collaborator

This is interesting, but I am not considering adding this at the moment.
I think that, with the optimisations that you are proposing, there will be very little code sharing with the low-level implementation in ZArith:

  • you want to use two's complement representation, but Zarith uses GMP's sign bit plus unsigned integer representation for long numbers
  • Zarith ensures that operations on short integers are mapped directly to the OCaml ones; this is not the case if you reserve 10 bits in every number

@recoules
Copy link
Contributor Author

recoules commented Jul 2, 2021

Zarith ensures that operations on short integers are mapped directly to the OCaml ones; this is not the case if you reserve 10 bits in every number

Well, in fact, it only needs a wrapper that shift right (asr 10) the value and then, the operation can be mapped directly to the OCaml one. Then, before returning, if ((x lsl 10) asr 10) = x then you know you can use compact form.

you want to use two's complement representation, but Zarith uses GMP's sign bit plus unsigned integer representation for long numbers

If think on the contrary that we want to use the GMP's sign bit plus unsigned value because for positive value, it changes nothing but for example, if you want to represent a big negative 128 bit value, e.g. -9 223 372 036 854 775 807, you would prefer to represent it as -0x7fffffffffffffff,128 rather than 0xffffffffffffffff8000000000000001,128. It is still very hypothetical.

I think that, with the optimisations that you are proposing, there will be very little code sharing with the low-level implementation in ZArith:

In fact, as I am very tempted to reuse some of the helper functions and constant declared in the stub c file, I thought it could be smarter to directly embed it in the same library rather than creating my own "almost copy-pasted" project.

Also, as a side note, I am wondering why there is a need for long numbers to have a word reserved for sign | size. Assuming you are willing to truncate ocaml memory layout (aka, updating the header), the size would be directly mapped to the one of the header (minus one because of the custom block). Yet, we still need a bit for the sign but, if the custom block is used (and I think it is the default implementation choice), we can use different struct custom_operations addresses to make the difference between positive and negative. Thus, for a small static duplication cost, we save one word by allocated long Z. I am planning to try this because it sounds fun but I agree it looks pretty pointless.

@antoinemine
Copy link
Collaborator

Sorry for the delay in answering.

Zarith ensures that operations on short integers are mapped directly to the OCaml ones; this is not the case if you reserve 10 bits in every number

Well, in fact, it only needs a wrapper that shift right (asr 10) the value and then, the operation can be mapped directly to the OCaml one. Then, before returning, if ((x lsl 10) asr 10) = x then you know you can use compact form.

If I understand correctly, it is possible that you have a bitvector in compact representation that does not fit into a machine integer (bitsize between 53 and 1024 I guess). In that case, you'll do asr and lsl on big numbers. I think that it could end up much more costly than keeping the size separately, as an OCaml int, and let the large integer be unshifted.

you want to use two's complement representation, but Zarith uses GMP's sign bit plus unsigned integer representation for long numbers

If think on the contrary that we want to use the GMP's sign bit plus unsigned value because for positive value, it changes nothing but for example, if you want to represent a big negative 128 bit value, e.g. -9 223 372 036 854 775 807, you would prefer to represent it as -0x7fffffffffffffff,128 rather than 0xffffffffffffffff8000000000000001,128. It is still very hypothetical.

For the modular arithmetic part, would it be possible to simply use the standard Z.add, Z.sub, etc operations, followed with an extract (or signed_extract) ?
The current downside is that there is no fast (OCaml) path for these operators, but we could work on this.

Also, as a side note, I am wondering why there is a need for long numbers to have a word reserved for sign | size. Assuming you are willing to truncate ocaml memory layout (aka, updating the header), the size would be directly mapped to the one of the header (minus one because of the custom block). Yet, we still need a bit for the sign but, if the custom block is used (and I think it is the default implementation choice), we can use different struct custom_operations addresses to make the difference between positive and negative. Thus, for a small static duplication cost, we save one word by allocated long Z. I am planning to try this because it sounds fun but I agree it looks pretty pointless.

Modifying the header of an OCaml block after its allocation seems an extremely bad idea to me.

@recoules
Copy link
Contributor Author

recoules commented Jan 5, 2022

If` I understand correctly, it is possible that you have a bitvector in compact representation that does not fit into a machine integer (bitsize between 53 and 1024 I guess).

It is not exactly that.
In a first time, only the positive value that actually fit in 52 bits can use the compact representation (cover usual small constants).
Then we can figure out that handling "sign extension" could be beneficial to cover small negative constant too.
For instance -1<64> is a good candidate and could be represented as -1 lsl 10 lor 64 instead of Bigint 0xffffffffffffffffL; assuming all methods deal correctly with the "sign extension" compact representation.

For the modular arithmetic part, would it be possible to simply use the standard Z.add, Z.sub, etc operations, followed with an extract (or signed_extract) ?
The current downside is that there is no fast (OCaml) path for these operators, but we could work on this.

This is actually what is done today. I opened some times ago the PR #90 (need to be reviewed) for adding fast path to extract operations.

Modifying the header of an OCaml block after its allocation seems an extremely bad idea to me.

The function Obj.truncate makes this (but it could be more efficient to redo it inline). It can not really mess with the garbage collector since the "remaining" bytes will not be reclaimed right away (it is not as if we changed the pointer in the minor heap to mark them reallocable). However, it could help if the block is promoted as only the useful part will be copied in major heap.

Still I expect all 64 bits common values to fall in the compact form so, saving 1 word time to time should not really matter a lot.

@Gbury
Copy link

Gbury commented Jan 5, 2022

Modifying the header of an OCaml block after its allocation seems an extremely bad idea to me.

The function Obj.truncate makes this (but it could be more efficient to redo it inline). It can not really mess with the garbage collector since the "remaining" bytes will not be reclaimed right away (it is not as if we changed the pointer in the minor heap to mark them reallocable). However, it could help if the block is promoted as only the useful part will be copied in major heap.

Still I expect all 64 bits common values to fall in the compact form so, saving 1 word time to time should not really matter a lot.

Truncating ocaml values after their creation breaks assumptions made by the flambda optimization pass of the compiler. From what I remember, with multicore (i.e. ocaml 5.0), it is also unsafe to modify headers of objects. And finally, the Obj.truncate function has been deprecated since april 2019 (see ocaml/ocaml@466d3bc ), so I would really recommend not to use it.

@recoules
Copy link
Contributor Author

recoules commented Jan 5, 2022

@Gbury Thank you for the precision. Of course, I would not recommend to truncate arbitrary values, but here is a special case: the value is created by an external function (so flambda is, I think, out of the discussion) and, more importantly, the header will be updated before this external function return (meaning it is the only owner of the value).

Still, I do not really know how the garbage collector will work in multicore with external function calls. Can an external C primitive be interrupted by the garbage collector without calling anything related to OCaml? If yes, then I think a lot of code will become unsafe. If it is still no, then it is safe to modify the header since no one except the C function can use the value yet.

@antoinemine
Copy link
Collaborator

It is not exactly that. In a first time, only the positive value that actually fit in 52 bits can use the compact representation (cover usual small constants).

OK, this seems more reasonable.

For the modular arithmetic part, would it be possible to simply use the standard Z.add, Z.sub, etc operations, followed with an extract (or signed_extract) ?
The current downside is that there is no fast (OCaml) path for these operators, but we could work on this.

This is actually what is done today.

OK. I was not sure, given the current bitvector.ml code (linked on top of the issue).

I opened some times ago the PR #90 (need to be reviewed) for adding fast path to extract operations.

Sorry, I meant a way to code the current fast path in OCaml instead of C, as done in the current version of most operators.
PR #90 is for calls that output a small int even though the arguments are not small.

Modifying the header of an OCaml block after its allocation seems an extremely bad idea to me.

Given the discussion, it is clearly not future-proof. I am against making and exploiting such assumptions on the GC.

@recoules
Copy link
Contributor Author

recoules commented Jan 6, 2022

PR #90 is for calls that output a small int even though the arguments are not small.

It was originally for this but I updated it a time ago (#90 (comment)) to give fast path in OCaml (because small integer input restrict is a special case where the function return a small integer).

Given the discussion, it is clearly not future-proof. I am against making and exploiting such assumptions on the GC.

Fair enough, it is not the center of the topic anyway.
Still, for my personal curiosity (since I have not closely followed the last GC development), I would be curious to know an expert view (@damiendoligez). My hypothesis is that each domain will have its own minor heap and that, as long as there is no reference on a newly created value in another minor heap / in the major heap and that there is no OCaml allocation in this minor heap, then it is safe to do what you want with this value (eg. make a big alloc and split it in smaller values, truncate it, etc.). If no, some older code may not be compatible anymore. Also, I am wondering if the major collection can be triggered when one domain is currently running an external primitive? If so, does this mean that all OCaml parameter should be declared volatile in the C part?

@xavierleroy
Copy link
Contributor

Re: truncating an already-allocated block, this was supported in OCaml up to versions 4.xx (function Obj.truncate), but rather inefficiently. In particular, reducing the size of a major heap block by one creates a one-word fragment that cannot be used for allocation, causing heap fragmentation.

In Multicore OCaml, which became OCaml 5.00 earlier today, Obj.truncate is not supported at all because of strange race conditions between truncating an object and concurrently accessing it. So, don't even think of using it.

@antoinemine
Copy link
Collaborator

Now that #90 is merged, it should be possible to write a modular arithmetic library on top of Zarith in an efficient way in 100% pure OCaml. So, closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants