Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster integer formatting to arbitrary base #2360

Open
triska opened this issue Mar 17, 2024 · 6 comments
Open

Faster integer formatting to arbitrary base #2360

triska opened this issue Mar 17, 2024 · 6 comments

Comments

@triska
Copy link
Contributor

triska commented Mar 17, 2024

In one of my cryptographic applications of Scryer Prolog, the bottleneck turns out to be formatting large integers in base 16, using library(format).

We can use the ~Nr format control sequence to format an integer to base (radix) N, such as for example:

?- phrase(format_("~16r", [121212]), Cs).
   Cs = "1d97c".

With larger integers, this is currently somewhat slow. For example, with the following definitions:

:- use_module(library(format)).
:- use_module(library(time)).
:- use_module(library(between)).
:- use_module(library(dcgs)).

exp(E) :-
        N is 2^E,
        between(1, N, _).

We have:

?- I is 2^2^8,
   time((exp(12),phrase(format_("~16r", [I]), _),false)).
   % CPU time: 1.766s, 8_413_227 inferences
   false.

The formatting code is written in Prolog and should preferably remain in Prolog, because if every tiny detail is hardcoded in Rust itself, then there will never be enough incentive to make actual Prolog code faster:

integer_to_radix(I0, R, Which, Cs) :-

The question I have is therefore: What are the most fundamental, smallest building blocks that would help to make this Prolog code faster? Only that functionality should be implemented in Rust, if at all.

For example, currently, both I0 mod R and I0 // R are computed in the code:

{ M is I0 mod R,

I see that the dashu crate exposes functionality to compute both results in a single operation:

https://docs.rs/dashu-int/latest/dashu_int/struct.IBig.html#impl-DivRem-for-IBig

Would such a predicate be a sensible addition to library(arithmetic), to obtain both the quotient and the remainder of two integers with better efficiency than is currently possible?

Any other ideas? I would greatly appreciate all suggestions and help with this issue. Thank you a lot!

@mthom
Copy link
Owner

mthom commented Mar 17, 2024

Rust uses the ryu crate to print floating point numbers to strings quickly but I'm not sure if there is an equivalent for numbers. perhaps @cmpute could advise?

@cmpute
Copy link
Contributor

cmpute commented Mar 18, 2024

There are several ways to support fast printing in dashu:

  1. Fast formatting with base 2, 8, 10 and 16 is already supported through the conventional Rust protocal. (Display, Debug, etc)
  2. Fast formatting with arbitrary base (up to 32) is already supported through dashu_int::IBig::in_radix method.
  3. Fast conversion to arbitrary base digits is in dev plan (the API currently in my mind is like dashu_int::IBig::to_digits(base: Word) -> Vec<Word>)
  4. Fast division with constant divisor is already supported through the dashu_int::fast_div::ConstDivisor

Not sure which approach is best suitable for Scryer Prolog. Let me know as well if you want different APIs :)

@triska
Copy link
Contributor Author

triska commented Mar 18, 2024

Thank you a lot @cmpute! I have published a draft (#2362) that makes the in_radix method available as an internal predicate. library(format) supports bases up to 36, so using this method would require adjustments in library(format): It would mean that the fast code can be used for most of the cases, but not for all of the cases that the library covers.

It may be interesting to have this functionality, for very large integers, if the need arises. I will first try to improve format_//2 in other ways that ideally only require changes in the engine that are as generally useful as possible.

@cmpute
Copy link
Contributor

cmpute commented Mar 19, 2024

Oh my bad, there was a typo, in_radix actually supports radix up to 36, which is in line with your need. Besides, whether to use upper case for the output is supported through the alternative flag (as documented here)

@triska
Copy link
Contributor Author

triska commented Mar 19, 2024

@cmpute: Thank you a lot for the clarification!

In my tests, the test case above can be made to run about 4 times faster by using the native in_radix method instead of the Prolog-based implementation. Personally, I think this speedup does not justify making this particular API available in Scryer, since it further increases the size of the Rust-based engine and also adds additional complexity to library(format) which should ideally remain portable between Prolog systems, and thus also needs to handle the situation that such a native method is not available in other Prolog systems.

I wonder whether adding a more general predicate to Scryer that also covers the functionality of the in_radix method is worth adding instead, for example, a predicate such as format_integer(FormatString, Integer). However, it is unclear how the specific functionality of the in_radix method could be best plugged into such a construct, and also how much "Rust-specifics" (such as the conventions of Rust format strings) should be exposed by the interface which should ideally be independent of the underlying implementation language.

The "arbitrary base digits" API mentioned above seems interesting, and may indeed be worth exposing as an interface predicate when it becomes available, especially if it satisfies these desiderata.

@cmpute
Copy link
Contributor

cmpute commented Mar 20, 2024

I will link this issue to the commit when the digits conversion functionality is added. However, it worth noting that this API won't get as much optimization as the small base conversions (at least not in the near future), i.e. the speed improvement can be smaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants