Skip to content

Conversation

@czurnieden
Copy link
Contributor

Both shortcuts are implemented as the internal functions s_mp_radix_size_radix_10 and s_mp_log_power_of_two so it would be easy to make a function mp_radix_size restricted to the bases {2,4,8,10,16,32,64} if that is wanted for version 2.0.0.

@czurnieden czurnieden requested review from minad and sjaeckel October 13, 2019 03:14
const uint64_t inv_log_2_10 = {0x4d104d427de7fbccULL};
when MP_8BIT got the boot.
*/
const uint16_t inv_log_2_10[4] = {0x4d10u, 0x4d42u, 0x7de7u, 0xfbccu};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

   const uint8_t inv_log_2_10[] = {0x4du, 0x10u, 0x4du, 0x42u, 0x7du, 0xe7u, 0xfbu, 0xccu};
   mp_int bi_bit_count, bi_k, t;
   int i, bit_count;
   if ((err = mp_init_multi(&bi_bit_count, &bi_k, &t, NULL)) != MP_OKAY) {
      return err;
   }
   if ((err = mp_from_ubin(&bi_k, inv_log_2_10, sizeof(inv_log_2_10))) != MP_OKAY) {
      return err;
   }
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be simpler (and most likely even faster) but I didn't want to add a dependency on a function that you might not use elsewhere and once MP_8BIT is gone it is down to just two mp_get_u32, one big-shift and and one big-add.

But if you like it more with mp_from_ubin: drop me a note and I'll change it, no problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer from_ubin which is the canonical importer for large integers from binary data. Alternatively you statically initialize the const mp_int, but then you have to recompute the array for each supported digit size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a branch for MP_16BIT which is the only one left, since MP_8BIT got the boot, to need that part. Every other size must support 64 bit integers to function, which allow for a simple mp_set_u64 for that rest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but please unroll the loop. set, mul, set, add.

@sjaeckel
Copy link
Member

👍

@minad
Copy link
Member

minad commented Oct 14, 2019

Can we discuss this later for 2.0 and concentrate on finishing 1.2 now?

@sjaeckel
Copy link
Member

Can we discuss this later for 2.0 and concentrate on finishing 1.2 now?

fine by me, can you please put milestones on the open issues that you think should still go into 1.2?

@minad
Copy link
Member

minad commented Oct 14, 2019

Ok, I will do. If nothing remains will you create a separate branch and we start on 2.0 in develop?

@czurnieden
Copy link
Contributor Author

@sjaeckel Yes, I must admit, I like this one most, too.

@minad

Can we discuss this later for 2.0 and concentrate on finishing 1.2 now?

Don't worry, we're still threadsafe.

Ok, I will do. If nothing remains will you create a separate branch and we start on 2.0 in develop?

I hope you're talking to @sjaeckel here? ;-)

@czurnieden czurnieden force-pushed the radix_size_exact_table branch 2 times, most recently from ae47fef to 393bf3d Compare October 15, 2019 19:13
@czurnieden czurnieden force-pushed the radix_size_exact_table branch 2 times, most recently from b7a420e to bbdd25c Compare October 19, 2019 18:34
@minad
Copy link
Member

minad commented Oct 22, 2019

@czurnieden alternatively to #368 use this, which optimizes the function for base 10 only? And then don't provide mp_radix_overestimate?

@minad
Copy link
Member

minad commented Oct 22, 2019

What do you prefer?

@czurnieden
Copy link
Contributor Author

alternatively to #368 use this, which optimizes the function for base 10 only?

This one has the extra functions for base 10 and powers-of-two and does the rest with mp_log

And then don't provide mp_radix_overestimate?

The original mp_radix_overestimate (#369) is exact, not an overestimate (was an error in the test-rig) so it might be an alternative. Fast (O(1) with little overhead, especially since MP_8BIT is gone) but nearly twice the size.

The second version, #368 , is a really rough one (but can be tamed down with an additional small table) and is, in my opinion, not really an alternative, even with the extra table.

What do you prefer?

I prefer this one: O(1) for base 10 and powers-of-two and the the rest with mp_log (O(log n)). There are not much cases where you need anything more than the bases 2, (8), 16, 10, and 64.
It is slightly bigger than #368 (if #368 has the extra table), though, but not much

And there was the discussion that we restrict LTM to these bases which can be done with this PR quite quickly: just add a test for the bases given and replace the call to mp_log with a call to s_mp_log_power_of_two.

Yes, I really like this one.

And now back to rebaseing *sigh* ;-)

@minad
Copy link
Member

minad commented Oct 22, 2019

I don't want to restrict ltm to only a few bases. But we could offer optimized versions for base 10 for sure.

Is the following a correct summary?

  1. Minimal mp_radix_size_overestimate, very large error #368 - O(1) estimation, small, error < 29?
  2. mp_radix_size replacement O(1) for all bases, large tables #369 - O(1) exact replacement for mp_radix_size, but larger than current mp_log based version
  3. Addition of shortcuts for bases that are powers of two and for base 10 to mp_radix_size #371 - mp_radix_size specialised for base 10 in O(1). Power of two bases already optimized via mp_log.

If we choose 2. or 3. we wouldn't need the overestimate function? This would make the API nicer. And we could still have the slower log fallback available via conditional MP_HAS compilation of that is preferred.

Edit: so it is either 2 or 3.

@minad
Copy link
Member

minad commented Oct 22, 2019

I think I agree with you then - we should take this version.

@czurnieden czurnieden force-pushed the radix_size_exact_table branch from bbdd25c to ae492e0 Compare October 22, 2019 20:25

#define LTM_RADIX_SIZE_SCALE 64
#define LTM_RADIX_SIZE_CONST_SHIFT 32
int s_mp_radix_size_radix_10(const mp_int *a, size_t *size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@czurnieden why don't you add a function s_mp_log10 which is called from mp_log and used indirectly bymp_radix_size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was, and still am, a bit torn between calling that optimization in mp_log or in mp_radix_size.

Calling it in mp_radix_size allows for easy reduction to the restricted radix-set 2,4,8,10,16,32,64 and would also getting rid of the dependency mp_log which isn't a very small function.

Calling it in mp_log is cleaner if we want to keep the full radix-set.

Mmh…
*strokes sesquipedalian beard*
I don't know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, but you can strip down mp_log by disabling s_mp_log and only enabling the power of two and base 10 versions. We had that discussion in #389. I think it should go to mp_log since it is more general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you strip down 'mp_log' you don't have a general log-function anymore.

If you strip down 'mp_radix_size' you still have a radix-size function, just not for the small range of radices 2-64, only for the smaller range of powers-of-two and base 10.

You expect of a log-function that it works over the whole range, with no holes in it.

You expect from a radix-size function to work over a very small range, it might even have holes in it. Restricting the string out/input to only a handfull of bases, sometimes down to only two (10 and 16) won't get many complains (vid. e.g.: printf).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we don't get an optimized log function if we add the base10 optimization only to radix_size. We already have other functions with holes in it if configured as such.

@czurnieden
Copy link
Contributor Author

If we choose 2. or 3. we wouldn't need the overestimate function?

Yepp, exactly.

bn_mp_log_u32.c Outdated
/* SPDX-License-Identifier: Unlicense */

/* Compute log_{base}(a) */
static mp_word s_pow(mp_word base, mp_word exponent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is called mp_log_u32.c in develop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great, knew that this large renaming would get me at some point! ;-)

@minad minad mentioned this pull request Oct 22, 2019
@czurnieden
Copy link
Contributor Author

I think I'll put this to rest, too and bury it beside #401

@czurnieden czurnieden closed this Oct 25, 2019
@minad
Copy link
Member

minad commented Oct 25, 2019

@czurnieden so shall we consider adding mp_radix_size_overestimate again?

@minad minad mentioned this pull request Oct 25, 2019
@czurnieden
Copy link
Contributor Author

so shall we consider adding mp_radix_size_overestimate again?

Back to where we started? Ok.
But which one?

  1. The full table for all bases [-0, +1]
  2. a. The smallest full table with the brutal error [-0,+28,000]
    b. The small full table with the not-so-brutal error [-0,+200] (upper limit approx.)
  3. Powers-of-two (from mp_log?) [±0] and base 10 only [-0,+1]

(You may add one to the upper limit for the sign to skip testing for the sign)

Using the full tables makes only sense when we have a fast number conversion for all bases. I hacked something together to give @MasterDuke17 an example of how he might be able to extend his Barrett_toDecimal. I can clean it up (it's a bit of a mess) but it still will be relatively large.

On the other side: the Barrett_toDecimal seems to work quite well as it is now and is not very large. The only thing left is to enlarge the leafs (4 decimal digits is way too small, 500-600 bits (tunable) seems to be a better cut-off) and maybe make use of the fact that 2 * 5 = 10.

So: shall all bases belong to us, or shall we restrict versions ≥2.0.0 to the small set {2, 4, 8, 10, 16, 32, 64}?

@minad
Copy link
Member

minad commented Oct 26, 2019

We shall not restrict the bases but we can provide faster to_radix/to_radix_overestimate for 10, 2^n. This is what I would like :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants