Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it normal that to_string() for bigint takes so much(8-10 mins)? #79

Open
Moldoteck opened this issue Jan 20, 2019 · 14 comments
Open

Comments

@Moldoteck
Copy link

Well, i have an big int with ~50K digits. And when i want to print it/write it to file, to_string() takes very much. Is there a way to boost this?
P.s. i'm using --release
P.P.s this number is result for calculating factorial(1000000)

@cuviper
Copy link
Member

cuviper commented Jan 24, 2019

Converting such a large binary number to decimal requires a lot of division by 10 (or powers of 10). I measured 560 seconds (9m20s) for this number from BigUint. For comparison, the same in Python took me 384 seconds (6m24s) -- same order of magnitude, but we do have room for improvement.

It's much faster to use hexadecimal -- less than 10ms.

@Moldoteck
Copy link
Author

The ideea is that:
When i do something like
write!(filename,{:?},mynum)
It is almost instant, but it writes the number as json like many numbers separated by comma. It is possible to take advantage from this and print number faster by iterating the data array?

@cuviper
Copy link
Member

cuviper commented Jan 24, 2019

We don't directly expose the data array, because we may change the internal details in the future, like choosing a different word size. However, if you enable the "serde" feature of num-bigint, you can use serde_json for this:
https://docs.rs/serde_json/1.0.37/serde_json/#creating-json-by-serializing-data-structures

This is quite fast, and also has the advantage that you can simply deserialize it. Any other serde format will work for this too.

@programmerjake
Copy link

decimal conversion takes so long because bigint doesn't use an efficient algorithm.
when using bc from GNU coreutils (which I think uses libgmp), printing 2^166096 (close to 10^50000) takes about a second on my smartphone.

using the Schönhage–Strassen algorithm for multiplication with Newton's method for division along with recursive subdivision for the base conversion should improve the algorithm to O(n * log(n)^j * log(log(n))^k) running time where j and k are small numbers that I won't bother to compute.

From what I recall, the currently implemented algorithm has O(n^3) running time, which is much worse (around 10000^3 operations, which would take around 300s at 3Gops/s).

@cuviper
Copy link
Member

cuviper commented Aug 25, 2019

PRs welcome!

@chirsz-ever
Copy link

decimal conversion takes so long because bigint doesn't use an efficient algorithm.
when using bc from GNU coreutils (which I think uses libgmp), printing 2^166096 (close to 10^50000) takes about a second on my smartphone.

I read the source code of bc then I found that the reason why it outputs numbers so fast is that bc uses decimal digits representation internally...

@programmerjake
Copy link

decimal conversion takes so long because bigint doesn't use an efficient algorithm.
when using bc from GNU coreutils (which I think uses libgmp), printing 2^166096 (close to 10^50000) takes about a second on my smartphone.

I read the source code of bc then I found that the reason why it outputs numbers so fast is that bc uses decimal digits representation internally...

Well, if bc does use decimal, then all the time would be spent on converting 2^166096 to decimal in the process of calculating that power of 2, so it wouldn't be much faster.

I just tried using libgmp (through python3 with gmpy2), converting 2^33219280 (has exactly 10,000,000 decimal digits) takes around 2s on my desktop (Ryzen 5 2600).

from gmpy2 import mpz
len((mpz(1) << 33219280).digits())

I know for a fact that libgmp uses either base 2^32 or 2^64 internally.

@tczajka
Copy link
Contributor

tczajka commented Oct 15, 2020

This will get fixed by #169.

@Moldoteck
Copy link
Author

This will get fixed by #169.

Thanks 👍

@seanyoung
Copy link

For display purposes, we often types we don't care about the exact value, but we want a visual representation of some sort.For example I'm using bigint in solang, and if the source code is:

contract C {
    uint public a = 0x42 << 0x1000;
}

The error message is:

error: value 68929666173268065441655678907297209250277640437127329505359409896740957926122792152022569587709838850498747043152991347307179846920262378079373581798190108344433160451884859890463864948869325408752252620782492450940837043400249041214374089166742716754789321974583044468704747216847828201554886533717469914441337972474197993872495771592557222079572243926566348973999761965241895651054178391269138225751793146952429238097003702352676784449795583594206767356612476729034555526070556182110034136368176157825790350818918680381187579004110590505254414061021345580607020527860296286314915887827214755800931314516484844723571229390274210314498301135085782169058787061833048081352669155166748440562902015274828003678438834758235744983872502940358864173884374362418421893050854849978562010362466242620085674985353278126304735758052063795187738140115584115393670010597080656489254414544708321337728396892217877211121649029592465037162750628525088209561398145517576740389232774117638701953386535741992764967317125237969417691252073510511769718651225882588926353299502641873735260637594404246088340952162161101156716117693793415175187367098516802584372524774806321669598085959288834521960432288439734685792771594710206781956091110750466608176562176 does not fit into type uint256.

How about a to_string_abbreviated() which prints something like:

  • 68929666173e500
  • 68929666173..(500 digits)..6608176562176
  • 68929666173..(abbreviated)..6608176562176

Not sure on the exact format - suggestions welcome.

@estebank
Copy link

Would the project consider accepting a PR that changes the BigInt type to be something along the lines of

pub struct BigInt<T: BigIntFormatting = Short> {
    sign: Sign,
    data: BigUint,
    formatter: PhantomData<T>,
}

In order to provide a "fast" default that prints normally only up until an acceptable size and uses scientific notation beyond that, but users still retain the ability to opt-into the current behavior relatively easily. I find this slightly better than .to_string_abbreviated() because it protects the unaware without overly restricting them.

All of this wouldn't be needed if Rust had custom formatting specifiers, but that's not the Rust we have today :)

@cuviper
Copy link
Member

cuviper commented Sep 27, 2023

I think a type parameter would be way too intrusive. I also think that defaulting to an abbreviated form would be a surprising change for the "unaware" on the other end of the spectrum.

@estebank
Copy link

@cuviper would the prior suggestion of a new method with conditional formatting predicated on a length check be acceptable instead?

@cuviper
Copy link
Member

cuviper commented Mar 5, 2024

Let's maybe start with an abbreviated form, and leave conditionals external for now. I fear that might still be somewhat slow to find the upper decimal digits, but maybe it can discard a lot with large divisors. I'm open to experimentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants