Skip to content
This repository has been archived by the owner on Mar 9, 2024. It is now read-only.

Poor performance of public_address() and public_view_key() #42

Closed
moneroexamples opened this issue Jan 29, 2019 · 14 comments
Closed

Poor performance of public_address() and public_view_key() #42

moneroexamples opened this issue Jan 29, 2019 · 14 comments

Comments

@moneroexamples
Copy link
Contributor

moneroexamples commented Jan 29, 2019

I'm using your project (good work by the way!) to generate hundreds/thousands of addresses and get their public address and private keys. But their generation is, unexpectedly, very slow. Some basic tests in python using timeit on Intel Core i7-3820 @ 3.8GHz:

In [1]: from monero import seed

In [2]: s = seed.Seed()

In [3]: %timeit s.public_address()
1.08 s ± 9.99 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit s.public_view_key()
544 ms ± 9.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: %timeit s.public_spend_key()
540 ms ± 3.53 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Any reason for this poor performance and any way/fix to make it faster? Generation of thousands addresses and keys on regular basis with that speed (basically one every 1 to 2 seconds) is slow.

@lalanza808
Copy link
Contributor

Can't offer much insight in the speed of the specific generation, but can you tell me how are you generating the hundreds/thousands of addresses? A simple for loop?

@moneroexamples
Copy link
Contributor Author

Hi. Very basic loop (for now):

    n = 1000
    for i in range(n):
        print(f"{i+1}/{n}")

        s = monero.seed.Seed()
        s_address = s.public_address(net_type)
        s_viewkey = s.public_view_key()
    
        # other processing of the generated address, viewkey, e.g.
        print(s_address, s_viewkey) 

The bottleneck are these two lines:

        s_address = s.public_address(net_type)
        s_viewkey = s.public_view_key()

@lalanza808
Copy link
Contributor

Even if the address generation took half the time, doing it hundreds or thousands of times per X will be time consuming without parallelism. This could be a good next step:

import concurrent.futures
from monero import seed
from time import time


# Times to loop
loop_count = 100


def generate_seed():
    s = seed.Seed()
    s_address = s.public_address()
    print(f"Generated new address: {s_address}")
    return s_address

def main():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        futures = []
        for i in range(loop_count):
            future = executor.submit(generate_seed)
            futures.append(future)


if __name__ == "__main__":
    t_start = time()
    main()
    t_end = time()
    t_total = t_end - t_start
    print(f"\n\n[+] {t_total} seconds elapsed generating {loop_count} addresses.")
$ python -V
Python 3.6.0

@lalanza808
Copy link
Contributor

Output of script:

... snip ...
[+] 30.121598958969116 seconds elapsed generating 100 addresses.

@moneroexamples
Copy link
Contributor Author

Thanks. I will look into it. Nevertheless, it still would be intresting to know why it is so slow. If not now, then then in the long term, it would be valuable.

@emesik
Copy link
Contributor

emesik commented Jan 29, 2019

It's slow because neither our ed25519 implementation is optimal, nor handling of its outputs is.

It's pure Python and it was designed to serve the most common use cases in the first row, namely having a working wallet with full features, not optimized for horizontal scaling. For example, the keys are internally kept as string hex representations because it's human-readable. However, for computations it would be much faster to keep the int (or even a point representation for public keys) and not convert (hexlify/unhexlify) each time.

The best solution would be to make some optimizations in Python code but also switch over to a low-level implementation of ed25519 arithmetic, like libsodium or orlp's. However, that requires creating Python/C bindings and dealing with all new set of problems typical to low-level solutions (building on different platforms, falling back to pure Python when lib is unavaliable, etc.)

Any help from a person experienced in this kind of work would be very welcome.

@moneroexamples
Copy link
Contributor Author

moneroexamples commented Jan 30, 2019

I looked into the code. I see the problem is that ed25519.public_from_secret_hex() takes 0.5 seconds, due to scalarmult taking 0.5 seconds (its a recursive function). At the moment I can't look into making it better, assuming there is a way of doing it better?. Could be good exercise for future.

But, I also found that ed25519.public_from_secret_hex() are execute twice, for no apparent reason. In this code example

 s = seed.Seed()
s.public_address()
s.public_spend_key()
s.public_view_key()

ed25519.public_from_secret_hex() is going to be executed 4 times, adding up to 2 seconds of execution time! The quick fix that I implemented for myself now, is to calculate it only when necessary and store the calculated value for future use. One could do this:

    def public_spend_key(self):
        if self.ed_pub_spend_key:       
            return self.ed_pub_spend_key    
        self.ed_pub_spend_key = ed25519.public_from_secret_hex(self.secret_spend_key())
        return self.ed_pub_spend_key    
    
    def public_view_key(self):
        if self.ed_pub_view_key:        
            return self.ed_pub_view_key     
        self.ed_pub_view_key =  ed25519.public_from_secret_hex(self.secret_view_key())
        return self.ed_pub_view_key 

where self.ed_pub_spend_key and self.ed_pub_view_key are initialized to None in the __init__. This will cut the execution number of ed25519.public_from_secret_hex to 2 in the code snippet from before, and reduce time by 50%.

@emesik
Copy link
Contributor

emesik commented Jan 30, 2019

Could you create a PR for that value caching? Just please prepend underscore to the attribute names (self._ed_*_key) so they won't tempt to be accessed directly. The existing tests should cover that code part already.

PS: I'm pretty sure that recursion in scalarmult can be unrolled. We should look at existing implementations and check how to do it, or just use one if possible.

@moneroexamples
Copy link
Contributor Author

@emesik
Sure. I will submit PR once ready.

Regarding scalarmult, I also checked how generation of addresses and keys perform on pypy3. Often its much faster then cpython when loops (recursions as well?) are involved. But it was worse. Guess is that some parts (pysha3) use C, and pypy does not perform well when C extensions are involved.

@emesik
Copy link
Contributor

emesik commented Jan 31, 2019

Thanks!

@moneroexamples
Copy link
Contributor Author

No problem.

@moneroexamples
Copy link
Contributor Author

Little updated.

Ended up using the address generator in multiprocessing.Pool to speed it up and also wrapped it in asyncio as all my code was asyncio. With these I could efficiently work with address generation.

As a side note. The issue originated from development of short asyncio program for stress-testing openmonero. The script utilizes monero-python and it is here is someone is interested in it:
https://github.com/moneroexamples/openmonero/blob/devel/scripts/batch_import_test.py

@sanderfoobar
Copy link

sanderfoobar commented May 18, 2019

I managed to go from seed->public address in 0.091552 seconds using wallet_api2 directly on a single thread and making sure major:minor lookahead was set to 1:1.

This is offtopic for monero-python however.

@emesik
Copy link
Contributor

emesik commented Nov 20, 2019

With 9aee656 I have changed the ed25519 implementation to one that cut the test suite running time by two orders of magnitude.

Integrating low-level C/C++ code from monero or libsodium could potentially cut the times further, however we still stay in pure Python world, so I stopped looking for other solutions ATM.

@emesik emesik closed this as completed Nov 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants