Performance optimizations #53

kolomenkin · 2020-05-26T16:49:56Z

This is a draft PR.

I would like to discuss following questions:

technical solution to cache character mappings for decoding. Is it good enough?
should I keep file for algorithm speed measurement? Should I change something there?

The results of speed measurements are stored in separate files inside of PR:

perf.encode.orig.txt - before optimization
perf.encode.from_bytes.txt -after optimization

perf.decode.orig.txt - before optimization
perf.decode.mapping.txt -after optimization

It was surprize for me that the biggest impact was done for short strings (25 bytes of enthropy, like in bitcoin wallet address). And longer strings got less optimization in percents.

P.S. I'm going to remove redundant files in final PR: editorconfog, gitognore, my naive additional unit tests. And I will clean up the code. And I will create informative commit messages.

keis · 2020-05-27T10:21:40Z

Thanks!

At a quick glance the numbers look promising, I'll go through and add some review comments in the code. The benchmark code already served it's purpose so no need to include that but just an aside do you know of the timeit module from the standard lib? IIRC it does pretty much what your measure function is doing.

The short vs long string is interesting, I'm guessing this is because for long strings the map lookup is a smaller part compared to the number crunching that follows.

keis · 2020-05-27T10:24:30Z

base58/__init__.py

+    # map = array.array('b', [alphabet.find(x) for x in range(256)])
+    # map = bytes(fix_position(alphabet.find(x)) for x in range(256))
+
+    map = GLOBAL_MAP.get(alphabet, None)


rather than dealing with caching in decode lets use functools.lru_cache to abstract this bit away

@lru_cache def get_map(alphabet: string): return bytes(...)

cool! Did not know about lru_cache. This also solves potential problem with too big cache in case somebody will try using numerous alphabets

keis · 2020-05-27T10:29:59Z

base58/__init__.py

@@ -23,6 +24,16 @@
 alphabet = BITCOIN_ALPHABET


+def fix_position(position: int) -> int:
+    return position if position >= 0 else 255


This could be written as position % 256 and be inlined which I think makes as it's about fitting the number into a byte.

keis · 2020-05-27T10:34:20Z

base58/__init__.py

+
+    map = GLOBAL_MAP.get(alphabet, None)
+    if not map:
+        # map = array.array('b', [alphabet.find(x) for x in range(256)])


what's the consideration between array and bytes? is there a difference?

From algorithmic point of view they are very similar for this task.
I have found array was a bit slower in my case. So I used bytes.

kolomenkin · 2020-05-28T15:13:54Z

I have looked at timeit before writing performance test. But I felt like it is needs equal efforts to adopt for it and to write the same manually. And it looked not very flexible.

I agree about the long string optimization results. Big number arithmetics takes most of the time for those cases. And everything else is not so important any more.

keis · 2020-06-06T13:33:44Z

Hey @kolomenkin did you have a chance to have another look at this? It would be really good to have this landed in master and there's no real blocker

kolomenkin · 2020-06-08T20:54:43Z

Hey @kolomenkin did you have a chance to have another look at this? It would be really good to have this landed in master and there's no real blocker

Sorry, I was busy.
Now I have updated and cleaned up the pull request.

base58/__init__.py

keis · 2020-06-11T20:21:23Z

Everything looks good to me, merged 🥳

Thanks again for all the work you put into this!

This was referenced May 26, 2020

performance issue with str.index() inside of the loop #52

Closed

The code may be simplified using int.from_bytes() #51

Closed

keis reviewed May 27, 2020

View reviewed changes

keis mentioned this pull request Jun 5, 2020

Add more Fault Tolerance #54

Closed

add .gitignore file

a8343e7

kolomenkin force-pushed the performance_optimization branch from 6af672b to f6e3f21 Compare June 8, 2020 20:53

optimize b58encode performance

911b56b

kolomenkin force-pushed the performance_optimization branch from f6e3f21 to eecbb08 Compare June 8, 2020 21:38

kolomenkin commented Jun 8, 2020

View reviewed changes

base58/__init__.py Outdated Show resolved Hide resolved

keis reviewed Jun 9, 2020

View reviewed changes

base58/__init__.py Outdated Show resolved Hide resolved

kolomenkin changed the title ~~Performance optimizations - draft PR~~ Performance optimizations Jun 11, 2020

kolomenkin force-pushed the performance_optimization branch from eecbb08 to 8cd2773 Compare June 11, 2020 14:59

optimize b58decode performance

4e283f0

kolomenkin force-pushed the performance_optimization branch from 8cd2773 to 4e283f0 Compare June 11, 2020 19:57

keis merged commit 13080e2 into keis:master Jun 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance optimizations #53

Performance optimizations #53

kolomenkin commented May 26, 2020 •

edited

Loading

keis commented May 27, 2020

keis May 27, 2020

kolomenkin May 28, 2020 •

edited

Loading

keis May 27, 2020

keis May 27, 2020

kolomenkin May 28, 2020 •

edited

Loading

kolomenkin commented May 28, 2020 •

edited

Loading

keis commented Jun 6, 2020

kolomenkin commented Jun 8, 2020

keis commented Jun 11, 2020

Performance optimizations #53

Performance optimizations #53

Conversation

kolomenkin commented May 26, 2020 • edited Loading

keis commented May 27, 2020

keis May 27, 2020

Choose a reason for hiding this comment

kolomenkin May 28, 2020 • edited Loading

Choose a reason for hiding this comment

keis May 27, 2020

Choose a reason for hiding this comment

keis May 27, 2020

Choose a reason for hiding this comment

kolomenkin May 28, 2020 • edited Loading

Choose a reason for hiding this comment

kolomenkin commented May 28, 2020 • edited Loading

keis commented Jun 6, 2020

kolomenkin commented Jun 8, 2020

keis commented Jun 11, 2020

kolomenkin commented May 26, 2020 •

edited

Loading

kolomenkin May 28, 2020 •

edited

Loading

kolomenkin May 28, 2020 •

edited

Loading

kolomenkin commented May 28, 2020 •

edited

Loading