-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance optimizations #53
Conversation
Thanks! At a quick glance the numbers look promising, I'll go through and add some review comments in the code. The benchmark code already served it's purpose so no need to include that but just an aside do you know of the timeit module from the standard lib? IIRC it does pretty much what your measure function is doing. The short vs long string is interesting, I'm guessing this is because for long strings the map lookup is a smaller part compared to the number crunching that follows. |
base58/__init__.py
Outdated
# map = array.array('b', [alphabet.find(x) for x in range(256)]) | ||
# map = bytes(fix_position(alphabet.find(x)) for x in range(256)) | ||
|
||
map = GLOBAL_MAP.get(alphabet, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than dealing with caching in decode lets use functools.lru_cache
to abstract this bit away
@lru_cache
def get_map(alphabet: string):
return bytes(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool! Did not know about lru_cache. This also solves potential problem with too big cache in case somebody will try using numerous alphabets
base58/__init__.py
Outdated
@@ -23,6 +24,16 @@ | |||
alphabet = BITCOIN_ALPHABET | |||
|
|||
|
|||
def fix_position(position: int) -> int: | |||
return position if position >= 0 else 255 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be written as position % 256
and be inlined which I think makes as it's about fitting the number into a byte.
base58/__init__.py
Outdated
|
||
map = GLOBAL_MAP.get(alphabet, None) | ||
if not map: | ||
# map = array.array('b', [alphabet.find(x) for x in range(256)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the consideration between array and bytes? is there a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From algorithmic point of view they are very similar for this task.
I have found array was a bit slower in my case. So I used bytes.
I have looked at timeit before writing performance test. But I felt like it is needs equal efforts to adopt for it and to write the same manually. And it looked not very flexible. I agree about the long string optimization results. Big number arithmetics takes most of the time for those cases. And everything else is not so important any more. |
Hey @kolomenkin did you have a chance to have another look at this? It would be really good to have this landed in master and there's no real blocker |
6af672b
to
f6e3f21
Compare
Sorry, I was busy. |
f6e3f21
to
eecbb08
Compare
eecbb08
to
8cd2773
Compare
8cd2773
to
4e283f0
Compare
Everything looks good to me, merged 🥳 Thanks again for all the work you put into this! |
This is a draft PR.
I would like to discuss following questions:
The results of speed measurements are stored in separate files inside of PR:
perf.encode.orig.txt - before optimization
perf.encode.from_bytes.txt -after optimization
perf.decode.orig.txt - before optimization
perf.decode.mapping.txt -after optimization
It was surprize for me that the biggest impact was done for short strings (25 bytes of enthropy, like in bitcoin wallet address). And longer strings got less optimization in percents.
P.S. I'm going to remove redundant files in final PR: editorconfog, gitognore, my naive additional unit tests. And I will clean up the code. And I will create informative commit messages.