-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add minlen feature to base62.encode() #7
Conversation
And there's no performance penalty.
return ord(ch) - ord('a') + 36 | ||
else: | ||
try: | ||
return CHARSET.index(ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
이거 좋네요.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
처음엔 문자열 스캔을 없애 성능을 최적화하려고 했었는데, ord()
의 성능이 불투명하기도 하고, 차이가 거의 없을 것 같기도 해서, 차라리 문자열 스캔을 편하게 해버리는 쪽으로 방향을 틀었습니다 ㅎㅎ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
만약 성능이 걱정된다면 다음과 같이 상수 시간으로 접근할 수 있는 자료구조를 만드는 편이 좋을 것 같습니다.
CHARSET_INDEX = {'0': 0, '1': 1, ..., 'A': 10, ..., 'z': 61}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
이흥섭님의 의견:
REVERSE_CHARSET = {v: k for k, v in enumerate(CHARSET)}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ㄴ 저도 이것이 더 좋아보입니다.
@@ -31,22 +31,24 @@ def bytes_to_int(s, byteorder='big', signed=False): | |||
return sum(ds) | |||
|
|||
|
|||
def encode(n): | |||
def encode(n, minlen=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
패딩의 길이가 아니라 인코딩 된 문자열의 최소 길이를 지정하는 인자니까 기본값이 1
이 되어야 하지 않을까요?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ㄴ 문자열의 길이가 0
일 수 있어서요.
@@ -15,6 +15,7 @@ | |||
|
|||
def test_basic(): | |||
assert base62.encode(0) == '0' | |||
assert base62.encode(0, minlen=5) == '00000' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
이것과 비슷하게 decode()
쪽에도 테스트 케이스를 추가했으면 좋겠습니다. 예를 들면,
assert base62.decode('00000') == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
반영하여 추가했습니다.
return ord(ch) - ord('a') + 36 | ||
else: | ||
try: | ||
return CHARSET.index(ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
만약 성능이 걱정된다면 다음과 같이 상수 시간으로 접근할 수 있는 자료구조를 만드는 편이 좋을 것 같습니다.
CHARSET_INDEX = {'0': 0, '1': 1, ..., 'A': 10, ..., 'z': 61}
return ord(ch) - ord('a') + 36 | ||
else: | ||
try: | ||
return CHARSET.index(ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
이흥섭님의 의견:
REVERSE_CHARSET = {v: k for k, v in enumerate(CHARSET)}
This PR will add minlen feature. In other words, it does zero padding for encode output string.
This implementation (and naming) is copied from the famous and widely used python bitcoin library
pybitcointools :
https://github.com/vbuterin/pybitcointools/blob/a82b00686b51677b047098e8968074a783e054a1/bitcoin/py2specials.py