[mypyc] Add efficient librt.base64.b64decode #20263

JukkaL · 2025-11-19T16:06:49Z

The performance can be 10x faster than stdlib if input is valid base64, or if input has extra non-base64 characters only at the end of input. Similar to the base64 encode implementation I added recently, this uses SIMD instructions when available.

The implementation first tries to decode the input optimistically assuming valid base64. If this fails, we'll perform a slow path with a preprocessing step that removes extra characters, and we'll perform a strict base64 decode on the cleaned up input.

The semantics aren't 100% compatible with stdlib. First, we raise ValueError on invalid padding instead of binascii.Error, since I don't want a runtime dependency on the unrelated abinascii module. This needs to be documented, but stdlib can already raise ValueError on other conditions, so the deviation is not huge. Also, some invalid inputs are checked more strictly for padding violations. The stdlib implementation has some mysterious behaviors with invalid inputs that didn't seem worth replicating.

The function only accepts a single ASCII str or bytes argument for now, since that seems to be by the far the most common use case. The stdlib function also accepts buffer objects and a validate argument.

The slow path is still somewhat faster than stdlib (on the order of 1.3x to 2x for longer inputs), at least if the input is much smaller than L1 cache size.

Got the initial fast path implementation from ChatGPT, but did a bunch of manual edits afterwards and reviewed carefully.

p-sawicki · 2025-11-19T16:12:18Z

mypyc/lib-rt/librt_base64.c

+    return ((c >= 'A' && c <= 'Z') | (c >= 'a' && c <= 'z') |
+            (c >= '0' && c <= '9') | (c == '+') | (c == '/') | (allow_padding && c == '='));


is there a reason to mix logical && and bitwise |?

No particularly good reason, mainly to highlight that these don't need branches at runtime (we don't want many mispredicted branches). I think this was from ChatGPT output and I thought it was okay in this use case. The semantics are identical so it's probably better to use && consistently though.

I did some experiments, and using || might help compilers generate faster code, so I will use it.

github-actions · 2025-11-19T17:07:53Z

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

Forgot to add this in #20263.

JukkaL added 11 commits November 19, 2025 10:58

Add b64decode and some test coverage (not full)

1336ada

Update stub

b7643e6

Improve tests

e4501f9

Accept ascii str arguments

3655416

Update b64decode stub

aeca40a

Filter out invalid base64 characters

836c5f3

Check for invalid padding

cc45fde

Test more

62666f4

Clean up implementation

65fd305

Update docstrings

6349a39

Fix typo

3d09104

This comment has been minimized.

Sign in to view

p-sawicki approved these changes Nov 19, 2025

View reviewed changes

Refactor function based on review comment

ca42d33

JukkaL merged commit 35e843c into master Nov 19, 2025
21 checks passed

JukkaL deleted the mypyc-base64-4-decode branch November 19, 2025 17:18

JukkaL added a commit that referenced this pull request Nov 20, 2025

[mypyc] Add primitive for librt.base64.b64decode

5f8faca

Forgot to add this in #20263.

JukkaL mentioned this pull request Nov 20, 2025

[mypyc] Add primitive for librt.base64.b64decode #20272

Merged

JukkaL added a commit that referenced this pull request Nov 20, 2025

[mypyc] Add primitive for librt.base64.b64decode (#20272)

89782cc

Forgot to add this in #20263.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[mypyc] Add efficient librt.base64.b64decode #20263

[mypyc] Add efficient librt.base64.b64decode #20263

JukkaL commented Nov 19, 2025 •

edited

Loading

Uh oh!

This comment has been minimized.

p-sawicki Nov 19, 2025

Uh oh!

JukkaL Nov 19, 2025

Uh oh!

JukkaL Nov 19, 2025

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return ((c >= 'A' && c <= 'Z') \| (c >= 'a' && c <= 'z') \|
		(c >= '0' && c <= '9') \| (c == '+') \| (c == '/') \| (allow_padding && c == '='));

Uh oh!

[mypyc] Add efficient librt.base64.b64decode #20263

[mypyc] Add efficient librt.base64.b64decode #20263

Conversation

JukkaL commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

p-sawicki Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

JukkaL Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

JukkaL Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JukkaL commented Nov 19, 2025 •

edited

Loading