Vectorize the ASCII check using SSE2 instructions by rhpvorderman · Pull Request #74 · marcelm/dnaio

rhpvorderman · 2022-04-15T08:20:48Z

SSE2 is guaranteed to be present on al AMD64 (x86_64) platforms. So a simple check for such a platform is sufficient to enable the instruction set without running into compile problems.

This increases the ASCII check speed from 20GB/s to 50GB/s. Making our ASCII string cost creation almost free.

rhpvorderman · 2022-04-19T13:03:28Z

Regarding your other point, I disagree. Querying a documented, pre-defined compiler macro is totally fine and in my opinion not worse than relying on platform.machine(), which is at the same level of "universalness", demonstrated by having to check for both "x86_64" and "AMD64".

At least with platform.machine there are only two options. x86_64 and AMD64. There are quite a lot of C compilers out there. GCC, Clang, MSVC, Intel C compiler, AMD Optimizing Compiler etc. So there is definitely going to be more variety in pre-defined compiler macros. There is no standardization in this space at all, so I feel the platform.machine choice is safer.

Point taken for the "documented" part. The macros should be at at least as stable as the platform.machine option.

marcelm · 2022-04-20T08:54:48Z

+        n -= 1;
+    }
+    // Check the most significant bits in the accumulated words and chars.
+    return !(_mm_movemask_epi8(all_words) || (all_chars & ASCII_MASK_1BYTE));


Nice how the movemask instruction is such a good fit here.

It is a very useful instruction. Intrinsic compare functions set the most significant bit too. So if you compare one vector to another you end up with a vector of bytes with the most significant bit set. There is also a popcnt (POPCOUNT) instruction that simply reports the number of set bits. So you can use mm_cmpneq + mm_movemask + popcnt to calculate the hamming distance of a vector in just three instructions.

There is also mm_blend, where you create a new vector from two other vectors, based on a provided mask. Very useful, as this allows branchless programming while still using conditionals (create a mask with a compare function, calculate the two possible result vectors, then select based on the mask). They use this in minimap2 for the alignment algorithm. So that might be interesting for cutadapt.

marcelm · 2022-04-20T08:56:57Z

Looks good now – although I am a bit disappointed that the setup.py isn’t so nice and short anymore ...

Thanks!

rhpvorderman · 2022-04-20T10:42:00Z

Looks good now – although I am a bit disappointed that the setup.py isn’t so nice and short anymore ...

The sacrifices we make for a few % performance gains... What have we become?!

If it makes you feel better you can take a look at the python-isal setup.py ;-). Although that has become slightly less verbose with the move to a pure C extension.

Vectorize the ASCII check using SSE2 instructions

883b6b1

rhpvorderman requested a review from marcelm April 19, 2022 09:54

marcelm reviewed Apr 19, 2022

View reviewed changes

Comment thread src/dnaio/_core.pyx

marcelm reviewed Apr 19, 2022

View reviewed changes

Comment thread setup.py Outdated

Comment thread src/dnaio/_core.pyx

Simplify DEFINE_MACROS condition

a6ff28e

marcelm reviewed Apr 20, 2022

View reviewed changes

Remove outdated comment

44a69b9

marcelm merged commit 3f261a3 into marcelm:main Apr 20, 2022

rhpvorderman deleted the SSE2 branch April 20, 2022 10:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize the ASCII check using SSE2 instructions#74

Vectorize the ASCII check using SSE2 instructions#74
marcelm merged 3 commits into
marcelm:mainfrom
rhpvorderman:SSE2

rhpvorderman commented Apr 15, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rhpvorderman commented Apr 19, 2022

Uh oh!

marcelm Apr 20, 2022

Uh oh!

rhpvorderman Apr 20, 2022

Uh oh!

marcelm commented Apr 20, 2022

Uh oh!

rhpvorderman commented Apr 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rhpvorderman commented Apr 15, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rhpvorderman commented Apr 19, 2022

Uh oh!

marcelm Apr 20, 2022

Choose a reason for hiding this comment

Uh oh!

rhpvorderman Apr 20, 2022

Choose a reason for hiding this comment

Uh oh!

marcelm commented Apr 20, 2022

Uh oh!

rhpvorderman commented Apr 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants