Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Adds missing extract implementations for AVX512. #2926

Merged
merged 1 commit into from Jan 25, 2022

Conversation

rrahn
Copy link
Contributor

@rrahn rrahn commented Jan 7, 2022

This PR adds the missing implementations for the family of extract functions specific to the AVX512 instruction set, as they are much faster then the default implementation.

A bit on the background:
These functions allow to extract only a part (half, i.e. 2x256 bit, quarter, i.e. 4x128 bit, or eighth, i.e. 8x64 bit) of the given simd vector and are used in combination with the upcast function in order to sign/zero extend a simd vector to a set of simd vectors with larger operand types, e.g. one vint_8x64_t vector to 2 vint_16x32_t vectors, with 8 (16) being the bit-size of the operands and 64 (32) being the number of operands that fit into a single __m512i vector type.

The extracted bytes will be placed in the lower bits of the target simd vector. There is no dedicated intrinsic to extract only 64 bits, so I had to emulate it with the intrinsic to extract 128 bit and using an additional shuffle operation for the uneven indices. This shuffle operation exchanges the higher 64 bits with the lower 64 bits of each of the four 128 bit lanes of the source simd vector. When upcasting this vector, only the first 64 bits are considered (Note this is only needed when going from 8-bit operands to 64-bit operands)

@rrahn rrahn requested review from a team and remyschwab and removed request for a team January 7, 2022 14:47
@vercel
Copy link

vercel bot commented Jan 7, 2022

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/seqan/seqan3/35cna2ZNEWZn9HFawYVQjC8ckpJn
✅ Preview: https://seqan3-git-fork-rrahn-feature-avx512extract-seqan.vercel.app

@codecov
Copy link

codecov bot commented Jan 7, 2022

Codecov Report

Merging #2926 (c7b45d8) into master (0f5fb9d) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2926   +/-   ##
=======================================
  Coverage   98.28%   98.28%           
=======================================
  Files         267      267           
  Lines       11462    11462           
=======================================
  Hits        11265    11265           
  Misses        197      197           
Impacted Files Coverage Δ
include/seqan3/utility/simd/algorithm.hpp 100.00% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f5fb9d...c7b45d8. Read the comment docs.

@rrahn rrahn requested review from a team and smehringer and removed request for a team January 20, 2022 14:39
Copy link
Member

@smehringer smehringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about tests for the new functions?

Comment on lines +263 to +265
#if defined(__AVX512DQ__)
template <uint8_t index, simd::simd_concept simd_t>
constexpr simd_t extract_quarter_avx512(simd_t const & src);
constexpr simd_t extract_quarter_avx512(simd_t const & src)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are extract_quarter/eighth_avx512 wrapped in #if defined(__AVX512DQ__) but extract_half_avx512 isn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests existed already (for SSE and AVX2). I ran the tests on icebear with AVX512 support to check the results.

The additional check is because the extract quarter requires an intrinsic that is only available by the AVX-512 CD subset. Not all AVX512 platforms might have this additional intrinsics subset, but still you can use other AVX512 intrinsics. The extract half doesn't need it because it only works with AVX512-F which is the foundational intrinsics set used by all that support AVX512.

@smehringer smehringer merged commit e1728f5 into seqan:master Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants