Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with instruction level parallelism in pixelsearch1x.c #40

Open
wants to merge 497 commits into
base: master
Choose a base branch
from

Conversation

wind0204
Copy link
Contributor

@wind0204 wind0204 commented Jan 2, 2024

Please see if this version has better performance than the non-parallel version if it interested you.

  • Dispatch 4 vector operations in each loop to allow a larger throughput in pixelsearch1x.c --I guess a CPU with decode width 5+ would accomplish the same throughput with just 2 vector operations per loop--
  • MOVMSKPS has twice the throughput of PMOVMSKB on AMD Zen2. --I guess it might help with the bottleneck on AMD Zen2--

Best regards.

iseahound and others added 27 commits October 4, 2023 20:42
Finalize ImageSearch1 code to be efficient and bug-free.
Supports transparency properly.
Fast!
…search_loop_could_go_out_of_range

Fix bug where focus search loop could go out of range
Rename pack to iter
Fix comment for pixelsearchall3
…canline_was_skipped

Fix bug where the last scanline was skipped
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants