Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize NEON functions required for libjpeg-turbo #646

Open
30 tasks done
nemequ opened this issue Nov 23, 2020 · 7 comments
Open
30 tasks done

optimize NEON functions required for libjpeg-turbo #646

nemequ opened this issue Nov 23, 2020 · 7 comments

Comments

@nemequ
Copy link
Member

nemequ commented Nov 23, 2020

@kleisauke is trying to get libjpeg-turbo working on WASM using SIMDe. Here is a list of functions which aren't implemented yet:

  • vaddhn_s32
  • vld1q_dup_s16
  • vld1q_lane_s16
  • vld2_u8
  • vmlal_lane_s16
  • vmlal_lane_u16
  • vmlsl_lane_s16
  • vmlsl_lane_u16
  • vmull_lane_s16
  • vmull_lane_u16
  • vqdmulh_lane_s16
  • vqdmulhq_lane_s16
  • vqrdmulhq_lane_s16
  • vqrshrn_n_s16
  • vqshluq_n_s16
  • vqshrn_n_s16
  • vrshrn_n_s32
  • vrshrn_n_u16
  • vrshrn_n_u32
  • vshll_n_s16
  • vshrn_n_s32
  • vshrn_n_u16
  • vshrn_n_u32
  • vsriq_n_u16
  • vst2_lane_u16
  • vst2q_u8
  • vst3_lane_u8
  • vst4_lane_u16
  • vst4_lane_u8
  • vsubhn_s32
@kleisauke
Copy link

kleisauke commented Jun 3, 2021

Here's a list of completed functions with their corresponding commits:

@nemequ
Copy link
Member Author

nemequ commented Jun 3, 2021

Thanks for the reminder! I added some more earlier today, and we'll try to get that last one done soon; I think @Glitch18 is planning to take care of it.

@Glitch18
Copy link
Contributor

Glitch18 commented Jun 3, 2021

Yes. Will be pushing the commit soon!

@nemequ
Copy link
Member Author

nemequ commented Jun 3, 2021

BTW, once this is done I'd be very interested in any performance data which could point us to something we might be able to optimize in SIMDe. See https://github.com/simd-everywhere/simde/wiki/Performance-Tuning#finding-performance-problems

@kleisauke
Copy link

kleisauke commented Jun 3, 2021

Great, thanks! I'll re-run the benchmark in test/bench within Chrome/Firefox once this is done. For Node.js, this requires an update of V8 to 9.1 (nodejs/node#38273) to match the renumbered/finalized WASM SIMD opcodes.

@kleisauke
Copy link

vqshluq_n_s16 was implemented with commit 77af9f1, which makes it possible to compile libjpeg-turbo for WebAssembly with SIMD support (by reusing the Arm Neon intrinsics, see commit kleisauke/wasm-vips@acd4c81). 🎉

I'll re-run the benchmarks and post the results soon, feel free to close this issue.

@kleisauke
Copy link

First set of benchmarking/profiling results can be found here: test/bench/README-simde.md.

It seems that reusing the Arm Neon intrinsics for WASM made it ~3.5x slower than its C implementation (on this benchmark). The most number of ticks (>= 10) can be observed in these functions (ordered from high to low):

  • simde_vshlq_u16
  • simde_vld3_u8
  • simde_vld4q_s16
  • simde_vclzq_s16
  • simde_vld1q_lane_s16
  • simde_vtrn1q_s32
  • simde_vtrn2q_s32
  • simde_vtrn1q_s16

Note that libjpeg-turbo is considering a whole new SIMD implementation just for WASM, so please don't spend too much time on this.

@mr-c mr-c changed the title NEON functions required for libjpeg-turbo optimize NEON functions required for libjpeg-turbo Aug 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants