Skip to content

v0.4.0

Latest

Choose a tag to compare

@smu160 smu160 released this 07 Jun 21:43
· 15 commits to main since this release

What's Changed

  • feat: add DIT FFT algorithm & bit reversal control by @smu160 in #37
  • Fix build by @Shnatsel in #39
  • Require gfni target feature for the AVX-512 codepath by @Shnatsel in #41
  • Use a fast, non-secure PRNG in benchmarking harness by @Shnatsel in #43
  • Drop lto=true from Cargo.toml by @Shnatsel in #44
  • don't multiversion on ARM by @Shnatsel in #45
  • perf: add initial cache-blocked dit fft impl by @smu160 in #48
  • Use .as_chunks_mut() instead of .chunks_exact_mut() for better performance by @Shnatsel in #51
  • Benchmarks: Lower criterion sample size to 20 from its default of 100 by @Shnatsel in #52
  • Report throughput in bytes in addition to Melem/s by @Shnatsel in #53
  • multithreading via Rayon by @Shnatsel in #50
  • Move DIF-only function to DIF file by @Shnatsel in #54
  • No explicit inlining by @Shnatsel in #56
  • Use mul_add with a float inversion instead of mul_neg_add by @Shnatsel in #57
  • Revert "Use mul_add with a float inversion instead of mul_neg_add" by @Shnatsel in #59
  • Port to fearless_simd by @Shnatsel in #58
  • fearless_simd + BRAVO by @Shnatsel in #64
  • Turn x << 1 into x * 2 and add comments on all the other uses of << by @Shnatsel in #66
  • Refactor dit planner by @Shnatsel in #67
  • Determine SIMD level in the planner by @Shnatsel in #69
  • Update examples/bench.rs by @Shnatsel in #71
  • Less dynamic dispatch by @Shnatsel in #72
  • Switch to released fearless_simd 0.4.0 by @Shnatsel in #77
  • targeted conversion to a for loop for just enough inlining by @Shnatsel in #79
  • chore: Remove DIF and COBRA implementations by @smu160 in #82
  • Use RustFFT more efficiently in benchmarks for a fair comparison by @Shnatsel in #81
  • Simplify and optimize BRAVO by @Shnatsel in #83
  • Improve standalone benchmarks by @Shnatsel in #84
  • Extend large-size benchmarks to 32 bits by @Shnatsel in #85
  • Double the chunk size for f64 by @Shnatsel in #86
  • README updates by @Shnatsel in #70
  • Options: default to 16k parallelism threshold by @Shnatsel in #87
  • Make planner fields pub(crate) by @Shnatsel in #88
  • Move 'How is it so fast?' down in the README by @Shnatsel in #91
  • Remove wide and multiversion dependencies by @Shnatsel in #90
  • add a benchmark for planner by @Shnatsel in #76
  • Initial work on interleaving performance by @Shnatsel in #94
  • Make BRAVO operating on chunks more explicit by @Shnatsel in #96
  • perf: add a FFT 32 codelet to fuse first 5 stages by @smu160 in #101
  • perf: add tiling to bravo to turn it into co-bravo by @smu160 in #106
  • Speed up random signal generation by removing per-element sqrt and sin_cos by @Shnatsel in #109
  • Use std::hint::black_box for more accurate benchmarks by @Shnatsel in #110
  • Use std::hint::black_box for more accurate small-size benchmarks by @Shnatsel in #111
  • perf: remove extra pass used for inverse transforms by @smu160 in #114
  • PoC: Make the codelets operate entirely in registers by @Shnatsel in #113
  • fix: remove old C FFTW benches, improve methodology, and plots by @smu160 in #117
  • feat: add real valued FFT/IFFT by @smu160 in #105
  • Use interleave() instead of zip_low()/zip_high() for better perf on AVX2 by @Shnatsel in #116
  • release/v0.4.0 by @smu160 in #119

Full Changelog: v0.3.0...v0.4.0