Release v0.4.0 · smu160/PhastFT

What's Changed

feat: add DIT FFT algorithm & bit reversal control by @smu160 in #37
Fix build by @Shnatsel in #39
Require gfni target feature for the AVX-512 codepath by @Shnatsel in #41
Use a fast, non-secure PRNG in benchmarking harness by @Shnatsel in #43
Drop lto=true from Cargo.toml by @Shnatsel in #44
don't multiversion on ARM by @Shnatsel in #45
perf: add initial cache-blocked dit fft impl by @smu160 in #48
Use .as_chunks_mut() instead of .chunks_exact_mut() for better performance by @Shnatsel in #51
Benchmarks: Lower criterion sample size to 20 from its default of 100 by @Shnatsel in #52
Report throughput in bytes in addition to Melem/s by @Shnatsel in #53
multithreading via Rayon by @Shnatsel in #50
Move DIF-only function to DIF file by @Shnatsel in #54
No explicit inlining by @Shnatsel in #56
Use mul_add with a float inversion instead of mul_neg_add by @Shnatsel in #57
Revert "Use mul_add with a float inversion instead of mul_neg_add" by @Shnatsel in #59
Port to fearless_simd by @Shnatsel in #58
fearless_simd + BRAVO by @Shnatsel in #64
Turn x << 1 into x * 2 and add comments on all the other uses of << by @Shnatsel in #66
Refactor dit planner by @Shnatsel in #67
Determine SIMD level in the planner by @Shnatsel in #69
Update examples/bench.rs by @Shnatsel in #71
Less dynamic dispatch by @Shnatsel in #72
Switch to released fearless_simd 0.4.0 by @Shnatsel in #77
targeted conversion to a for loop for just enough inlining by @Shnatsel in #79
chore: Remove DIF and COBRA implementations by @smu160 in #82
Use RustFFT more efficiently in benchmarks for a fair comparison by @Shnatsel in #81
Simplify and optimize BRAVO by @Shnatsel in #83
Improve standalone benchmarks by @Shnatsel in #84
Extend large-size benchmarks to 32 bits by @Shnatsel in #85
Double the chunk size for f64 by @Shnatsel in #86
README updates by @Shnatsel in #70
Options: default to 16k parallelism threshold by @Shnatsel in #87
Make planner fields pub(crate) by @Shnatsel in #88
Move 'How is it so fast?' down in the README by @Shnatsel in #91
Remove wide and multiversion dependencies by @Shnatsel in #90
add a benchmark for planner by @Shnatsel in #76
Initial work on interleaving performance by @Shnatsel in #94
Make BRAVO operating on chunks more explicit by @Shnatsel in #96
perf: add a FFT 32 codelet to fuse first 5 stages by @smu160 in #101
perf: add tiling to bravo to turn it into co-bravo by @smu160 in #106
Speed up random signal generation by removing per-element sqrt and sin_cos by @Shnatsel in #109
Use std::hint::black_box for more accurate benchmarks by @Shnatsel in #110
Use std::hint::black_box for more accurate small-size benchmarks by @Shnatsel in #111
perf: remove extra pass used for inverse transforms by @smu160 in #114
PoC: Make the codelets operate entirely in registers by @Shnatsel in #113
fix: remove old C FFTW benches, improve methodology, and plots by @smu160 in #117
feat: add real valued FFT/IFFT by @smu160 in #105
Use interleave() instead of zip_low()/zip_high() for better perf on AVX2 by @Shnatsel in #116
release/v0.4.0 by @smu160 in #119

Full Changelog: v0.3.0...v0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Contributors

Uh oh!