Releases: smu160/PhastFT
Releases · smu160/PhastFT
v0.4.0
What's Changed
- feat: add DIT FFT algorithm & bit reversal control by @smu160 in #37
- Fix build by @Shnatsel in #39
- Require gfni target feature for the AVX-512 codepath by @Shnatsel in #41
- Use a fast, non-secure PRNG in benchmarking harness by @Shnatsel in #43
- Drop lto=true from Cargo.toml by @Shnatsel in #44
- don't multiversion on ARM by @Shnatsel in #45
- perf: add initial cache-blocked dit fft impl by @smu160 in #48
- Use
.as_chunks_mut()instead of.chunks_exact_mut()for better performance by @Shnatsel in #51 - Benchmarks: Lower criterion sample size to 20 from its default of 100 by @Shnatsel in #52
- Report throughput in bytes in addition to Melem/s by @Shnatsel in #53
- multithreading via Rayon by @Shnatsel in #50
- Move DIF-only function to DIF file by @Shnatsel in #54
- No explicit inlining by @Shnatsel in #56
- Use mul_add with a float inversion instead of mul_neg_add by @Shnatsel in #57
- Revert "Use mul_add with a float inversion instead of mul_neg_add" by @Shnatsel in #59
- Port to fearless_simd by @Shnatsel in #58
- fearless_simd + BRAVO by @Shnatsel in #64
- Turn
x << 1intox * 2and add comments on all the other uses of << by @Shnatsel in #66 - Refactor dit planner by @Shnatsel in #67
- Determine SIMD level in the planner by @Shnatsel in #69
- Update examples/bench.rs by @Shnatsel in #71
- Less dynamic dispatch by @Shnatsel in #72
- Switch to released fearless_simd 0.4.0 by @Shnatsel in #77
- targeted conversion to a for loop for just enough inlining by @Shnatsel in #79
- chore: Remove DIF and COBRA implementations by @smu160 in #82
- Use RustFFT more efficiently in benchmarks for a fair comparison by @Shnatsel in #81
- Simplify and optimize BRAVO by @Shnatsel in #83
- Improve standalone benchmarks by @Shnatsel in #84
- Extend large-size benchmarks to 32 bits by @Shnatsel in #85
- Double the chunk size for f64 by @Shnatsel in #86
- README updates by @Shnatsel in #70
- Options: default to 16k parallelism threshold by @Shnatsel in #87
- Make planner fields pub(crate) by @Shnatsel in #88
- Move 'How is it so fast?' down in the README by @Shnatsel in #91
- Remove
wideandmultiversiondependencies by @Shnatsel in #90 - add a benchmark for planner by @Shnatsel in #76
- Initial work on interleaving performance by @Shnatsel in #94
- Make BRAVO operating on chunks more explicit by @Shnatsel in #96
- perf: add a FFT 32 codelet to fuse first 5 stages by @smu160 in #101
- perf: add tiling to bravo to turn it into co-bravo by @smu160 in #106
- Speed up random signal generation by removing per-element sqrt and sin_cos by @Shnatsel in #109
- Use std::hint::black_box for more accurate benchmarks by @Shnatsel in #110
- Use std::hint::black_box for more accurate small-size benchmarks by @Shnatsel in #111
- perf: remove extra pass used for inverse transforms by @smu160 in #114
- PoC: Make the codelets operate entirely in registers by @Shnatsel in #113
- fix: remove old C FFTW benches, improve methodology, and plots by @smu160 in #117
- feat: add real valued FFT/IFFT by @smu160 in #105
- Use interleave() instead of zip_low()/zip_high() for better perf on AVX2 by @Shnatsel in #116
- release/v0.4.0 by @smu160 in #119
Full Changelog: v0.3.0...v0.4.0
v0.3.0
v0.2.2
v0.2.1
v0.2.0
What's Changed
- Updated benchmark instructions and fixed typos by @dmtrinh in #10
- Swapped out use of sincos() for more portability across platforms by @dmtrinh in #11
- Bump black from 24.1.1 to 24.3.0 in /benches by @dependabot in #19
- Add support for
f32, as well asf64by @smu160 in #17 - Hotfix by @smu160 in #20
- Add automatic CPU feature detection by @calebzulawski in #21
New Contributors
- @dmtrinh made their first contribution in #10
- @calebzulawski made their first contribution in #21
Full Changelog: v0.1.1...v0.2.0