Releases · smu160/PhastFT · GitHub

07 Jun 21:43

smu160

v0.4.0 Latest

Latest

What's Changed

feat: add DIT FFT algorithm & bit reversal control by @smu160 in #37
Fix build by @Shnatsel in #39
Require gfni target feature for the AVX-512 codepath by @Shnatsel in #41
Use a fast, non-secure PRNG in benchmarking harness by @Shnatsel in #43
Drop lto=true from Cargo.toml by @Shnatsel in #44
don't multiversion on ARM by @Shnatsel in #45
perf: add initial cache-blocked dit fft impl by @smu160 in #48
Use .as_chunks_mut() instead of .chunks_exact_mut() for better performance by @Shnatsel in #51
Benchmarks: Lower criterion sample size to 20 from its default of 100 by @Shnatsel in #52
Report throughput in bytes in addition to Melem/s by @Shnatsel in #53
multithreading via Rayon by @Shnatsel in #50
Move DIF-only function to DIF file by @Shnatsel in #54
No explicit inlining by @Shnatsel in #56
Use mul_add with a float inversion instead of mul_neg_add by @Shnatsel in #57
Revert "Use mul_add with a float inversion instead of mul_neg_add" by @Shnatsel in #59
Port to fearless_simd by @Shnatsel in #58
fearless_simd + BRAVO by @Shnatsel in #64
Turn x << 1 into x * 2 and add comments on all the other uses of << by @Shnatsel in #66
Refactor dit planner by @Shnatsel in #67
Determine SIMD level in the planner by @Shnatsel in #69
Update examples/bench.rs by @Shnatsel in #71
Less dynamic dispatch by @Shnatsel in #72
Switch to released fearless_simd 0.4.0 by @Shnatsel in #77
targeted conversion to a for loop for just enough inlining by @Shnatsel in #79
chore: Remove DIF and COBRA implementations by @smu160 in #82
Use RustFFT more efficiently in benchmarks for a fair comparison by @Shnatsel in #81
Simplify and optimize BRAVO by @Shnatsel in #83
Improve standalone benchmarks by @Shnatsel in #84
Extend large-size benchmarks to 32 bits by @Shnatsel in #85
Double the chunk size for f64 by @Shnatsel in #86
README updates by @Shnatsel in #70
Options: default to 16k parallelism threshold by @Shnatsel in #87
Make planner fields pub(crate) by @Shnatsel in #88
Move 'How is it so fast?' down in the README by @Shnatsel in #91
Remove wide and multiversion dependencies by @Shnatsel in #90
add a benchmark for planner by @Shnatsel in #76
Initial work on interleaving performance by @Shnatsel in #94
Make BRAVO operating on chunks more explicit by @Shnatsel in #96
perf: add a FFT 32 codelet to fuse first 5 stages by @smu160 in #101
perf: add tiling to bravo to turn it into co-bravo by @smu160 in #106
Speed up random signal generation by removing per-element sqrt and sin_cos by @Shnatsel in #109
Use std::hint::black_box for more accurate benchmarks by @Shnatsel in #110
Use std::hint::black_box for more accurate small-size benchmarks by @Shnatsel in #111
perf: remove extra pass used for inverse transforms by @smu160 in #114
PoC: Make the codelets operate entirely in registers by @Shnatsel in #113
fix: remove old C FFTW benches, improve methodology, and plots by @smu160 in #117
feat: add real valued FFT/IFFT by @smu160 in #105
Use interleave() instead of zip_low()/zip_high() for better perf on AVX2 by @Shnatsel in #116
release/v0.4.0 by @smu160 in #119

Full Changelog: v0.3.0...v0.4.0

Contributors

Shnatsel and smu160

Assets 2

04 Sep 14:51

smu160

v0.3.0

Full Changelog: v0.2.2...v0.3.0

Fixes

Bump bytemuck
Add docs for new interleaved format API

Assets 2

03 Sep 14:41

smu160

v0.2.2

What's Changed

Update docs on normalization after IFFT
Add deinterleaving function by @Shnatsel in #32
Make planner reusable by @smu160 in #33
Add functions to support interleaved complex num by @smu160 in #27
Make required cfg show up on docs.rs by @Shnatsel in #34

Full Changelog: v0.2.1...v0.2.2

Contributors

Shnatsel and smu160

Assets 2

03 May 18:40

smu160

v0.2.1

What's Changed

advertise runtime feature selection by @Shnatsel in #25
Fixes inverse FFT ouput order issue by @smu160 in #29

Full Changelog: v0.2.0...v0.2.1

Contributors

Shnatsel and smu160

Assets 2

01 May 18:14

smu160

v0.2.0

What's Changed

Updated benchmark instructions and fixed typos by @dmtrinh in #10
Swapped out use of sincos() for more portability across platforms by @dmtrinh in #11
Bump black from 24.1.1 to 24.3.0 in /benches by @dependabot in #19
Add support for f32, as well as f64 by @smu160 in #17
Hotfix by @smu160 in #20
Add automatic CPU feature detection by @calebzulawski in #21

New Contributors

@dmtrinh made their first contribution in #10
@calebzulawski made their first contribution in #21

Full Changelog: v0.1.1...v0.2.0

Contributors

calebzulawski, dmtrinh, and 2 other contributors

Assets 2