You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the Rust standard library, it's quite common for fold to be implemented in terms of try_fold. In your original code you construct a TakeWhile and call fold on it. But TakeWhile's fold implementation delegates to try_fold!
The general rule to follow is something like this:
If you need to completely consume an iterator, use for_each or fold.
Use for_each if you don't need to maintain an accumulator.
Use fold if you do need to maintain an accumulator.
If you need to partly consume an iterator, use try_for_each or try_fold.
Same rules about accumulators apply.
Bold is mine as I wonder if we are using for_each (convenient) at multiple places instead of fold.
(There is also Itertools::foreach but this one is okay.)
Why do I care? While benchmarking a new MapForGrouping::fold, I found out that some_slice_iter::for_each does not delegate to fold (default behavior) and is not optimized * (just while let Some(x) = self.next() { f(x); }) while some_slice_iter::fold is optimized (which is useful to see benchmark improvements of specializing fold methods of our adaptors by delegating to the inner iterator).
*: The for_each has a comment: "We override the default implementation, which uses try_fold, because this simple implementation generates less LLVM IR and is faster to compile."
The text was updated successfully, but these errors were encountered:
Well, apart from Itertools::join that is known to be slow anyway, I benchmarked each "for_each → fold" change and got mostly severe regressions. The only "win" was -5% for into_group_map.
Use fold if you do need to maintain an accumulator.
I would tweak this one slightly:
Use fold if you need owned access to an accumulator
If the folder is something like |v, x| { v.push(x); v }, then it's (very slightly) better to do that as .for_each(|x| v.push(x)), using an FnMut closure to refer to local state.
Last people looked into it, threading the passing of a Vec or similar through all the folds was harder for LLVM to optimize well than if there's nothing (well, a trivial ()) to pass along. So if the owned access to the accumulator -- fold's superpower -- isn't needed, in μoptimized code it's better to use for_each capturing a &mut instead.
#780 (comment) by jswrenn
Bold is mine as I wonder if we are using
for_each
(convenient) at multiple places instead offold
.Current
for_each
uses:Itertools::{counts, partition_map, join}
min_set_impl
GroupingMap::{aggregate, collect}
into_group_map
k_smallest
(There is also
Itertools::foreach
but this one is okay.)Why do I care? While benchmarking a new
MapForGrouping::fold
, I found out thatsome_slice_iter::for_each
does not delegate tofold
(default behavior) and is not optimized * (justwhile let Some(x) = self.next() { f(x); }
) whilesome_slice_iter::fold
is optimized (which is useful to see benchmark improvements of specializingfold
methods of our adaptors by delegating to the inner iterator).*: The
for_each
has a comment: "We override the default implementation, which usestry_fold
, because this simple implementation generates less LLVM IR and is faster to compile."The text was updated successfully, but these errors were encountered: