-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vector version of mapAccumL that behaves like the list version #228
Comments
well, lets work it out and measure! https://hackage.haskell.org/package/vector-0.12.0.1/docs/Data-Vector-Fusion-Stream-Monadic.html write a mapAccumL for the Stream type, and then its super easy to have that be the internal implementation for the rest :) |
(modulo a few subtleties, like i think you need to make sure your stream step function is non recursive, but that should be fine, plus theres some other stuff, but i'm here to help (and trick someone into learning stream fusion :) ) |
I've updated my PR with an implementation of |
Specifically these benchmark results where
|
ok, so we need to do some digging it seems. also perhaps some comparing of Core generated for each of these ... hrmm |
(also thanks for getting the ball rolling! ) |
What are the best options for generating core in this situation. My prior attempts at generating core have created quite an intelligible monstrosity (at least for me). |
i dont remember the precise spelling for flags, though |
also: adding those [0] and [1] annotations is usually driven by a reason! (we could try to do some sort of rewrite rule to write back out to the optimized version when we can't fuse, but lets profile using -g3 and perf tools or something to understand whats happening for each of these, or something) |
With those annotations, I've tried to keep them the same as what the I've read a little bit about rewrite rules and how ordering of inlining and rewriting matters, although, I haven't got a good way to debug these. |
BTW, thanks for jumping on and helping with this 😀 |
Both were generated with |
You will want the dump to file flag and then upload them as attachments :)
@treeowl , while I understand stream fusion isn’t the flavor you’re most
comfy with, do you have any good suggestions John can try to help the
fusion stuff ? Or something ? :)
…On Thu, Nov 8, 2018 at 11:25 PM John Ky ***@***.***> wrote:
Both were generated with ./project.sh bench --ghc-options "-ddump-simpl
-dsuppress-all"
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#228 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAQwrwN5Hlo88QodcBhl9Ts2Sj_eIDVks5utQPOgaJpZM4YVm-z>
.
|
One crazy thought I just had is a hybrid version:
Let’s look at the versions where the result is stream structured but one or
none of the inputs are.
I think we need to be careful ... our stream benchmarks may be just
including something we assume isn’t being recomputed. Or something.
On Thu, Nov 8, 2018 at 11:31 PM Carter Schonwald <carter.schonwald@gmail.com>
wrote:
… You will want the dump to file flag and then upload them as attachments :)
@treeowl , while I understand stream fusion isn’t the flavor you’re most
comfy with, do you have any good suggestions John can try to help the
fusion stuff ? Or something ? :)
On Thu, Nov 8, 2018 at 11:25 PM John Ky ***@***.***> wrote:
> Both were generated with ./project.sh bench --ghc-options "-ddump-simpl
> -dsuppress-all"
>
> —
> You are receiving this because you commented.
>
>
> Reply to this email directly, view it on GitHub
> <#228 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAAQwrwN5Hlo88QodcBhl9Ts2Sj_eIDVks5utQPOgaJpZM4YVm-z>
> .
>
|
Fast hand-rolled version: Slower lazy StateT based version: |
After some discussion on GHC channel on IRC with mpickering. Found that:
And
|
Could conversion to list be the cause of the performance issue? |
I’m not awake yet so I’ve not caught up on everything. But it’s definitelu
true that ghc is much more conservative on simplifying code it things is
recursive, so this could be a culprit. I’ll try to investigate a bit
today.
…On Fri, Nov 9, 2018 at 8:13 AM John Ky ***@***.***> wrote:
Could conversion to list be the cause of the performance issue?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#228 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAQwmJLv2R-PbjjROwmCFusWAQ5U1Chks5utX-UgaJpZM4YVm-z>
.
|
I tried writing alternate mapM :: forall m v a b . Monad m => (a -> m b) -> Bundle m v a -> Bundle m v b
{-# INLINE_FUSED mapM #-}
mapM f Bundle { sElems = s, sChunks = c, sVector = v, sSize = n} = Bundle
{ sElems = S.mapM f s
, sChunks = S.mapM fc c
, sVector = Nothing
, sSize = n
}
where fc :: Chunk v a -> m (Chunk v b)
fc = undefined |
One question: how does producing the final seed affect fusion on the other side? Another question: how do the general problems fusing |
this fell off the wagon but should be revisisted at some point |
Yes please 😁 |
Anything I can do to help move this forward? |
I've implemented a hand rolled version, and another two versions based on a combination of
mapM
and the lazy and strict versions ofState
monad.haskell-works/hw-prim#38
The benchmarks show that the hand rolled versions run two times faster than the lazy state monad version and 16 times faster than the strict state monad version.
I found the slow performance of the strict monad version most surprising.
I'm aware that the version that using
mapM
might enable fusion, however it is a fair bit slower than a hand rolled version that defeats fusion.I would love to have a fusion-enabled version that runs as fast as the hand rolled version. Would that be possible?
The text was updated successfully, but these errors were encountered: