-
Notifications
You must be signed in to change notification settings - Fork 67
-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of Generic-based instances. #200
Comments
Marking gput for That is we use
with all other generic put instances being INLINE. The only question is then how big the impact of this would be on compile time. Here is the full fast version of the
|
Encoding speed increase based on the variantions I tested: binary-master: 1.0 This was collected using this cursed benchmark which while pretty unrealistic gives at least an idea about the potential impact of full specializaton. You need to compile with -ffull-laziness for these results to make sense or you will just measure allocation performance.
|
@AndreasPK did this change make it into the binary library? You also used this as a motivating example for improving the inliner. Can you offer a repro case, so I can see if GHC MR !11579 gets it? |
@simonpj I don't believe there have been any |
@AndreasPK @kolmodin what is the status here? So far as I can tell, @AndreasPK has identified that the I'm not close enough to the problem or the solution, but it certainly sounds attractive -- |
As perhaps obvious from my other recent tickets I've been looking at how Binary gets compiled as a matter of investigating GHC performance.
While looking at the resulting code I found that generic-based instances generally don't fully optimize away the overhead of generics.
In particular I've looked at slight variations of this code:
I found that for an expression like this (and having split the deriving into it's own module):
It results in this kind of core on ghcs master branch currently:
Which looks fine assuming the code in the Derive module we call is just the "put" method for each constructor. But sadly instead these methods all end up calling the generic put method (but at least with a statically computed generic representation of the individual constructor).
For runtime performance the issue here is that $w$cgput doesn't get inlined. And indeed for a regular function it's rather large so it not being inlined is not unexpected. But we could force it to inline trivially by adding INLINE pragmas on the methods in in
Data.Binary.Generic
.And indeed I tried this and for encoding the example data type above via it's generic instance allocations at runtime went down by around a third and runtime similarly improved significantly (but I didn't take exact measurements for runtime).
This isn't, sadly, enough for complete elimination of overhead. The resulting partial specialized in pseudo code looks something like:
If
put_con
would inline it would cancel out withgeneric_var0_rep
, same if it where to get specialized by SpecConstr but since it's non-recursive this doesn't happen either.Not that nothing about this is allocating so this is a good win over the current behaviour. But there are a lot of conditional branches taken in order to compute the encoding and the overhead is also larger with larger types.
Maybe this too can be fixed with with some well place INLINE pragmas or
inline
applications.The text was updated successfully, but these errors were encountered: