New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partially remove polymorphic compare for constructor_tag #406
Conversation
compare_val is quite slow and was accounting for a significant fraction of the time spent compiling my program when profiled with perf. Hopefully this will be modestly faster.
Did you observe any improvement in compilation time on your program after applying this change? |
I didn't have time to test it yet because I would have to rebuild my entire application and it was already the end of the workday. But perf reported about 20% total time spent in compare_val of which over 95% was traced back to this call site. My experience typically has been that compare_val is always slower than an explicit comparison function. But I'll get you numbers on Monday. |
typing/types.ml
Outdated
| c -> c) | ||
| Cstr_constant _, _ -> 1 | ||
| Cstr_block _, _ -> 1 | ||
| _ -> -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems wrong. compare_tag (Cstr_constant _) (Cst_block _) = compare_tag (Cst_block _) (Cstr_constant _) = 1
Also, while using substraction as comparison is ok for those small integers, it's not really faster than the correct version.
You only need equal_tag
anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh good catch. I've written these functions before but I seem to have forgotten a case. Alright, I'll rewrite it with an equals method.
It would be very interesting to know how much faster your build get with this optimization. If it is significant, maybe the native code generator should inline the first level for comparison of sum types. Anybody knows if this has been considered before? |
And my guess is that the true bottleneck is the structural comparison of 20% of whole compilation time is insane. It would be interesting to know more details about your particular program. Let me guess further: it is generated code that contains lots of nested modules? |
I timed compilation of my program before and after this change. Compilation time dropped from roughly 600ms to roughly 475 ms. This is a relatively small program that is generated automatically which unmarshals a term from a string literal, performs match-intensive processing on it, prints a small amount of output, and then terminates. It does link against a rather complex runtime but the compilation of that runtime is not included in these running times. |
typing/types.ml
Outdated
| Cstr_block i1, Cstr_block i2 -> i2 = i1 | ||
| Cstr_extension (path1, b1), Cstr_extension (path2, b2) -> path1 = path2 && b1 = b2 | ||
| _ -> false | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be an exhaustive match to catch problems if the type is extended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm not sure I quite follow what you mean. You mean you want it to exhaustively match the structure of Path.t? Or are you suggesting something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chambart means that there should be no final | _ -> false
case. The code should be such that, if we add a new constructor to the input datatype, we get an exhaustivity warning.
I personally structure equality tests as follows to ensure this property:
| K1 p, K1 q -> ...
| K1 _, _ | _, K1 _ -> false
| K2 p, K2 q -> ...
| K2 _, _ | _, K2 _ -> false
...
| Kn p, Kn p -> ...
(The last case does not have a catch-all sibling because all constructors have been matched.)
I'm not sure whether this generates more or less efficient matching code, so maybe it would make sense to re-run your benchmark if you change this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to omitting the last case is to expand only the left column and group the non-matching constructors together:
K1 p, K1 q -> ...
| K2 p, K2 q -> ...
...
| Kn p, Kn q -> ...
| (K1 _|K2 _|...|Kn _), _ -> false
This seems to result in slightly better lambda code, but I also find it easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed this and did not see any significant impact to performance. I suspect because the OCAML compiler detects that the pattern is exhaustive with code as it exists currently and therefore it does not need to actually perform the check once jumping to this case.
Why only sum types? Records and tuples would also be useful. And one could go further and generate specialized comparison code for the known structure of the type (probably generating recursive functions for each required type node in each unit). |
add compiler patch (cf ocaml/ocaml#406)
Conflicts: typing/parmatch.ml
typing/types.ml
Outdated
@@ -304,6 +304,12 @@ and constructor_tag = | |||
| Cstr_extension of Path.t * bool (* Extension constructor | |||
true if a constant false if a block*) | |||
|
|||
let equal_tag t1 t2 = match (t1, t2) with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stylistic comment: "match" should be on the next line.
typing/types.ml
Outdated
let equal_tag t1 t2 = match (t1, t2) with | ||
| Cstr_constant i1, Cstr_constant i2 -> i2 = i1 | ||
| Cstr_block i1, Cstr_block i2 -> i2 = i1 | ||
| Cstr_extension (path1, b1), Cstr_extension (path2, b2) -> path1 = path2 && b1 = b2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use Path.same
here I think.
I think efforts to remove uses of polymorphic comparison are to be encouraged, so I move in favour of merging this patch, after a couple of comments have been addressed. |
I believe this should address the remaining issues, but if you have more feedback let me know. I can also do a history cleanup but if you want me to do that I would prefer to wait until all other feedback is addressed so I'm not doing it multiple times. Note that I also did some profiling on code recently and found that the time spent in ocamlc on one of my programs decreased from 3 minutes to 1 minute if I replaced the List.exists call in this PR with a Hashtbl. Would you like that included here or as a separate PR subsequent to the merging of this one? |
| Cstr_extension (path1, b1), Cstr_extension (path2, b2) -> | ||
Path.same path1 path2 && b1 = b2 | ||
| (Cstr_constant _|Cstr_block _|Cstr_unboxed|Cstr_extension _), _ -> false | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I find this more readable, should compiles fast and generate efficient code
let equal_tag t1 =
match t1 with
| Cstr_constant i1 ->
begin match t2 with Cstr_constant i2 -> i1 = i2
| _ -> false
end
| ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it the compiler's job to make it fast and efficient?
Regarding readability and maintainability: you lose exhaustiveness checks this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For readability, I prefer the original version. I find @bobzhang's version harder to read.
I don't know what goes on under the hood of ocaml's optimizer obviously but
I would actually expect the code as written now to be faster than matching
each argument individually because the compiler won't be able to create
jump instructions as effectively. Am I missing something?
…On Feb 6, 2017 7:33 PM, "Max Mouratov" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In typing/types.ml <#406>:
> @@ -322,6 +322,15 @@ and constructor_tag =
| Cstr_extension of Path.t * bool (* Extension constructor
true if a constant false if a block*)
+let equal_tag t1 t2 =
+ match (t1, t2) with
+ | Cstr_constant i1, Cstr_constant i2 -> i2 = i1
+ | Cstr_block i1, Cstr_block i2 -> i2 = i1
+ | Cstr_unboxed, Cstr_unboxed -> true
+ | Cstr_extension (path1, b1), Cstr_extension (path2, b2) ->
+ Path.same path1 path2 && b1 = b2
+ | (Cstr_constant _|Cstr_block _|Cstr_unboxed|Cstr_extension _), _ -> false
+
Isn't it the compiler's job to make it fast and efficient?
Regarding readability and maintainability: you lose exhaustiveness checks
this way.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#406>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE3jnYzjShE9DSR29WxFHg4Ez6AtVKMAks5rZ8nRgaJpZM4HBk0j>
.
|
@dwightguth You might as well put the Hashtbl change into this patch. Can you do a bit more benchmarking to make sure it isn't also a regression on average? |
Yeah, I can do that. I don't have a lot of experience benchmarking compilers though, do you have any advice on what type of programs you'd like to see as part of the benchmark? |
For this one maybe just check the time spent building the compiler distribution to make sure it hasn't degraded noticeably. I think it should be fine. |
I ran the following command twice, once before bootstrapping the compiler and once after, and these are the results:
So it looks like if there is an effect on the average case performance, it is small but positive. |
Changes
Outdated
@@ -378,7 +378,11 @@ Next version (4.05.0): | |||
(Alain Frisch, report by Anil Madhavapeddy) | |||
|
|||
- PR#7443, GPR#990: spurious unused open warning with local open in patterns | |||
(Florian Angeletti, report by Gabriel Scherer) | |||
(Florian Angeletti, report by Gabriel Scherer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you removed a parenthesis by mistake
Changes
Outdated
|
||
- GPR#406: remove polymorphic comparison for Types.constructor_tag in compiler | ||
(Dwight Guth, | ||
review by Gabriel Radanne, Pierre Chambart, Mark Shinwell) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please respect the format and put it all on one line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that multi-line credits are acceptable, and in particular may be used to respect the 80-columns rule. I'm not sure we care about which wrapping strategy is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adjusted it to wrap only at 80 characters. If you want it all on one line even though it now would be longer than 80 characters on one line, let me know and I can do that as well.
| Cstr_extension (path1, b1), Cstr_extension (path2, b2) -> | ||
Path.same path1 path2 && b1 = b2 | ||
| (Cstr_constant _|Cstr_block _|Cstr_unboxed|Cstr_extension _), _ -> false | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For readability, I prefer the original version. I find @bobzhang's version harder to read.
Now that @damiendoligez has approved I'm pretty sure this is ok to merge, even though AppVeyor failed for some reason. |
Remove ephemeron usage of RPC
* Always rebuild the term inside subfunctions * Wording and clarify comment
compare_val is quite slow and was accounting for a significant
fraction of the time spent compiling my program when profiled with
perf. Hopefully this will be modestly faster.