Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partially remove polymorphic compare for constructor_tag #406

Merged
merged 14 commits into from Apr 28, 2017

Conversation

dwightguth
Copy link
Contributor

compare_val is quite slow and was accounting for a significant
fraction of the time spent compiling my program when profiled with
perf. Hopefully this will be modestly faster.

compare_val is quite slow and was accounting for a significant
fraction of the time spent compiling my program when profiled with
perf. Hopefully this will be modestly faster.
@gasche
Copy link
Member

gasche commented Jan 9, 2016

Did you observe any improvement in compilation time on your program after applying this change?

@dwightguth
Copy link
Contributor Author

I didn't have time to test it yet because I would have to rebuild my entire application and it was already the end of the workday. But perf reported about 20% total time spent in compare_val of which over 95% was traced back to this call site. My experience typically has been that compare_val is always slower than an explicit comparison function. But I'll get you numbers on Monday.

typing/types.ml Outdated
| c -> c)
| Cstr_constant _, _ -> 1
| Cstr_block _, _ -> 1
| _ -> -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems wrong. compare_tag (Cstr_constant _) (Cst_block _) = compare_tag (Cst_block _) (Cstr_constant _) = 1
Also, while using substraction as comparison is ok for those small integers, it's not really faster than the correct version.

You only need equal_tag anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh good catch. I've written these functions before but I seem to have forgotten a case. Alright, I'll rewrite it with an equals method.

@let-def
Copy link
Contributor

let-def commented Jan 9, 2016

It would be very interesting to know how much faster your build get with this optimization.

If it is significant, maybe the native code generator should inline the first level for comparison of sum types. Anybody knows if this has been considered before?

@murmour
Copy link
Contributor

murmour commented Jan 9, 2016

But perf reported about 20% total time spent in compare_val of which over 95% was traced back to this call site. My experience typically has been that compare_val is always slower than an explicit comparison function.

And my guess is that the true bottleneck is the structural comparison of Path.t, not a couple of redundant branches (tag checks) that you've shaved off prematurely.

20% of whole compilation time is insane. It would be interesting to know more details about your particular program. Let me guess further: it is generated code that contains lots of nested modules?

@dwightguth
Copy link
Contributor Author

I timed compilation of my program before and after this change. Compilation time dropped from roughly 600ms to roughly 475 ms. This is a relatively small program that is generated automatically which unmarshals a term from a string literal, performs match-intensive processing on it, prints a small amount of output, and then terminates. It does link against a rather complex runtime but the compilation of that runtime is not included in these running times.

typing/types.ml Outdated
| Cstr_block i1, Cstr_block i2 -> i2 = i1
| Cstr_extension (path1, b1), Cstr_extension (path2, b2) -> path1 = path2 && b1 = b2
| _ -> false

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be an exhaustive match to catch problems if the type is extended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm not sure I quite follow what you mean. You mean you want it to exhaustively match the structure of Path.t? Or are you suggesting something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chambart means that there should be no final | _ -> false case. The code should be such that, if we add a new constructor to the input datatype, we get an exhaustivity warning.

I personally structure equality tests as follows to ensure this property:

| K1 p, K1 q -> ...
| K1 _, _ | _, K1 _ -> false
| K2 p, K2 q -> ...
| K2 _, _ | _, K2 _ -> false
...
| Kn p, Kn p -> ...

(The last case does not have a catch-all sibling because all constructors have been matched.)

I'm not sure whether this generates more or less efficient matching code, so maybe it would make sense to re-run your benchmark if you change this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to omitting the last case is to expand only the left column and group the non-matching constructors together:

    K1 p, K1 q -> ...
  | K2 p, K2 q -> ...
    ...
  | Kn p, Kn q -> ...
  | (K1 _|K2 _|...|Kn _), _ -> false

This seems to result in slightly better lambda code, but I also find it easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed this and did not see any significant impact to performance. I suspect because the OCAML compiler detects that the pattern is exhaustive with code as it exists currently and therefore it does not need to actually perform the check once jumping to this case.

@damiendoligez damiendoligez added this to the 4.04-or-later milestone Jan 12, 2016
@alainfrisch
Copy link
Contributor

If it is significant, maybe the native code generator should inline the first level for comparison of sum types.

Why only sum types? Records and tuples would also be useful. And one could go further and generate specialized comparison code for the known structure of the type (probably generating recursive functions for each required type node in each unit).

dwightguth pushed a commit to runtimeverification/k that referenced this pull request Jan 20, 2016
dwightguth added a commit to runtimeverification/k that referenced this pull request Jan 20, 2016
Conflicts:
	typing/parmatch.ml
@damiendoligez damiendoligez removed this from the 4.04 milestone Aug 3, 2016
typing/types.ml Outdated
@@ -304,6 +304,12 @@ and constructor_tag =
| Cstr_extension of Path.t * bool (* Extension constructor
true if a constant false if a block*)

let equal_tag t1 t2 = match (t1, t2) with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistic comment: "match" should be on the next line.

typing/types.ml Outdated
let equal_tag t1 t2 = match (t1, t2) with
| Cstr_constant i1, Cstr_constant i2 -> i2 = i1
| Cstr_block i1, Cstr_block i2 -> i2 = i1
| Cstr_extension (path1, b1), Cstr_extension (path2, b2) -> path1 = path2 && b1 = b2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use Path.same here I think.

@mshinwell
Copy link
Contributor

mshinwell commented Dec 28, 2016

I think efforts to remove uses of polymorphic comparison are to be encouraged, so I move in favour of merging this patch, after a couple of comments have been addressed.

@dwightguth
Copy link
Contributor Author

I believe this should address the remaining issues, but if you have more feedback let me know. I can also do a history cleanup but if you want me to do that I would prefer to wait until all other feedback is addressed so I'm not doing it multiple times.

Note that I also did some profiling on code recently and found that the time spent in ocamlc on one of my programs decreased from 3 minutes to 1 minute if I replaced the List.exists call in this PR with a Hashtbl. Would you like that included here or as a separate PR subsequent to the merging of this one?

| Cstr_extension (path1, b1), Cstr_extension (path2, b2) ->
Path.same path1 path2 && b1 = b2
| (Cstr_constant _|Cstr_block _|Cstr_unboxed|Cstr_extension _), _ -> false

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I find this more readable, should compiles fast and generate efficient code

let equal_tag t1 =
   match t1 with 
   | Cstr_constant i1 -> 
      begin match t2 with Cstr_constant i2 -> i1 = i2 
       | _ -> false 
      end
  | ..
      

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it the compiler's job to make it fast and efficient?

Regarding readability and maintainability: you lose exhaustiveness checks this way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability, I prefer the original version. I find @bobzhang's version harder to read.

@dwightguth
Copy link
Contributor Author

dwightguth commented Feb 7, 2017 via email

bobzhang added a commit to rescript-lang/ocaml that referenced this pull request Feb 7, 2017
@mshinwell
Copy link
Contributor

@dwightguth You might as well put the Hashtbl change into this patch. Can you do a bit more benchmarking to make sure it isn't also a regression on average?

@dwightguth
Copy link
Contributor Author

Yeah, I can do that. I don't have a lot of experience benchmarking compilers though, do you have any advice on what type of programs you'd like to see as part of the benchmark?

@mshinwell
Copy link
Contributor

For this one maybe just check the time spent building the compiler distribution to make sure it hasn't degraded noticeably. I think it should be fine.

@dwightguth
Copy link
Contributor Author

I ran the following command twice, once before bootstrapping the compiler and once after, and these are the results:

$ time sh -c './configure && make world && make bootstrap && make world.opt' # before bootstrapping
<snipped>
real	6m32.920s
user	5m50.812s
sys	0m16.232s

$ time sh -c './configure && make world && make bootstrap && make world.opt' # after bootstrapping
<snipped>
real	6m20.863s
user	5m46.392s
sys	0m15.668s

So it looks like if there is an effect on the average case performance, it is small but positive.

Changes Outdated
@@ -378,7 +378,11 @@ Next version (4.05.0):
(Alain Frisch, report by Anil Madhavapeddy)

- PR#7443, GPR#990: spurious unused open warning with local open in patterns
(Florian Angeletti, report by Gabriel Scherer)
(Florian Angeletti, report by Gabriel Scherer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you removed a parenthesis by mistake

Changes Outdated

- GPR#406: remove polymorphic comparison for Types.constructor_tag in compiler
(Dwight Guth,
review by Gabriel Radanne, Pierre Chambart, Mark Shinwell)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please respect the format and put it all on one line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that multi-line credits are acceptable, and in particular may be used to respect the 80-columns rule. I'm not sure we care about which wrapping strategy is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjusted it to wrap only at 80 characters. If you want it all on one line even though it now would be longer than 80 characters on one line, let me know and I can do that as well.

| Cstr_extension (path1, b1), Cstr_extension (path2, b2) ->
Path.same path1 path2 && b1 = b2
| (Cstr_constant _|Cstr_block _|Cstr_unboxed|Cstr_extension _), _ -> false

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability, I prefer the original version. I find @bobzhang's version harder to read.

@mshinwell
Copy link
Contributor

Now that @damiendoligez has approved I'm pretty sure this is ok to merge, even though AppVeyor failed for some reason.

@mshinwell mshinwell merged commit 7d48407 into ocaml:trunk Apr 28, 2017
EduardoRFS pushed a commit to EduardoRFS/ocaml that referenced this pull request Dec 6, 2020
poechsel added a commit to poechsel/ocaml that referenced this pull request Apr 16, 2021
* Always rebuild the term inside subfunctions

* Wording and clarify comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet