New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various fixes for -dtimings #1027

Merged
merged 7 commits into from Feb 3, 2017

Conversation

Projects
None yet
4 participants
@sliquister
Contributor

sliquister commented Feb 1, 2017

When I tried to use -dtimings to figure out where the time is spent in the compiler pipeline, I ran into the problem that it gives unreliable results:

  1. the "typing" category is completely missing for mli's, resulting in a big gap between the "all" category and the sum of everything else
  2. anything that runs subcommands, like -pp, is not measured because Sys.time doesn't count time spent in subprocesses. This means the compiler underestimates its running time by a good amount.
  3. the -ppx commands, reading marshalled ast and checking its invariants are not measured (other than in the "all" category)

This pull request fixes these points, and I now get relatively good measurements.

Fixing 1. is straightforward.

For 2., I know of two ways to get the resource usage of subprocesses on linux: getrusage and wait4, and judging from searching around and looking at the python stdlib, neither of them works on windows.
So I added a function similar to Sys.time that includes cpu time of subprocesses if possible, but I don't expose it in Sys, since it's not very portable.

For 3., I redefined the "parsing" category to include everything from "we have a filename" to "we have an ast we can typecheck". I made a subcategory for -ppx as it's probably worth separating out, like -pp.

Valentin Gatien-Baron added some commits Jan 31, 2017

Valentin Gatien-Baron
add a variant of Sys.time that include user and system time of subpro…
…cesses

It's not exposed in the stdlib, as it doesn't seem to be implementable
on windows
Valentin Gatien-Baron
including time spent in -pp, -ppx, linker etc in -dtimings
One can see the difference by running for instance:
time ocamlc -dtimings -pp 'function f { find ~ | head -n 100000 > /dev/null; }; f' a.ml
@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Feb 2, 2017

Member

Is it necessary to bootstrap here, or did you do it out of habit? I thought that #324 had made bootstrapping unnecessary in that scenario.

Member

gasche commented Feb 2, 2017

Is it necessary to bootstrap here, or did you do it out of habit? I thought that #324 had made bootstrapping unnecessary in that scenario.

@sliquister

This comment has been minimized.

Show comment
Hide comment
@sliquister

sliquister Feb 2, 2017

Contributor

I bootstrapped because I couldn't build otherwise.
Maybe #324 only solves gets rid of bootstrap for adding primitives to modules that aren't linked in the compiler?

Contributor

sliquister commented Feb 2, 2017

I bootstrapped because I couldn't build otherwise.
Maybe #324 only solves gets rid of bootstrap for adding primitives to modules that aren't linked in the compiler?

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Feb 2, 2017

Member

I reviewed the code and it looks OK to me. One thing I wondered about is if it wouldn't be nice to separate computation of self-process time and children-time, both in the exposed primitive (we could have two functions instead of one trying to do both) and in the display of results.

(One way to do this that I hadn't thought of when writing my comment is to have hierarchical timer, and add a timer around what we know is the sub-process call. This allows the user to differentiate both without having to do an explicit separation of the two measurements. So maybe this isn't so useful after all.)

Member

gasche commented Feb 2, 2017

I reviewed the code and it looks OK to me. One thing I wondered about is if it wouldn't be nice to separate computation of self-process time and children-time, both in the exposed primitive (we could have two functions instead of one trying to do both) and in the display of results.

(One way to do this that I hadn't thought of when writing my comment is to have hierarchical timer, and add a timer around what we know is the sub-process call. This allows the user to differentiate both without having to do an explicit separation of the two measurements. So maybe this isn't so useful after all.)

@gasche gasche added the approved label Feb 2, 2017

@sliquister

This comment has been minimized.

Show comment
Hide comment
@sliquister

sliquister Feb 2, 2017

Contributor

Thanks.

I think if we wanted more detailed breakdown, we would create more categories, like -pp is a subcategory of parsing. The split between subprocess or current process is not really how you want to split time, as you could have plugins as subprocesses or as cmxs.

The current interface of dtimings is actually annoying with subcategories because you have to know which categories include what other categories to make sense of it. If people agree, I wouldn't mind changing the code so we print categories as a tree: /, /parsing, /parsing/-pp, /parsing/-ppx, /typing, /generate, generate/clambda, generate/cmm etc instead ol all, parsing, -pp, -ppx, typing, generate, clambda, cmm etc.

Contributor

sliquister commented Feb 2, 2017

Thanks.

I think if we wanted more detailed breakdown, we would create more categories, like -pp is a subcategory of parsing. The split between subprocess or current process is not really how you want to split time, as you could have plugins as subprocesses or as cmxs.

The current interface of dtimings is actually annoying with subcategories because you have to know which categories include what other categories to make sense of it. If people agree, I wouldn't mind changing the code so we print categories as a tree: /, /parsing, /parsing/-pp, /parsing/-ppx, /typing, /generate, generate/clambda, generate/cmm etc instead ol all, parsing, -pp, -ppx, typing, generate, clambda, cmm etc.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Feb 2, 2017

Member

Yes, I think giving a hierarchical view would be important.

(Another thing I wondered about your PR is whether it's really worth having separate -pp and -ppx timer, or whether just having the Preprocessing timer for both would be fine. I am undecided on it, and would respect the choice you made, but I must say that Dash_ppx is a rather ugly constructor name.)

Member

gasche commented Feb 2, 2017

Yes, I think giving a hierarchical view would be important.

(Another thing I wondered about your PR is whether it's really worth having separate -pp and -ppx timer, or whether just having the Preprocessing timer for both would be fine. I am undecided on it, and would respect the choice you made, but I must say that Dash_ppx is a rather ugly constructor name.)

Valentin Gatien-Baron
@sliquister

This comment has been minimized.

Show comment
Hide comment
@sliquister

sliquister Feb 2, 2017

Contributor

Ok. I think I prefer to change the display as a second pull request, then (it's nice-to-have, but this pull request is necessary to be able to use -dtimings at all).

About -pp and -ppx, I don't feel strongly either way. Iit seems marginally better to separate them, as "parsing" is already essentially -pp + -ppx.
I can make a change to hide lines that would reported as 0.000s, since these lines are essentially noise, so that at least the -ppx line wouldn't show up if you don't use -ppx.

Contributor

sliquister commented Feb 2, 2017

Ok. I think I prefer to change the display as a second pull request, then (it's nice-to-have, but this pull request is necessary to be able to use -dtimings at all).

About -pp and -ppx, I don't feel strongly either way. Iit seems marginally better to separate them, as "parsing" is already essentially -pp + -ppx.
I can make a change to hide lines that would reported as 0.000s, since these lines are essentially noise, so that at least the -ppx line wouldn't show up if you don't use -ppx.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Feb 3, 2017

Contributor

For the avoidance of doubt this work will be covered by Jane Street's corporate CLA.

Contributor

mshinwell commented Feb 3, 2017

For the avoidance of doubt this work will be covered by Jane Street's corporate CLA.

@chambart

This comment has been minimized.

Show comment
Hide comment
@chambart

chambart Feb 3, 2017

Contributor

I do approve the change too. I didn't really care about preprocessing when I added that so I
didn't even notice the limitation. Thank you for solving that.

If you are willing to spend some more time on that, it would be a great improvement to have
hierarchical timers. When I try to optimize some part of the compiler I usually add some finer
grain timers that I don't commit to avoid adding too much noise. I never took the time to
seriously do that.

Lexify landmark can be a good example.

If you are willing to spend even more time. Recording allocated blocks (Gc.minor_words)
can be quite useful. (I have some implementation somewhere, but I would be ashamed to
show it in the current state...)

This PR is good as it is right now. If you want to go further, this should go in another one.

Contributor

chambart commented Feb 3, 2017

I do approve the change too. I didn't really care about preprocessing when I added that so I
didn't even notice the limitation. Thank you for solving that.

If you are willing to spend some more time on that, it would be a great improvement to have
hierarchical timers. When I try to optimize some part of the compiler I usually add some finer
grain timers that I don't commit to avoid adding too much noise. I never took the time to
seriously do that.

Lexify landmark can be a good example.

If you are willing to spend even more time. Recording allocated blocks (Gc.minor_words)
can be quite useful. (I have some implementation somewhere, but I would be ashamed to
show it in the current state...)

This PR is good as it is right now. If you want to go further, this should go in another one.

@gasche gasche merged commit 297d00c into ocaml:trunk Feb 3, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Feb 3, 2017

Member

Based on @chambart's review (thanks!) I decided to go ahead and merge.

Member

gasche commented Feb 3, 2017

Based on @chambart's review (thanks!) I decided to go ahead and merge.

camlspotter pushed a commit to camlspotter/ocaml that referenced this pull request Oct 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment