Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optionally write the output of -dlambda etc into a file #1913

Merged
merged 1 commit into from Jul 27, 2018

Conversation

Projects
None yet
7 participants
@sliquister
Copy link
Contributor

commented Jul 18, 2018

Right now, -dlambda, -dcmm, -dlive and all such debug flags cause the compiler to write to stderr. This works fine when running the compiler by hand, but when using a build system, it's usually easier or more convenient to change flags for a bunch of files at once, at which point the printing to stderr isn't so nice.

So I propose the addition of a flag -dintofile that makes the compiler write all such output into a file <compilation-unit>.mldump, or <exe>.mldump depending on what is being built.

I initially had the command line take the filename to write into, but I think it's preferable to have the compiler choose it, similar to ocamlopt -S. It's simpler for the build system, simpler to setup editor modes if needed.

Implementation-wise, many files are touched, but the changes are simple: the debug output already goes into this ppf argument being passed around, so I rename it to ~ppf_dumpto clarify the intended use of the formatter (and avoid people starting using it to print errors).
Then I grepped for Clflags.dump to see if I was missing some flags, and found some prints directly on stderr, which I made use this ppf_dump.
I had a previous version where the formatter was in a global variable (as is traditional), but I think passing the formatter around is better because it's clear when we do and don't need to provide such a formatter.

@shindere

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2018

@sliquister

This comment has been minimized.

Copy link
Contributor Author

commented Jul 18, 2018

-dfile was actually the previous name I had for this option! I thought if the d means debug or dump, it makes a bit more sense to say dump into file or debug into file than debug file or dump file, which sounds more like -dsource.

I didn't really consider printing to several file.
I think it's more convenient for tooling to have a single file? As in, if you have keybindings to go from .ml to .mldump, that's all you need. If you have 3 files, now the tool needs to ask which file you want to go in (some of which may exit only due to option from previous builds). It's harder to gitignore such files, though that can be made to work with a different naming convention. A build system that cares about undeclared targets would have to know about every option.
The compiler runs cmm, live, regalloc, cmm, live, regalloc, etc for one source file; we can't create one file per toplevel declaration, so what would you expect? The dcmm would contain the concatenation of all the dcmm output?

While writing this, I realized that <compilation-unit>.mldump clearly isn't specific enough: it will cause compilation of native, bytecode and interface to all write to the same file, so I'm going to add the .cmi, .cmo and .cmx in the name.

@trefis

trefis approved these changes Jul 18, 2018

Copy link
Contributor

left a comment

I've reviewed the code and it seems all fine to me.

I was a bit surprised to this this new flag in Common_options instead of Compiler_options.
Is there a situation where you'd like to be in the toplevel with -dwhatever -dintofile?

About the name, I disagree with Sebastian: -dfile seems worse to me. I'd be fine with -dtofile (... or -dtf), but that's probably a sign that my english is bad.

I agree with Sebastian that at some point we might want to dump to different files, but that it is a discussion for another day.

Thanks for this work @sliquister, several people have wanted this option for a while!

if !Clflags.dump_parsetree then Printast. top_phrase ppf phrase;
if !Clflags.dump_source then Pprintast.top_phrase ppf phrase;
if !Clflags.dump_parsetree then Printast.top_phrase ppf phrase;
if !Clflags.dump_source then Pprintast.top_phrase ppf phrase;

This comment has been minimized.

Copy link
@trefis

trefis Jul 18, 2018

Contributor

That diff is a bit gratuitous.

This comment has been minimized.

Copy link
@sliquister

sliquister Jul 18, 2018

Author Contributor

Oh yeah, that's probably leftover from previous changes, I'll remove it while I squash all the commits.

@trefis

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2018

Btw, I'm pretty sure this is going to conflict with #1703, and I'm not sure which one (if any) is going to be easier to rebase.

Given that @Drup has already waited a long while, I'm tempted to wait for his PR to get merged first and ask @sliquister to rebase this one (sorry).
But you guys will know better than me, what do you think?

@trefis

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2018

(On the other hand, this PR is ready now, and there's always the possibility that @mshinwell won't review @Drup's PR before 2021, so... 😆 )

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2018

How about something like -dump-to-file? I don't like the similarity between -dtofile and e.g. -dsource.

@shindere

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2018

@sliquister

This comment has been minimized.

Copy link
Contributor Author

commented Jul 18, 2018

@trefis yeah they will conflict, but I don't think the resolution will be problematic: resolve in favor of Drup's PR, and then reimplement this pull request in the couple of files. What'd be annoying is to have conflict in 30 different files.

@lpw25 -dintofile was chosen to express that it's related to the other -dfoo option, while not writing more output itself. The first aspect is lost with -dump-to-file, but if people prefer that, I can change it.

@shindere

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2018

@shindere

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2018

@trefis

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2018

@shindere: he is.

@sliquister

This comment has been minimized.

Copy link
Contributor Author

commented Jul 18, 2018

I completely missed @trefis's initial comment, somehow. I put the flag to control where to dump along with the flags that enables dumping. I could maybe move it to Compiler_options, I'm not sure what tool uses what set of options. In the toplevel, I implemented something because the flag was supported; I thought it may not be useless to say -dlambda -dintofile and then instead of getting the lambda in your terminal, you can look at it on occasion without having a ton of stuff in your terminal scrollback.

@shindere

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2018

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 18, 2018

I'm not convinced either by the common .mldump file for all modes, and I think there is enough non-consensus to pause despite @trefis' approval: please don't merge this yet.

I agree with @shindere that per-format extensions would be better. Either .dcmm or .cmm are fine by me. I have a mild preference for .cmm, and I don't think there is a risk of conflict with the testsuite given that for each foo.cmm in the testsuite there is no corresponding foo.ml file.

-S already generate its output in foo.s (instead of printing it), and I think that going this way would be nice.

I'm skeptical that the situation is worse for build systems for several extensions than for one. Either the build system is smart about parsing the user-provided command-line parameters, and it knows exactly when one of these files will be output (same with one extension or several), or it does not, and it has to pessimistically assume that any of them may be output (same with one extension or several).

The .gitignore stuff will just adapt, it's not a big deal, and it's not even clear that you want to have these files in your .gitignore anyway: either you never use them, and it doesn't matter, or you use them for specific reasons and you would probably prefer to see them and clean them explicitly. It's only if you had a pipeline that produces these files (1) in an automated way and (2) outside a specific build-directory that is already ignored, that you would need to add them, and then its amortized by the cost of setting up whatever optimization you are using.

The question of what to do when outputting disjoint pieces for each phrase remains (with either mldump or phase-specific information). Just piping whatever we print today as a whole seems a fine first step to me.

What about -dtiming and -dprofile?

P.S.: to me -d means display. Maybe the original intent was debug, but the original people made the explicit choice to not document these options, so it's on them.

@trefis

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2018

I'm not convinced either by the common .mldump file for all modes, and I think there is enough non-consensus to pause

I disagree with you there.
I'm not saying I wouldn't want several files eventually, but the current PR is already a clear improvement over not having that option at all, and it seems fine to improve from this point later on. However, I don't know whether sliquister has the time (or the motivation) to do that work right now, in which case do you really think we should just forget about the current state of the PR?

People are of course free to do more work right now, my approval was just meant as "I'd be happy to merge what is currently there, if nothing more happens".

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 18, 2018

If it goes in the language in the current state, people will adapt their tooling and workflow to support it, and would then have to redo the work if we decide to change things. That sounds bad, and enough of an additional frustration to not change things.
Or maybe your idea was that we could easily change while it is in trunk, before a release. But I don't want to be working against a time-bomb, having to make sure that I have an alternative proposal ready before the next release.

It's fine if we disagree on that point, but to me things that change the tool-oriented interface of the compiler should be considered more carefully than things that change the human-oriented interface; we can iterate five times on an error message reformulation if we fancy, not on how we produce new files.

@gasche
Copy link
Member

left a comment

(Let's make this official.)

Note: I'm not "requesting" that things be done in the way that I propose, but I request a more careful discussion of what it is that we want to do.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2018

Original author of the -d options here.

First, "d" means "dump", but "display" works too.

Second, for debugging the back-end of ocamlopt (something I did a lot of when @gasche was still in elementary school), I find it convenient to have all outputs in one file (redirected from stderr in the old days).

Since each top-level function F1...Fn is sent through the back-end pipeline one after the other, the output of ocamlopt -dspill -dsplit, say, contains the code for function F1 before and after the splitting pass, then the code for F2 before and after, etc. It's quite convenient for debugging issues with the pass in question.

With separate files, having this kind of combined view takes more work, not so much with 2 files but certainly with 3 files or more.

@xclerc

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2018

(If we go with only .mldump now, we can then almost safely
move to distinct outputs by then using .cmm.mldump,
.lambda.mldump, etc. -- and, taking @xavierleroy's comment
into account, it could be triggered by yet another command-line
option.)

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 18, 2018

We could also decide, as a hybrid approach, to group some of the passes together but not all. The backend passes that print per-function, they could be grouped together, but personally I would find it weird for -dlambda -dcse, or even -dcmm -dcse, to print in the same file.

I thought of suggesting .mach as the name of the grouped file, but in fact I don't like it; this should be a pretty-printing of a single Mach representation of the code (not sure at which point). Maybe .machpasses?

@sliquister

This comment has been minimized.

Copy link
Contributor Author

commented Jul 19, 2018

@trefis I moved the option to Compiler_options, and removed most of the changes to the toplevel as a result.

@gasche -dtimings and -dprofile are rather different from the other -d options. I'd rather keep them unchanged until we have a need to change them, especially since the change is not obvious (timing is about the whole command vs -dlambda and friends are about one source file).

Now about the multiple files, I'm surprised people are insisting on it, especially given that the compiler is already printing everything in one file, along with all the errors. I don't see what is this migration cost that would justify keeping the status quo over this pull request. We're talking about looking at intermediate asts here, it's not like every ocaml user does this kind of things every 10min. None of the -d flags are documented, and we're not in a case where everyone knows about them despite the lack of documentation (like for lowercase polymorphic variants).

@sliquister sliquister force-pushed the sliquister:doutput2 branch 2 times, most recently from 14d132f to 093f86b Jul 21, 2018

@trefis

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2018

If it goes in the language in the current state, people will adapt their tooling and workflow to support it, and would then have to redo the work if we decide to change things.

I have to agree with @sliquister here that this argument seems very weak given the current context (what tooling? what workflow?).

@sliquister sliquister force-pushed the sliquister:doutput2 branch from 093f86b to 96e1079 Jul 24, 2018

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 24, 2018

Well @sliquister mentioned two kind of changes to adapt to this PR:

  • changing .gitignore files
  • refining the rules of build systems, at least those that want to be aware of all the potential outputs of each build action

Both things would have to be changed if the interface changes.

@sliquister

This comment has been minimized.

Copy link
Contributor Author

commented Jul 24, 2018

I looked at what ghc does for comparison:

$ ghc -ddump-simpl -ddump-parsed -ddump-to-file -c -ddump-asm -ddump-llvm a.hs
$ ls -1
a.dump-llvm
a.dump-parsed
a.dump-simpl
a.hi
a.hs
a.o

so I renamed the output to <target>.dump, and presumably we'd use the same convention in the future: <target>.dump-lambda. Maybe the flag should be renamed to -dump-to-file as well, like Leo was saying, to at least match the file extension (though then it'd be desirable to rename the -dcmm as -dump-cmm etc at some point).

This way, you can ignore *.dump{,-*} and that's that. No need to force cmm, lambda to show up in there.

For the build system, all I can say is it takes 15s to write let targets = if List.mem args "-dintofile" then [ cmx ^ ".dump" ] else [], not so much when you have to check for every -d flag, many of which only apply in bytecode or clambda or flambda, or are conditional on others, or only apply when compiling but not linking.

@sliquister sliquister force-pushed the sliquister:doutput2 branch 2 times, most recently from b37d42c to c45ffce Jul 26, 2018

@trefis

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2018

Allow me to resume the conversation.

It is true that that if we merge this PR as is, and then change the meaning of the option to dump to different files, any build system with specific support for that option, which also keeps track of the targets of each rule, will have to be updated.
What I tried to say (in a somewhat unclear and unpolite way) in my previous message was: apart from janestreet's internal jenga rules (which @sliquister is maintaining) I don't expect any build system to add support for that option. I might of course be wrong, and dune might very well add support for it.
That problem though is easily avoidable if, instead of changing the behavior of that cmdline flag, we introduce yet another flag for that new behavior (-dump-to-separate-files). Then build systems authors/maintainers are free to update their support whenever they want to (or even not at all).

In my opinion, that should be enough to justify merging the current PR as is.

Considering that:

  • @sliquister did the present work
  • he is also the one who is going to have to update a build system
  • he is not is not planning to update this PR to dump to several files (I had confirmation of that offline)
  • you (the only one currently objecting to this PR) don't have the time (right now?) to update this PR
  • you, a priori, are not going to add any specific support in any build system

I kindly ask you to reconsider your objection, and allow this PR to be merged.

@gasche gasche dismissed their stale review Jul 26, 2018

(apparently a consensus?)

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 26, 2018

@trefis, I agree with your analysis and I was thinking of writing a message of the same effect anyway. I still believe that the proposed behavior is the wrong one, and that at least all whole-program representations (not per-function representations) should be printed to separate file, but (1) apparently no one else cares except maybe Sébastien and (2) indeed I don't currently have the time to do anything constructive about it. Let's agree to disagree and let you take your responsibilities if you want to merge this PR.

@sliquister sliquister force-pushed the sliquister:doutput2 branch from c45ffce to 15679b3 Jul 26, 2018

@trefis trefis merged commit ae1317c into ocaml:trunk Jul 27, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@sliquister sliquister deleted the sliquister:doutput2 branch Jul 27, 2018

@Drup Drup referenced this pull request Jul 27, 2018

Merged

Refactor Compile/Optcompile #1703

@gasche

This comment has been minimized.

Copy link
Member

commented Sep 8, 2018

Why are we writing the file to <target>.dump instead of <source>.dump? If I run ocamlc -o foo test.ml, the "target file" is test.cmo, but this is somewhat arbitrary: it could have been test.cmi or foo for example.

(I wanted to use -dump-into-file to implement #2030 but the use of the target makes it inconvenient, it is easier to just redirect the error output to a file.)

@sliquister

This comment has been minimized.

Copy link
Contributor Author

commented Sep 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.