Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change semantics of -fhpc to not accumulate data over multiple runs #612

Merged
merged 8 commits into from Feb 27, 2024

Conversation

BinderDavid
Copy link
Contributor

@BinderDavid BinderDavid commented Sep 13, 2023

The proposal has been accepted; the following discussion is mostly of historic interest.


If I compile a program with the -fhpc option and run it, I expect to obtain a .tix file which contains information about the code covered during the execution of that run. This, however, is surprisingly not the behaviour that is implemented.
Instead, the generated .tix file contains the accumulated coverage of this execution of the program and all previous runs. I propose to change the semantics to only generate the coverage information for one run of the program.

Rendered

BinderDavid added a commit to BinderDavid/ghc-proposals that referenced this pull request Sep 13, 2023

## Effect and Interactions
If a user wants to combine the results of multiple runs, then this behaviour can be trivially implemented in userland.
The `hpc` binary has the subcommand `hpc combine` which allows to do this on the commandline, and the HPC library on [Hackage](https://hackage.haskell.org/package/hpc) provides the tools to do that programmatically from within Haskell programs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“trivial” is a stretch. If I want to get coverage on binary a, which is called indirectly by some other processes (a test suite driver, or even application logic that happens to shell out to a), then it is going to be rather tricky to copy away the .tix file in time for the next run. I believe I have relied on that behavior before, so it’s not theoretical.

It seems more trivial to delete the .tix file in between runs :-)

Maybe the old behaviour could be guarded by a RTS flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“trivial” is a stretch. If I want to get coverage on binary a, which is called indirectly by some other processes (a test suite driver, or even application logic that happens to shell out to a), then it is going to be rather tricky to copy away the .tix file in time for the next run. I believe I have relied on that behavior before, so it’s not theoretical.

Thanks, I removed the word "trivially", and turned nobody to "very few people" ;) The usecase you describe is actually one that I have too, but where I ran into the following restriction of cabal: haskell/cabal#9251 Edit: It is also possible to set the environment variable HPCTIXFILE to a different value on each invocation instead of copying the files around after they have been created.

Maybe the old behaviour could be guarded by a RTS flag?

I have added this in the "Alternatives" section. I could also implement that RTS flag, if the comittee decides that this alternative is better.

@tomjaguarpaw
Copy link
Contributor

I'm confused. Why is the name of the .tix file set implicitly? In Haskell we're not usually fans of implicit behaviour!

@BinderDavid
Copy link
Contributor Author

I'm confused. Why is the name of the .tix file set implicitly? In Haskell we're not usually fans of implicit behaviour!

Yes, the name of the tix file is implicitly set to the name of the executable. This behaviour can currently only be overriden by setting the HPCTIXFILE environment variable. If we would now start to always require the name of the tixfile to be set explicitly, the resulting breakage would probably be way too large, so I am not proposing that. I can add a RTS flag to set the name of the tixfile in addition to the environment variable, but this change is backwards compatible, and I can do that as part of a normal issue/merge request on gitlab.

@elaforge
Copy link

From the discourse comment:

I also don’t use the accumulate feature (and would like to see it gone), but I do use HPCTIXFILE to send parallel processes to different tix files which are then merged.

I’ve been continually annoyed by “tix doesn’t match” for something like 15 years now, as far as I can tell it cannot be worked around (writing scripts to clear out tix files constantly never worked, I figured out why once long ago but forget now). It just means I either don’t use hpc, or my ghci sessions will regularly die and I have to restart them. So if your change will fix that too, then my only request is you go back in time and commit it 15 years ago!

That said, I'm not totally sure this change will actually fix my case, which is: start ghci, load modules, recompile modules, reload, run any function -> tix doesn't match. That's different from rerunning a standalone binary. But if the crashes are due to the implicit "read the last tix" and this removes that, then it should also fix the crashes.

@BinderDavid
Copy link
Contributor Author

In related news: My merge request to add a flag which controls whether tix files are written has just been merged (https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11327). With this MR there is now a new group of HPC related RTS-Flags, and it would now be easy to add another flag which controls whether tix files are read, for example --read-tix-file=<yes|no>, as described in the alternatives section.

@nomeata I didn't receive a lot of comments here, and I also advertised on the Haskell discourse (https://discourse.haskell.org/t/do-you-rely-on-fhpc-accumulating-coverage-over-multiple-runs/7584). Do you think it is too early to bring this proposal before the committee, or should I do something else first?

@nomeata
Copy link
Contributor

nomeata commented Sep 27, 2023

It’s probably good to get it through now. In light of the increased emphasis on not breaking things in GHC, and the fact that there are RTS flags for this now, would you want to change the proposal to keep the old aggregating functionality (at least opt-in), or propose as is?

@BinderDavid
Copy link
Contributor Author

It’s probably good to get it through now. In light of the increased emphasis on not breaking things in GHC, and the fact that there are RTS flags for this now, would you want to change the proposal to keep the old aggregating functionality (at least opt-in), or propose as is?

I have modified the proposal and now propose to guard the old behaviour behind the RTS flag --read-tix-file=<yes|no>. But I still propose to change the default behaviour to not read in tix files anymore. The name of the flag is chosen to express that it is complementary to the existing flag --write-tix-file=<yes|no>. Completely removing the functionality is now part of the alternatives section.

I hereby propose to bring the proposal before the committee :)

@nomeata nomeata added the Pending shepherd recommendation The shepherd needs to evaluate the proposal and make a recommendataion label Sep 27, 2023
@BinderDavid
Copy link
Contributor Author

I just opened a MR with the implementation of the change here: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11401

@BinderDavid
Copy link
Contributor Author

Friendly ping @nomeata @angerman :) I am happy to provide any more information that is needed. The implementation in the linked MR is ready, and this proposal will only determine what the default of the additional RTS flag will be. If this proposal is accepted, then the default will be --read-tix-file=no, otherwise --read-tix-file=yes.

@angerman
Copy link
Contributor

@BinderDavid I'm fairly strong on backwards compatibility with migration paths, as I hate surprising behaviour changes. While I agree that tix is probably not that frequently used, and I agree this whole implicit behaviour is ... uh, questionable.

What is your take on the following idea wrt to this proposal?

  1. default to yes for 2 GHC releases, wich will also emit a warning that the existing tix file is being read, and that this behaviour will change in GHC+2 releases. This message should also link to this proposal".
  2. GHC+2 releases down the line (we may be able to implement this with CPP in GHC), it defaults to no?

This should give users enough time to be given a heads up that the tix behaviour will change, and that they'll need to be aware of the change coming in GHC+2?

@BinderDavid
Copy link
Contributor Author

What is your take on the following idea wrt to this proposal?

Yes, that sounds like a good idea. I would propose something like the following warning text which is emitted whenever a tix file is successfully read by the RTS:

Deprecation notice: An executable instrumented with -fhpc currently
accumulates the coverage over multiple runs by default. As described
in GHC proposal 612, this default will change in a future version. If you
want to explicitly opt-in to the old behaviour you can use the
RTS flag --read-tix-file=yes

Do you want me to update the text of the proposal?

@angerman
Copy link
Contributor

@BinderDavid that looks pretty good, two things to note thought:

  • can we add the full https://github.com link? Otherwise people will have to lookup GHC proposal 612 with google or something else, being explicit about the url here, is probably very helpful to users.
  • I find the opt-in notice to be a bit confusing. Can I opt-in to that now during the deprecation phase? It reads a bit like that, but then I would think that would need to be --read-tix-file=no? And only after the deprecation phase be --read-tix-file=yes?

@BinderDavid
Copy link
Contributor Author

Ok, what do you think about the following variant:

Deprecation notice: An executable instrumented with -fhpc currently
accumulates the coverage over multiple runs by default. As described
in https://github.com/ghc-proposals/ghc-proposals/pull/612, this default
will change in a future version. If you want to keep using the old behaviour
you can start using the RTS flag --read-tix-file=yes now.

I find the opt-in notice to be a bit confusing. Can I opt-in to that now during the deprecation phase? It reads a bit like that, but then I would think that would need to be --read-tix-file=no? And only after the deprecation phase be --read-tix-file=yes?

Yes, you can use the flag --read-tix-file=yes from the same time that I would add this warning notice. The flag would have no effect in the deprecation period, since yes remains the default. After the deprecation period the default would change to no, and the --read-tix-file=yes has the effect of keeping the old behaviour. The flag is called --read-tix-file because the RTS has to read in a tix file from disk in order to accumulate the tix counts.

@angerman
Copy link
Contributor

@BinderDavid sorry for me failing to explain myself properly. Here is what I meant:

Deprecation notice: An executable instrumented with -fhpc currently
accumulates the coverage over multiple runs by default. As described
in https://github.com/ghc-proposals/ghc-proposals/pull/612, this default
will change in a future version. If you want to opt in to the new behaviour
you can pass the RTS flag --read-tix-file=no, which will be the default
in GHC X.Y.

We could consider having a message after the deprecation phase:

Since GHC X.Y, -fhpc does not accumulate coverage over multiple runs by
default anymore. See https://github.com/ghc-proposals/ghc-proposals/pull/612.
To opt-out of this new default behaviour you can pass the RTS flag --read-tix-file=yes.

Whether or not we want that extra notice for maybe one GHC release though I'm not
completely sure. Maybe only with -v? Or not at all?

@BinderDavid
Copy link
Contributor Author

@BinderDavid sorry for me failing to explain myself properly. Here is what I meant:

Deprecation notice: An executable instrumented with -fhpc currently
accumulates the coverage over multiple runs by default. As described
in https://github.com/ghc-proposals/ghc-proposals/pull/612, this default
will change in a future version. If you want to opt in to the new behaviour
you can pass the RTS flag --read-tix-file=no, which will be the default
in GHC X.Y.

Ah sorry. Now I understand what you meant. Yes, I can implement it like that.

We could consider having a message after the deprecation phase:

Since GHC X.Y, -fhpc does not accumulate coverage over multiple runs by
default anymore. See https://github.com/ghc-proposals/ghc-proposals/pull/612.
To opt-out of this new default behaviour you can pass the RTS flag --read-tix-file=yes.

Whether or not we want that extra notice for maybe one GHC release though I'm not completely sure. Maybe only with -v? Or not at all?

I would prefer it if we don't spam the users too much. I could simply check if a file <name>.tix is present, and in that case emit the message that a previous version would have ingested that file, but that the current version doesn't.

@tomjaguarpaw
Copy link
Contributor

In the interest of avoiding breaking changes, could we perhaps add a new flag that has the behaviour you want, keeping the previous behaviour around? For example, we could add -fhpc-overwrite.

@BinderDavid
Copy link
Contributor Author

In the interest of avoiding breaking changes, could we perhaps add a new flag that has the behaviour you want, keeping the previous behaviour around? For example, we could add -fhpc-overwrite.

Do you mean as an alternative to -fhpc? As I see it -fhpc is a compile-time option which enables that an executable is instrumented with the tick boxes which are required to collect coverage information. Whether the instrumented binary reads a tix file from disk should in my opinion be a concern of the RTS. I therefore think we shouldn't have separate compile-time options but resolve the situation using RTS flags.

But if you just suggest that we shouldn't change the default but make it an option to tell the RTS not to read in the tix file, then I have implemented that in the open MR https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11401 . This adds a flag --read-tix-file=<yes|no>, the question is just what the default of that flag should be. (The name of the flag is chosen to mirror the name of the existing --write-tix-file=<yes|no> flag which controls whether to write a tix file at the end of execution).

@angerman
Copy link
Contributor

I think at this point I'll recommend acceptance of this proposal, and see what the rest of the steering committees view on the default is. I tend towards changing the default with an appropriate grace/deprecation period and notice, as I consider this implicit behaviour surprising.

@adamgundry adamgundry added Pending committee review The committee needs to evaluate the proposal and make a decision and removed Pending shepherd recommendation The shepherd needs to evaluate the proposal and make a recommendataion labels Jan 25, 2024
@angerman
Copy link
Contributor

@BinderDavid what's your take on this suggestion by Simon

  • In the first release, GHC continues with the current behaviour, but
    if it finds an old file (of any kind, even in the wrong format) it emits
    a warning (before attempting to read it) saying

    I am reading in the existing tix file, and will add hpc info from this run
    to the existing data in that file. GHC 9.12 will cease looking for an
    existing tix file. If you positively want to add hpc info to the current
    tix file, use -fread-tix-file

  • In the next release, it stops reading the file

@BinderDavid
Copy link
Contributor Author

This sounds very sensible to me, and should be quite easy to implement. I am happy to implement this or any other similar deprecation strategy that is deemed the most suitable.

@angerman
Copy link
Contributor

@BinderDavid unless anyone speaks up over the weekend, let's I'll mark this proposal as accepted and merge it. Let's implement it as discusses then, and get this done :-)

@BinderDavid
Copy link
Contributor Author

@angermann Should I modify the proposal and add the specific deprecation strategy outlined by Simon to the text of the proposal before it is merged?

@angerman
Copy link
Contributor

@BinderDavid yes, please do so. This proposal is now accepted! I'll merge it once you updated the deprecation strategy! Thank you!

@angerman angerman removed the Pending committee review The committee needs to evaluate the proposal and make a decision label Feb 26, 2024
@angerman angerman added the Accepted The committee has decided to accept the proposal label Feb 26, 2024
@BinderDavid
Copy link
Contributor Author

@angerman I added the deprecation strategy in aaf984e and created 0612-fhpc-accumulation.md instead of modifying the original template.

Many thanks for the sheparding of the proposal ❤️ .

@angerman angerman merged commit c7419cd into ghc-proposals:master Feb 27, 2024
1 check passed
adamgundry added a commit that referenced this pull request Feb 27, 2024
@chreekat
Copy link

The old behaviour will be changed over two consecutive releases:

In the first release, GHC continues with the current behaviour, but if it finds an old file (of any kind, even in the wrong format) it emits a warning (before attempting to read it) saying
I am reading in the existing tix file, and will add hpc info from this run to the existing data in that file. GHC 9.12 will cease looking for an existing tix file. If you positively want to add hpc info to the current tix file, use --read-tix-file=yes

Sorry to comment on an accepted proposal, but to be clear, there is no "GHC" available to print any warning. There's just the actual compiled program. This deprecation warning is output by the compiled program.[1] Isn't that wildly backward incompatible? How many people test their software by running it and expecting certain output? How many of those have -fhpc enabled? Surely "some", at the very least. Could this expectation be clarified?

[1]: As demonstrated by the nice example at https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11401

@BinderDavid
Copy link
Contributor Author

BinderDavid commented Feb 28, 2024

The likelihood that someone is running a program with hpc enabled in production is extremely low. An instrumented program runs much, much slower than an uninstrumented program. (Edit: Original paper says a slowdown by a factor of 3-5)

@BinderDavid
Copy link
Contributor Author

The deprecation warning itself is emitted on stderr, so I hope that if people are testing the output of their executables they check for stdout. But it could happen, of course.

@angerman
Copy link
Contributor

I agree with @BinderDavid here, people who run -fhpc will have just built their program with hpc, and then run subsequently run that (most likely in CI). We could argue that there is no "ghc", but there is ghc in the form of the RTS and lib:ghc, ... in every build product. Should the message rather be this build product was compiled with GHC X.Y? GHC X.Y+2, will change the behaviour, once you compile your program against a newer GHC the behaviour will change? @chreekat?

@chreekat
Copy link

I do think modifying stderr output from compiled programs will break project test suites in the wild. That was my primary concern.

I don't feel strongly about this, I just wanted it understood that this behavior will probably break somebody's CI. :p

@chreekat
Copy link

P.S. I bet people do use hpc in production. But then the stderr message is probably a good thing.

@adamgundry
Copy link
Contributor

I do think modifying stderr output from compiled programs will break project test suites in the wild.

True, but it can be worked around by passing --read-tix-file=yes explicitly, right? (At least I think it should, but see https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11401#note_550679). And it will only appear when the project would otherwise be silently impacted by the changed default. So it seems better to have an explicit message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted The committee has decided to accept the proposal
Development

Successfully merging this pull request may close these issues.

None yet

7 participants