New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/timer #2550

Open
wants to merge 20 commits into
base: develop
from

Conversation

Projects
None yet
5 participants
@bgoodri
Contributor

bgoodri commented Jun 14, 2018

Submission Checklist

  • [?] Run unit tests: ./runTests.py src/test/unit
  • Run cpplint: make cpplint
  • Declare copyright holder and open-source license: see below

Summary

Use time difference rather than iteration difference to determine whether to send "Iteration: ..." to the logger.

Also, check whether refresh > 0 before printing "Elapsed Time: ..."

Also, print the estimated time to completion by dividing the number of iterations to go by the ratio of the number of interations completed since the last refresh to the refresh interval.

Intended Effect

Get Andrew to shut up

Fixes #2070

How to Verify

Look at a run that takes more than refresh seconds

Side Effects

Departs from past behavior. Any (misguided) tool that is parsing the progress based on iteration differences might break.

Also, the default value of refresh used by all the interfaces are not particularly appropriate when (re)interpreted in seconds. 100 seconds is too long as a default. 30 sounds about right.

Documentation

None

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Trustees of Columbia University

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

bgoodri added some commits Jun 13, 2018

use time difference to print progress
also test whether refresh > 0 before printing Elapsed Time:

@bgoodri bgoodri added the feature label Jun 14, 2018

@bgoodri bgoodri added this to the 2.17.1++ milestone Jun 14, 2018

bgoodri added some commits Jun 14, 2018

refresh = 0 -> refresh = 5
this allows the logger to get the Elapsed Time stuff
ibid
[ci skip]
ibid
[ci skip]
ibid
[ci skip]
ibid
[ci skip]
ibid
[ci skip]
ibid
[ci skip]
ibid
[ci skip]
ibid
[ci skip]
ibid
[ci skip]
@bgoodri

This comment has been minimized.

Contributor

bgoodri commented Jun 15, 2018

Jenkins, retest this please.

@bgoodri

This comment has been minimized.

Contributor

bgoodri commented Jun 15, 2018

Jenkins, test this please.

@bgoodri

This comment has been minimized.

Contributor

bgoodri commented Jun 15, 2018

Jenkins, WTF is wrong with you?

@bbbales2

This comment has been minimized.

Member

bbbales2 commented Jun 15, 2018

Hahahahaha, the chat thing got disabled. Look for “retry” in here: http://discourse.mc-stan.org/t/new-jenkins-jobs-tutorial/2383

@bgoodri

This comment has been minimized.

Contributor

bgoodri commented Jun 15, 2018

Possibly there is some test in CmdStan that is broken either by expecting a fixed number of entries in the logger or by passing refresh = 0 and breaking because it no longer prints the Elapsed Time thing at the end. But looking at the log, there is no actual error message.

@syclik

This comment has been minimized.

Member

syclik commented Jun 16, 2018

@bgoodri

This comment has been minimized.

Contributor

bgoodri commented Jun 16, 2018

@bob-carpenter

This comment has been minimized.

Contributor

bob-carpenter commented Jun 16, 2018

@bgoodri

This comment has been minimized.

Contributor

bgoodri commented Jun 16, 2018

I can't make it through the entire unit tests due to #2549 but I don't think that is what Jenkins is experiencing.

@bob-carpenter

This comment has been minimized.

Contributor

bob-carpenter commented Jun 16, 2018

There should be issues created for something this major and the PR should reference the issue. I think it saves time on false starts for development.

Can we maintain backward compatibility by making time a second condition on iteration output? That is, we would have two refresh values, N for minimum iterations and T for minimum seconds between iterations. That is, we wait for N iterations and T seconds to go by. Then if we set N = 0, the refresh will be totally controlled by time. If we set T = 0, everything is backward compatible and controlled by iteration. Thus backard compatible default behavior would have T = 0. Turning off output would then be turning T or N up very high, though that's awkward and we could use a negative number and arguably would have to for N to preserve backward compatibility.

Rather than an interval timed refresh, I probably want something with a growing span, like a refresh at 1s, 10s, 1m, 5m, 30m, and maybe once an hour thereafter. I just want to know the whole thing hasn't gone to sleep or gotten stuck.

@bgoodri

This comment has been minimized.

Contributor

bgoodri commented Jun 16, 2018

There is a referenced issue: #2070

We could certainly get more fancy, but what in the current behavior of the progress logger is worth preserving? I have never thought the strong presumption for backwards compatibility extended to how things look as long as the draws are the same as they were before.

@bgoodri

This comment has been minimized.

Contributor

bgoodri commented Jun 16, 2018

Also, the progress logger does not include the chain ID currently, but for most of Stan's history it did. Then it went away, came back, and went away again. No one complained that doing so broke their homemade progress parser, so my guess is there are no homemade progress parsers to break.

@bob-carpenter

This comment has been minimized.

Contributor

bob-carpenter commented Jun 17, 2018

@bob-carpenter

This comment has been minimized.

Contributor

bob-carpenter commented Jun 17, 2018

@bgoodri

This comment has been minimized.

Contributor

bgoodri commented Jun 17, 2018

The backwards compatibility standard you are applying is inconsistent with the standard of backward compatibility that has been applied in the past. In addition to the chain ID hokey pokey, we have changed the number of digits in the print output, changed the way n_eff is calculated, and are changing the way R_hat is calculated. In each of those situations, we didn't introduce another flag that defaults to false in order to keep doing things the old way unless the user flips it on; we forced people to do things the new way because it is better.

The progress reports is even less consequential than those things. The only harm is if someone has an old script where they specify refresh = 1000 or something in order to keep the progress output minimal and then they run their old script again, which gets reinterpreted as "refresh every 1000 seconds", then the progress output will be even more minimal than what they might have been expecting based on Stan's past behavior. But that isn't a big deal.

@bob-carpenter

This comment has been minimized.

Contributor

bob-carpenter commented Jun 17, 2018

@syclik

This comment has been minimized.

Member

syclik commented Jun 18, 2018

I'll review the comments so far and put up some thoughts tomorrow.

@syclik

This comment has been minimized.

Member

syclik commented Jun 18, 2018

I just saw the discussion here:
http://discourse.mc-stan.org/t/timed-iteration-updates-and-backward-compatibility-standards/4552/12

I'll post a response there tomorrow.

add doubling mechanism
consider defaulting to 10s

also estimate time left
@bob-carpenter

This comment has been minimized.

Contributor

bob-carpenter commented Jul 11, 2018

@bgoodri --- any idea what's up with this failure? We're about to release 2.18 and this would be nice to include if it's just something like the tests needing to be kicked off again.

@bgoodri

This comment has been minimized.

Contributor

bgoodri commented Jul 11, 2018

No. This is what I was complaining to @seantalts about: Jenkins alleges that the CmdStan tests failed but if you open up the log there is no actual error message or test failure.

But Daniel is not agreeing with the PR anyway, so we'll probably have to take it up for 2.19.

@seantalts

This comment has been minimized.

Member

seantalts commented Jul 11, 2018

Weird, the same bug was happening on a Math PR. I added a workaround to the Jenkinsfile in Math but not here. I will open a PR.

@@ -22,8 +23,9 @@ namespace stan {
* @param[in] finish end iteration number used for printing messages
* @param[in] num_thin when save is true, a draw will be written to the
* mcmc_writer every num_thin iterations
* @param[in] refresh number of iterations to print a message. If
* refresh is zero, iteration number messages will not be printed
* @param[in] refresh initial number of seconds between progress messages.

This comment has been minimized.

@syclik

syclik Jul 13, 2018

Member

@bgoodri, was this really the intended behavior? The refresh rate gets doubled?

@syclik

This comment has been minimized.

Member

syclik commented Jul 13, 2018

I'm working on fixing it so it can handle this, but I'm really confused.

Do you really want "1" second refresh to mean it'll get printed at times:

  • 1 sec
  • 3 sec
  • 7 sec
  • 15 sec
  • 31 sec

and so on?

@bgoodri

This comment has been minimized.

Contributor

bgoodri commented Jul 13, 2018

@seantalts seantalts modified the milestones: 2.18.0, 2.18.0++ Jul 13, 2018

@bob-carpenter

This comment has been minimized.

Contributor

bob-carpenter commented Jul 13, 2018

@bob-carpenter

This comment has been minimized.

Contributor

bob-carpenter commented Oct 17, 2018

I think we're OK merging this now after discussions of what pieces of the API need to remain stable. Is there something that needs to be done to get this working with tests?

@bgoodri

This comment has been minimized.

Contributor

bgoodri commented Oct 17, 2018

@syclik has a different branch now that does the same thing in practice
https://discourse.mc-stan.org/t/development-of-time-based-refresh/5988

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment