Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop "Total running time" when generating the documentation #390

Merged
merged 1 commit into from Sep 3, 2018

Conversation

lamby
Copy link
Contributor

@lamby lamby commented Jun 11, 2018

Whilst working on the Reproducible Builds effort [0], we noticed
that sphinx-gallery could not be built reproducibly.

Patch attached that removes the "Total running time of the script"
messages from the documentation build.

(There is some filesystem-related unreproduciblity that remains, alas.)

This was originally filed in Debian as #901307 [1].

[0] https://reproducible-builds.org/
[1] https://bugs.debian.org/901307

@codecov-io
Copy link

codecov-io commented Jun 11, 2018

Codecov Report

Merging #390 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #390   +/-   ##
=======================================
  Coverage   95.37%   95.37%           
=======================================
  Files          29       29           
  Lines        2186     2186           
=======================================
  Hits         2085     2085           
  Misses        101      101

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d876e30...aa65cbc. Read the comment docs.

@lesteve
Copy link
Member

lesteve commented Jun 11, 2018

Disclaimer: I am not very familiar with what "reproducible builds" means in a Python context. Probably a very naive comment, but I have to admit that I can't really see a good reason why the docs need to be exactly reproducible.

@lamby
Copy link
Contributor Author

lamby commented Jun 11, 2018

Thanks.

Re. "Python context", in @Debian we build and ship the generated documentation, so ensuring that the results are the same is essential. :) (The "took 0.2s" is not even very useful for an end-user given that this is essentially a lame microbenchmark of the build server...)

@GaelVaroquaux
Copy link
Contributor

GaelVaroquaux commented Jun 11, 2018 via email

@lamby
Copy link
Contributor Author

lamby commented Jun 11, 2018

It's non-deterministic and essentially random value that is microbenchmarking the current CPU load of the machine building documentation so whilst I might agree that there is a difference between 20 seconds and 0.0002 seconds, I don't think any of them are long enough to offset the "cost" of not having a reproducible build :)

@GaelVaroquaux
Copy link
Contributor

GaelVaroquaux commented Jun 11, 2018 via email

@larsoner
Copy link
Contributor

I don't see a problem with providing a config option to disable the messages during build. Would that solve your problem @lamby ? I guess you'd need to have a patch for our conf.py that you use, so maybe it's not so helpful.

@sandrotosi
Copy link

we can patch docs/conf.py no problem, so this would indeed solve Debian's problem

@lamby
Copy link
Contributor Author

lamby commented Jun 11, 2018

Oh sure.. but I'd rather this is fixed upstream! :)

In terms of a config option, simply detect the presence of the SOURCE_DATE_EPOCH envionment variable.

@larsoner
Copy link
Contributor

For 90%+ of our users, having the example run times reported is useful, even if there is some variability in the numbers (ballpark is good enough!). So I agree with @GaelVaroquaux that we should leave the default "on". For people who need builds to give identical output with identical input, the config option should work.

@GaelVaroquaux
Copy link
Contributor

GaelVaroquaux commented Jun 11, 2018 via email

@larsoner
Copy link
Contributor

In terms of a config option, simply detect the presence of the SOURCE_DATE_EPOCH envionment variable.

I'm not sure what you mean by this.

Typically in sphinx-gallery config options are set by values in conf.py. These can be overridden by passing -D values during the build, or by users tweaking their own conf.py values.

@Titan-C
Copy link
Member

Titan-C commented Jun 11, 2018

I like the idea of reproducible builds, but I don't see how much sense it makes to have the docs reproducible. I do agree that the examples inside Sphinx-Gallery are so quick that the elapsed time adds more noise than content, but some of our users do appreciate this feature, since their examples can be very time consuming and they want to warn their users about this.
We have the feature to hide this stats for projects where it makes no sense. But in the aim of discoverability of features Sphinx-Gallery includes this information.

@lamby
Copy link
Contributor Author

lamby commented Jun 11, 2018

Given that a reproducible build is a niche case and there seems to be some objection to removing these timings from the source, if you detect that someone is building the package documentation with the SOURCE_DATE_EPOCH environment variable exported you can assume that they would like a reproducible build and thus "drop" the timings.

I don't see how much sense it makes to have the docs reproducible.

@Debian is shipping the documentation pre-built in HTML form ← this may be the bit you are missing. :)

@Titan-C
Copy link
Member

Titan-C commented Jun 11, 2018

These can be overridden by passing -D values during the build.

We also need to make this option recognizable. But is true, we can keep our config the way it is and provide this information for our online version of the docs, as we want to advertise this feature. The Debian project could build enabling a flag to drop this statistics.

@lesteve
Copy link
Member

lesteve commented Jun 12, 2018

Is my understanding correct that you can patch conf.py inside the Debian package with the changes in this PR? If that's the case I would be in favour of doing this for now.

Let me try to give you my perspective from a small corner of the Scientific Python ecosystem: documentation build is very likely not reproducible and "upstream" is unlikely to care enough to accept PRs unless they have a very small impact, in particular in terms of maintenance cost.

As an example, I have noticed in the past that the scikit-learn examples do not produce the exact same plots. It is likely that the main problem is some unset random state somewhere, which seems fixable, but it is going to be painful to find out which examples are the culprits.

On top of this, suppose I want to demonstrate in an example how random.random and random.seed works with a snippet like this:

# random numbers will be different on each run. In particular if you are looking at the example
# on the website you are very likely not going to get the same numbers
for i in range(3):
    print(random.random())

# now we set the random state and you should get the same numbers on each run
random.setstate(0)
for i in range(3):
    print(random.random())

The output of the example is not reproducible but that is exactly the point. I am not sure what the reproduciblebuilds approach would be in such a case.

@lamby
Copy link
Contributor Author

lamby commented Jun 12, 2018

documentation build is very likely not reproducible and "upstream" is unlikely to care enough to accept PRs unless they have a very small impact

I have encountered many such examples that deliberately use random numbers. In almost all cases PRs were accepted that if and only if some flag is set (eg .SOURCE_DATE_EPOCH) then it is built reproducibly. :)

(That avoids very single downstream distribution having to manage their own patch — sure, Debian can patch the software, but what about Arch Linux, Nix, Opensuse, Guix, etc. etc? They are all striving to be reproducible too..)

I worry we are are talking at cross-purposes here :)

@lesteve lesteve force-pushed the master branch 3 times, most recently from 0c1f0d4 to 49d4f11 Compare June 13, 2018 18:18
@GaelVaroquaux
Copy link
Contributor

This PR has not gone anywhere.

I think that the point of view of developers is that an option to not report computing time is possible, but it should be an option, off by default, and well documented.

I think that I will close this PR, with the goal to unclutter our PR history, unless there is objection.

@lamby
Copy link
Contributor Author

lamby commented Aug 31, 2018

I think that I will close this PR, with the goal to unclutter our PR history, unless there is objection.

Understandable. Would you accept a patch that dropped this time if-and-only-if the SOURCE_DATE_EPOCH variable is set?

@GaelVaroquaux
Copy link
Contributor

GaelVaroquaux commented Aug 31, 2018 via email

@lamby
Copy link
Contributor Author

lamby commented Aug 31, 2018

I would rather have a more explicit variable name.

Unfortunately, the name is somewhat "stuck". See: https://reproducible-builds.org/specs/source-date-epoch/

@larsoner
Copy link
Contributor

larsoner commented Sep 2, 2018

How about if SOURCE_DATE_EPOCH is set, we override min_reported_time to be np.inf?

Once #374 is merged, we will also want to override it, too, since it's not going to be deterministic, either. So we need to think about the general case of what we wan to do to have reproducible builds. I think it's reasonable to mention in the config that having SOURCE_DATE_EPOCH set will override specific config vars, so far just min_reported_time but soon also show_memory, too.

…phinx-gallery#390)

Whilst working on the Reproducible Builds effort [0], we noticed that
sphinx-gallery could not be built reproducibly.

This patch removes the "Total running time of the script" messages from the
documentation build if the SOURCE_DATE_EPOCH [1] environment variable is
exported in the environment.

This was originally filed in Debian as #901307 [2].

 [0] https://reproducible-builds.org/
 [1] https://reproducible-builds.org/specs/source-date-epoch/
 [2] https://bugs.debian.org/901307
@lamby
Copy link
Contributor Author

lamby commented Sep 3, 2018

Sounds good to me. I've updated this PR to match, rebasing against the current master whilst I'm at it.

Copy link
Contributor

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to only modify the doc build for Sphinx gallery itself only (current PR) or for all packages that use SG? I think I misunderstood the scope of your suggestion. I thought we were talking about the latter but I realize now you might mean the former.

@lamby
Copy link
Contributor Author

lamby commented Sep 3, 2018

Kinda both… in the sense that sphinx gallery ships examples of itself:

image

│ │ │ ├── ./usr/share/doc/python-sphinx-gallery-doc/html/_sources/auto_examples/plot_choose_thumbnail.rst.txt
│ │ │ │ @@ -64,15 +64,15 @@
│ │ │ │  
│ │ │ │          plt.show()
│ │ │ │  
│ │ │ │  
│ │ │ │      if __name__ == '__main__':
│ │ │ │          main()
│ │ │ │  
│ │ │ │ -**Total running time of the script:** ( 0 minutes  0.048 seconds)
│ │ │ │ +**Total running time of the script:** ( 0 minutes  0.035 seconds)

@larsoner
Copy link
Contributor

larsoner commented Sep 3, 2018

+1 for merge from my end

If other packages that use SG to build docs want 100% reproducible builds, this is one way to do it. And there is wide demand we could in the future build it into the package itself, but don't need to for now.

@Titan-C Titan-C merged commit 0cf4603 into sphinx-gallery:master Sep 3, 2018
@Titan-C
Copy link
Member

Titan-C commented Sep 3, 2018

I merged this. Setting this to the configuration file and not to the build part is the correct way to go. It also sets the example for other projects that might be interested in doing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants