New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop "Total running time" when generating the documentation #390
Conversation
Codecov Report
@@ Coverage Diff @@
## master #390 +/- ##
=======================================
Coverage 95.37% 95.37%
=======================================
Files 29 29
Lines 2186 2186
=======================================
Hits 2085 2085
Misses 101 101 Continue to review full report at Codecov.
|
Disclaimer: I am not very familiar with what "reproducible builds" means in a Python context. Probably a very naive comment, but I have to admit that I can't really see a good reason why the docs need to be exactly reproducible. |
Thanks. Re. "Python context", in @Debian we build and ship the generated documentation, so ensuring that the results are the same is essential. :) (The "took 0.2s" is not even very useful for an end-user given that this is essentially a lame microbenchmark of the build server...) |
(The "took 0.2s" is not even very useful for an end-user given that
this is essentially a lame microbenchmark of the build server...)
If it's 0.2s it's useful to serve as a control that the code isn't very
long running. And in some situations, it's 10s, or 10mn, in which case it
is really useful to the user.
|
It's non-deterministic and essentially random value that is microbenchmarking the current CPU load of the machine building documentation so whilst I might agree that there is a difference between 20 seconds and 0.0002 seconds, I don't think any of them are long enough to offset the "cost" of not having a reproducible build :) |
I might agree that there is a difference between 20 seconds and 0.0002
seconds, I don't think any of them are long enough to offset the "cost"
of not having a reproducible build :)
I disagree. From a user point of view it's huge.
|
I don't see a problem with providing a config option to disable the messages during build. Would that solve your problem @lamby ? I guess you'd need to have a patch for our |
we can patch docs/conf.py no problem, so this would indeed solve Debian's problem |
Oh sure.. but I'd rather this is fixed upstream! :) In terms of a config option, simply detect the presence of the |
For 90%+ of our users, having the example run times reported is useful, even if there is some variability in the numbers (ballpark is good enough!). So I agree with @GaelVaroquaux that we should leave the default "on". For people who need builds to give identical output with identical input, the config option should work. |
Oh sure.. but I'd rather this is fixed upstream! :)
It's not a bug. The user cares, when looking at a library, to know
whether this library is slow or fast.
|
I'm not sure what you mean by this. Typically in |
I like the idea of reproducible builds, but I don't see how much sense it makes to have the docs reproducible. I do agree that the examples inside Sphinx-Gallery are so quick that the elapsed time adds more noise than content, but some of our users do appreciate this feature, since their examples can be very time consuming and they want to warn their users about this. |
Given that a reproducible build is a niche case and there seems to be some objection to removing these timings from the source, if you detect that someone is building the package documentation with the
@Debian is shipping the documentation pre-built in HTML form ← this may be the bit you are missing. :) |
We also need to make this option recognizable. But is true, we can keep our config the way it is and provide this information for our online version of the docs, as we want to advertise this feature. The Debian project could build enabling a flag to drop this statistics. |
Is my understanding correct that you can patch conf.py inside the Debian package with the changes in this PR? If that's the case I would be in favour of doing this for now. Let me try to give you my perspective from a small corner of the Scientific Python ecosystem: documentation build is very likely not reproducible and "upstream" is unlikely to care enough to accept PRs unless they have a very small impact, in particular in terms of maintenance cost. As an example, I have noticed in the past that the scikit-learn examples do not produce the exact same plots. It is likely that the main problem is some unset random state somewhere, which seems fixable, but it is going to be painful to find out which examples are the culprits. On top of this, suppose I want to demonstrate in an example how # random numbers will be different on each run. In particular if you are looking at the example
# on the website you are very likely not going to get the same numbers
for i in range(3):
print(random.random())
# now we set the random state and you should get the same numbers on each run
random.setstate(0)
for i in range(3):
print(random.random()) The output of the example is not reproducible but that is exactly the point. I am not sure what the reproduciblebuilds approach would be in such a case. |
I have encountered many such examples that deliberately use random numbers. In almost all cases PRs were accepted that if and only if some flag is set (eg . (That avoids very single downstream distribution having to manage their own patch — sure, Debian can patch the software, but what about Arch Linux, Nix, Opensuse, Guix, etc. etc? They are all striving to be reproducible too..) I worry we are are talking at cross-purposes here :) |
0c1f0d4
to
49d4f11
Compare
This PR has not gone anywhere. I think that the point of view of developers is that an option to not report computing time is possible, but it should be an option, off by default, and well documented. I think that I will close this PR, with the goal to unclutter our PR history, unless there is objection. |
Understandable. Would you accept a patch that dropped this time if-and-only-if the |
Understandable. Would you accept a patch that dropped this time if-and-only-if
the SOURCE_DATE_EPOCH variable is set?
I would rather have a more explicit variable name. This behavior is quite
an indirect one.
|
Unfortunately, the name is somewhat "stuck". See: https://reproducible-builds.org/specs/source-date-epoch/ |
How about if SOURCE_DATE_EPOCH is set, we override Once #374 is merged, we will also want to override it, too, since it's not going to be deterministic, either. So we need to think about the general case of what we wan to do to have reproducible builds. I think it's reasonable to mention in the config that having |
…phinx-gallery#390) Whilst working on the Reproducible Builds effort [0], we noticed that sphinx-gallery could not be built reproducibly. This patch removes the "Total running time of the script" messages from the documentation build if the SOURCE_DATE_EPOCH [1] environment variable is exported in the environment. This was originally filed in Debian as #901307 [2]. [0] https://reproducible-builds.org/ [1] https://reproducible-builds.org/specs/source-date-epoch/ [2] https://bugs.debian.org/901307
7e8dda2
to
aa65cbc
Compare
Sounds good to me. I've updated this PR to match, rebasing against the current master whilst I'm at it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to only modify the doc build for Sphinx gallery itself only (current PR) or for all packages that use SG? I think I misunderstood the scope of your suggestion. I thought we were talking about the latter but I realize now you might mean the former.
Kinda both… in the sense that sphinx gallery ships examples of itself:
|
+1 for merge from my end If other packages that use SG to build docs want 100% reproducible builds, this is one way to do it. And there is wide demand we could in the future build it into the package itself, but don't need to for now. |
I merged this. Setting this to the configuration file and not to the build part is the correct way to go. It also sets the example for other projects that might be interested in doing this. |
Whilst working on the Reproducible Builds effort [0], we noticed
that sphinx-gallery could not be built reproducibly.
Patch attached that removes the "Total running time of the script"
messages from the documentation build.
(There is some filesystem-related unreproduciblity that remains, alas.)
This was originally filed in Debian as #901307 [1].
[0] https://reproducible-builds.org/
[1] https://bugs.debian.org/901307