New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop "Total running time" when generating the documentation #390

Merged
merged 1 commit into from Sep 3, 2018

Conversation

Projects
None yet
7 participants
@lamby
Contributor

lamby commented Jun 11, 2018

Whilst working on the Reproducible Builds effort [0], we noticed
that sphinx-gallery could not be built reproducibly.

Patch attached that removes the "Total running time of the script"
messages from the documentation build.

(There is some filesystem-related unreproduciblity that remains, alas.)

This was originally filed in Debian as #901307 [1].

[0] https://reproducible-builds.org/
[1] https://bugs.debian.org/901307

@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io Jun 11, 2018

Codecov Report

Merging #390 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #390   +/-   ##
=======================================
  Coverage   95.37%   95.37%           
=======================================
  Files          29       29           
  Lines        2186     2186           
=======================================
  Hits         2085     2085           
  Misses        101      101

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d876e30...aa65cbc. Read the comment docs.

codecov-io commented Jun 11, 2018

Codecov Report

Merging #390 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #390   +/-   ##
=======================================
  Coverage   95.37%   95.37%           
=======================================
  Files          29       29           
  Lines        2186     2186           
=======================================
  Hits         2085     2085           
  Misses        101      101

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d876e30...aa65cbc. Read the comment docs.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 11, 2018

Contributor

Disclaimer: I am not very familiar with what "reproducible builds" means in a Python context. Probably a very naive comment, but I have to admit that I can't really see a good reason why the docs need to be exactly reproducible.

Contributor

lesteve commented Jun 11, 2018

Disclaimer: I am not very familiar with what "reproducible builds" means in a Python context. Probably a very naive comment, but I have to admit that I can't really see a good reason why the docs need to be exactly reproducible.

@lamby

This comment has been minimized.

Show comment
Hide comment
@lamby

lamby Jun 11, 2018

Contributor

Thanks.

Re. "Python context", in @Debian we build and ship the generated documentation, so ensuring that the results are the same is essential. :) (The "took 0.2s" is not even very useful for an end-user given that this is essentially a lame microbenchmark of the build server...)

Contributor

lamby commented Jun 11, 2018

Thanks.

Re. "Python context", in @Debian we build and ship the generated documentation, so ensuring that the results are the same is essential. :) (The "took 0.2s" is not even very useful for an end-user given that this is essentially a lame microbenchmark of the build server...)

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 11, 2018

Contributor
Contributor

GaelVaroquaux commented Jun 11, 2018

@lamby

This comment has been minimized.

Show comment
Hide comment
@lamby

lamby Jun 11, 2018

Contributor

It's non-deterministic and essentially random value that is microbenchmarking the current CPU load of the machine building documentation so whilst I might agree that there is a difference between 20 seconds and 0.0002 seconds, I don't think any of them are long enough to offset the "cost" of not having a reproducible build :)

Contributor

lamby commented Jun 11, 2018

It's non-deterministic and essentially random value that is microbenchmarking the current CPU load of the machine building documentation so whilst I might agree that there is a difference between 20 seconds and 0.0002 seconds, I don't think any of them are long enough to offset the "cost" of not having a reproducible build :)

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 11, 2018

Contributor
Contributor

GaelVaroquaux commented Jun 11, 2018

@larsoner

This comment has been minimized.

Show comment
Hide comment
@larsoner

larsoner Jun 11, 2018

Contributor

I don't see a problem with providing a config option to disable the messages during build. Would that solve your problem @lamby ? I guess you'd need to have a patch for our conf.py that you use, so maybe it's not so helpful.

Contributor

larsoner commented Jun 11, 2018

I don't see a problem with providing a config option to disable the messages during build. Would that solve your problem @lamby ? I guess you'd need to have a patch for our conf.py that you use, so maybe it's not so helpful.

@sandrotosi

This comment has been minimized.

Show comment
Hide comment
@sandrotosi

sandrotosi Jun 11, 2018

we can patch docs/conf.py no problem, so this would indeed solve Debian's problem

sandrotosi commented Jun 11, 2018

we can patch docs/conf.py no problem, so this would indeed solve Debian's problem

@lamby

This comment has been minimized.

Show comment
Hide comment
@lamby

lamby Jun 11, 2018

Contributor

Oh sure.. but I'd rather this is fixed upstream! :)

In terms of a config option, simply detect the presence of the SOURCE_DATE_EPOCH envionment variable.

Contributor

lamby commented Jun 11, 2018

Oh sure.. but I'd rather this is fixed upstream! :)

In terms of a config option, simply detect the presence of the SOURCE_DATE_EPOCH envionment variable.

@larsoner

This comment has been minimized.

Show comment
Hide comment
@larsoner

larsoner Jun 11, 2018

Contributor

For 90%+ of our users, having the example run times reported is useful, even if there is some variability in the numbers (ballpark is good enough!). So I agree with @GaelVaroquaux that we should leave the default "on". For people who need builds to give identical output with identical input, the config option should work.

Contributor

larsoner commented Jun 11, 2018

For 90%+ of our users, having the example run times reported is useful, even if there is some variability in the numbers (ballpark is good enough!). So I agree with @GaelVaroquaux that we should leave the default "on". For people who need builds to give identical output with identical input, the config option should work.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 11, 2018

Contributor
Contributor

GaelVaroquaux commented Jun 11, 2018

@larsoner

This comment has been minimized.

Show comment
Hide comment
@larsoner

larsoner Jun 11, 2018

Contributor

In terms of a config option, simply detect the presence of the SOURCE_DATE_EPOCH envionment variable.

I'm not sure what you mean by this.

Typically in sphinx-gallery config options are set by values in conf.py. These can be overridden by passing -D values during the build, or by users tweaking their own conf.py values.

Contributor

larsoner commented Jun 11, 2018

In terms of a config option, simply detect the presence of the SOURCE_DATE_EPOCH envionment variable.

I'm not sure what you mean by this.

Typically in sphinx-gallery config options are set by values in conf.py. These can be overridden by passing -D values during the build, or by users tweaking their own conf.py values.

@Titan-C

This comment has been minimized.

Show comment
Hide comment
@Titan-C

Titan-C Jun 11, 2018

Member

I like the idea of reproducible builds, but I don't see how much sense it makes to have the docs reproducible. I do agree that the examples inside Sphinx-Gallery are so quick that the elapsed time adds more noise than content, but some of our users do appreciate this feature, since their examples can be very time consuming and they want to warn their users about this.
We have the feature to hide this stats for projects where it makes no sense. But in the aim of discoverability of features Sphinx-Gallery includes this information.

Member

Titan-C commented Jun 11, 2018

I like the idea of reproducible builds, but I don't see how much sense it makes to have the docs reproducible. I do agree that the examples inside Sphinx-Gallery are so quick that the elapsed time adds more noise than content, but some of our users do appreciate this feature, since their examples can be very time consuming and they want to warn their users about this.
We have the feature to hide this stats for projects where it makes no sense. But in the aim of discoverability of features Sphinx-Gallery includes this information.

@lamby

This comment has been minimized.

Show comment
Hide comment
@lamby

lamby Jun 11, 2018

Contributor

Given that a reproducible build is a niche case and there seems to be some objection to removing these timings from the source, if you detect that someone is building the package documentation with the SOURCE_DATE_EPOCH environment variable exported you can assume that they would like a reproducible build and thus "drop" the timings.

I don't see how much sense it makes to have the docs reproducible.

@Debian is shipping the documentation pre-built in HTML form ← this may be the bit you are missing. :)

Contributor

lamby commented Jun 11, 2018

Given that a reproducible build is a niche case and there seems to be some objection to removing these timings from the source, if you detect that someone is building the package documentation with the SOURCE_DATE_EPOCH environment variable exported you can assume that they would like a reproducible build and thus "drop" the timings.

I don't see how much sense it makes to have the docs reproducible.

@Debian is shipping the documentation pre-built in HTML form ← this may be the bit you are missing. :)

@Titan-C

This comment has been minimized.

Show comment
Hide comment
@Titan-C

Titan-C Jun 11, 2018

Member

These can be overridden by passing -D values during the build.

We also need to make this option recognizable. But is true, we can keep our config the way it is and provide this information for our online version of the docs, as we want to advertise this feature. The Debian project could build enabling a flag to drop this statistics.

Member

Titan-C commented Jun 11, 2018

These can be overridden by passing -D values during the build.

We also need to make this option recognizable. But is true, we can keep our config the way it is and provide this information for our online version of the docs, as we want to advertise this feature. The Debian project could build enabling a flag to drop this statistics.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 12, 2018

Contributor

Is my understanding correct that you can patch conf.py inside the Debian package with the changes in this PR? If that's the case I would be in favour of doing this for now.

Let me try to give you my perspective from a small corner of the Scientific Python ecosystem: documentation build is very likely not reproducible and "upstream" is unlikely to care enough to accept PRs unless they have a very small impact, in particular in terms of maintenance cost.

As an example, I have noticed in the past that the scikit-learn examples do not produce the exact same plots. It is likely that the main problem is some unset random state somewhere, which seems fixable, but it is going to be painful to find out which examples are the culprits.

On top of this, suppose I want to demonstrate in an example how random.random and random.seed works with a snippet like this:

# random numbers will be different on each run. In particular if you are looking at the example
# on the website you are very likely not going to get the same numbers
for i in range(3):
    print(random.random())

# now we set the random state and you should get the same numbers on each run
random.setstate(0)
for i in range(3):
    print(random.random())

The output of the example is not reproducible but that is exactly the point. I am not sure what the reproduciblebuilds approach would be in such a case.

Contributor

lesteve commented Jun 12, 2018

Is my understanding correct that you can patch conf.py inside the Debian package with the changes in this PR? If that's the case I would be in favour of doing this for now.

Let me try to give you my perspective from a small corner of the Scientific Python ecosystem: documentation build is very likely not reproducible and "upstream" is unlikely to care enough to accept PRs unless they have a very small impact, in particular in terms of maintenance cost.

As an example, I have noticed in the past that the scikit-learn examples do not produce the exact same plots. It is likely that the main problem is some unset random state somewhere, which seems fixable, but it is going to be painful to find out which examples are the culprits.

On top of this, suppose I want to demonstrate in an example how random.random and random.seed works with a snippet like this:

# random numbers will be different on each run. In particular if you are looking at the example
# on the website you are very likely not going to get the same numbers
for i in range(3):
    print(random.random())

# now we set the random state and you should get the same numbers on each run
random.setstate(0)
for i in range(3):
    print(random.random())

The output of the example is not reproducible but that is exactly the point. I am not sure what the reproduciblebuilds approach would be in such a case.

@lamby

This comment has been minimized.

Show comment
Hide comment
@lamby

lamby Jun 12, 2018

Contributor

documentation build is very likely not reproducible and "upstream" is unlikely to care enough to accept PRs unless they have a very small impact

I have encountered many such examples that deliberately use random numbers. In almost all cases PRs were accepted that if and only if some flag is set (eg .SOURCE_DATE_EPOCH) then it is built reproducibly. :)

(That avoids very single downstream distribution having to manage their own patch — sure, Debian can patch the software, but what about Arch Linux, Nix, Opensuse, Guix, etc. etc? They are all striving to be reproducible too..)

I worry we are are talking at cross-purposes here :)

Contributor

lamby commented Jun 12, 2018

documentation build is very likely not reproducible and "upstream" is unlikely to care enough to accept PRs unless they have a very small impact

I have encountered many such examples that deliberately use random numbers. In almost all cases PRs were accepted that if and only if some flag is set (eg .SOURCE_DATE_EPOCH) then it is built reproducibly. :)

(That avoids very single downstream distribution having to manage their own patch — sure, Debian can patch the software, but what about Arch Linux, Nix, Opensuse, Guix, etc. etc? They are all striving to be reproducible too..)

I worry we are are talking at cross-purposes here :)

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Aug 31, 2018

Contributor

This PR has not gone anywhere.

I think that the point of view of developers is that an option to not report computing time is possible, but it should be an option, off by default, and well documented.

I think that I will close this PR, with the goal to unclutter our PR history, unless there is objection.

Contributor

GaelVaroquaux commented Aug 31, 2018

This PR has not gone anywhere.

I think that the point of view of developers is that an option to not report computing time is possible, but it should be an option, off by default, and well documented.

I think that I will close this PR, with the goal to unclutter our PR history, unless there is objection.

@lamby

This comment has been minimized.

Show comment
Hide comment
@lamby

lamby Aug 31, 2018

Contributor

I think that I will close this PR, with the goal to unclutter our PR history, unless there is objection.

Understandable. Would you accept a patch that dropped this time if-and-only-if the SOURCE_DATE_EPOCH variable is set?

Contributor

lamby commented Aug 31, 2018

I think that I will close this PR, with the goal to unclutter our PR history, unless there is objection.

Understandable. Would you accept a patch that dropped this time if-and-only-if the SOURCE_DATE_EPOCH variable is set?

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Aug 31, 2018

Contributor
Contributor

GaelVaroquaux commented Aug 31, 2018

@lamby

This comment has been minimized.

Show comment
Hide comment
@lamby

lamby Aug 31, 2018

Contributor

I would rather have a more explicit variable name.

Unfortunately, the name is somewhat "stuck". See: https://reproducible-builds.org/specs/source-date-epoch/

Contributor

lamby commented Aug 31, 2018

I would rather have a more explicit variable name.

Unfortunately, the name is somewhat "stuck". See: https://reproducible-builds.org/specs/source-date-epoch/

@larsoner

This comment has been minimized.

Show comment
Hide comment
@larsoner

larsoner Sep 2, 2018

Contributor

How about if SOURCE_DATE_EPOCH is set, we override min_reported_time to be np.inf?

Once #374 is merged, we will also want to override it, too, since it's not going to be deterministic, either. So we need to think about the general case of what we wan to do to have reproducible builds. I think it's reasonable to mention in the config that having SOURCE_DATE_EPOCH set will override specific config vars, so far just min_reported_time but soon also show_memory, too.

Contributor

larsoner commented Sep 2, 2018

How about if SOURCE_DATE_EPOCH is set, we override min_reported_time to be np.inf?

Once #374 is merged, we will also want to override it, too, since it's not going to be deterministic, either. So we need to think about the general case of what we wan to do to have reproducible builds. I think it's reasonable to mention in the config that having SOURCE_DATE_EPOCH set will override specific config vars, so far just min_reported_time but soon also show_memory, too.

Don't show "Total running time" if SOURCE_DATE_EPOCH is set (Closes: #…
…390)

Whilst working on the Reproducible Builds effort [0], we noticed that
sphinx-gallery could not be built reproducibly.

This patch removes the "Total running time of the script" messages from the
documentation build if the SOURCE_DATE_EPOCH [1] environment variable is
exported in the environment.

This was originally filed in Debian as #901307 [2].

 [0] https://reproducible-builds.org/
 [1] https://reproducible-builds.org/specs/source-date-epoch/
 [2] https://bugs.debian.org/901307
@lamby

This comment has been minimized.

Show comment
Hide comment
@lamby

lamby Sep 3, 2018

Contributor

Sounds good to me. I've updated this PR to match, rebasing against the current master whilst I'm at it.

Contributor

lamby commented Sep 3, 2018

Sounds good to me. I've updated this PR to match, rebasing against the current master whilst I'm at it.

@larsoner

Do you mean to only modify the doc build for Sphinx gallery itself only (current PR) or for all packages that use SG? I think I misunderstood the scope of your suggestion. I thought we were talking about the latter but I realize now you might mean the former.

@lamby

This comment has been minimized.

Show comment
Hide comment
@lamby

lamby Sep 3, 2018

Contributor

Kinda both… in the sense that sphinx gallery ships examples of itself:

image

│ │ │ ├── ./usr/share/doc/python-sphinx-gallery-doc/html/_sources/auto_examples/plot_choose_thumbnail.rst.txt
│ │ │ │ @@ -64,15 +64,15 @@
│ │ │ │  
│ │ │ │          plt.show()
│ │ │ │  
│ │ │ │  
│ │ │ │      if __name__ == '__main__':
│ │ │ │          main()
│ │ │ │  
│ │ │ │ -**Total running time of the script:** ( 0 minutes  0.048 seconds)
│ │ │ │ +**Total running time of the script:** ( 0 minutes  0.035 seconds)
Contributor

lamby commented Sep 3, 2018

Kinda both… in the sense that sphinx gallery ships examples of itself:

image

│ │ │ ├── ./usr/share/doc/python-sphinx-gallery-doc/html/_sources/auto_examples/plot_choose_thumbnail.rst.txt
│ │ │ │ @@ -64,15 +64,15 @@
│ │ │ │  
│ │ │ │          plt.show()
│ │ │ │  
│ │ │ │  
│ │ │ │      if __name__ == '__main__':
│ │ │ │          main()
│ │ │ │  
│ │ │ │ -**Total running time of the script:** ( 0 minutes  0.048 seconds)
│ │ │ │ +**Total running time of the script:** ( 0 minutes  0.035 seconds)
@larsoner

This comment has been minimized.

Show comment
Hide comment
@larsoner

larsoner Sep 3, 2018

Contributor

+1 for merge from my end

If other packages that use SG to build docs want 100% reproducible builds, this is one way to do it. And there is wide demand we could in the future build it into the package itself, but don't need to for now.

Contributor

larsoner commented Sep 3, 2018

+1 for merge from my end

If other packages that use SG to build docs want 100% reproducible builds, this is one way to do it. And there is wide demand we could in the future build it into the package itself, but don't need to for now.

@Titan-C Titan-C merged commit 0cf4603 into sphinx-gallery:master Sep 3, 2018

5 checks passed

ci/circleci: build_docs Your tests passed on CircleCI!
Details
codecov/patch Coverage not affected when comparing d876e30...aa65cbc
Details
codecov/project 95.37% remains the same compared to d876e30
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@Titan-C

This comment has been minimized.

Show comment
Hide comment
@Titan-C

Titan-C Sep 3, 2018

Member

I merged this. Setting this to the configuration file and not to the build part is the correct way to go. It also sets the example for other projects that might be interested in doing this.

Member

Titan-C commented Sep 3, 2018

I merged this. Setting this to the configuration file and not to the build part is the correct way to go. It also sets the example for other projects that might be interested in doing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment