Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building unchanged documentation takes very long #14204

Closed
jdemeyer opened this issue Feb 28, 2013 · 30 comments
Closed

Building unchanged documentation takes very long #14204

jdemeyer opened this issue Feb 28, 2013 · 30 comments

Comments

@jdemeyer
Copy link

Even with #14156, rebuilding the documentation when nothing needs to be done (make doc right after make doc) still takes a very long time.

In sage-5.7:

real    0m48.142s
user    0m37.120s
sys     0m10.110s

In sage-5.8.beta2:

real    5m52.020s
user    4m39.910s
sys     1m5.140s

CC: @jhpalmieri @vbraun @hivert @williamstein

Component: build

Author: Jeroen Demeyer

Reviewer: John Palmieri

Merged: sage-5.8.beta4

Issue created by migration from https://trac.sagemath.org/ticket/14204

@jhpalmieri
Copy link
Member

comment:1

I don't know. I think there are 74 documents that need to be built: 56 modules in the reference manual, 1 master reference manual doc, and 17 others. When you run make doc, 57 of those get built twice. For each build, on my machine, it takes 6 seconds to run Sphinx and decide that nothing needs to be done. This all adds up.

I wonder if it could be sped up much by calling Sphinx internally in Python, instead of as a command-line call each time. Otherwise, I don't have any ideas. Users should be advised to run sage --docbuild reference/algebras html, etc., rather than rebuilding the full reference manual. They should also be advised to buy a machine with lots of cores and set MAKE=make -jN for some large number N :)

Should this really be a blocker?

@jhpalmieri
Copy link
Member

comment:2

For what it's worth, you can cut down on the build time by setting MAKE=make -jN, where N is somewhat larger than the number of processors, at least if nothing needs to be rebuilt. On my machine, which has 2 processors:

$ export MAKE='make -j2' 
$ time make doc
...
real	4m59.845s
user	4m33.320s
sys	2m0.269s

and

$ export MAKE='make -j4' 
$ time make doc
...
real	4m4.408s
user	4m41.729s
sys	1m59.842s

@jdemeyer
Copy link
Author

jdemeyer commented Mar 1, 2013

comment:3

Replying to @jhpalmieri:

Should this really be a blocker?

That's a personal opinion of course.

In my opinion, the answer is absolutely "Yes". If you regularly run make ptestlong, this really is a massive regression. I think the problem mentioned in this ticket is much worse than the problem solved in #6495.

@jhpalmieri
Copy link
Member

comment:4

How long does the rest of make ptestlong take? Why does a few extra minutes matter that much?

@jdemeyer
Copy link
Author

jdemeyer commented Mar 1, 2013

comment:5

Replying to @jhpalmieri:

How long does the rest of make ptestlong take? Why does a few extra minutes matter that much?

OK, with 4 processes on sage.math, the documentation takes 188s and the doctesting (without --long) takes 1065s, with #12415. So, it's certainly noticable.

Why does a few extra minutes matter that much?

It's not that those extra minutes matter that much, but those lost minutes matter much more to me than the minutes saved by #6495.

@jdemeyer
Copy link
Author

jdemeyer commented Mar 1, 2013

comment:6

Replying to @jhpalmieri:

For each build, on my machine, it takes 6 seconds to run Sphinx and decide that nothing needs to be done.

Do you have any idea what Sphinx does in those 6 seconds? Does it import something from sage?

@jhpalmieri
Copy link
Member

comment:7

If I put a call to cProfile in the docbuild command in spkg/bin/sage when building reference/homology, it doesn't tell me much:

         178370 function calls (174697 primitive calls) in 5.695 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    4.087    2.043    4.087    2.043 {posix.waitpid}
       29    1.125    0.039    1.125    0.039 {method 'read' of 'file' objects}
        1    0.041    0.041    5.696    5.696 builder.py:12(<module>)
       15    0.019    0.001    0.032    0.002 sre_compile.py:301(_optimize_unicode)
       ...

If I put it instead into the Python command called in builder.py, I get more information:

         962034 function calls (948847 primitive calls) in 4.297 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    18537    1.142    0.000    1.142    0.000 {method 'read' of 'file' objects}
       64    0.199    0.003    2.480    0.039 all.py:1(<module>)
    23751    0.159    0.000    0.279    0.000 intersphinx.py:90(split_lines)
35847/35841    0.133    0.000    0.172    0.000 posixpath.py:60(join)
       57    0.112    0.002    0.571    0.010 intersphinx.py:74(read_inventory_v2)
        1    0.081    0.081    2.803    2.803 all.py:38(<module>)
        2    0.079    0.039    0.079    0.040 {cPickle.load}
    29365    0.063    0.000    0.063    0.000 {_codecs.utf_8_decode}
 2269/568    0.057    0.000    0.163    0.000 sre_parse.py:379(_parse)
       17    0.053    0.003    2.075    0.122 all.py:3(<module>)
       ...

We're calling certain functions a huge number of times, but we're not importing sage.all (as far as I can tell). I don't know why we're calling posixpath.join 35000 times, for example: that seems like a lot. Could it be reading every file in the Sage library each time, instead of only the files referenced in that section?

@jhpalmieri
Copy link
Member

comment:8

Actually, it is importing sage.all every time.

@jdemeyer
Copy link
Author

jdemeyer commented Mar 2, 2013

comment:9

I think the issue is

        build_command = 'python '+os.path.join(SAGE_DOC, 'common', 'custom-sphinx-build.py')
        build_command += ' -b %s -d %s %s %s %s'%(type, self._doctrees_dir(),
                                                  options, self.dir,
                                                  output_dir)
        logger.debug(build_command)
        subprocess.call(build_command, shell=True)

Could we replace this subprocess stuff by a simple Python function call or an execfile()?

@jhpalmieri
Copy link
Member

comment:10

Replying to @jdemeyer:

I think the issue is

        build_command = 'python '+os.path.join(SAGE_DOC, 'common', 'custom-sphinx-build.py')
        build_command += ' -b %s -d %s %s %s %s'%(type, self._doctrees_dir(),
                                                  options, self.dir,
                                                  output_dir)
        logger.debug(build_command)
        subprocess.call(build_command, shell=True)

Could we replace this subprocess stuff by a simple Python function call or an execfile()?

I tried to do that, but I think Sphinx has a memory leak (this has been suggested before). For whatever reason, after a little while it uses a lot of memory and slows way down. Actually, the inventory build sped up (from almost 2 minutes before down to about half a minute on an OS X machine with 2 cores), but the html build does not.

In more detail, I tried importing sage.all once, and then using a Python function call to build each document: I basically stuck all of custom-sphinx-builder.py into builder.py, and replaced the subprocess.call(...) with

        from sphinx.cmdline import main
        main(args)

This way sage.all would be imported just once, not 75 times. But there are issues. Maybe we should call main(args) some other way, so the memory is guaranteed to be released afterward? Any suggestions?

@jdemeyer
Copy link
Author

jdemeyer commented Mar 2, 2013

comment:11

Replying to @jhpalmieri:

Any suggestions?

Use multiprocessing.Pool(maxtasksperchild=1). This way, every child would just call Sphinx once, no cleanup to worry about. Make sure that everything that needs to be imported is imported by the master process (the one that calls ``multiprocessing.Pool()`.

@vbraun
Copy link
Member

vbraun commented Mar 3, 2013

comment:12

My laptop builds the new-style documentation in 55 seconds with 8x parallel if nothing is changed. And this is from cold caches.

Jeroen, you are just trying to replace the subprocess call with a fork. That just saves us a couple of Sage startups, each of which is 1.5 seconds. Thats maybe 10% performance that you can get out of there, right?

You need to pull in a number of files across the different doc pieces to figure out if stuff is out-of-date, so I guess this makes this ticket extremely dependent on small file read performance. Thats just the price of having to synchronize multiple separate sub-documents.

William, is there a chance to get a SSD (presumably into sage.math) for Jeroen?

@jdemeyer
Copy link
Author

jdemeyer commented Mar 3, 2013

comment:13

Replying to @vbraun:

My laptop builds the new-style documentation in 55 seconds with 8x parallel if nothing is changed. And this is from cold caches.

Which is probably about the same time as one process for the old docbuilder.

Jeroen, you are just trying to replace the subprocess call with a fork.

Sure.

That just saves us a couple of Sage startups, each of which is 1.5 seconds. Thats maybe 10% performance that you can get out of there, right?

I doubt it. My timings suggest that some documents need 2 ~ 3 seconds to build, which is pretty close to the Sage startup time. So I think there is a substantial gain there.

I think the elephant in the room is still: why did we care so much about speeding up the initial build of the documentation (#6495) at the expense of rebuilds of the documentation? When doing a few rebuilds of the documentation, you lose all the time which is gained by #6495.

@jhpalmieri
Copy link
Member

comment:14

Why did we care so much about speeding up the initial build of the documentation (#6495) at the expense of rebuilds of the documentation?

When I was thinking about #6495, I was imagining the following situations, among others:

  • the initial make
  • some people still clone the Sage repo, which usually triggers a full rebuild of the docs
  • edit some files (say in combinat) and run sage --docbuild reference/combinat html
  • building the pdf docs

For all of these, #6495 helps considerably. When I think about running make ptestlong (which I do more frequently than make ptest), I didn't think a little added time was much of an issue.

@jdemeyer
Copy link
Author

jdemeyer commented Mar 4, 2013

comment:15

Could we at the very least get rid of the subprocess call? It probably won't make the documentation build as fast as before, but there should be some considerable gain.

@jdemeyer
Copy link
Author

jdemeyer commented Mar 4, 2013

comment:16

Replying to @vbraun:

William, is there a chance to get a SSD (presumably into sage.math) for Jeroen?

There already is a SSD in sage.math for Jeroen, and the timings I mention are with that SSD.

@jdemeyer
Copy link
Author

jdemeyer commented Mar 4, 2013

comment:17

This is a proof-of-concept patch for getting rid of subprocess, giving a speedup of roughly a factor 2.

@jdemeyer
Copy link
Author

jdemeyer commented Mar 4, 2013

comment:18

Perhaps more imports could be done it the master process, but I don't know how much there is to gain. Running strace shows that the docbuilder tries to open() a non-existing file 270200 times(!).

@jdemeyer
Copy link
Author

jdemeyer commented Mar 4, 2013

comment:19

Add a cache for the intersphinx inventory, reducing the total time again by more than a factor of 2. Still untested proof-of-concept.

@jdemeyer
Copy link
Author

jdemeyer commented Mar 4, 2013

comment:20

Patch seems to work, I'm setting it to needs_review very tentatively. Somebody who knows Sphinx better than me should look at it.

With this patch, building the documentation with 2 threads is now faster than before #6495, so it would fix the issue.

@jdemeyer
Copy link
Author

jdemeyer commented Mar 4, 2013

Author: Jeroen Demeyer

@jhpalmieri
Copy link
Member

comment:21

The patch seems to work as advertised, and the changes make sense to me. Volker, do you concur? (Note that the commit message is not ideal ;)

@jdemeyer
Copy link
Author

jdemeyer commented Mar 5, 2013

Attachment: 14204_sphinx_fork.patch.gz

@jdemeyer
Copy link
Author

jdemeyer commented Mar 5, 2013

comment:22

Fixed commit message.

@jhpalmieri
Copy link
Member

Reviewer: John Palmieri

@jdemeyer
Copy link
Author

jdemeyer commented Mar 7, 2013

Merged: sage-5.8.beta4

@kcrisman
Copy link
Member

comment:25

I have a followup question. I'm working with a student on learning to develop, and his first patch is a pretty basic documentation upgrade. Changing just one file and then doing sage -docbuild reference html takes eons on a reasonably modern Ubuntu laptop because it checks ALL the files. It's true that we didn't necessarily set the number of parallel cores. But it still really slows down checking whether things look right. Any thoughts, or is this a wontfix? I'm hesitant to open a ticket until I know it really should be considered a bug, though I do think it is a pretty nasty (non-mathematical) regression.

@jhpalmieri
Copy link
Member

comment:26

All the more reason to set MAKE so it uses multiple threads! On my (not reasonably modern) OS X machine, rebuilding the documentation after modifying one file, with MAKE unset, takes

real	2m44.637s
user	2m7.278s
sys	0m22.776s

With MAKE set to use two cores:

real	1m55.552s
user	2m12.224s
sys	0m24.907s

On sage.math, with MAKE='make -j12':

real	0m34.899s
user	1m34.650s
sys	0m16.800s

So using multiple cores is very advantageous. Note also that if you are only modifying a single file, or files within one section of the reference manual, be aware that you can do

sage --docbuild reference/homology html

(or reference/combinat or reference/matrices or ...). This will be very fast (except when rebuilding the combinat docs ;), even with a single core.

@jdemeyer
Copy link
Author

comment:27

Replying to @kcrisman:

I have a followup question. I'm working with a student on learning to develop, and his first patch is a pretty basic documentation upgrade. Changing just one file and then doing sage -docbuild reference html takes eons on a reasonably modern Ubuntu laptop because it checks ALL the files.

On my reasonably modern Gentoo laptop, this takes about 40 seconds (with 1 thread), which I personally find reasonable.

I do think it is a pretty nasty (non-mathematical) regression.

I don't think it's such a big regression. With 1 thread, it is slightly slower than before, but not by much.

If you do want to create a ticket, this is the place to do it: http://bitbucket.org/birkenfeld/sphinx/issues/

@kcrisman
Copy link
Member

comment:28

I see, so this isn't something we can fix more without getting rid of parallel documentation building. I think this was a lot faster before because it only had to do a sweep through one directory.

With Sage 5.2, after touching one file and sage -b, the documentation build takes well under 15 seconds, but in 5.9 the same thing took about 80 seconds (I had to use a watch for this, since it didn't print it out).

Maybe we could add something about both of John's solutions to the Sage manuals developer page? I'd be open to opening such a ticket. Maybe my student could even write the patch ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants