Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplistic multiprocessing.Pool replacement for parallel docbuild on older Cygwin #27490

Closed
embray opened this issue Mar 15, 2019 · 41 comments
Closed

Comments

@embray
Copy link
Contributor

embray commented Mar 15, 2019

For Cygwin versions less than 3.0.0 (and only Cygwin) this replaces the use of multiprocessing.Pool in the sage_setup.docbuild.build_many function, with a naïve but "good enough" (it works in general) parallel process pool that does not rely on starting processes from threads.

This is needed for #27214, because the specific combination of using MAP_NORESERVE mmaps and forking processes from a thread can result in a bug in Cygwin (fixed in 3.0.0) which causes unhandled segfaults to occur in any code that is run during the docbuild which uses libgap.

So this is really only needed so that the docs can continue to be built on systems (including my primary development environment, as well as the buildbot) that do not yet have Cygwin >= 3.0.0 once #27214 is applied.

CC: @jdemeyer

Component: porting: Cygwin

Author: Erik Bray

Branch: fe0e3ea

Reviewer: Jeroen Demeyer

Issue created by migration from https://trac.sagemath.org/ticket/27490

@embray embray added this to the sage-8.7 milestone Mar 15, 2019
@embray
Copy link
Contributor Author

embray commented Mar 15, 2019

Branch: u/embray/ticket-27490

@embray
Copy link
Contributor Author

embray commented Mar 15, 2019

Author: Erik Bray

@embray
Copy link
Contributor Author

embray commented Mar 15, 2019

Commit: f9aabb7

@embray
Copy link
Contributor Author

embray commented Mar 15, 2019

comment:1

Jeroen, you will likely have thoughts about this. Keep in mind, it's not meant to be at all robust--just a quick workaround so I don't have to spend too much more time on it. But if you have any thoughts on straightforward improvements to this I'm all for it.

Obviously a better workaround, if it were possible, would be to use the much talked-about idea for generalizing the parallel processing loop from the Sage doctester. But since we don't have that yet this will do for now.


New commits:

f9aabb7Trac #27490: Simplistic multiprocessing.Pool replacement for parallel docbuild on older Cygwin

@jdemeyer
Copy link

comment:2

I'm just wondering why you even bother with parallel docbuilding in the first place. The obvious solution is just using a single process.

@jdemeyer
Copy link

comment:3

Also, in general I don't like code of the form

if A_is_not_broken:
    A()
else:
    B()

If B works well enough to replace A, then why don't we just use B unconditionally?

This is especially relevant when A involves multiprocessing.Pool which has other issues (that's why I didn't use it in the doctester).

So I would like to know if there is a good reason to not use your "simplistic multiprocessing.Pool replacement" on all systems.

@jdemeyer
Copy link

comment:4

Some comments on the code:

  1. This should use os.wait() instead of time.sleep(5). Unless I'm missing something, this should be robust.

  2. A few comments would be helpful.

  3. not any(filter(None, workers)) is not an easy condition to understand. I would write it as all(w is None for w in workers).

  4. I don't think that there is a need to gracefully shutdown the workers in the finally block. A hard kill os.kill(w.pid, signal.SIGKILL) may be better because it guarantees to kill the process (cysignals catches SIGTERM which does sys.exit() but that might not actually kill the process in a sufficiently messed up state). For extra safety, maybe call is_alive() before killing the process.

In general, I like it. It's simple but the use case is sufficiently simple that we don't need anything more complicated. And it says a lot about multiprocess.Pool that this might actually work better than multiprocessing.Pool.

@embray
Copy link
Contributor Author

embray commented Mar 15, 2019

comment:5

Thanks for having a look.

Replying to @jdemeyer:

Some comments on the code:

  1. This should use os.wait() instead of time.sleep(5). Unless I'm missing something, this should be robust.

Yes, that should be much better.

  1. A few comments would be helpful.

+1

  1. not any(filter(None, workers)) is not an easy condition to understand. I would write it as all(w is None for w in workers).

I thought it was pretty straightforward, but I guess the not any is a little confusing.

  1. I don't think that there is a need to gracefully shutdown the workers in the finally block. A hard kill os.kill(w.pid, signal.SIGKILL) may be better because it guarantees to kill the process (cysignals catches SIGTERM which does sys.exit() but that might not actually kill the process in a sufficiently messed up state). For extra safety, maybe call is_alive() before killing the process.

I think there is: Or at least to try to SIGTERM first. Reason being this block can be reached if one process exits in error, while other processes are still working perfectly fine. You want to gracefully shut them down and ensure that their atexit handlers run, clean up temp files, etc.

In general, I like it. It's simple but the use case is sufficiently simple that we don't need anything more complicated. And it says a lot about multiprocess.Pool that this might actually work better than multiprocessing.Pool.

I don't know that it says a lot. I don't think it actually works "better" on the whole, just in this one case. Keep in mind also that multiprocessing.Pool implements a lot of generic functionality (e.g. multiple simultaneous map_async jobs) that simply aren't needed for this simpler use case.

One downside to this approach is that there is no data returned from the child processes to the parent. So for example an exception raised in a worker cannot be re-raised from the parent. Instead I just raise a generic RuntimeError if the process exited with an error code. I simulated this case in testing and you can still see the worker's exception and traceback printed to stderr, so that was good-enough for my purposes.

@embray
Copy link
Contributor Author

embray commented Mar 15, 2019

comment:6

Replying to @jdemeyer:

Also, in general I don't like code of the form

if A_is_not_broken:
    A()
else:
    B()

If B works well enough to replace A, then why don't we just use B unconditionally?

This is especially relevant when A involves multiprocessing.Pool which has other issues (that's why I didn't use it in the doctester).

So I would like to know if there is a good reason to not use your "simplistic multiprocessing.Pool replacement" on all systems.

Partly for the reason I mentioned at the end of my previous comment, and partly just because I need this now and although I'm convinced it's robust-enough for my use it's still not well-tested.

How about for now we special-case this, and then for the next release make it a priority to finally get at least an initial version of the doctest forker code released and replace it with that?

@embray
Copy link
Contributor Author

embray commented Mar 15, 2019

comment:7

Replying to @jdemeyer:

I'm just wondering why you even bother with parallel docbuilding in the first place. The obvious solution is just using a single process.

At first I did just replace this with just plain map(target, args) but I wasn't satisfied: It was slow (obviously) and made memory usage even worse than it already is, over time. It still seems better, if not kind of brushing other problems under the rug, to build each sub-doc in its own process that can easily be cleaned up when it's done.

The parallel version only took about 20 minutes to get working.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 15, 2019

Branch pushed to git repo; I updated commit sha1. New commits:

219e9c4A little bit of import cleanup
88771dfTrac #27490: Address some review comments and other cleanup:

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 15, 2019

Changed commit from f9aabb7 to 88771df

@embray
Copy link
Contributor Author

embray commented Mar 15, 2019

comment:9

I addressed most of your comments, but I still have the big if/else. Since that sage_setup.docbuild.__init__ module is already quite large, perhaps I could move that code to a utility module at least.

I'd still be open to just using it on all platforms, but I'm wary, given that this isn't battle-tested.

@jdemeyer
Copy link

comment:10

Replying to @embray:

I'd still be open to just using it on all platforms, but I'm wary, given that this isn't battle-tested.

I see your point.

@jdemeyer
Copy link

Reviewer: Jeroen Demeyer

@jdemeyer
Copy link

comment:11

I'm willing to give this the benefit of the doubt. You'll probably be the only user of this code anyway.

I'm sure that there is room for improvement (I still don't like the finally block for example), but we can defer that to the point that we decide to use this code for all systems.

@embray
Copy link
Contributor Author

embray commented Mar 15, 2019

comment:12

Thanks. However, I need to double back now since my last change broke something. It's not starting new processes up after previous ones finish.

@embray
Copy link
Contributor Author

embray commented Mar 18, 2019

comment:21

Which "abomination"? I'm not necessarily going to do anything at your behest if you put it in such negative terms.

@embray
Copy link
Contributor Author

embray commented Mar 18, 2019

comment:22

(Which is not not say I necessarily think this is pretty as-is but it's still not even clear exactly what you're asking to move).

@vbraun
Copy link
Member

vbraun commented Mar 18, 2019

comment:23

Im asking you to move the parallel implementation of multiprocessing into a separate file, ideally with a small doctstring explaining what is going on here.

@embray
Copy link
Contributor Author

embray commented Mar 19, 2019

comment:24

But both versions? Just the one I added?

@vbraun
Copy link
Member

vbraun commented Mar 19, 2019

comment:25

At least for now just the one you added in the else branch. Thanks.

@embray
Copy link
Contributor Author

embray commented Mar 19, 2019

comment:26

Okay, I'll do that.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 19, 2019

Branch pushed to git repo; I updated commit sha1. New commits:

78f8938Trac #27490: Moved the alternate build_many implementation into a

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 19, 2019

Changed commit from 1e5b1f5 to 78f8938

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 19, 2019

Branch pushed to git repo; I updated commit sha1. New commits:

3a35e4dfixup

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 19, 2019

Changed commit from 78f8938 to 3a35e4d

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 19, 2019

Changed commit from 3a35e4d to fe0e3ea

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 19, 2019

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

fe0e3eaTrac #27490: Moved the alternate build_many implementation into a

@embray
Copy link
Contributor Author

embray commented Mar 19, 2019

comment:30

(Just removed an unused import.)

@vbraun
Copy link
Member

vbraun commented Mar 19, 2019

Changed branch from u/embray/ticket-27490 to fe0e3ea

@vbraun
Copy link
Member

vbraun commented Mar 19, 2019

comment:33

Followup at #27514

@vbraun
Copy link
Member

vbraun commented Mar 19, 2019

Changed commit from fe0e3ea to none

dkrenn added a commit to dkrenn/sage that referenced this issue May 25, 2023
…regular-guess

* u/dkrenn/sequences/rec-hash: (8211 commits)
  Updated SageMath version to 8.7
  Updated SageMath version to 8.7.rc0
  Trac sagemath#27490: Moved the alternate build_many implementation into a sage_setup.docbuild.utils module.
  Trac sagemath#27490: Further fixes in use of os.wait()
  Trac sagemath#27214: Patch GAP to allocate its memory pool using MAP_NORESERVE on Cygwin
  Trac sagemath#27490: Address some review comments and other cleanup:
  A little bit of import cleanup
  Trac sagemath#27490: Simplistic multiprocessing.Pool replacement for parallel docbuild on older Cygwin
  Fix alarm() test when cysignals was compiled with debugging
  Trac sagemath#27485: Use sdh_cmake in the spkg-install for primecount.
  Trac sagemath#27484: Add shd_cmake helper for running cmake with the correct flags for building Sage SPKGs.
  cysignals should be a normal dependency
  Upgrade to Cysignals 1.10.2
  Upgrade to notebook-5.7.6
  Trac sagemath#27461: Add abs tol on this test to account for minor numerical difference on Cygwin due to libm differences.
  Replacing None < infinity comparison with equivalent code.
  Some last little tidbits for uniformity.
  Removing some code duplication for __pth_root (changed to _pth_root_func).
  One more xderinv added.
  trac 27474: move some references to the master bibliography file.
  ...
dkrenn added a commit to dkrenn/sage that referenced this issue Jul 4, 2023
…ular-warning

* u/dkrenn/sequences/k-regular-guess: (8211 commits)
  Updated SageMath version to 8.7
  Updated SageMath version to 8.7.rc0
  Trac sagemath#27490: Moved the alternate build_many implementation into a sage_setup.docbuild.utils module.
  Trac sagemath#27490: Further fixes in use of os.wait()
  Trac sagemath#27214: Patch GAP to allocate its memory pool using MAP_NORESERVE on Cygwin
  Trac sagemath#27490: Address some review comments and other cleanup:
  A little bit of import cleanup
  Trac sagemath#27490: Simplistic multiprocessing.Pool replacement for parallel docbuild on older Cygwin
  Fix alarm() test when cysignals was compiled with debugging
  Trac sagemath#27485: Use sdh_cmake in the spkg-install for primecount.
  Trac sagemath#27484: Add shd_cmake helper for running cmake with the correct flags for building Sage SPKGs.
  cysignals should be a normal dependency
  Upgrade to Cysignals 1.10.2
  Upgrade to notebook-5.7.6
  Trac sagemath#27461: Add abs tol on this test to account for minor numerical difference on Cygwin due to libm differences.
  Replacing None < infinity comparison with equivalent code.
  Some last little tidbits for uniformity.
  Removing some code duplication for __pth_root (changed to _pth_root_func).
  One more xderinv added.
  trac 27474: move some references to the master bibliography file.
  ...
dkrenn added a commit to dkrenn/sage that referenced this issue Jul 4, 2023
SageMath version 8.7, Release Date: 2019-03-23

* tag '8.7': (943 commits)
  Updated SageMath version to 8.7
  Updated SageMath version to 8.7.rc0
  Trac sagemath#27490: Moved the alternate build_many implementation into a sage_setup.docbuild.utils module.
  Trac sagemath#27490: Further fixes in use of os.wait()
  Trac sagemath#27214: Patch GAP to allocate its memory pool using MAP_NORESERVE on Cygwin
  Trac sagemath#27490: Address some review comments and other cleanup:
  A little bit of import cleanup
  Trac sagemath#27490: Simplistic multiprocessing.Pool replacement for parallel docbuild on older Cygwin
  Fix alarm() test when cysignals was compiled with debugging
  Trac sagemath#27485: Use sdh_cmake in the spkg-install for primecount.
  Trac sagemath#27484: Add shd_cmake helper for running cmake with the correct flags for building Sage SPKGs.
  cysignals should be a normal dependency
  Upgrade to Cysignals 1.10.2
  Upgrade to notebook-5.7.6
  Trac sagemath#27461: Add abs tol on this test to account for minor numerical difference on Cygwin due to libm differences.
  Replacing None < infinity comparison with equivalent code.
  Some last little tidbits for uniformity.
  Removing some code duplication for __pth_root (changed to _pth_root_func).
  One more xderinv added.
  trac 27474: move some references to the master bibliography file.
  ...
dkrenn added a commit to dkrenn/sage that referenced this issue Aug 3, 2023
…ounded

* u/dkrenn/k-regular-warning: (8211 commits)
  Updated SageMath version to 8.7
  Updated SageMath version to 8.7.rc0
  Trac sagemath#27490: Moved the alternate build_many implementation into a sage_setup.docbuild.utils module.
  Trac sagemath#27490: Further fixes in use of os.wait()
  Trac sagemath#27214: Patch GAP to allocate its memory pool using MAP_NORESERVE on Cygwin
  Trac sagemath#27490: Address some review comments and other cleanup:
  A little bit of import cleanup
  Trac sagemath#27490: Simplistic multiprocessing.Pool replacement for parallel docbuild on older Cygwin
  Fix alarm() test when cysignals was compiled with debugging
  Trac sagemath#27485: Use sdh_cmake in the spkg-install for primecount.
  Trac sagemath#27484: Add shd_cmake helper for running cmake with the correct flags for building Sage SPKGs.
  cysignals should be a normal dependency
  Upgrade to Cysignals 1.10.2
  Upgrade to notebook-5.7.6
  Trac sagemath#27461: Add abs tol on this test to account for minor numerical difference on Cygwin due to libm differences.
  Replacing None < infinity comparison with equivalent code.
  Some last little tidbits for uniformity.
  Removing some code duplication for __pth_root (changed to _pth_root_func).
  One more xderinv added.
  trac 27474: move some references to the master bibliography file.
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants