Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel map reduce on SearchForest #13580

Closed
hivert opened this issue Oct 7, 2012 · 145 comments
Closed

Parallel map reduce on SearchForest #13580

hivert opened this issue Oct 7, 2012 · 145 comments

Comments

@hivert
Copy link

hivert commented Oct 7, 2012

Implement a map reduce algorithm in parallel on large sets described by a SearchForest. We use a work-stealing algorithm (see https://en.wikipedia.org/wiki/Work_stealing) based on Python's multiprocessing.

CC: @sagetrac-sage-combinat @nathanncohen @jdemeyer @seblabbe

Component: combinatorics

Keywords: map-reduce, days57, days77

Author: Florent Hivert, Jean-Baptiste Priez, Nathann Cohen

Branch: 134c1fa

Reviewer: Sébastien Labbé, Jean-Baptiste Priez

Issue created by migration from https://trac.sagemath.org/ticket/13580

@hivert hivert added this to the sage-5.11 milestone Oct 7, 2012
@hivert hivert self-assigned this Oct 7, 2012
@seblabbe
Copy link
Contributor

seblabbe commented Feb 9, 2013

comment:1

Salut Florent et Nathann,

I am starting to think/work on #6637... I have some questions. Mainly, I would like to know what is this ticket, because the above one liner in the description does not say much...

  • Will SearchForest survive or not?
  • Does it replace SearchForest?
  • Does it improve SearchForest?
  • Does it use SearchForest?
  • Or is it used by SearchForest?

Also, Florent wrote on sage-devel in October 2012 that

   I'm also in the process of finalizing a patch which do parallel and even
   distributed map-reduce on recursively enumerated sets (currently badly named
   SearchForest, I'll change the name in my patch, ticket #13580, patch on
   Sage-Combinat queue [1]).
  • I do not see in the cited patch that the name of SearchForest is changed.
  • What would be a better name for SearchForest ?
  • What patch is the more recent? the "old" one or the "experimental" one?
    trac_13580-map_reduce-fh.patch #+experimental
    map_reduce_improved_loop-fh.patch #+experimental
    map_reduce_condition-fh.patch #+experimental
    trac_13580-map_reduce-old-fh.patch 

@jdemeyer jdemeyer modified the milestones: sage-5.11, sage-5.12 Aug 13, 2013
@sagetrac-vbraun-spam sagetrac-vbraun-spam mannequin modified the milestones: sage-6.1, sage-6.2 Jan 30, 2014
@hivert
Copy link
Author

hivert commented Apr 7, 2014

Branch: u/hivert/13580/map_reduce

@hivert
Copy link
Author

hivert commented Apr 9, 2014

New commits:

693c672Imported code from trac_13580-map_reduce-fh.patch + fixed multiline doctests.

@hivert
Copy link
Author

hivert commented Apr 9, 2014

Commit: 693c672

@hivert
Copy link
Author

hivert commented Apr 9, 2014

Changed keywords from map-reduce to map-reduce, days57

@sagetrac-vbraun-spam sagetrac-vbraun-spam mannequin modified the milestones: sage-6.2, sage-6.3 May 6, 2014
@sagetrac-vbraun-spam sagetrac-vbraun-spam mannequin modified the milestones: sage-6.3, sage-6.4 Aug 10, 2014
@nthiery
Copy link
Contributor

nthiery commented Mar 6, 2015

Changed branch from u/hivert/13580/map_reduce to u/nthiery/13580/map_reduce

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 9, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

addb17a13580: Trivial rest fix

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 9, 2015

Changed commit from 693c672 to addb17a

@hivert
Copy link
Author

hivert commented May 8, 2015

Changed branch from u/nthiery/13580/map_reduce to u/hivert/13580/map_reduce

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 20, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

92e6e68Merge branch 't/13580/map_reduce' into 13580

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 20, 2015

Changed commit from addb17a to 92e6e68

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 20, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

734c748#13580 Fixed test after merging #6637

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 20, 2015

Changed commit from 92e6e68 to 734c748

@seblabbe
Copy link
Contributor

comment:13

I saw the following typo in map reduce file while looking at the previous commit: "As an example, ee compute"

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 21, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

f234f60#13580 : improved documentation

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 21, 2015

Changed commit from 734c748 to f234f60

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 23, 2015

Changed commit from f234f60 to 7a33037

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 7, 2016

Branch pushed to git repo; I updated commit sha1. New commits:

46cbab913580: Fixed doctests to pass on Darwin

@seblabbe
Copy link
Contributor

seblabbe commented Apr 7, 2016

comment:93

It works now! :

Using --optional=dot2tex,mpir,python2,sage
Doctesting 1 file using 2 threads.
sage -t --warn-long 84.3 src/sage/parallel/map_reduce.py
    [296 tests, 32.56 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------

@seblabbe
Copy link
Contributor

seblabbe commented Apr 7, 2016

comment:94

some typos:

compute the the cardinality

its job. Non only mapping -> Not only

method so that to compute the number of element you only need to call:: -> wrap the line

precedingly by we only -> previously? , but

I would suggest to replace this:

        OUTPUT: Calling :meth:`task_done` decrement the counter and returns
        its value after the decrementation.

by

        OUTPUT: 

        Calling :meth:`task_done` decrement the counter and returns
        its value after the decrementation.

to follow sage standards (more than once).

Also, I would replace this:

INPUT: ``o`` -- a node

by

INPUT: 

- ``o`` -- a node

to follow sage standards (more than once).

the computation are done i -> s missing

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 7, 2016

Changed commit from 46cbab9 to 134c1fa

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 7, 2016

Branch pushed to git repo; I updated commit sha1. New commits:

134c1fa13580: doc rereading

@tscrim
Copy link
Collaborator

tscrim commented Apr 7, 2016

comment:96

I am wondering what else is currently needed for this ticket before a positive review?

Also, what are the current plans to cythonize SearchForest and the parallel reduce map?

@seblabbe
Copy link
Contributor

seblabbe commented Apr 8, 2016

comment:97

What I had in mind for the next step to do was to move the SearchForest code from backtrack file into the recursivelyenumerated_set.pyx file and into the hierarchy of RecEnumSet classes so that it benefits for example of methods like the newly added to_digraph.

@hivert
Copy link
Author

hivert commented Apr 8, 2016

comment:98

Replying to @tscrim:

I am wondering what else is currently needed for this ticket before a positive review?

Sebastien is currently rereading it. I think he is very close to put a positive review.

Also, what are the current plans to cythonize SearchForest and the parallel reduce map?

Concerning Cythonization of map-reduce, my current opinion is that, the code
is designed for case where the computation of the children is costly. In this
case the cost of the current code should be negligible. There are a lot of
function calls, I'm using Python synchronization primitive, but most
importantly, communication is are by pickling/unpickling which is very
slow. So the idea is to reduce at most at possible the communication, but
without drastically changing the technology, I don't think you get a large
improvement without great effort. You'll need to rewrite all the communication
primitive and to inline the function call to the client code.

On the other hand, If you are allowed to write the code directly in C/C++
(which is the case for small basic objects), then the Cilk extension of C is
wonderful. It's a small extension which is supported by GCC since version 5,
ICC and in recent CLANG (https://cilkplus.github.io/). I've several code where
I've been able to have very large speedup. See
e.g. https://hal.archives-ouvertes.fr/hal-00823339/file/article.pdf if you are
curious.

Now if you have a specific use case, I'll be happy to experiment.

@hivert
Copy link
Author

hivert commented Apr 8, 2016

comment:99

Replying to @seblabbe:

What I had in mind for the next step to do was to move the SearchForest code from backtrack file into the recursivelyenumerated_set.pyx file and into the hierarchy of RecEnumSet classes so that it benefits for example of methods like the newly added to_digraph.

That was also the next goal in my mind.

@seblabbe
Copy link
Contributor

seblabbe commented Apr 8, 2016

comment:100

See #16351

@tscrim
Copy link
Collaborator

tscrim commented Apr 8, 2016

comment:101

Replying to @hivert:

Replying to @tscrim:

Also, what are the current plans to cythonize SearchForest and the parallel reduce map?

Concerning Cythonization of map-reduce, my current opinion is that, the code
is designed for case where the computation of the children is costly. In this
case the cost of the current code should be negligible. There are a lot of
function calls, I'm using Python synchronization primitive, but most
importantly, communication is are by pickling/unpickling which is very
slow. So the idea is to reduce at most at possible the communication, but
without drastically changing the technology, I don't think you get a large
improvement without great effort. You'll need to rewrite all the communication
primitive and to inline the function call to the client code.

On the other hand, If you are allowed to write the code directly in C/C++
(which is the case for small basic objects), then the Cilk extension of C is
wonderful. It's a small extension which is supported by GCC since version 5,
ICC and in recent CLANG (https://cilkplus.github.io/). I've several code where
I've been able to have very large speedup. See
e.g. https://hal.archives-ouvertes.fr/hal-00823339/file/article.pdf if you are
curious.

Now if you have a specific use case, I'll be happy to experiment.

Christian, Vivien, and I are working on trying to speed up the iteration of finite Coxeter groups by using (subgroups of) permutation groups and the iterator method of generic Coxeter groups, which uses a SearchForest (see #11187 and the CoxeterGroups category). Our goal is to try and match or beat GAP3 iteration time (which we are about ~30x slower).

We are in the process pushing our successor computation code to Cython and some of our other methods, but we would appreciate any insights you have.

Yet the code on this ticket will definitely help with that. Thank you for finishing it.

@tscrim tscrim modified the milestones: sage-6.4, sage-7.2 Apr 8, 2016
@hivert
Copy link
Author

hivert commented Apr 8, 2016

comment:102

Replying to @tscrim:

Christian, Vivien, and I are working on trying to speed up the iteration of
finite Coxeter groups by using (subgroups of) permutation groups and the
iterator method of generic Coxeter groups, which uses a SearchForest (see
#11187 and the CoxeterGroups category). Our goal is to try and match or
beat GAP3 iteration time (which we are about ~30x slower).

I'll be both very happy and interested to have usecase for this code.

We are in the process pushing our successor computation code to Cython and
some of our other methods, but we would appreciate any insights you have.

Do you have a typical example relying on SearchForest of code on top of
#11187 that I can profile ? Note that I included in the documentation of
map_reduce.py a guide on how to profile the parallel code. I'd like to measure
the proportion of time that is spend on Coxeter code one one hand and on the
parallel infrastructure on the other.

Now, concerning coxeter group, I'm quite sure that rewriting part of the code
in C/C++ and using advanced parallel stuff (AVX instruction set + SIMD) you
can get speedup a large as several hundreds and even thousand ! As a witness
on https://github.com/hivert/IVMPG I wrote a code whose goal is to enumerate
integer vector modulo permutation group. Compared to the code we have in Sage
which has been Cythonized and optimized by Simon King I manage to get speedup
a large as x2000 ! I need help on the build system to be able to distribute
it. I'm pretty sure the code there is very similar to what you need in Coxeter
groups.

By the way, should we transfer this discussion on #11187 ?

@tscrim
Copy link
Collaborator

tscrim commented Apr 8, 2016

comment:103

Replying to @hivert:

By the way, should we transfer this discussion on #11187 ?

Done.

@vbraun
Copy link
Member

vbraun commented Apr 8, 2016

Changed branch from u/hivert/ticket/13580 to 134c1fa

@sagetrac-tmonteil
Copy link
Mannequin

sagetrac-tmonteil mannequin commented Apr 15, 2016

comment:105

This ticket breaks the doctests on my patchbot, i do not know whether it is because it is run from a VM or from a 32-bit system, see http://patchbot.sagemath.org/log/0/debian/8.3/i686/3.16.0-4-586/tmonteil-debian-jessie-32/2016-04-15%2012:13:01?short

@sagetrac-tmonteil
Copy link
Mannequin

sagetrac-tmonteil mannequin commented Apr 15, 2016

Changed commit from 134c1fa to none

@sagetrac-tmonteil
Copy link
Mannequin

sagetrac-tmonteil mannequin commented Apr 15, 2016

comment:106

Note also that the VM emulates a "Pentium III (Katmai)"

@hivert
Copy link
Author

hivert commented Apr 15, 2016

comment:107

Replying to @sagetrac-tmonteil:

This ticket breaks the doctests on my patchbot, i do not know whether it is because it is run from a VM or from a 32-bit system, see http://patchbot.sagemath.org/log/0/debian/8.3/i686/3.16.0-4-586/tmonteil-debian-jessie-32/2016-04-15%2012:13:01?short

From the error message, it is neither because of 32 bits nor because of the VM but because your (virtual) machine has only one core. The goal of the patch is to distribute computation on several cores. The doctest assumes that your machine has at least 2 cores. I should fix that. Though I'm not sure what to do on those kind of machine. Pretend that they are two core and let the computation run on those two cores ?

@hivert
Copy link
Author

hivert commented Apr 15, 2016

comment:108

Replying to @hivert:

From the error message, it is neither because of 32 bits nor because of the VM but because your (virtual) machine has only one core. The goal of the patch is to distribute computation on several cores. The doctest assumes that your machine has at least 2 cores. I should fix that. Though I'm not sure what to do on those kind of machine. Pretend that they are two core and let the computation run on those two cores ?

I just checked that I get the very same errors on my machine pretending that it has only one core.

@sagetrac-tmonteil
Copy link
Mannequin

sagetrac-tmonteil mannequin commented Apr 15, 2016

comment:109

OK thanks for isolating the culplrit, i made #20449 as a follwup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants