Parallel map reduce on SearchForest #13580

hivert · 2012-10-07T17:07:38Z

Implement a map reduce algorithm in parallel on large sets described by a SearchForest. We use a work-stealing algorithm (see https://en.wikipedia.org/wiki/Work_stealing) based on Python's multiprocessing.

CC: @sagetrac-sage-combinat @nathanncohen @jdemeyer @seblabbe

Component: combinatorics

Keywords: map-reduce, days57, days77

Author: Florent Hivert, Jean-Baptiste Priez, Nathann Cohen

Branch: 134c1fa

Reviewer: Sébastien Labbé, Jean-Baptiste Priez

Issue created by migration from https://trac.sagemath.org/ticket/13580

The text was updated successfully, but these errors were encountered:

seblabbe · 2013-02-09T18:12:42Z

comment:1

Salut Florent et Nathann,

I am starting to think/work on #6637... I have some questions. Mainly, I would like to know what is this ticket, because the above one liner in the description does not say much...

Will SearchForest survive or not?
Does it replace SearchForest?
Does it improve SearchForest?
Does it use SearchForest?
Or is it used by SearchForest?

Also, Florent wrote on sage-devel in October 2012 that

   I'm also in the process of finalizing a patch which do parallel and even
   distributed map-reduce on recursively enumerated sets (currently badly named
   SearchForest, I'll change the name in my patch, ticket #13580, patch on
   Sage-Combinat queue [1]).

I do not see in the cited patch that the name of SearchForest is changed.
What would be a better name for SearchForest ?
What patch is the more recent? the "old" one or the "experimental" one?

    trac_13580-map_reduce-fh.patch #+experimental
    map_reduce_improved_loop-fh.patch #+experimental
    map_reduce_condition-fh.patch #+experimental
    trac_13580-map_reduce-old-fh.patch

hivert · 2014-04-07T21:54:44Z

Branch: u/hivert/13580/map_reduce

hivert · 2014-04-09T02:56:21Z

New commits:

`693c672`	`Imported code from trac_13580-map_reduce-fh.patch + fixed multiline doctests.`

hivert · 2014-04-09T02:56:21Z

Commit: 693c672

hivert · 2014-04-09T02:56:21Z

Changed keywords from map-reduce to map-reduce, days57

nthiery · 2015-03-06T11:16:51Z

Changed branch from u/hivert/13580/map_reduce to u/nthiery/13580/map_reduce

sagetrac-git · 2015-03-09T15:47:18Z

Branch pushed to git repo; I updated commit sha1. New commits:

`addb17a`	`13580: Trivial rest fix`

sagetrac-git · 2015-03-09T15:47:18Z

Changed commit from 693c672 to addb17a

hivert · 2015-05-08T22:01:24Z

Changed branch from u/nthiery/13580/map_reduce to u/hivert/13580/map_reduce

sagetrac-git · 2015-05-20T22:49:54Z

Branch pushed to git repo; I updated commit sha1. New commits:

`92e6e68`	`Merge branch 't/13580/map_reduce' into 13580`

sagetrac-git · 2015-05-20T22:49:54Z

Changed commit from addb17a to 92e6e68

sagetrac-git · 2015-05-20T23:06:24Z

Branch pushed to git repo; I updated commit sha1. New commits:

`734c748`	`#13580 Fixed test after merging #6637`

sagetrac-git · 2015-05-20T23:06:24Z

Changed commit from 92e6e68 to 734c748

seblabbe · 2015-05-21T07:51:43Z

comment:13

I saw the following typo in map reduce file while looking at the previous commit: "As an example, ee compute"

sagetrac-git · 2015-05-21T15:42:06Z

Branch pushed to git repo; I updated commit sha1. New commits:

`f234f60`	`#13580 : improved documentation`

sagetrac-git · 2015-05-21T15:42:06Z

Changed commit from 734c748 to f234f60

sagetrac-git · 2015-11-23T11:06:25Z

Changed commit from f234f60 to 7a33037

sagetrac-git · 2016-04-07T17:57:42Z

Branch pushed to git repo; I updated commit sha1. New commits:

`46cbab9`	`13580: Fixed doctests to pass on Darwin`

seblabbe · 2016-04-07T18:00:19Z

comment:93

It works now! :

Using --optional=dot2tex,mpir,python2,sage
Doctesting 1 file using 2 threads.
sage -t --warn-long 84.3 src/sage/parallel/map_reduce.py
    [296 tests, 32.56 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------

seblabbe · 2016-04-07T18:29:18Z

comment:94

some typos:

compute the the cardinality

its job. Non only mapping -> Not only

method so that to compute the number of element you only need to call:: -> wrap the line

precedingly by we only -> previously? , but

I would suggest to replace this:

        OUTPUT: Calling :meth:`task_done` decrement the counter and returns
        its value after the decrementation.

by

        OUTPUT: 

        Calling :meth:`task_done` decrement the counter and returns
        its value after the decrementation.

to follow sage standards (more than once).

Also, I would replace this:

INPUT: ``o`` -- a node

by

INPUT: 

- ``o`` -- a node

to follow sage standards (more than once).

the computation are done i -> s missing

sagetrac-git · 2016-04-07T21:11:48Z

Changed commit from 46cbab9 to 134c1fa

sagetrac-git · 2016-04-07T21:11:48Z

Branch pushed to git repo; I updated commit sha1. New commits:

`134c1fa`	`13580: doc rereading`

tscrim · 2016-04-07T23:02:46Z

comment:96

I am wondering what else is currently needed for this ticket before a positive review?

Also, what are the current plans to cythonize SearchForest and the parallel reduce map?

seblabbe · 2016-04-08T00:43:54Z

comment:97

What I had in mind for the next step to do was to move the SearchForest code from backtrack file into the recursivelyenumerated_set.pyx file and into the hierarchy of RecEnumSet classes so that it benefits for example of methods like the newly added to_digraph.

hivert · 2016-04-08T01:05:41Z

comment:98

Replying to @tscrim:

I am wondering what else is currently needed for this ticket before a positive review?

Sebastien is currently rereading it. I think he is very close to put a positive review.

Also, what are the current plans to cythonize SearchForest and the parallel reduce map?

Concerning Cythonization of map-reduce, my current opinion is that, the code
is designed for case where the computation of the children is costly. In this
case the cost of the current code should be negligible. There are a lot of
function calls, I'm using Python synchronization primitive, but most
importantly, communication is are by pickling/unpickling which is very
slow. So the idea is to reduce at most at possible the communication, but
without drastically changing the technology, I don't think you get a large
improvement without great effort. You'll need to rewrite all the communication
primitive and to inline the function call to the client code.

On the other hand, If you are allowed to write the code directly in C/C++
(which is the case for small basic objects), then the Cilk extension of C is
wonderful. It's a small extension which is supported by GCC since version 5,
ICC and in recent CLANG (https://cilkplus.github.io/). I've several code where
I've been able to have very large speedup. See
e.g. https://hal.archives-ouvertes.fr/hal-00823339/file/article.pdf if you are
curious.

Now if you have a specific use case, I'll be happy to experiment.

hivert · 2016-04-08T01:07:04Z

comment:99

Replying to @seblabbe:

What I had in mind for the next step to do was to move the SearchForest code from backtrack file into the recursivelyenumerated_set.pyx file and into the hierarchy of RecEnumSet classes so that it benefits for example of methods like the newly added to_digraph.

That was also the next goal in my mind.

seblabbe · 2016-04-08T08:28:49Z

comment:100

See #16351

tscrim · 2016-04-08T09:28:57Z

comment:101

Replying to @hivert:

Replying to @tscrim:

Also, what are the current plans to cythonize SearchForest and the parallel reduce map?

Concerning Cythonization of map-reduce, my current opinion is that, the code
is designed for case where the computation of the children is costly. In this
case the cost of the current code should be negligible. There are a lot of
function calls, I'm using Python synchronization primitive, but most
importantly, communication is are by pickling/unpickling which is very
slow. So the idea is to reduce at most at possible the communication, but
without drastically changing the technology, I don't think you get a large
improvement without great effort. You'll need to rewrite all the communication
primitive and to inline the function call to the client code.

On the other hand, If you are allowed to write the code directly in C/C++
(which is the case for small basic objects), then the Cilk extension of C is
wonderful. It's a small extension which is supported by GCC since version 5,
ICC and in recent CLANG (https://cilkplus.github.io/). I've several code where
I've been able to have very large speedup. See
e.g. https://hal.archives-ouvertes.fr/hal-00823339/file/article.pdf if you are
curious.

Now if you have a specific use case, I'll be happy to experiment.

Christian, Vivien, and I are working on trying to speed up the iteration of finite Coxeter groups by using (subgroups of) permutation groups and the iterator method of generic Coxeter groups, which uses a SearchForest (see #11187 and the CoxeterGroups category). Our goal is to try and match or beat GAP3 iteration time (which we are about ~30x slower).

We are in the process pushing our successor computation code to Cython and some of our other methods, but we would appreciate any insights you have.

Yet the code on this ticket will definitely help with that. Thank you for finishing it.

hivert · 2016-04-08T09:51:35Z

comment:102

Replying to @tscrim:

Christian, Vivien, and I are working on trying to speed up the iteration of
finite Coxeter groups by using (subgroups of) permutation groups and the
iterator method of generic Coxeter groups, which uses a SearchForest (see
#11187 and the CoxeterGroups category). Our goal is to try and match or
beat GAP3 iteration time (which we are about ~30x slower).

I'll be both very happy and interested to have usecase for this code.

We are in the process pushing our successor computation code to Cython and
some of our other methods, but we would appreciate any insights you have.

Do you have a typical example relying on SearchForest of code on top of
#11187 that I can profile ? Note that I included in the documentation of
map_reduce.py a guide on how to profile the parallel code. I'd like to measure
the proportion of time that is spend on Coxeter code one one hand and on the
parallel infrastructure on the other.

Now, concerning coxeter group, I'm quite sure that rewriting part of the code
in C/C++ and using advanced parallel stuff (AVX instruction set + SIMD) you
can get speedup a large as several hundreds and even thousand ! As a witness
on https://github.com/hivert/IVMPG I wrote a code whose goal is to enumerate
integer vector modulo permutation group. Compared to the code we have in Sage
which has been Cythonized and optimized by Simon King I manage to get speedup
a large as x2000 ! I need help on the build system to be able to distribute
it. I'm pretty sure the code there is very similar to what you need in Coxeter
groups.

By the way, should we transfer this discussion on #11187 ?

tscrim · 2016-04-08T10:07:32Z

comment:103

Replying to @hivert:

By the way, should we transfer this discussion on #11187 ?

Done.

vbraun · 2016-04-08T22:40:20Z

Changed branch from u/hivert/ticket/13580 to 134c1fa

sagetrac-tmonteil · 2016-04-15T15:39:00Z

comment:105

This ticket breaks the doctests on my patchbot, i do not know whether it is because it is run from a VM or from a 32-bit system, see http://patchbot.sagemath.org/log/0/debian/8.3/i686/3.16.0-4-586/tmonteil-debian-jessie-32/2016-04-15%2012:13:01?short

sagetrac-tmonteil · 2016-04-15T15:39:00Z

Changed commit from 134c1fa to none

sagetrac-tmonteil · 2016-04-15T15:44:37Z

comment:106

Note also that the VM emulates a "Pentium III (Katmai)"

hivert · 2016-04-15T20:01:58Z

comment:107

Replying to @sagetrac-tmonteil:

This ticket breaks the doctests on my patchbot, i do not know whether it is because it is run from a VM or from a 32-bit system, see http://patchbot.sagemath.org/log/0/debian/8.3/i686/3.16.0-4-586/tmonteil-debian-jessie-32/2016-04-15%2012:13:01?short

From the error message, it is neither because of 32 bits nor because of the VM but because your (virtual) machine has only one core. The goal of the patch is to distribute computation on several cores. The doctest assumes that your machine has at least 2 cores. I should fix that. Though I'm not sure what to do on those kind of machine. Pretend that they are two core and let the computation run on those two cores ?

hivert · 2016-04-15T20:16:07Z

comment:108

Replying to @hivert:

From the error message, it is neither because of 32 bits nor because of the VM but because your (virtual) machine has only one core. The goal of the patch is to distribute computation on several cores. The doctest assumes that your machine has at least 2 cores. I should fix that. Though I'm not sure what to do on those kind of machine. Pretend that they are two core and let the computation run on those two cores ?

I just checked that I get the very same errors on my machine pretending that it has only one core.

sagetrac-tmonteil · 2016-04-15T20:41:23Z

comment:109

OK thanks for isolating the culplrit, i made #20449 as a follwup.

hivert added this to the sage-5.11 milestone Oct 7, 2012

hivert added c: combinatorics labels Oct 7, 2012

hivert self-assigned this Oct 7, 2012

jdemeyer modified the milestones: sage-5.11, sage-5.12 Aug 13, 2013

sagetrac-vbraun-spam mannequin modified the milestones: sage-6.1, sage-6.2 Jan 30, 2014

sagetrac-vbraun-spam mannequin modified the milestones: sage-6.2, sage-6.3 May 6, 2014

sagetrac-vbraun-spam mannequin modified the milestones: sage-6.3, sage-6.4 Aug 10, 2014

seblabbe added s: positive review and removed s: needs work labels Apr 8, 2016

tscrim modified the milestones: sage-6.4, sage-7.2 Apr 8, 2016

vbraun removed the s: positive review label Apr 8, 2016

vbraun closed this as completed in a8c9bc0 Apr 8, 2016

seblabbe mentioned this issue Feb 9, 2013

standardize the interface to TransitiveIdeal and friends #6637

Closed

This was referenced Apr 8, 2016

Implementation of finite reflection groups #11187

Closed

Fix enumeration of finitely generated semigroups #20403

Closed

sagetrac-tmonteil mannequin mentioned this issue Aug 27, 2019

Let the doctest of map_reduce work for single-core computers #20449

Closed

Parallel map reduce on SearchForest #13580

Parallel map reduce on SearchForest #13580

Comments

hivert commented Oct 7, 2012

seblabbe commented Feb 9, 2013

hivert commented Apr 7, 2014

hivert commented Apr 9, 2014

hivert commented Apr 9, 2014

hivert commented Apr 9, 2014

nthiery commented Mar 6, 2015

sagetrac-git mannequin commented Mar 9, 2015

sagetrac-git mannequin commented Mar 9, 2015

hivert commented May 8, 2015

sagetrac-git mannequin commented May 20, 2015

sagetrac-git mannequin commented May 20, 2015

sagetrac-git mannequin commented May 20, 2015

sagetrac-git mannequin commented May 20, 2015

seblabbe commented May 21, 2015

sagetrac-git mannequin commented May 21, 2015

sagetrac-git mannequin commented May 21, 2015

sagetrac-git mannequin commented Nov 23, 2015

sagetrac-git mannequin commented Apr 7, 2016

seblabbe commented Apr 7, 2016

seblabbe commented Apr 7, 2016

sagetrac-git mannequin commented Apr 7, 2016

sagetrac-git mannequin commented Apr 7, 2016

tscrim commented Apr 7, 2016

seblabbe commented Apr 8, 2016

hivert commented Apr 8, 2016

hivert commented Apr 8, 2016

seblabbe commented Apr 8, 2016

tscrim commented Apr 8, 2016

hivert commented Apr 8, 2016

tscrim commented Apr 8, 2016

vbraun commented Apr 8, 2016

sagetrac-tmonteil mannequin commented Apr 15, 2016

sagetrac-tmonteil mannequin commented Apr 15, 2016

sagetrac-tmonteil mannequin commented Apr 15, 2016

hivert commented Apr 15, 2016

hivert commented Apr 15, 2016

sagetrac-tmonteil mannequin commented Apr 15, 2016