Parmap is a minimalistic library allowing to exploit multicore architecture for OCaml programs with minimal modifications.
Branch: master
Clone or download
rdicosmo Merge pull request #75 from UnixJunkie/patch-2
fix mapi_range used by array_parmapi
Latest commit 6be3de4 Jul 10, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
example Update Makefile.OSX Mar 3, 2015
m4 Updated ocaml.m4 to version on master as of 17/9/2015 (not yet releas… Sep 17, 2015
tests Updated in tests to allow passing init functions May 12, 2014
.depend First version with direct marshalling to bigarrays (thanks to Jerome!) Nov 7, 2011
.gitignore fix make install to obey DESTDIR Jul 25, 2013
Changelog Clean up Changelog Apr 30, 2016
INSTALL Removed outdated instructions for OCaml 3.11 users (configure properl… Feb 23, 2013
LICENSE Clarified LICENCE and origin of bytearray code. Nov 16, 2011
META Add required cclib to link with parmap.cma Feb 18, 2013 Makefile: also install .cmx files May 12, 2016 Added parmap_utils to Oct 9, 2012
README Fixed README Nov 30, 2011
README.maintainer Add information on the build system. May 12, 2016 Improved mkdwn Apr 23, 2015
_tags remove trailing whitespaces Nov 30, 2017
bytearray.mli Comment references to bytes Nov 18, 2017
bytearray_stubs.c Stable working version with maximum float array performance Nov 18, 2011 Initial support for setcore on Mac OS X Dec 2, 2011
configure Bumped up version string Sep 17, 2015
libparmap_stubs.clib Added configure and moved to ocamlbuild. Nov 17, 2011 Proper handling of dependencies, thanks Gabriel Sherer for spotting this May 12, 2016 corrected script to rebuild and reinstall parmap via oasis Nov 30, 2017
opam same as in the opam repository Nov 27, 2017
parmap.mldylib Proper handling of dependencies, thanks Gabriel Sherer for spotting this May 12, 2016
parmap.mli Merge pull request #69 from UnixJunkie/disable_core_pinning Nov 30, 2017
parmap.odocl Added documentation target, cleaned up AUTHORS and README, minor chan… Nov 27, 2011 Bumped up version to 1.0-rc1 Feb 18, 2013
parmap_utils.mli added interface for parmap_utils Feb 22, 2013
setcore_stubs.c Fix build error on systems without thread_policy Feb 9, 2015

Parmap in a nutshell

Parmap is a minimalistic library allowing to exploit multicore architecture for OCaml programs with minimal modifications: if you want to use your many cores to accelerate an operation which happens to be a map, fold or map/fold (map-reduce), just use Parmap's parmap, parfold and parmapfold primitives in place of the standard and friends, and specify the number of subprocesses to use by the optional parameter ~ncores.

See the example directory for a couple of running programs.


Parmap is not meant to be a replacement for a full fledged implementation of parallelism skeletons (map, reduce, pipe, and the many others described in the scientific literature since the end of the 1980's, much earlier than the specific implementation by Google engineers that popularised them). It is meant, instead, to allow you to quickly leverage the idle processing power of your extra cores, when handling some heavy computational load.

The principle of parmap is very simple: when you call one of the three available primitives, map, fold, and mapfold , your OCaml sequential program forks in n subprocesses (you choose the n), and each subprocess performs the computation on the 1/n of the data, in chunks of a size you can choose, returning the results through a shared memory area to the parent process, that resumes execution once all the children have terminated, and the data has been recollected.

You need to run your program on a single multicore machine; repeat after me: Parmap is not meant to run on a cluster, see one of the many available (re)implementations of the map-reduce schema for that.

By forking the parent process on a sigle machine, the children get access, for free, to all the data structures already built, even the imperative ones, and as far as your computation inside the map/fold does not produce side effects that need to be preserved, the final result will be the same as performing the sequential operation, the only difference is that you might get it faster.

The OCaml code is reasonably simple and only marginally relies on external C libraries: most of the magic is done by your operating system's fork and memory mapping mechanisms. One could gain some speed by implementing a marshal/unmarshal operation directly on bigarrays, but we did not do this yet.

Of course, if you happen to have open channels, or files, or other connections that should only be used by the parent process, your program may behave in a very wierd way: as an example, do not open a graphic window before calling a Parmap primitive, and do not use this library if your program is multi-threaded!

Pinning processes to physical CPUs

To obtain maximum speed, Parmap tries to pin the worker processes to a CPU, using the scheduler affinity interface that is available in recent Linux kernels. Similar functionality may be obtained on different platforms using slightly different API. Contributions are welcome to support those other APIs, just make sure that you use autoconf properly.

Using Parmap with Ocamlnat

You can use Parmap in a native toplevel (it may be quite useful if you use the native toplevel to perform fast interactive computations), but remember that you need to load the .cmxs modules in it; an example is given in example/

Preservation of output order in Parmap

If the number of chunks is equal to the number of cores, it is easy to preserve the order of the elements of the sequence passed to the map/fold operations, so the result will be a list with the same order as if the sequential function would be applied to the input. This is what the parmap, parmafold and parfold functions do when the chunksize argument is not used.

If the user specifies a chunksize that is different from the number of cores, there is no general way to preserve the ordering, so the result of calling Parmap.parmap f l are not necessarily in the same order as f l.

In general, using little chunksize helps in balancing the load among the workers, and provides better speed, at the price of losing the ordering: there is a tradeoff, and it is up to the user to choose the solution that better suits him/her.

Fast map on arrays and on float arrays

Visiting an array is much faster than visiting a list, and conversion of an array to and from a list is expensive, on large data structures, so we provide a specialised version of map on arrays, that beaves exactly like parmap.

We also provide a highly optimised specialised parmap version that is targeted to float arrays, array_float_parmap, that allows you to perform parallel computation on very large float arrays efficiently, without the boxing/unboxing overhead introduced by the other primitives, including array_parmap.

To understand the efficiency issues involved in the case of large arrays of float, here is a short summary of the steps that any implementation of a parallel map function must perform.

  1. create a float array to hold the result of the computation. This operation is expensive: on an Intel i7, creating a 10M float array takes 50 milliseconds

             Objective Caml version 3.12.0 - native toplevel
        # #load "unix.cmxs";;
        # let d = Unix.gettimeofday() in ignore(Array.create 10000000 0.); Unix.gettimeofday() -. d;;
        - : float = 0.0501301288604736328
  2. create a shared memory area

  3. possibly copy the result array to the shared memory area

  4. perform the computation in the children writing the result in the shared memory area

  5. possibly copy the result back to the OCaml array

All implementations need to do 1, 2 and 4; steps 3 and/or 5 may be omitted depending on what the user wants to do with the result.

The array_float_parmap performs steps 1, 2, 4 and 5. It is possible to share steps 1 and 2 among subsequent calls to the parallel function by preallocating the result array and the shared memory buffer, and passing them as optional parameters to the array_float_parmap function: this may save a significant amount of time if the array is very large.