Ray supports running distributed python programs with the multiprocessing.Pool API using Ray Actors instead of local processes. This makes it easy to scale existing applications that use multiprocessing.Pool
from a single node to a cluster.
To get started, first install Ray, then use ray.util.multiprocessing.Pool
in place of multiprocessing.Pool
. This will start a local Ray cluster the first time you create a Pool
and distribute your tasks across it. See the Run on a Cluster section below for instructions to run on a multi-node Ray cluster instead.
from ray.util.multiprocessing import Pool
def f(index):
return index
pool = Pool()
for result in pool.map(f, range(100)):
print(result)
The full multiprocessing.Pool
API is currently supported. Please see the multiprocessing documentation for details.
Warning
The context
argument in the Pool
constructor is ignored when using Ray.
This section assumes that you have a running Ray cluster. To start a Ray cluster, please refer to the cluster setup instructions.
To connect a Pool
to a running Ray cluster, you can specify the address of the head node in one of two ways:
- By setting the
RAY_ADDRESS
environment variable. - By passing the
ray_address
keyword argument to thePool
constructor.
from ray.util.multiprocessing import Pool
# Starts a new local Ray cluster.
pool = Pool()
# Connects to a running Ray cluster, with the current node as the head node.
# Alternatively, set the environment variable RAY_ADDRESS="auto".
pool = Pool(ray_address="auto")
# Connects to a running Ray cluster, with a remote node as the head node.
# Alternatively, set the environment variable RAY_ADDRESS="<ip_address>:<port>".
pool = Pool(ray_address="<ip_address>:<port>")
You can also start Ray manually by calling ray.init()
(with any of its supported configuration options) before creating a Pool
.