Skip to content

Commit

Permalink
Small fixes.
Browse files Browse the repository at this point in the history
  • Loading branch information
robertnishihara committed Mar 28, 2017
1 parent 46b5e56 commit b951db1
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 8 deletions.
2 changes: 1 addition & 1 deletion doc/source/internals-overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ Several things happen when a driver or worker calls ``ray.get`` on an object ID.
- The driver or worker goes to the object store on the same node and requests
the relevant object. Each object store consists of two components, a
shared-memory key-value store of immutable objects, and an manager to
shared-memory key-value store of immutable objects, and a manager to
coordinate the transfer of objects between nodes.

- If the object is not present in the object store, the manager checks the
Expand Down
13 changes: 6 additions & 7 deletions doc/source/serialization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,7 @@ There are a number of situations in which Ray will place objects in the object
store.

1. The return values of a remote function.
store.
2. ``x`` in a call to ``ray.put(x)``.
2. The value ``x`` in a call to ``ray.put(x)``.
3. Large objects or objects other than simple primitive types that are passed
as arguments into remote functions.

Expand All @@ -18,7 +17,7 @@ nesting. To place an object in the object store or send it between processes,
it must first be converted to a contiguous string of bytes. This process is
known as serialization. The process of converting the string of bytes back into a
Python object is known as deserialization. Serialization and deserialization
are often bottlenecks in distributed computing, if the time needed to compute
are often bottlenecks in distributed computing if the time needed to compute
on the data is relatively low.

Pickle is one example of a library for serialization and deserialization in
Expand All @@ -31,7 +30,7 @@ Python.
pickle.dumps([1, 2, 3]) # prints b'\x80\x03]q\x00(K\x01K\x02K\x03e.'
pickle.loads(b'\x80\x03]q\x00(K\x01K\x02K\x03e.') # prints [1, 2, 3]
Pickle (and the variant we use, cloudpickle) is general-purpose. They can
Pickle (and the variant we use, cloudpickle) is general-purpose. It can
serialize a large variety of Python objects. However, for numerical workloads,
pickling and unpickling can be inefficient. For example, if multiple processes
want to access a Python list of numpy arrays, each process must unpickle the
Expand All @@ -40,9 +39,9 @@ overheads, even when all processes are read-only and could easily share memory.

In Ray, we optimize for numpy arrays by using the `Apache Arrow`_ data format.
When we deserialize a list of numpy arrays from the object store, we still
create a Python list of numpy array objects. However, rather than copy each
numpy array over again, each numpy array object is essentially a pointer to its
address in shared memory. There are some advantages to this form of
create a Python list of numpy array objects. However, rather than copy each
numpy array over again, each numpy array object holds a pointer to the relevant
array held in shared memory. There are some advantages to this form of
serialization.

- Deserialization can be very fast.
Expand Down

0 comments on commit b951db1

Please sign in to comment.