Skip to content

Commit

Permalink
Rename chunk to chunk_size for all tensor expressions (#74)
Browse files Browse the repository at this point in the history
* rename chunks to chunk_size, pass expression tests
  • Loading branch information
qinxuye authored and wjsi committed Dec 27, 2018
1 parent 52aa0a2 commit 68c04d3
Show file tree
Hide file tree
Showing 103 changed files with 1,158 additions and 1,159 deletions.
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ After all mars processes are started, users can run
.. code-block:: python
sess = new_session('http://<web_ip>:<ui_port>')
a = mt.ones((2000, 2000), chunks=200)
a = mt.ones((2000, 2000), chunk_size=200)
b = mt.inner(a, a)
sess.run(b)
Expand Down
6 changes: 3 additions & 3 deletions docs/source/distributed/prepare.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Graph Preparation
=================
When a tensor graph is submitted into Mars scheduler, a graph comprises of
operands and chunks will be generated given ``chunks`` parameters passed in
operands and chunks will be generated given ``chunk_size`` parameters passed in
data sources.

Graph Compose
Expand All @@ -16,8 +16,8 @@ without branches. For example, when executing code
.. code-block:: python
import mars.tensor as mt
a = mt.random.rand(100, chunks=100)
b = mt.random.rand(100, chunks=100)
a = mt.random.rand(100, chunk_size=100)
b = mt.random.rand(100, chunk_size=100)
c = (a + b).sum()
Mars will compose operand ADD and SUM into one FUSE node. RAND operands are
Expand Down
4 changes: 2 additions & 2 deletions docs/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ After installation, you can simply open a Python console and run
import mars.tensor as mt
from mars.session import new_session
a = mt.ones((5, 5), chunks=3)
a = mt.ones((5, 5), chunk_size=3)
b = a * 4
# if there isn't a local session,
# execute will create a default one first
Expand Down Expand Up @@ -98,7 +98,7 @@ After all Mars processes are started, you can open a Python console and run
import mars.tensor as mt
from mars.session import new_session
sess = new_session('http://<web_ip>:<ui_port>')
a = mt.ones((2000, 2000), chunks=200)
a = mt.ones((2000, 2000), chunk_size=200)
b = mt.inner(a, a)
sess.run(b)
Expand Down
16 changes: 8 additions & 8 deletions docs/source/tensor/datasource.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,16 +40,16 @@ multi-dimensional tensor will be supported later soon.
Chunks
------

In mars tensor, we tile a tensor into small chunks. Argument ``chunks`` is not always required,
In mars tensor, we tile a tensor into small chunks. Argument ``chunk_size`` is not always required,
a chunk's bytes occupation will be 128M for the default setting.
However, user can specify each chunk's size in a more flexible way which may be adaptive to the data scale.
The fact is that chunk's size may effect heavily on the performance of execution.

The options or arguments which will effect the chunk's size are listed below:

- Change ``options.tensor.chunk_size_limit`` which is 128*1024*1024(128M) by default.
- Specify ``chunks`` as integer, like ``5000``, means chunk's size is 5000 at most for all dimensions
- Specify ``chunks`` as tuple, like ``(5000, 3000)``
- Specify ``chunk_size`` as integer, like ``5000``, means chunk's size is 5000 at most for all dimensions
- Specify ``chunk_size`` as tuple, like ``(5000, 3000)``
- Explicitly define sizes of all chunks along all dimensions, like ``((5000, 5000, 2000), (2000, 1000))``

Chunks Examples
Expand All @@ -68,9 +68,9 @@ Assume we have such a tensor with the data shown below.
4 2 4 6 2 0
6 8 2 6 5 4
We will show how different ``chunks=`` arguments will tile the tensor.
We will show how different ``chunk_size=`` arguments will tile the tensor.

``chunks=3``:
``chunk_size=3``:

.. code-block:: python
Expand All @@ -85,7 +85,7 @@ We will show how different ``chunks=`` arguments will tile the tensor.
4 2 4 6 2 0
6 8 2 6 5 4
``chunks=2``:
``chunk_size=2``:

.. code-block:: python
Expand All @@ -101,7 +101,7 @@ We will show how different ``chunks=`` arguments will tile the tensor.
4 2 4 6 2 0
6 8 2 6 5 4
``chunks=(3, 2)``:
``chunk_size=(3, 2)``:

.. code-block:: python
Expand All @@ -116,7 +116,7 @@ We will show how different ``chunks=`` arguments will tile the tensor.
4 2 4 6 2 0
6 8 2 6 5 4
``chunks=((3, 1, 2, 2), (3, 2, 1))``:
``chunk_size=((3, 1, 2, 2), (3, 2, 1))``:

.. code-block:: python
Expand Down
2 changes: 1 addition & 1 deletion docs/source/tensor/execution.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ More than one mars tensors can be passed to ``session.run``, and calculate the r

.. code-block:: python
>>> a = mt.ones((5, 5), chunks=3)
>>> a = mt.ones((5, 5), chunk_size=3)
>>> b = a + 1
>>> c = a * 4
>>> sess.run(b, c)
Expand Down
4 changes: 2 additions & 2 deletions mars/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -290,8 +290,8 @@ def validate(x):
})

# Tensor
default_options.register_option('tensor.chunks', None, validator=any_validator(is_null, is_integer), serialize=True)
default_options.register_option('tensor.chunk_size_limit', 128 * 1024 ** 2, validator=(is_integer, is_float))
default_options.register_option('tensor.chunk_size', None, validator=any_validator(is_null, is_integer), serialize=True)
default_options.register_option('tensor.chunk_store_limit', 128 * 1024 ** 2, validator=(is_integer, is_float))
default_options.register_option('tensor.rechunk.threshold', 4, validator=is_integer, serialize=True)
default_options.register_option('tensor.rechunk.chunk_size_limit', int(1e8), validator=is_integer, serialize=True)
default_options.register_option('tensor.combine_size', 4, validator=is_integer, serialize=True)
Expand Down
8 changes: 4 additions & 4 deletions mars/deploy/local/tests/test_cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ def testLocalCluster(self):
with new_session(endpoint) as session:
api = session._api

t = mt.ones((3, 3), chunks=2)
t = mt.ones((3, 3), chunk_size=2)
result = session.run(t)

np.testing.assert_array_equal(result, np.ones((3, 3)))
Expand All @@ -75,13 +75,13 @@ def testLocalCluster(self):
def testLocalClusterWithWeb(self):
with new_cluster(scheduler_n_process=2, worker_n_process=3, web=True) as cluster:
with cluster.session as session:
t = mt.ones((3, 3), chunks=2)
t = mt.ones((3, 3), chunk_size=2)
result = session.run(t)

np.testing.assert_array_equal(result, np.ones((3, 3)))

with new_session('http://' + cluster._web_endpoint) as session:
t = mt.ones((3, 3), chunks=2)
t = mt.ones((3, 3), chunk_size=2)
result = session.run(t)

np.testing.assert_array_equal(result, np.ones((3, 3)))
Expand Down Expand Up @@ -128,7 +128,7 @@ def testMultipleOutputTensorExecute(self):
with new_cluster(scheduler_n_process=2, worker_n_process=2) as cluster:
session = cluster.session

t = mt.random.rand(20, 5, chunks=5)
t = mt.random.rand(20, 5, chunk_size=5)
r = mt.linalg.svd(t)

res = session.run((t,) + r)
Expand Down
6 changes: 3 additions & 3 deletions mars/operands/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -304,13 +304,13 @@ class Rechunk(HasInput):
_op_type_ = OperandDef.RECHUNK

_input = KeyField('input')
_chunks = AnyField('chunks')
_chunk_size = AnyField('chunk_size')
_threshold = Int32Field('threshold')
_chunk_size_limit = Int64Field('chunk_size_limit')

@property
def chunks(self):
return self._chunks
def chunk_size(self):
return self._chunk_size

@property
def threshold(self):
Expand Down
8 changes: 4 additions & 4 deletions mars/scheduler/tests/test_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ def testGraphActor(self):
session_id = str(uuid.uuid4())
graph_key = str(uuid.uuid4())

arr = mt.random.randint(10, size=(10, 8), chunks=4)
arr_add = mt.random.randint(10, size=(10, 8), chunks=4)
arr = mt.random.randint(10, size=(10, 8), chunk_size=4)
arr_add = mt.random.randint(10, size=(10, 8), chunk_size=4)
arr2 = arr + arr_add

graph = arr2.build_graph(compose=False)
Expand Down Expand Up @@ -103,7 +103,7 @@ def testGraphWithSplit(self):
session_id = str(uuid.uuid4())
graph_key = str(uuid.uuid4())

arr = mt.ones(12, chunks=4)
arr = mt.ones(12, chunk_size=4)
arr_split = mt.split(arr, 2)
arr_sum = arr_split[0] + arr_split[1]

Expand Down Expand Up @@ -176,7 +176,7 @@ def testSameKey(self, *_):
session_id = str(uuid.uuid4())
graph_key = str(uuid.uuid4())

arr = mt.ones((5, 5), chunks=3)
arr = mt.ones((5, 5), chunk_size=3)
arr2 = mt.concatenate((arr, arr))

graph = arr2.build_graph(compose=False)
Expand Down
8 changes: 4 additions & 4 deletions mars/scheduler/tests/test_main.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,8 +139,8 @@ def testMain(self):
uid=SessionActor.gen_name(session_id),
address=scheduler_address,
session_id=session_id)
a = ones((100, 100), chunks=30) * 2 * 1 + 1
b = ones((100, 100), chunks=30) * 2 * 1 + 1
a = ones((100, 100), chunk_size=30) * 2 * 1 + 1
b = ones((100, 100), chunk_size=30) * 2 * 1 + 1
c = (a * b * 2 + 1).sum()
graph = c.build_graph()
targets = [c.key]
Expand All @@ -163,8 +163,8 @@ def testMain(self):
state = self.wait_for_termination(session_ref, graph_key)
self.assertEqual(state, GraphState.FAILED)

a = ones((100, 50), chunks=30) * 2 + 1
b = ones((50, 200), chunks=30) * 2 + 1
a = ones((100, 50), chunk_size=30) * 2 + 1
b = ones((50, 200), chunk_size=30) * 2 + 1
c = a.dot(b)
graph = c.build_graph()
targets = [c.key]
Expand Down
18 changes: 9 additions & 9 deletions mars/scheduler/tests/test_operand.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,8 +151,8 @@ def write_mock_meta():
@mock.patch(OperandActor.__module__ + '.OperandActor._get_raw_execution_ref')
@mock.patch(OperandActor.__module__ + '.OperandActor._free_worker_data')
def testOperandActor(self, *_):
arr = mt.random.randint(10, size=(10, 8), chunks=4)
arr_add = mt.random.randint(10, size=(10, 8), chunks=4)
arr = mt.random.randint(10, size=(10, 8), chunk_size=4)
arr_add = mt.random.randint(10, size=(10, 8), chunk_size=4)
arr2 = arr + arr_add

session_id = str(uuid.uuid4())
Expand All @@ -163,7 +163,7 @@ def testOperandActor(self, *_):
@mock.patch(OperandActor.__module__ + '.OperandActor._get_raw_execution_ref')
@mock.patch(OperandActor.__module__ + '.OperandActor._free_worker_data')
def testOperandActorWithSameKey(self, *_):
arr = mt.ones((5, 5), chunks=3)
arr = mt.ones((5, 5), chunk_size=3)
arr2 = mt.concatenate((arr, arr))

session_id = str(uuid.uuid4())
Expand All @@ -174,8 +174,8 @@ def testOperandActorWithSameKey(self, *_):
@mock.patch(OperandActor.__module__ + '.OperandActor._get_raw_execution_ref')
@mock.patch(OperandActor.__module__ + '.OperandActor._free_worker_data')
def testOperandActorWithRetry(self, *_):
arr = mt.random.randint(10, size=(10, 8), chunks=4)
arr_add = mt.random.randint(10, size=(10, 8), chunks=4)
arr = mt.random.randint(10, size=(10, 8), chunk_size=4)
arr_add = mt.random.randint(10, size=(10, 8), chunk_size=4)
arr2 = arr + arr_add

session_id = str(uuid.uuid4())
Expand All @@ -190,8 +190,8 @@ def testOperandActorWithRetry(self, *_):
@mock.patch(OperandActor.__module__ + '.OperandActor._get_raw_execution_ref')
@mock.patch(OperandActor.__module__ + '.OperandActor._free_worker_data')
def testOperandActorWithRetryAndFail(self, *_):
arr = mt.random.randint(10, size=(10, 8), chunks=4)
arr_add = mt.random.randint(10, size=(10, 8), chunks=4)
arr = mt.random.randint(10, size=(10, 8), chunk_size=4)
arr_add = mt.random.randint(10, size=(10, 8), chunk_size=4)
arr2 = arr + arr_add

session_id = str(uuid.uuid4())
Expand All @@ -209,8 +209,8 @@ def testOperandActorWithCancel(self, *_):
import logging
logging.basicConfig(level=logging.DEBUG)

arr = mt.random.randint(10, size=(10, 8), chunks=4)
arr_add = mt.random.randint(10, size=(10, 8), chunks=4)
arr = mt.random.randint(10, size=(10, 8), chunk_size=4)
arr_add = mt.random.randint(10, size=(10, 8), chunk_size=4)
arr2 = arr + arr_add

session_id = str(uuid.uuid4())
Expand Down

0 comments on commit 68c04d3

Please sign in to comment.