Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use size_t/ssize_t to support large count transfers #135

Merged
merged 1 commit into from
Aug 8, 2019

Conversation

Akshay-Venkatesh
Copy link
Contributor

  • Applies fixes in send/recv path and temporary allocation path to use size_t/ssize_t types to support transfers of size > 1GB
  • should fix Error with recv_future with large data #133
    • fixed the cupy 2GB test that Benjamin pointed to

@Akshay-Venkatesh
Copy link
Contributor Author

cupy test with 2GB

$ UCX_RNDV_SCHEME=put_zcopy UCX_MEMTYPE_CACHE=n UCX_TLS=rc,cuda_copy py.test -vs tests/test_send_recv_obj.py::test_send_recv_cupy
======================================================== test session starts =========================================================
platform linux -- Python 3.7.2, pytest-4.3.0, py-1.8.0, pluggy-0.9.0 -- /home/akvenkatesh/py/install/bin/python3
cachedir: .pytest_cache
rootdir: /home/akvenkatesh/ucx-py, inifile:
plugins: repeat-0.8.0, asyncio-0.10.0
collected 1 item                                                                                                                     

tests/test_send_recv_obj.py::test_send_recv_cupy[2147483648] [1565132621.652003] [prm-dgx-30:46177:0]         parser.c:1487 UCX  WARN  unused env variables: UCX_HOME, UCX_PATH, UCX_PY_CUDA_PATH,... (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning)
PASSED

===================================================== 1 passed in 111.67 seconds =====================================================
a

numba test with 2GB:

$ UCX_RNDV_SCHEME=put_zcopy UCX_MEMTYPE_CACHE=n UCX_TLS=rc,cuda_copy py.test -vs tests/test_send_recv_obj.py::test_send_recv_numba
======================================================== test session starts =========================================================
platform linux -- Python 3.7.2, pytest-4.3.0, py-1.8.0, pluggy-0.9.0 -- /home/akvenkatesh/py/install/bin/python3
cachedir: .pytest_cache
rootdir: /home/akvenkatesh/ucx-py, inifile:
plugins: repeat-0.8.0, asyncio-0.10.0
collected 1 item                                                                                                                     

tests/test_send_recv_obj.py::test_send_recv_numba[2147483648] {'shape': (2147483648,), 'strides': (1,), 'data': (47183470526464, False), 'typestr': '|u1', 'version': 0}
[1565133104.835556] [prm-dgx-30:47109:0]         parser.c:1487 UCX  WARN  unused env variables: UCX_HOME, UCX_PATH, UCX_PY_CUDA_PATH,... (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning)
sending length = 2147483648
(1,)
CAI
{'shape': (2147483648,), 'typestr': '|u1', 'descr': [('', '|u1')], 'data': (47187765493760, False), 'version': 0}
PASSED

===================================================== 1 passed in 111.65 seconds =====================================================

@quasiben You mentioned issues with cudf. On that note, did you get around the issues you had faced with using datatype <i4 instead of |u1? Just wondering if the cudf is orthogonal to large count.

@mrocklin
Copy link
Collaborator

mrocklin commented Aug 6, 2019

The changes here look sensible to me, to the extent that I'm able to judge.

@Akshay-Venkatesh I'm now starting to think about how to handle larger buffers. Eventually we can probably cut these up on the dask side and send lots of moderately sized messages. I assume that the ideal size is somewhere in the 100MB-1GB range?

@mrocklin
Copy link
Collaborator

mrocklin commented Aug 6, 2019

(in general of course, I'm sure that this is highly dependent on hardware)

@Akshay-Venkatesh
Copy link
Contributor Author

@quasiben Tested up to 2GB transfers with numba and dataype <i4. No issues so far.

$ UCX_RNDV_SCHEME=put_zcopy UCX_MEMTYPE_CACHE=n UCX_TLS=rc,cuda_copy,cuda_ipc py.test -vs tests/test_send_recv_obj.py::test_send_recv_numba
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.7.2, pytest-4.3.0, py-1.8.0, pluggy-0.9.0 -- /home/akvenkatesh/py/install/bin/python3
cachedir: .pytest_cache
rootdir: /home/akvenkatesh/ucx-py, inifile:
plugins: repeat-0.8.0, asyncio-0.10.0
collected 4 items                                                                                                                                                                                                 

tests/test_send_recv_obj.py::test_send_recv_numba[67108864] {'shape': (67108864,), 'strides': (4,), 'data': (47264370262016, False), 'typestr': '<i4', 'version': 0}
[1565135030.856031] [prm-dgx-30:50482:0]         parser.c:1487 UCX  WARN  unused env variables: UCX_HOME, UCX_PATH, UCX_PY_CUDA_PATH,... (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning)
(4,)
{'shape': (67108864,), 'typestr': '<i4', 'descr': [('', '<i4')], 'data': (47265041350656, False), 'version': 0}
PASSED
tests/test_send_recv_obj.py::test_send_recv_numba[134217728] {'shape': (134217728,), 'strides': (4,), 'data': (47266786181120, False), 'typestr': '<i4', 'version': 0}
(4,)
{'shape': (134217728,), 'typestr': '<i4', 'descr': [('', '<i4')], 'data': (47267859922944, False), 'version': 0}
PASSED
tests/test_send_recv_obj.py::test_send_recv_numba[268435456] {'shape': (268435456,), 'strides': (4,), 'data': (47269504090112, False), 'typestr': '<i4', 'version': 0}
(4,)
{'shape': (268435456,), 'typestr': '<i4', 'descr': [('', '<i4')], 'data': (47271651573760, False), 'version': 0}
PASSED
tests/test_send_recv_obj.py::test_send_recv_numba[536870912] {'shape': (536870912,), 'strides': (4,), 'data': (47274906353664, False), 'typestr': '<i4', 'version': 0}
(4,)
{'shape': (536870912,), 'typestr': '<i4', 'descr': [('', '<i4')], 'data': (47279201320960, False), 'version': 0}
PASSED

============================================================================================ 4 passed in 85.36 seconds ============================================================================================

@Akshay-Venkatesh
Copy link
Contributor Author

@mrocklin You're right about the ideal range being hardware dependent. As long as the size of large buffers is <= physical memory limits, you shouldn't have to worry about having to slice the buffer inside dask.

This argument may change for cuda managed memory where you can allocate more than physical limits and it's not illegal to use counts as big as the allocation. I've not tested the case for out-of-core case but today managed memory transfers go through pipeline transfers and so the case should be supported.

Copy link
Member

@quasiben quasiben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All working 👍 Merging!

@quasiben quasiben merged commit bf85723 into devel Aug 8, 2019
@quasiben quasiben deleted the use-sizet-for-length branch October 14, 2019 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error with recv_future with large data
3 participants