[SYCL] Fix memory deallocation size, fix 2 CTS test fails on Windows #702

v-klochkov · 2019-10-05T02:54:23Z

Signed-off-by: Vyacheslav N Klochkov vyacheslav.n.klochkov@intel.com

mibintc

Nice work! Can you add test cases for this? Is this unique to Windows? Why does it not fail on Linux?

v-klochkov · 2019-10-07T15:35:00Z

Thank you for the review and comments. I added a test case to buffer.cpp LIT test.

I suppose it did not fail on Linux because one of these
a) allocation is aligned to bigger number of bytes, thus no need to create a copy of the buffer, i.e. the original memory can be re-used.
b) I don't know, but Linux might just not care about the argument telling how many elements are deallocated.

mibintc

Thanks a lot for fixing this. I saw similar problems on OpenCL_interop_constructors and I hope it fixes those failures too.

keryell · 2019-10-07T16:32:29Z

sycl/include/CL/sycl/detail/sycl_mem_obj_t.hpp

+    size_t AllocatorValueSize = sizeof(allocator_value_type_t<AllocatorT>);
+    size_t AllocationCount = get_size() / AllocatorValueSize;
+    AllocationCount += (get_size() % AllocatorValueSize) ? 1 : 0;
+    return AllocationCount;


What about:

auto constexpr AllocatorValueSize = sizeof(allocator_value_type_t<AllocatorT>); return (get_size() + AllocatorValueSize - 1)/AllocatorValueSize;

?

Well, this looks better if we do not care about possible overflow on 32-bit targets in this expression: (get_size() + AllocatorValueSize - 1).

The current version of code may produce two DIV operations (depending on compiler), but that can be fixed easily by having this modified version of the original code (stored the result of get_size() to a var):

auto AllocatorValueSize = sizeof(allocator_value_type_t<AllocatorT>); auto Size = get_size(); auto AllocationCount = Size / AllocatorValueSize; return AllocationCount + (Size % AllocatorValueSize) ? 1 : 0;

Usually compilers can optimize two sequential divs: (A / B) and ( A % B) into one asm division operation.

Do you still recommend using your version? or the modified variant showed above?

If we have an overflow on get_size() + AllocatorValueSize - 1 we have some other problems to worry first... :-)
The problem is also the ?: that might be inefficient. Anyway, I hope that this is just replaced by some bit operations in my constexpr version.
But at the first place I am surprised that get_count() is computed from get_size() and not the opposite... Because you are focused on the allocated memory and not the number of elements of the object seen by the user (a buffer of n objects T for example)?
Anyway, I hope these functions are not on the critical path of a real application...

Ok, I changed the code to your version. Thank you.
Regarding keeping the number of elements allocated instead of the number of bytes..., I do not change it in this patch. It would require a separate more elaborate and risky change-set, talk to original author, and a good reason for fix.

Signed-off-by: Vyacheslav N Klochkov <vyacheslav.n.klochkov@intel.com>

…sts (#702) This allows to run Image tests currently supported by the CUDA BE even if the Image support is disabled by default. This follows #5256

…sts (intel#702) This allows to run Image tests currently supported by the CUDA BE even if the Image support is disabled by default. This follows intel#5256

v-klochkov requested review from romanovvlad and mibintc October 5, 2019 02:54

romanovvlad previously approved these changes Oct 7, 2019

View reviewed changes

mibintc suggested changes Oct 7, 2019

View reviewed changes

v-klochkov dismissed romanovvlad’s stale review via 79d5871 October 7, 2019 15:30

v-klochkov force-pushed the public_vklochkov_buffer_ctor branch from 22fef0c to 79d5871 Compare October 7, 2019 15:30

v-klochkov requested review from mibintc and romanovvlad October 7, 2019 15:35

mibintc previously approved these changes Oct 7, 2019

View reviewed changes

keryell reviewed Oct 7, 2019

View reviewed changes

[SYCL] Fix memory deallocation size, fix 2 CTS test fails on Windows

236b90d

Signed-off-by: Vyacheslav N Klochkov <vyacheslav.n.klochkov@intel.com>

v-klochkov dismissed mibintc’s stale review via 236b90d October 7, 2019 21:09

v-klochkov force-pushed the public_vklochkov_buffer_ctor branch from 79d5871 to 236b90d Compare October 7, 2019 21:09

romanovvlad approved these changes Oct 8, 2019

View reviewed changes

romanovvlad merged commit 866d634 into intel:sycl Oct 8, 2019

v-klochkov deleted the public_vklochkov_buffer_ctor branch October 10, 2019 19:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Fix memory deallocation size, fix 2 CTS test fails on Windows #702

[SYCL] Fix memory deallocation size, fix 2 CTS test fails on Windows #702

v-klochkov commented Oct 5, 2019

mibintc left a comment

v-klochkov commented Oct 7, 2019

mibintc left a comment

keryell Oct 7, 2019

v-klochkov Oct 7, 2019

keryell Oct 7, 2019

v-klochkov Oct 7, 2019

[SYCL] Fix memory deallocation size, fix 2 CTS test fails on Windows #702

[SYCL] Fix memory deallocation size, fix 2 CTS test fails on Windows #702

Conversation

v-klochkov commented Oct 5, 2019

mibintc left a comment

Choose a reason for hiding this comment

v-klochkov commented Oct 7, 2019

mibintc left a comment

Choose a reason for hiding this comment

keryell Oct 7, 2019

Choose a reason for hiding this comment

v-klochkov Oct 7, 2019

Choose a reason for hiding this comment

keryell Oct 7, 2019

Choose a reason for hiding this comment

v-klochkov Oct 7, 2019

Choose a reason for hiding this comment