-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Fix memory deallocation size, fix 2 CTS test fails on Windows #702
[SYCL] Fix memory deallocation size, fix 2 CTS test fails on Windows #702
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Can you add test cases for this? Is this unique to Windows? Why does it not fail on Linux?
22fef0c
to
79d5871
Compare
Thank you for the review and comments. I added a test case to buffer.cpp LIT test. I suppose it did not fail on Linux because one of these |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for fixing this. I saw similar problems on OpenCL_interop_constructors and I hope it fixes those failures too.
size_t AllocatorValueSize = sizeof(allocator_value_type_t<AllocatorT>); | ||
size_t AllocationCount = get_size() / AllocatorValueSize; | ||
AllocationCount += (get_size() % AllocatorValueSize) ? 1 : 0; | ||
return AllocationCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about:
auto constexpr AllocatorValueSize = sizeof(allocator_value_type_t<AllocatorT>);
return (get_size() + AllocatorValueSize - 1)/AllocatorValueSize;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this looks better if we do not care about possible overflow on 32-bit targets in this expression: (get_size() + AllocatorValueSize - 1).
The current version of code may produce two DIV operations (depending on compiler), but that can be fixed easily by having this modified version of the original code (stored the result of get_size() to a var):
auto AllocatorValueSize = sizeof(allocator_value_type_t<AllocatorT>);
auto Size = get_size();
auto AllocationCount = Size / AllocatorValueSize;
return AllocationCount + (Size % AllocatorValueSize) ? 1 : 0;
Usually compilers can optimize two sequential divs: (A / B) and ( A % B) into one asm division operation.
Do you still recommend using your version? or the modified variant showed above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have an overflow on get_size() + AllocatorValueSize - 1
we have some other problems to worry first... :-)
The problem is also the ?:
that might be inefficient. Anyway, I hope that this is just replaced by some bit operations in my constexpr
version.
But at the first place I am surprised that get_count()
is computed from get_size()
and not the opposite... Because you are focused on the allocated memory and not the number of elements of the object seen by the user (a buffer of n
objects T
for example)?
Anyway, I hope these functions are not on the critical path of a real application...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I changed the code to your version. Thank you.
Regarding keeping the number of elements allocated instead of the number of bytes..., I do not change it in this patch. It would require a separate more elaborate and risky change-set, talk to original author, and a good reason for fix.
Signed-off-by: Vyacheslav N Klochkov <vyacheslav.n.klochkov@intel.com>
79d5871
to
236b90d
Compare
…sts (intel#702) This allows to run Image tests currently supported by the CUDA BE even if the Image support is disabled by default. This follows intel#5256
Signed-off-by: Vyacheslav N Klochkov vyacheslav.n.klochkov@intel.com