Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Fix memory deallocation size, fix 2 CTS test fails on Windows #702

Merged
merged 1 commit into from
Oct 8, 2019

Conversation

v-klochkov
Copy link
Contributor

Signed-off-by: Vyacheslav N Klochkov vyacheslav.n.klochkov@intel.com

romanovvlad
romanovvlad previously approved these changes Oct 7, 2019
Copy link

@mibintc mibintc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Can you add test cases for this? Is this unique to Windows? Why does it not fail on Linux?

@v-klochkov
Copy link
Contributor Author

Thank you for the review and comments. I added a test case to buffer.cpp LIT test.

I suppose it did not fail on Linux because one of these
a) allocation is aligned to bigger number of bytes, thus no need to create a copy of the buffer, i.e. the original memory can be re-used.
b) I don't know, but Linux might just not care about the argument telling how many elements are deallocated.

mibintc
mibintc previously approved these changes Oct 7, 2019
Copy link

@mibintc mibintc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for fixing this. I saw similar problems on OpenCL_interop_constructors and I hope it fixes those failures too.

size_t AllocatorValueSize = sizeof(allocator_value_type_t<AllocatorT>);
size_t AllocationCount = get_size() / AllocatorValueSize;
AllocationCount += (get_size() % AllocatorValueSize) ? 1 : 0;
return AllocationCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

auto constexpr AllocatorValueSize = sizeof(allocator_value_type_t<AllocatorT>);
return (get_size() + AllocatorValueSize - 1)/AllocatorValueSize;

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this looks better if we do not care about possible overflow on 32-bit targets in this expression: (get_size() + AllocatorValueSize - 1).

The current version of code may produce two DIV operations (depending on compiler), but that can be fixed easily by having this modified version of the original code (stored the result of get_size() to a var):

auto AllocatorValueSize = sizeof(allocator_value_type_t<AllocatorT>);
auto Size = get_size();
auto AllocationCount = Size / AllocatorValueSize;
return AllocationCount + (Size % AllocatorValueSize) ? 1 : 0;

Usually compilers can optimize two sequential divs: (A / B) and ( A % B) into one asm division operation.

Do you still recommend using your version? or the modified variant showed above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have an overflow on get_size() + AllocatorValueSize - 1 we have some other problems to worry first... :-)
The problem is also the ?: that might be inefficient. Anyway, I hope that this is just replaced by some bit operations in my constexpr version.
But at the first place I am surprised that get_count() is computed from get_size() and not the opposite... Because you are focused on the allocated memory and not the number of elements of the object seen by the user (a buffer of n objects T for example)?
Anyway, I hope these functions are not on the critical path of a real application...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I changed the code to your version. Thank you.
Regarding keeping the number of elements allocated instead of the number of bytes..., I do not change it in this patch. It would require a separate more elaborate and risky change-set, talk to original author, and a good reason for fix.

Signed-off-by: Vyacheslav N Klochkov <vyacheslav.n.klochkov@intel.com>
@romanovvlad romanovvlad merged commit 866d634 into intel:sycl Oct 8, 2019
@v-klochkov v-klochkov deleted the public_vklochkov_buffer_ctor branch October 10, 2019 19:22
aelovikov-intel pushed a commit that referenced this pull request Feb 17, 2023
…sts (#702)

This allows to run Image tests currently supported by the CUDA BE
even if the Image support is disabled by default.

This follows #5256
aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Feb 23, 2023
…sts (intel#702)

This allows to run Image tests currently supported by the CUDA BE
even if the Image support is disabled by default.

This follows intel#5256
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants