Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add device_uvector::reserve and device_buffer::reserve #1079

Merged

Conversation

upsj
Copy link
Contributor

@upsj upsj commented Aug 3, 2022

I am building a parser that outputs variable-sized blocks of data. To collect them, I would like to use pre-allocated device_uvectors, using size() to keep track of how much memory is already in use. Setting the capacity and size manually works at the moment by calling vec.resize(capacity, stream); vec.resize(size, stream); on an empty vector, but this seems unnecessarily complicated. Since device_uvector otherwise already closely matches the std::vector interface, I want to propose adding reserve to the interface.

TODO:

  • Add to Python interface

@upsj upsj requested a review from a team as a code owner August 3, 2022 07:00
@upsj upsj requested review from harrism and jrhemstad August 3, 2022 07:00
@github-actions github-actions bot added the cpp Pertains to C++ code label Aug 3, 2022
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a reasonable addition. One comment / question about making it a little cleaner.

include/rmm/device_buffer.hpp Outdated Show resolved Hide resolved
@upsj upsj requested a review from a team as a code owner August 3, 2022 11:36
@github-actions github-actions bot added the Python Related to RMM Python API label Aug 3, 2022
Comment on lines 303 to 306
auto tmp = device_buffer{new_size, stream, _mr};
RMM_CUDA_TRY(
cudaMemcpyAsync(new_data, data(), size(), cudaMemcpyDefault, this->stream().value()));
deallocate_async();
_data = new_data;
_size = new_size;
_capacity = new_size;
cudaMemcpyAsync(tmp.data(), data(), size(), cudaMemcpyDefault, this->stream().value()));
std::swap(tmp, *this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have sworn I had a reason for not using CAS here originally, but now I'm struggling to remember what it could have been.

Oh well ¯_(ツ)_/¯

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work correctly without adding/specializing a swap function for device_buffer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have this question. From what I can tell std::swap should be safe, but I may be missing something. Also is this "CAS"? There's no comparison, just a swap, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also is this "CAS"?

Sorry, overloaded acronym. "copy and swap"

Does this work correctly without adding/specializing a swap function for device_buffer?

Yeah, the default implementation of swap will just use the move ctor: https://stackoverflow.com/a/25286610/11341974

Copy link
Contributor

@bdice bdice Aug 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I read the link about the behavior of swap, and that all seems fine. But could we just move-assign the new buffer to this instance instead of swapping?

Copy link
Contributor Author

@upsj upsj Aug 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine as well. No need to move around the old buffer ;) Changed it to std::move

include/rmm/device_uvector.hpp Outdated Show resolved Hide resolved
Comment on lines 303 to 306
auto tmp = device_buffer{new_size, stream, _mr};
RMM_CUDA_TRY(
cudaMemcpyAsync(new_data, data(), size(), cudaMemcpyDefault, this->stream().value()));
deallocate_async();
_data = new_data;
_size = new_size;
_capacity = new_size;
cudaMemcpyAsync(tmp.data(), data(), size(), cudaMemcpyDefault, this->stream().value()));
std::swap(tmp, *this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have this question. From what I can tell std::swap should be safe, but I may be missing something. Also is this "CAS"? There's no comparison, just a swap, right?

@bdice bdice added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Aug 4, 2022
auto tmp = device_buffer{new_capacity, stream, _mr};
auto const old_size = size();
RMM_CUDA_TRY(cudaMemcpyAsync(tmp.data(), data(), size(), cudaMemcpyDefault, stream.value()));
*this = std::move(tmp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you just move tmp over *this, is the old memory of this properly deallocated? I want to ensure there is no memory leak. With swap, it's obvious to me there is no leak because tmp is on the stack and when it goes out of scope it will be destroyed, taking the old memory of this with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the old memory of this properly deallocated?

Yep, it will invoke the move assignment operator of device_buffer which will deallocate the original memory.

/**
* @brief Move assignment operator moves the contents from `other`.
*
* This `device_buffer`'s current device memory allocation will be deallocated
* on `stream()`.
*
* If a different stream is required, call `set_stream()` on
* the instance before assignment. After assignment, this instance's stream is
* replaced by the `other.stream()`.
*
* @param other The `device_buffer` whose contents will be moved.
*/
device_buffer& operator=(device_buffer&& other) noexcept
{
if (&other != this) {
deallocate_async();
_data = other._data;
_size = other._size;
_capacity = other._capacity;
set_stream(other.stream());
_mr = other._mr;
other._data = nullptr;
other._size = 0;
other._capacity = 0;
other.set_stream(cuda_stream_view{});
}
return *this;
}

Copy link
Contributor

@bdice bdice Aug 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the move is safe. However I have a question about streams. Should the old buffer be destroyed on the same stream as the copy (to ensure the copy is complete), or the stream the old buffer was constructed with (current behavior)? Should the copy always occur on the stream used to construct the original buffer to ensure that the reserve sequences after the constructor’s allocation? (I don’t remember seeing an explicit sync but need to look again.) Is this solved by the call to set_stream?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this solved by the call to set_stream?

Yes. Everything here is happening all on the provided stream argument.

It would be an error to construct a device_buffer on s1 and reserve on s2 without first doing a synchronization between s1 and s2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic. I wanted to be sure I understood that correctly. Approving now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all. Sounds right.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reservations from me! Thanks @upsj!

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @upsj !

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @upsj !

@harrism harrism added this to PR-WIP in v22.10 Release via automation Aug 6, 2022
@harrism harrism changed the title Add device_uvector::reserve Add device_uvector::reserve and device_buffer::reserve Aug 6, 2022
@harrism
Copy link
Member

harrism commented Aug 6, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit d3b1dfb into rapidsai:branch-22.10 Aug 6, 2022
v22.10 Release automation moved this from PR-WIP to Done Aug 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp Pertains to C++ code improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Related to RMM Python API
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants