Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace default stream for scalars and column factories usages (because of defaulted arguments) #14354

Merged
merged 13 commits into from
Dec 13, 2023

Conversation

karthikeyann
Copy link
Contributor

@karthikeyann karthikeyann commented Nov 2, 2023

Description

This PR contributes to #13744

Adds missing stream for scalars, column factories
Uses right set_null_mask for moving null masks instead of copy. (due to defaulted stream argument).

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@karthikeyann karthikeyann added libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 2, 2023
@karthikeyann karthikeyann requested a review from a team as a code owner November 2, 2023 18:45
Comment on lines 374 to 377
cudf::string_scalar newline{options.get_line_terminator(), true, stream};
auto p_str_col_w_nl = cudf::strings::detail::join_strings(str_column_view,
newline,
string_scalar("", false),
string_scalar("", false, stream),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline is defined separately while the 3rd parameter is defined inline. Can you make them consistent please?

Copy link
Contributor Author

@karthikeyann karthikeyann Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think, it's not a matter of being consistent. That temporary scalar is not required after this usage. So, it's not a named parameter. its scope ends after function call.
newline is used below.

Comment on lines 483 to 484
cudf::string_scalar(delimiter_str, true, stream),
narep,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One parameter is inline defined while the other is not. Please make them consistent.

Copy link
Contributor Author

@karthikeyann karthikeyann Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That temporary scalar is not required after this usage. So, it's not a named parameter. its scopes end after function call.
Here narep is used below this line too.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started a review earlier and didn't submit it -- sorry. Flushing comments for now, and I'll try to get back to completing the review soon.

empty_like(values),
0,
{},
cudf::get_default_stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is introducing a use of the default stream, which we want to avoid. There are public APIs accepting a user stream that can call this, which won't use the desired stream. We need to refactor all callers of this code path, and pass their stream through.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the stream relevant in this case? Is any allocation actually happening when creating an empty column?

Copy link
Contributor

@ttnghia ttnghia Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've filed an issue a while ago for it: #11463 and there is also another issue for it: #13737.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So before that issue is closed, we don't have any input stream to put into line 115 above and have to use the default stream anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groupby needs a bigger cleanup; I can exclude it as part of this PR. It should be a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed groupby from this PR.

cpp/src/io/csv/writer_impl.cu Outdated Show resolved Hide resolved
cpp/src/lists/dremel.cu Outdated Show resolved Hide resolved
@karthikeyann karthikeyann changed the title Add missing stream for scalars, column factories Replace default stream for scalars, column factories (because of defaulted arguments) Nov 14, 2023
@karthikeyann karthikeyann changed the title Replace default stream for scalars, column factories (because of defaulted arguments) Replace default stream for scalars and column factories (because of defaulted arguments) Nov 14, 2023
@karthikeyann karthikeyann changed the title Replace default stream for scalars and column factories (because of defaulted arguments) Replace default stream for scalars and column factories usages (because of defaulted arguments) Nov 14, 2023
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
Posted a few follow-up questions.

cpp/src/unary/cast_ops.cu Show resolved Hide resolved
cpp/src/table/table.cpp Outdated Show resolved Hide resolved
cpp/src/scalar/scalar.cpp Outdated Show resolved Hide resolved
bdice
bdice previously requested changes Nov 14, 2023
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not add new calls to cudf::get_default_stream() in internal implementations. @vuule just raised this here: https://github.com/rapidsai/cudf/pull/14354/files#r1393097787

It's the same problem as mentioned here: #14354 (comment)

Is there a way we can fix this in order, so that those calls have the stream passed through by their caller?

@vuule
Copy link
Contributor

vuule commented Nov 14, 2023

We should not add new calls to cudf::get_default_stream() in internal implementations. @vuule just raised this here: https://github.com/rapidsai/cudf/pull/14354/files#r1393097787

It's the same problem as mentioned here: #14354 (comment)

Is there a way we can fix this in order, so that those calls have the stream passed through by their caller?

I'm okay with explicitly passing the default stream in cases where we don't have the user stream available (yet). It makes the default stream use very visible so we'll find these places easily once we get to updating those APIs to take a stream. IMO this is a good change, even though it does not really change the stream.

@vyasr
Copy link
Contributor

vyasr commented Nov 14, 2023

I would recommend just adding a stream parameter with a default value and then forwarding that to the column constructor. Is there any reason we can't do that? It shouldn't break any existing APIs. While we're at it, we probably want to add a default mr parameter too. The table shouldn't be default copy-constructible.

I think either leaving the implicit default stream or adding an explicit argument to the column constructor is fine as a short-term solution, but since columns already support streams in the public API I don't see any reason to delay adding that support for tables.

@karthikeyann
Copy link
Contributor Author

As @vuule mentioned, default stream is added to make its use visible and hence allows us to replace it in future PRs.
@vyasr Adding default arguments (stream/mr) will not break existing APIs. I am also concerned about their future usage ignoring these defaults, which will again need to be fixed by similar cleanup-PRs (if they are not caught by existing stream tests). Is there any alternative or a tradeoff? Ideally, we should prefer to make it hard to use the APIs wrongly, and easy to use it in intended way.

I thought about explicit keyword for copy constructor of scalar (I think, it's similar for column as well). I feared if it might cause our code be too verbose. But I compiled libcudf with explicit for scalar; only scalar tests need to be updated. Also, this might be a breaking change, as users need to update their code to call the copy constructor explicitly.

Moving this PR for 24.02.

@karthikeyann karthikeyann changed the base branch from branch-23.12 to branch-24.02 November 15, 2023 19:16
@karthikeyann karthikeyann requested a review from a team as a code owner November 15, 2023 19:16
@bdice bdice dismissed their stale review November 16, 2023 00:46

Removing my blocking review, after further conversation.

Comment on lines 165 to 166
cudf::string_scalar(options_.get_true_value(), true, stream_),
cudf::string_scalar(options_.get_false_value(), true, stream_),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind making a change similar to the one in #14444?

@wence-
Copy link
Contributor

wence- commented Nov 20, 2023

None of the Python changes look like they belong in this PR. Is it because you were previously targeted to 23.12 and have some changes from there that are not yet in 24.02 due to the forward-merge no coming across yet?

@karthikeyann
Copy link
Contributor Author

karthikeyann commented Nov 28, 2023

@wence- pulled 24.02. python changes are gone.

@wence- wence- removed request for a team and shwina November 29, 2023 10:16
@wence-
Copy link
Contributor

wence- commented Nov 29, 2023

@wence- pulled 24.02. python changes are gone.

Thanks! Removed the now unnecessary codeowner review requests.

cpp/src/scalar/scalar.cpp Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_booleans.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_durations.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_floats.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_integers.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_ipv4.cu Outdated Show resolved Hide resolved
cpp/src/table/table.cpp Outdated Show resolved Hide resolved
cpp/src/scalar/scalar.cpp Outdated Show resolved Hide resolved
@karthikeyann
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 0698438 into rapidsai:branch-24.02 Dec 13, 2023
67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

6 participants