Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ExecutionSpace partitioning function #4096

Merged
merged 4 commits into from
Jul 19, 2021

Conversation

crtrott
Copy link
Member

@crtrott crtrott commented Jun 15, 2021

This allows the creation of multiple instances, but only CUDA implements an actual implementation, everyone else returns the same thing multiple times for now.

The design is as a customization point, which would allow for ADL hopefully.

// Customization point for backends
// Default behavior is to return the passed in instance
template <class ExecSpace, class... Args>
std::array<ExecSpace, sizeof...(Args)> partition_space(ExecSpace space,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we intentionally requiring people to partition into a compile-time sized set of args? Should we have a variant where this takes a vector of weights and returns a vector of instances?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm interesting question, we could have both, that said I didn't see a real need for that much runtime choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The partition count in EMPIRE will be runtime-determined

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, it could just call the current unweighted CUDA implementation that just makes separate streams one by one, but that's probably not the usage you were going for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, using std::vector now

void check_equalness(ExecSpace, ExecSpace) {}

#ifdef KOKKOS_ENABLE_CUDA
void check_equalness(Kokkos::Cuda exec1, Kokkos::Cuda exec2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps check_distinctive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

sum1);
Kokkos::parallel_reduce(
Kokkos::RangePolicy<TEST_EXECSPACE>(instances[1], 0, N), SumFunctor(),
sum2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This poses the same testing challenge as I noted in #4059 - confirming that various kernels actually ran on distinct execution spaces, and that they were at least hypothetically concurrent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the distinctive part is probably good enough for now

@crtrott
Copy link
Member Author

crtrott commented Jun 16, 2021

  • Allow runtime size via some vector type.
  • Return always std::vector.
  • No normalization requirement.
  • Support float and int, but make sure its either, not a mix?

This allows the creation of multiple instances,
only CUDA implements an actual implementation, everyone else
returns the same thing multiple times for now.
@crtrott
Copy link
Member Author

crtrott commented Jul 1, 2021

I addressed all the points: for the int/float thing I allow mix, I am using a fold expression to static_assert that they all are, that fold expresssion is protected by feature test macro.

@crtrott
Copy link
Member Author

crtrott commented Jul 1, 2021

@PhilMiller @DavidPoliakoff I think i addressed everything

@PhilMiller
Copy link
Contributor

For runtime-determined partition count, I think the partition functions need to take std::vector rather than variadic arguments, too?

@DavidPoliakoff
Copy link
Contributor

@PhilMiller yeah, I believe we discussed this in the meeting. Should either be a vector, or a template<template typename> thingy, and I recommend vector

Comment on lines 259 to 273
template <class... Args>
std::vector<Cuda> partition_space(Cuda space, Args...) {
std::vector<Cuda> instances(sizeof...(Args));
#ifdef __cpp_fold_expressions
static_assert(
(... && std::is_arithmetic_v<Args>),
"Kokkos Error: partitioning arguments must be integers or floats");
#endif
for (int s = 0; s < int(sizeof...(Args)); s++) {
cudaStream_t stream;
CUDA_SAFE_CALL(cudaStreamCreate(&stream));
instances[s] = Cuda(stream, true);
}
return instances;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/var/jenkins/workspace/Kokkos/core/src/Kokkos_Cuda.hpp:260:40: error: unused parameter 'space' [clang-diagnostic-unused-parameter]
std::vector<Cuda> partition_space(Cuda space, Args...) {
                                       ^
/var/jenkins/workspace/Kokkos/core/src/Kokkos_Cuda.hpp:269:5: error: use of undeclared identifier 'CUDA_SAFE_CALL' [clang-diagnostic-error]
    CUDA_SAFE_CALL(cudaStreamCreate(&stream));
    ^

@crtrott crtrott added this to In progress in Kokkos Release 3.5 Jul 14, 2021
@crtrott crtrott moved this from In progress to Awaiting Feedback in Kokkos Release 3.5 Jul 15, 2021
@crtrott crtrott force-pushed the partition_space branch 3 times, most recently from 62c6a23 to 62ef2f3 Compare July 16, 2021 20:24
Also moved the CUda overloads to the Instances header file.
@PhilMiller
Copy link
Contributor

I think the variadic version only needs to be implemented once, in the generic code, since it can then call the version taking a vector<T> weights and let the overload resolution happen there.

Copy link
Contributor

@PhilMiller PhilMiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will suite EMPIRE's needs well enough. The interpretation of the weights will become an interesting question down the line, but it'll be good to get this step integrated

Copy link
Contributor

@DavidPoliakoff DavidPoliakoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very lightly opposed to having a vector and a variadic overload. If anybody else wants to back this up, I'll change it to a request changes. But I think this is a good implementation of what we're aiming at

@crtrott crtrott merged commit ab87061 into kokkos:develop Jul 19, 2021
Kokkos Release 3.5 automation moved this from Awaiting Feedback to Done Jul 19, 2021
}

template <class T>
std::vector<HIP> partition_space(const HIP &, std::vector<T> &weights) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are weights by non-const reference?

};

template <class ExecSpace>
void check_distinctive(ExecSpace, ExecSpace) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean distinct from one another?


TEST(TEST_CATEGORY, partitioning_by_args) {
auto instances =
Kokkos::Experimental::partition_space(TEST_EXECSPACE(), 1, 1.);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we capture anywhere the rational for being able to mix types?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants