Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] Set GPU device ID in threads #6028

Merged
merged 3 commits into from Aug 13, 2023
Merged

Conversation

shiyu1994
Copy link
Collaborator

This is to fix the issue proposed in #6018. As reported in the issue, currently illegal memory access may arise when using a gpu_device_id > 0 for cuda tree learner. This is because some CUDA memory is allocated in threads. Though the device ID is set in the main thread, it is not set in the newly created threads. The allocated CUDA memory in those threads may reside in GPU 0 by default, which is different with the gpu_device_id. Accessing to such memory on gpu_device_id may cause illegal memory access.

Here's one example,

#pragma omp parallel for schedule(static) num_threads(num_threads_)
for (int column_index = 0; column_index < num_columns_; ++column_index) {
OMP_LOOP_EX_BEGIN();
const int8_t bit_type = column_bit_type[column_index];
if (column_data[column_index] != nullptr) {
// is dense column
if (bit_type == 4) {
column_bit_type_[column_index] = 8;
InitOneColumnData<false, true, uint8_t>(column_data[column_index], nullptr, &data_by_column_[column_index]);
} else if (bit_type == 8) {
InitOneColumnData<false, false, uint8_t>(column_data[column_index], nullptr, &data_by_column_[column_index]);
} else if (bit_type == 16) {
InitOneColumnData<false, false, uint16_t>(column_data[column_index], nullptr, &data_by_column_[column_index]);
} else if (bit_type == 32) {
InitOneColumnData<false, false, uint32_t>(column_data[column_index], nullptr, &data_by_column_[column_index]);
} else {
Log::Fatal("Unknow column bit type %d", bit_type);
}

@@ -224,6 +223,7 @@ void CUDAColumnData::ResizeWhenCopySubrow(const data_size_t num_used_indices) {
#pragma omp parallel for schedule(static) num_threads(num_threads_)
for (int column_index = 0; column_index < num_columns_; ++column_index) {
OMP_LOOP_EX_BEGIN();
SetCUDADevice(gpu_device_id_, __FILE__, __LINE__);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to set this inside loop?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move outside. d4695ff

@jameslamb jameslamb changed the title [CUDA][fix] Set GPU device ID in threads [CUDA] Set GPU device ID in threads Aug 11, 2023
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough explanation! It's really helpful for my understanding 😊

@jameslamb
Copy link
Collaborator

I just merged latest master into this, to get the fixes from #6032.

@shiyu1994 , could you go into the repo settings and check this "Always suggest updating pull request branches" box?

image

That would add a button that you can click on PRs to merge master in them, so there wouldn't be a need to pull and push locally. I've found that saves a bit of work in other projects I work on.

@shiyu1994
Copy link
Collaborator Author

could you go into the repo settings and check this "Always suggest updating pull request branches" box?

Done.

@shiyu1994 shiyu1994 merged commit 5c9e61d into master Aug 13, 2023
41 checks passed
@shiyu1994 shiyu1994 deleted the cuda/fix-gpu-device-id-bug branch August 13, 2023 15:14
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants