Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build fails with Thrust 2.1: pinned_allocator.h removed #202

Open
torrance opened this issue Apr 26, 2023 · 7 comments
Open

Build fails with Thrust 2.1: pinned_allocator.h removed #202

torrance opened this issue Apr 26, 2023 · 7 comments

Comments

@torrance
Copy link

pinned_allocator.h was removed as part of pull request NVIDIA/thrust#1611, and a commit referenced from there mentions "Remove thrust::system::cuda::experimental::pinned_allocator.h, which has been deprecated for a long time."

I have no idea what it has been deprecated in favour of. An old issue suggests universal_host_pinned_allocator but this doesn't seem to actually exist anywhere.

What should it be replaced with?

@benbarsdell
Copy link
Collaborator

Apparently this is the replacement:

#include <thrust/system/cuda/memory.h>

using pinned_allocator = thrust::mr::stateless_resource_allocator<
    T, thrust::system::cuda::universal_host_pinned_memory_resource>;

@torrance
Copy link
Author

torrance commented Apr 27, 2023

At your suggestion, I've made the following change:

diff --git a/src/fft.cu b/src/fft.cu
index eeace96..26cd458 100644
--- a/src/fft.cu
+++ b/src/fft.cu
@@ -44,7 +44,7 @@
 #include "ArrayIndexer.cuh"
 #include <thrust/device_vector.h>
 #include <thrust/host_vector.h>
-#include <thrust/system/cuda/experimental/pinned_allocator.h>
+#include <thrust/system/cuda/memory.h>

 #include <cufft.h>
 #include <cufftXt.h>
@@ -63,9 +63,9 @@ class BFfft_impl {
        bool             _using_load_callback;
        thrust::device_vector<char> _dv_tmp_storage;
        thrust::device_vector<CallbackData> _dv_callback_data;
-       typedef thrust::cuda::experimental::pinned_allocator<CallbackData> pinned_allocator_type;
+       using pinned_allocator_type = thrust::mr::stateless_resource_allocator<CallbackData, thrust::universal_host_pinned_memory_resource>;
        thrust::host_vector<CallbackData, pinned_allocator_type> _hv_callback_data;

And that builds.

However, all FFT-related tests currently fail, specifically those using fftshift, which seems to be exactly where this host_vector is used (?). In these cases, the odata array is all zeros, suggesting something failed to transfer from host to device, or vice versa.

I can't be certain that's the cause, since this is my first time trying to build bifrost, but seems likely.

@torrance
Copy link
Author

torrance commented May 3, 2023

A little more investigation, and it turns out the entire callback that performs that fftshift isn't running.

What's incredible is that if I add an empty print statement to post_fftshift the callback works and is called:

diff --git a/src/fft_kernels.cu b/src/fft_kernels.cu
index 9aefa89..7ec352c 100644
--- a/src/fft_kernels.cu
+++ b/src/fft_kernels.cu
@@ -28,6 +28,7 @@
 
 #include "fft_kernels.h"
 #include "cuda.hpp"
+#include "stdio.h"
 
 __device__
 inline size_t pre_fftshift(size_t        offset,
@@ -56,6 +57,8 @@ inline Complex post_fftshift(size_t        offset,
        // For forward transforms with apply_fftshift=true, we cyclically shift
        //   the output data by phase-rotating the input data here.
        if( cb->do_fftshift && !cb->inverse ) {
+               if (offset == 0) printf("");
+
                for( int d=0; d<cb->ndim; ++d ) {
                        // Compute the index of this element along dimension d
                        // **TODO: 64-bit indexing support

What's more incredible, is that if I add this print statement to the parent function only, in this case callback_load_cf32, it doesn't work and nothing is printed. Only if the print is added to post_fftshift do both print statements print anything at all.

I have no idea what's going on here.

@jaycedowell
Copy link
Collaborator

I've updated the self-hosted runner to Ubuntu 20.04 and CUDA 12.0 and I'm now seeing this in the CI. I'm also getting a 'cuda/stream.hpp(85): error: namespace "cuda::std" has no member "runtime_error"' error there as well. Working through those locally, I get Bifrost to build, and I am seeing that all of the test_fft tests are failing with a lot of zero filled results.

I played around with this a little bit and ended up with fewer errors if I changed the declaration of CallbackData in fft_kernels.h to be a struct __attribute__((packed)) CallbackData. I'm not sure why this would matter but I now only get errors on the complex-to-real transform tests.

@jaycedowell
Copy link
Collaborator

I think my complex-to-real errors are from an older version of the test suite (I've been testing on "ibverb-support"). As of abee49a CI looks to be ok.

@torrance
Copy link
Author

I'm also getting a 'cuda/stream.hpp(85): error: namespace "cuda::std" has no member "runtime_error"' error there as well

Yes, I got that too and had to make it an absolute import.

I played around with this a little bit and ended up with fewer errors if I changed the declaration of CallbackData in fft_kernels.h to be a struct __attribute__((packed)) CallbackData. I'm not sure why this would matter but I now only get errors on the complex-to-real transform tests.

I can confirm this works for me too, however the compiler complains:

fft_kernels.h:109:13: warning: ignoring packed 
   attribute because of unpacked non-POD field ‘int_fastdiv 
   CallbackData::istrides [3]’
  109 |  int_fastdiv istrides[3]; // Note: Elements, not bytes

...so I'm not sure why it works, especially since the compiler is telling me it's being ignored (!).

@jaycedowell
Copy link
Collaborator

I also got that compiler warning. I'm hesitant to call this "solved" since it's not clear why this change makes any difference. But it does seem to yield the correct FFT results and it doesn't appear to cause any problems with earlier versions of CUDA. Maybe this is a "works for me".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants