-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bit reverse #528
bit reverse #528
Conversation
@@ -164,4 +189,66 @@ namespace vec_ops { | |||
|
|||
return CHK_LAST(); | |||
} | |||
|
|||
template <typename E> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
existing template for vec_ops cannot be reused here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted the inplace template, it's merged with bit_reverse
icicle/src/vec_ops/vec_ops.cu
Outdated
} | ||
} | ||
template <typename E> | ||
__global__ void bit_reverse_inplace_kernel(E* input, unsigned n, unsigned shift) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could merge those two kernels like the implementation in ntt.cu is doing and simplify the bit_reverse() since it would not have to choose which one to call
icicle/src/vec_ops/vec_ops.cu
Outdated
} | ||
unsigned shift = __builtin_clz(size) + 1; | ||
unsigned num_blocks = (size + MAX_THREADS_PER_BLOCK - 1) / MAX_THREADS_PER_BLOCK; | ||
if (input == output) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is interesting only if the input/output are on device.
if your input/output is not on device then anyway you can copy only the input, reverse in place and copy back to the host output. If input and output both on host it doesn't really matter if they are the same or not.
@@ -113,6 +113,27 @@ namespace vec_ops { | |||
device_context::DeviceContext& ctx, | |||
bool on_device, | |||
bool is_async); | |||
|
|||
struct BitReverseConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) looks like the vecOpsConfig so consider reusing it instead of adding a new struct.
(2) for inplace, input==output but the config may tell you that one is on device but the other is not. How do handle that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(2) I use only is_output_on_device
for inplace because input
isn't used in this scenario.
let mut result = vec![F::one(); TEST_SIZE]; | ||
let result = HostSlice::from_mut_slice(&mut result); | ||
let cfg = BitReverseConfig::default(); | ||
bit_reverse(&intermediate_result[..], &cfg, result).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically every function such that x=f(f(x) would pass this test, including f(x)=x.
Maybe you should verify that it really reversed? for example by using NTT NR followed by reverse, compared to NTT NN. Alternatively check that the values (or maybe a few random indices) moved how you expected
icicle/src/vec_ops/vec_ops.cu
Outdated
|
||
template <typename E> | ||
cudaError_t bit_reverse(const E* input, unsigned size, BitReverseConfig& cfg, E* output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that you have a limit of 2^32-1 elements. Maybe use u64
This PR adds bit reverse operation support to icicle