Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bit reverse #528

Merged
merged 14 commits into from
Jun 2, 2024
Merged

bit reverse #528

merged 14 commits into from
Jun 2, 2024

Conversation

nonam3e
Copy link
Contributor

@nonam3e nonam3e commented May 27, 2024

This PR adds bit reverse operation support to icicle

@@ -164,4 +189,66 @@ namespace vec_ops {

return CHK_LAST();
}

template <typename E>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

existing template for vec_ops cannot be reused here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted the inplace template, it's merged with bit_reverse

@nonam3e nonam3e marked this pull request as ready for review May 28, 2024 10:54
}
}
template <typename E>
__global__ void bit_reverse_inplace_kernel(E* input, unsigned n, unsigned shift)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could merge those two kernels like the implementation in ntt.cu is doing and simplify the bit_reverse() since it would not have to choose which one to call

}
unsigned shift = __builtin_clz(size) + 1;
unsigned num_blocks = (size + MAX_THREADS_PER_BLOCK - 1) / MAX_THREADS_PER_BLOCK;
if (input == output) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is interesting only if the input/output are on device.
if your input/output is not on device then anyway you can copy only the input, reverse in place and copy back to the host output. If input and output both on host it doesn't really matter if they are the same or not.

@@ -113,6 +113,27 @@ namespace vec_ops {
device_context::DeviceContext& ctx,
bool on_device,
bool is_async);

struct BitReverseConfig {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) looks like the vecOpsConfig so consider reusing it instead of adding a new struct.
(2) for inplace, input==output but the config may tell you that one is on device but the other is not. How do handle that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(2) I use only is_output_on_device for inplace because input isn't used in this scenario.

@jeremyfelder jeremyfelder mentioned this pull request May 29, 2024
let mut result = vec![F::one(); TEST_SIZE];
let result = HostSlice::from_mut_slice(&mut result);
let cfg = BitReverseConfig::default();
bit_reverse(&intermediate_result[..], &cfg, result).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically every function such that x=f(f(x) would pass this test, including f(x)=x.
Maybe you should verify that it really reversed? for example by using NTT NR followed by reverse, compared to NTT NN. Alternatively check that the values (or maybe a few random indices) moved how you expected


template <typename E>
cudaError_t bit_reverse(const E* input, unsigned size, BitReverseConfig& cfg, E* output)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you have a limit of 2^32-1 elements. Maybe use u64

@nonam3e nonam3e requested a review from yshekel May 29, 2024 17:01
@nonam3e nonam3e requested a review from jeremyfelder May 30, 2024 09:20
@ChickenLover ChickenLover merged commit 8e62bde into main Jun 2, 2024
25 checks passed
@ChickenLover ChickenLover deleted the timur/stwo/bit-reverse branch June 2, 2024 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants