-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SIMD backend arithmetic #592
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #592 +/- ##
==========================================
- Coverage 94.36% 91.56% -2.80%
==========================================
Files 69 74 +5
Lines 8689 9335 +646
Branches 8689 9335 +646
==========================================
+ Hits 8199 8548 +349
- Misses 420 717 +297
Partials 70 70 ☔ View full report in Codecov by Sentry. |
51d9333
to
d2ca30c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 9 files at r1, all commit messages.
Reviewable status: 2 of 9 files reviewed, 1 unresolved discussion (waiting on @andrewmilson)
crates/prover/src/core/backend/simd/m31.rs
line 54 at r1 (raw file):
/// Deinterleaves two vectors. pub fn deinterleave(self, other: Self) -> (Self, Self) { let (a, b) = self.0.deinterleave(other.0);
I thought I came up with this name!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 9 files reviewed, 5 unresolved discussions (waiting on @andrewmilson)
crates/prover/src/core/backend/simd/m31.rs
line 226 at r1 (raw file):
// Each c_i contains |0|prod_lo|prod_hi|0|0|prod_lo|prod_hi|0| let c0: u32x4 = unsafe { transmute(vqdmull_s32(a0, b0)) };
Can you put this in compiler explorer and check you get only the instruction you intended, and not some extra moves / swizzles?
crates/prover/src/core/backend/simd/m31.rs
line 236 at r1 (raw file):
// *_lo contain `|prod_lo|0|prod_lo|0|prod_lo0|0|prod_lo|0|`. let mut c0_c1_lo: u32x4 = LoEvensConcatHiEvens::concat_swizzle(c0, c1);
Don't you have deinterleave for this?
crates/prover/src/core/backend/simd/m31.rs
line 268 at r1 (raw file):
let [a0, a1, a2, a3]: [v128; 4] = unsafe { transmute(a) }; let b_double = b.0 << 1;
Is this the same as b.0+b.0? Couldn't that be more efficient?
crates/prover/src/core/backend/simd/m31.rs
line 445 at r1 (raw file):
PackedBaseField(prod_lows) + PackedBaseField(prod_highs) }
How can we test all of these architecture specific implementations?
a5e2491
to
206a5e3
Compare
d7bc78f
to
dbb8ada
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 9 files reviewed, 5 unresolved discussions (waiting on @spapinistarkware)
crates/prover/src/core/backend/simd/m31.rs
line 54 at r1 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
I thought I came up with this name!!
Lol. Funnily enough my VSCode spell checker doesn't think it's a word 😆
crates/prover/src/core/backend/simd/m31.rs
line 226 at r1 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
Can you put this in compiler explorer and check you get only the instruction you intended, and not some extra moves / swizzles?
All looks good: https://godbolt.org/z/nzjM4TY7M
crates/prover/src/core/backend/simd/m31.rs
line 236 at r1 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
Don't you have deinterleave for this?
True, thanks
crates/prover/src/core/backend/simd/m31.rs
line 268 at r1 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
Is this the same as b.0+b.0? Couldn't that be more efficient?
Yes, updated
crates/prover/src/core/backend/simd/m31.rs
line 445 at r1 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
How can we test all of these architecture specific implementations?
I meant to add a TODO (added now) to test these platforms in the CI.
- Will make sure target feature neon is enabled on our mac tests.
- On avx512 machine plan to run tests once each for target features:
-avx2,-avx512,-simd128
,+avx2,-avx512,-simd128
,-avx2,+avx512,-simd128
,-avx2,-avx512,+simd128
(wasm target wasm32-wasi)
d2ca30c
to
c3c6fd2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 9 files at r1.
Reviewable status: 5 of 9 files reviewed, 9 unresolved discussions (waiting on @andrewmilson and @spapinistarkware)
crates/prover/src/core/backend/simd/m31.rs
line 16 at r1 (raw file):
pub const LOG_N_LANES: u32 = 4; pub const N_LANES: usize = 1 << LOG_N_LANES;
In the avx we stored only
pub const K_BLOCK_SIZE: usize = 16;
Code quote:
pub const LOG_N_LANES: u32 = 4;
pub const N_LANES: usize = 1 << LOG_N_LANES;
crates/prover/src/core/backend/simd/m31.rs
line 76 at r1 (raw file):
/// /// Vector elements must be in the range `[0, P]`. pub unsafe fn from_simd(v: Simd<u32, N_LANES>) -> Self {
Suggestion:
pub unsafe fn from_simd_unchecked(v: Simd<u32, N_LANES>) -> Self {
crates/prover/src/core/backend/simd/m31.rs
line 107 at r1 (raw file):
// When c in [P,2P], then c-P in [0,P] which is always less than [P,2P]. // When c in [0,P-1], then c-P in [2^32-P,2^32-1] which is always greater than [0,P-1]. Self(Simd::simd_min(c, c - MODULUS))
Suggestion:
Can you call reduce instead?
and move the documentation there?
Code quote:
// Apply min(c, c-P) to each word.
// When c in [P,2P], then c-P in [0,P] which is always less than [P,2P].
// When c in [0,P-1], then c-P in [2^32-P,2^32-1] which is always greater than [0,P-1].
Self(Simd::simd_min(c, c - MODULUS))
crates/prover/src/core/backend/simd/m31.rs
line 118 at r1 (raw file):
} impl Mul for PackedBaseField {
Consider moving these _mul* implementations to a separated file
Code quote:
impl Mul for PackedBaseField {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 9 files reviewed, 9 unresolved discussions (waiting on @andrewmilson and @shaharsamocha7)
crates/prover/src/core/backend/simd/m31.rs
line 226 at r1 (raw file):
Previously, andrewmilson (Andrew Milson) wrote…
All looks good: https://godbolt.org/z/nzjM4TY7M
Doesn't look good to me
uAaA0MyaQJSyleLYTbSw1T4lKzcV4HZ3iwWQsSavZI9hJBAA%3D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 9 files reviewed, 9 unresolved discussions (waiting on @andrewmilson and @shaharsamocha7)
crates/prover/src/core/backend/simd/m31.rs
line 226 at r1 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
Doesn't look good to me
uAaA0MyaQJSyleLYTbSw1T4lKzcV4HZ3iwWQsSavZI9hJBAA%3D
i.e. the issue is that you transmute, and move between representations that are not a full vector (u32x2).
Try to keep all the algorithm using full vectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 9 files reviewed, 9 unresolved discussions (waiting on @shaharsamocha7 and @spapinistarkware)
crates/prover/src/core/backend/simd/m31.rs
line 226 at r1 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
i.e. the issue is that you transmute, and move between representations that are not a full vector (u32x2).
Try to keep all the algorithm using full vectors.
I don't think your example is fair since [u32; 16] has different alignment. What is the issue here https://godbolt.org/z/o9qE6bW86?
2c82ea1
to
d8abf68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 9 files reviewed, 8 unresolved discussions (waiting on @shaharsamocha7 and @spapinistarkware)
crates/prover/src/core/backend/simd/m31.rs
line 16 at r1 (raw file):
Previously, shaharsamocha7 wrote…
In the avx we stored only
pub const K_BLOCK_SIZE: usize = 16;
avx also has VECS_LOG_SIZE = 4
crates/prover/src/core/backend/simd/m31.rs
line 107 at r1 (raw file):
Previously, shaharsamocha7 wrote…
Suggestion:
Can you call reduce instead?
and move the documentation there?
The functionality is the same but the documentation is different. These docs are for reducing c in [0, 2P] to [0, P] but reduce docs are for [0, P] to [0, P)
crates/prover/src/core/backend/simd/m31.rs
line 118 at r1 (raw file):
Previously, shaharsamocha7 wrote…
Consider moving these _mul* implementations to a separated file
What's the perceived benefit? To me these are all functions relating to m31 arithmetic so think they belong in this file. Also I'm generally in favour of larger files - the standard library has many examples of 3,000+ line files.
crates/prover/src/core/backend/simd/m31.rs
line 445 at r1 (raw file):
Previously, andrewmilson (Andrew Milson) wrote…
I meant to add a TODO (added now) to test these platforms in the CI.
- Will make sure target feature neon is enabled on our mac tests.
- On avx512 machine plan to run tests once each for target features:
-avx2,-avx512,-simd128
,+avx2,-avx512,-simd128
,-avx2,+avx512,-simd128
,-avx2,-avx512,+simd128
(wasm target wasm32-wasi)
Can restrict the tests to core::backend::simd
d8abf68
to
063029c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 9 files reviewed, 3 unresolved discussions (waiting on @shaharsamocha7 and @spapinistarkware)
crates/prover/src/core/backend/simd/utils.rs
line 4 at r3 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
Where is this used? Isn't it replaced by interleave and deinterleav()?
It's used by bit_reverse16
. LoEvensConcatHiEvens
(now removed) was replaced by interleave and deinterleave
b8a5161
to
92fb94e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 9 files reviewed, 3 unresolved discussions (waiting on @shaharsamocha7 and @spapinistarkware)
crates/prover/src/core/backend/simd/qm31.rs
line 16 at r3 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r4.
Reviewable status: 8 of 9 files reviewed, 4 unresolved discussions (waiting on @andrewmilson and @spapinistarkware)
crates/prover/src/core/backend/simd/cm31.rs
line 65 at r4 (raw file):
let Self([a, b]) = self; Self([a.double(), b.double()]) }
Do we need those?
We don't have them in the AVX512 impl
Code quote:
/// Interleaves two vectors.
pub fn interleave(self, other: Self) -> (Self, Self) {
let Self([a_evens, b_evens]) = self;
let Self([a_odds, b_odds]) = other;
let (a_lhs, a_rhs) = a_evens.interleave(a_odds);
let (b_lhs, b_rhs) = b_evens.interleave(b_odds);
(Self([a_lhs, b_lhs]), Self([a_rhs, b_rhs]))
}
/// Deinterleaves two vectors.
pub fn deinterleave(self, other: Self) -> (Self, Self) {
let Self([a_self, b_self]) = self;
let Self([a_other, b_other]) = other;
let (a_evens, a_odds) = a_self.deinterleave(a_other);
let (b_evens, b_odds) = b_self.deinterleave(b_other);
(Self([a_evens, b_evens]), Self([a_odds, b_odds]))
}
/// Doubles each element in the vector.
pub fn double(self) -> Self {
let Self([a, b]) = self;
Self([a.double(), b.double()])
}
crates/prover/src/core/backend/simd/qm31.rs
line 219 at r4 (raw file):
iter.copied().sum() } }
Why do we need both?
Code quote:
impl Sum for PackedQM31 {
fn sum<I>(mut iter: I) -> Self
where
I: Iterator<Item = Self>,
{
let first = iter.next().unwrap_or_else(Self::zero);
iter.fold(first, |a, b| a + b)
}
}
impl<'a> Sum<&'a Self> for PackedQM31 {
fn sum<I>(iter: I) -> Self
where
I: Iterator<Item = &'a Self>,
{
iter.copied().sum()
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 8 of 9 files reviewed, 4 unresolved discussions (waiting on @shaharsamocha7 and @spapinistarkware)
crates/prover/src/core/backend/simd/cm31.rs
line 65 at r4 (raw file):
Previously, shaharsamocha7 wrote…
Do we need those?
We don't have them in the AVX512 impl
Interleave and deinterleave replace interleave_with and deinterleave_with. I added double since there is a an opportunity to have a more optimal implementation
crates/prover/src/core/backend/simd/qm31.rs
line 219 at r4 (raw file):
Previously, shaharsamocha7 wrote…
Why do we need both?
Just quality of life. It's nice to have the ability to just do thing.iter().sum(). Also it's consistent with the std library e.g. implements sum on usize and &usise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 8 of 9 files reviewed, 3 unresolved discussions (waiting on @andrewmilson and @spapinistarkware)
crates/prover/src/core/backend/simd/cm31.rs
line 65 at r4 (raw file):
Previously, andrewmilson (Andrew Milson) wrote…
Interleave and deinterleave replace interleave_with and deinterleave_with. I added double since there is a an opportunity to have a more optimal implementation
but in avx512 we only have those for base field. Do we also need it for CM31/QM31?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 8 of 9 files reviewed, 3 unresolved discussions (waiting on @shaharsamocha7 and @spapinistarkware)
crates/prover/src/core/backend/simd/cm31.rs
line 65 at r4 (raw file):
Previously, shaharsamocha7 wrote…
but in avx512 we only have those for base field. Do we also need it for CM31/QM31?
Yes, not in the code yet but I'm using CM31/QM31 interleave, deinterleave and double for GKR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @spapinistarkware)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't check the specific mul implementations.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @spapinistarkware)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @andrewmilson and @spapinistarkware)
crates/prover/src/core/backend/simd/utils.rs
line 6 at r4 (raw file):
pub struct LoLoInterleaveHiLo; impl<const N: usize> Swizzle<N> for LoLoInterleaveHiLo {
Consider to call those structs as follows:
InterleaveLow
InterleaveHigh
InterleaveEvens
InterleaveOdds
Code quote:
LoLoInterleaveHiLo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @spapinistarkware)
crates/prover/src/core/backend/simd/utils.rs
line 4 at r3 (raw file):
Previously, andrewmilson (Andrew Milson) wrote…
It's used by
bit_reverse16
.LoEvensConcatHiEvens
(now removed) was replaced by interleave and deinterleave
🤦♂️ my bad you're right. LoEvensConcatHiEvens
was replaced by deinterleave. LoEvensConcatHiEvens
LoLoInterleaveHiLo
was replaced by interleave. Updated
92fb94e
to
a82d991
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 7 of 9 files reviewed, 3 unresolved discussions (waiting on @shaharsamocha7 and @spapinistarkware)
crates/prover/src/core/backend/simd/utils.rs
line 6 at r4 (raw file):
Previously, shaharsamocha7 wrote…
Consider to call those structs as follows:
InterleaveLow
InterleaveHigh
InterleaveEvens
InterleaveOdds
Done.
d555471
to
7c485c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r5, all commit messages.
Reviewable status: 8 of 9 files reviewed, 4 unresolved discussions (waiting on @andrewmilson and @spapinistarkware)
crates/prover/src/core/backend/simd/m31.rs
line 263 at r6 (raw file):
} /// Returns `a * b`.
I wonder if this function should be defined that way.
Is b_double defined correctly? (Values of PackedM31 should be in [0,P], no?)
Suggestion:
/// Returns `a * b_double`.
crates/prover/src/core/backend/simd/utils.rs
line 17 at r6 (raw file):
} const fn parity_interleave<const N: usize>(odd: bool) -> [usize; N] {
Consider adding doc test to understand the indices it returns
Code quote:
const fn parity_interleave<const N: usize>(odd: bool) -> [usize; N] {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 8 of 9 files reviewed, 4 unresolved discussions (waiting on @shaharsamocha7 and @spapinistarkware)
crates/prover/src/core/backend/simd/m31.rs
line 263 at r6 (raw file):
Previously, shaharsamocha7 wrote…
I wonder if this function should be defined that way.
Is b_double defined correctly? (Values of PackedM31 should be in [0,P], no?)
My bad. Removed
crates/prover/src/core/backend/simd/utils.rs
line 17 at r6 (raw file):
Previously, shaharsamocha7 wrote…
Consider adding doc test to understand the indices it returns
I'd like to but these types are private and doc tests require public types.
7c485c7
to
d7299c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 9 files at r1, 2 of 3 files at r4, 1 of 2 files at r5, 1 of 1 files at r7, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @andrewmilson)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @andrewmilson)
This change is![Reviewable](https://camo.githubusercontent.com/23b05f5fb48215c989e92cc44cf6512512d083132bd3daf689867c8d9d386888/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)