Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SIMD backend arithmetic #592

Merged
merged 1 commit into from
May 15, 2024
Merged

Conversation

andrewmilson
Copy link
Contributor

@andrewmilson andrewmilson commented May 1, 2024

This change is Reviewable

@codecov-commenter
Copy link

codecov-commenter commented May 1, 2024

Codecov Report

Attention: Patch coverage is 54.02477% with 297 lines in your changes are missing coverage. Please review.

Project coverage is 91.56%. Comparing base (89b4fcd) to head (d7299c5).
Report is 2 commits behind head on dev.

Files Patch % Lines
crates/prover/src/core/backend/simd/m31.rs 50.32% 155 Missing ⚠️
crates/prover/src/core/backend/simd/qm31.rs 52.27% 84 Missing ⚠️
crates/prover/src/core/backend/simd/cm31.rs 62.59% 49 Missing ⚠️
crates/prover/src/core/backend/simd/utils.rs 69.23% 8 Missing ⚠️
crates/prover/src/core/backend/simd/mod.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #592      +/-   ##
==========================================
- Coverage   94.36%   91.56%   -2.80%     
==========================================
  Files          69       74       +5     
  Lines        8689     9335     +646     
  Branches     8689     9335     +646     
==========================================
+ Hits         8199     8548     +349     
- Misses        420      717     +297     
  Partials       70       70              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@spapinistarkware spapinistarkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 9 files at r1, all commit messages.
Reviewable status: 2 of 9 files reviewed, 1 unresolved discussion (waiting on @andrewmilson)


crates/prover/src/core/backend/simd/m31.rs line 54 at r1 (raw file):

    /// Deinterleaves two vectors.
    pub fn deinterleave(self, other: Self) -> (Self, Self) {
        let (a, b) = self.0.deinterleave(other.0);

I thought I came up with this name!!

Copy link
Collaborator

@spapinistarkware spapinistarkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 9 files reviewed, 5 unresolved discussions (waiting on @andrewmilson)


crates/prover/src/core/backend/simd/m31.rs line 226 at r1 (raw file):

    // Each c_i contains |0|prod_lo|prod_hi|0|0|prod_lo|prod_hi|0|
    let c0: u32x4 = unsafe { transmute(vqdmull_s32(a0, b0)) };

Can you put this in compiler explorer and check you get only the instruction you intended, and not some extra moves / swizzles?


crates/prover/src/core/backend/simd/m31.rs line 236 at r1 (raw file):

    // *_lo contain `|prod_lo|0|prod_lo|0|prod_lo0|0|prod_lo|0|`.
    let mut c0_c1_lo: u32x4 = LoEvensConcatHiEvens::concat_swizzle(c0, c1);

Don't you have deinterleave for this?


crates/prover/src/core/backend/simd/m31.rs line 268 at r1 (raw file):

    let [a0, a1, a2, a3]: [v128; 4] = unsafe { transmute(a) };

    let b_double = b.0 << 1;

Is this the same as b.0+b.0? Couldn't that be more efficient?


crates/prover/src/core/backend/simd/m31.rs line 445 at r1 (raw file):

    PackedBaseField(prod_lows) + PackedBaseField(prod_highs)
}

How can we test all of these architecture specific implementations?

@andrewmilson andrewmilson force-pushed the andrew/dev/field_rand branch 2 times, most recently from d7bc78f to dbb8ada Compare May 1, 2024 14:20
Base automatically changed from andrew/dev/field_rand to dev May 1, 2024 14:50
Copy link
Contributor Author

@andrewmilson andrewmilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 9 files reviewed, 5 unresolved discussions (waiting on @spapinistarkware)


crates/prover/src/core/backend/simd/m31.rs line 54 at r1 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

I thought I came up with this name!!

Lol. Funnily enough my VSCode spell checker doesn't think it's a word 😆


crates/prover/src/core/backend/simd/m31.rs line 226 at r1 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

Can you put this in compiler explorer and check you get only the instruction you intended, and not some extra moves / swizzles?

All looks good: https://godbolt.org/z/nzjM4TY7M


crates/prover/src/core/backend/simd/m31.rs line 236 at r1 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

Don't you have deinterleave for this?

True, thanks


crates/prover/src/core/backend/simd/m31.rs line 268 at r1 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

Is this the same as b.0+b.0? Couldn't that be more efficient?

Yes, updated


crates/prover/src/core/backend/simd/m31.rs line 445 at r1 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

How can we test all of these architecture specific implementations?

I meant to add a TODO (added now) to test these platforms in the CI.

  • Will make sure target feature neon is enabled on our mac tests.
  • On avx512 machine plan to run tests once each for target features: -avx2,-avx512,-simd128, +avx2,-avx512,-simd128, -avx2,+avx512,-simd128, -avx2,-avx512,+simd128 (wasm target wasm32-wasi)

Copy link
Collaborator

@shaharsamocha7 shaharsamocha7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 9 files at r1.
Reviewable status: 5 of 9 files reviewed, 9 unresolved discussions (waiting on @andrewmilson and @spapinistarkware)


crates/prover/src/core/backend/simd/m31.rs line 16 at r1 (raw file):

pub const LOG_N_LANES: u32 = 4;

pub const N_LANES: usize = 1 << LOG_N_LANES;

In the avx we stored only
pub const K_BLOCK_SIZE: usize = 16;

Code quote:

pub const LOG_N_LANES: u32 = 4;

pub const N_LANES: usize = 1 << LOG_N_LANES;

crates/prover/src/core/backend/simd/m31.rs line 76 at r1 (raw file):

    ///
    /// Vector elements must be in the range `[0, P]`.
    pub unsafe fn from_simd(v: Simd<u32, N_LANES>) -> Self {

Suggestion:

    pub unsafe fn from_simd_unchecked(v: Simd<u32, N_LANES>) -> Self {

crates/prover/src/core/backend/simd/m31.rs line 107 at r1 (raw file):

        // When c in [P,2P], then c-P in [0,P] which is always less than [P,2P].
        // When c in [0,P-1], then c-P in [2^32-P,2^32-1] which is always greater than [0,P-1].
        Self(Simd::simd_min(c, c - MODULUS))

Suggestion:
Can you call reduce instead?
and move the documentation there?

Code quote:

        // Apply min(c, c-P) to each word.
        // When c in [P,2P], then c-P in [0,P] which is always less than [P,2P].
        // When c in [0,P-1], then c-P in [2^32-P,2^32-1] which is always greater than [0,P-1].
        Self(Simd::simd_min(c, c - MODULUS))

crates/prover/src/core/backend/simd/m31.rs line 118 at r1 (raw file):

}

impl Mul for PackedBaseField {

Consider moving these _mul* implementations to a separated file

Code quote:

impl Mul for PackedBaseField {

Copy link
Collaborator

@spapinistarkware spapinistarkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 9 files reviewed, 9 unresolved discussions (waiting on @andrewmilson and @shaharsamocha7)


crates/prover/src/core/backend/simd/m31.rs line 226 at r1 (raw file):

Previously, andrewmilson (Andrew Milson) wrote…

All looks good: https://godbolt.org/z/nzjM4TY7M

Doesn't look good to me
uAaA0MyaQJSyleLYTbSw1T4lKzcV4HZ3iwWQsSavZI9hJBAA%3D

Copy link
Collaborator

@spapinistarkware spapinistarkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 9 files reviewed, 9 unresolved discussions (waiting on @andrewmilson and @shaharsamocha7)


crates/prover/src/core/backend/simd/m31.rs line 226 at r1 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

Doesn't look good to me
uAaA0MyaQJSyleLYTbSw1T4lKzcV4HZ3iwWQsSavZI9hJBAA%3D

i.e. the issue is that you transmute, and move between representations that are not a full vector (u32x2).
Try to keep all the algorithm using full vectors.

Copy link
Contributor Author

@andrewmilson andrewmilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 9 files reviewed, 9 unresolved discussions (waiting on @shaharsamocha7 and @spapinistarkware)


crates/prover/src/core/backend/simd/m31.rs line 226 at r1 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

i.e. the issue is that you transmute, and move between representations that are not a full vector (u32x2).
Try to keep all the algorithm using full vectors.

I don't think your example is fair since [u32; 16] has different alignment. What is the issue here https://godbolt.org/z/o9qE6bW86?

Copy link
Contributor Author

@andrewmilson andrewmilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 9 files reviewed, 8 unresolved discussions (waiting on @shaharsamocha7 and @spapinistarkware)


crates/prover/src/core/backend/simd/m31.rs line 16 at r1 (raw file):

Previously, shaharsamocha7 wrote…

In the avx we stored only
pub const K_BLOCK_SIZE: usize = 16;

avx also has VECS_LOG_SIZE = 4


crates/prover/src/core/backend/simd/m31.rs line 107 at r1 (raw file):

Previously, shaharsamocha7 wrote…

Suggestion:
Can you call reduce instead?
and move the documentation there?

The functionality is the same but the documentation is different. These docs are for reducing c in [0, 2P] to [0, P] but reduce docs are for [0, P] to [0, P)


crates/prover/src/core/backend/simd/m31.rs line 118 at r1 (raw file):

Previously, shaharsamocha7 wrote…

Consider moving these _mul* implementations to a separated file

What's the perceived benefit? To me these are all functions relating to m31 arithmetic so think they belong in this file. Also I'm generally in favour of larger files - the standard library has many examples of 3,000+ line files.


crates/prover/src/core/backend/simd/m31.rs line 445 at r1 (raw file):

Previously, andrewmilson (Andrew Milson) wrote…

I meant to add a TODO (added now) to test these platforms in the CI.

  • Will make sure target feature neon is enabled on our mac tests.
  • On avx512 machine plan to run tests once each for target features: -avx2,-avx512,-simd128, +avx2,-avx512,-simd128, -avx2,+avx512,-simd128, -avx2,-avx512,+simd128 (wasm target wasm32-wasi)

Can restrict the tests to core::backend::simd

@andrewmilson andrewmilson force-pushed the 04-30-Add_SIMD_backend_arithmetic branch from d8abf68 to 063029c Compare May 2, 2024 19:23
Copy link
Contributor Author

@andrewmilson andrewmilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 9 files reviewed, 3 unresolved discussions (waiting on @shaharsamocha7 and @spapinistarkware)


crates/prover/src/core/backend/simd/utils.rs line 4 at r3 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

Where is this used? Isn't it replaced by interleave and deinterleav()?

It's used by bit_reverse16. LoEvensConcatHiEvens (now removed) was replaced by interleave and deinterleave

@andrewmilson andrewmilson force-pushed the 04-30-Add_SIMD_backend_arithmetic branch from b8a5161 to 92fb94e Compare May 6, 2024 19:16
Copy link
Contributor Author

@andrewmilson andrewmilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 9 files reviewed, 3 unresolved discussions (waiting on @shaharsamocha7 and @spapinistarkware)


crates/prover/src/core/backend/simd/qm31.rs line 16 at r3 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

.

Done.

Copy link
Collaborator

@shaharsamocha7 shaharsamocha7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r4.
Reviewable status: 8 of 9 files reviewed, 4 unresolved discussions (waiting on @andrewmilson and @spapinistarkware)


crates/prover/src/core/backend/simd/cm31.rs line 65 at r4 (raw file):

        let Self([a, b]) = self;
        Self([a.double(), b.double()])
    }

Do we need those?
We don't have them in the AVX512 impl

Code quote:

    /// Interleaves two vectors.
    pub fn interleave(self, other: Self) -> (Self, Self) {
        let Self([a_evens, b_evens]) = self;
        let Self([a_odds, b_odds]) = other;
        let (a_lhs, a_rhs) = a_evens.interleave(a_odds);
        let (b_lhs, b_rhs) = b_evens.interleave(b_odds);
        (Self([a_lhs, b_lhs]), Self([a_rhs, b_rhs]))
    }

    /// Deinterleaves two vectors.
    pub fn deinterleave(self, other: Self) -> (Self, Self) {
        let Self([a_self, b_self]) = self;
        let Self([a_other, b_other]) = other;
        let (a_evens, a_odds) = a_self.deinterleave(a_other);
        let (b_evens, b_odds) = b_self.deinterleave(b_other);
        (Self([a_evens, b_evens]), Self([a_odds, b_odds]))
    }

    /// Doubles each element in the vector.
    pub fn double(self) -> Self {
        let Self([a, b]) = self;
        Self([a.double(), b.double()])
    }

crates/prover/src/core/backend/simd/qm31.rs line 219 at r4 (raw file):

        iter.copied().sum()
    }
}

Why do we need both?

Code quote:

impl Sum for PackedQM31 {
    fn sum<I>(mut iter: I) -> Self
    where
        I: Iterator<Item = Self>,
    {
        let first = iter.next().unwrap_or_else(Self::zero);
        iter.fold(first, |a, b| a + b)
    }
}

impl<'a> Sum<&'a Self> for PackedQM31 {
    fn sum<I>(iter: I) -> Self
    where
        I: Iterator<Item = &'a Self>,
    {
        iter.copied().sum()
    }
}

Copy link
Contributor Author

@andrewmilson andrewmilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 9 files reviewed, 4 unresolved discussions (waiting on @shaharsamocha7 and @spapinistarkware)


crates/prover/src/core/backend/simd/cm31.rs line 65 at r4 (raw file):

Previously, shaharsamocha7 wrote…

Do we need those?
We don't have them in the AVX512 impl

Interleave and deinterleave replace interleave_with and deinterleave_with. I added double since there is a an opportunity to have a more optimal implementation


crates/prover/src/core/backend/simd/qm31.rs line 219 at r4 (raw file):

Previously, shaharsamocha7 wrote…

Why do we need both?

Just quality of life. It's nice to have the ability to just do thing.iter().sum(). Also it's consistent with the std library e.g. implements sum on usize and &usise

Copy link
Collaborator

@shaharsamocha7 shaharsamocha7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 9 files reviewed, 3 unresolved discussions (waiting on @andrewmilson and @spapinistarkware)


crates/prover/src/core/backend/simd/cm31.rs line 65 at r4 (raw file):

Previously, andrewmilson (Andrew Milson) wrote…

Interleave and deinterleave replace interleave_with and deinterleave_with. I added double since there is a an opportunity to have a more optimal implementation

but in avx512 we only have those for base field. Do we also need it for CM31/QM31?

Copy link
Contributor Author

@andrewmilson andrewmilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 9 files reviewed, 3 unresolved discussions (waiting on @shaharsamocha7 and @spapinistarkware)


crates/prover/src/core/backend/simd/cm31.rs line 65 at r4 (raw file):

Previously, shaharsamocha7 wrote…

but in avx512 we only have those for base field. Do we also need it for CM31/QM31?

Yes, not in the code yet but I'm using CM31/QM31 interleave, deinterleave and double for GKR

Copy link
Collaborator

@shaharsamocha7 shaharsamocha7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 3 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @spapinistarkware)

Copy link
Collaborator

@shaharsamocha7 shaharsamocha7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check the specific mul implementations.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @spapinistarkware)

Copy link
Collaborator

@shaharsamocha7 shaharsamocha7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @andrewmilson and @spapinistarkware)


crates/prover/src/core/backend/simd/utils.rs line 6 at r4 (raw file):

pub struct LoLoInterleaveHiLo;

impl<const N: usize> Swizzle<N> for LoLoInterleaveHiLo {

Consider to call those structs as follows:
InterleaveLow
InterleaveHigh
InterleaveEvens
InterleaveOdds

Code quote:

LoLoInterleaveHiLo

Copy link
Contributor Author

@andrewmilson andrewmilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @spapinistarkware)


crates/prover/src/core/backend/simd/utils.rs line 4 at r3 (raw file):

Previously, andrewmilson (Andrew Milson) wrote…

It's used by bit_reverse16. LoEvensConcatHiEvens (now removed) was replaced by interleave and deinterleave

🤦‍♂️ my bad you're right. LoEvensConcatHiEvens was replaced by deinterleave. LoEvensConcatHiEvens LoLoInterleaveHiLo was replaced by interleave. Updated

Copy link
Contributor Author

@andrewmilson andrewmilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 9 files reviewed, 3 unresolved discussions (waiting on @shaharsamocha7 and @spapinistarkware)


crates/prover/src/core/backend/simd/utils.rs line 6 at r4 (raw file):

Previously, shaharsamocha7 wrote…

Consider to call those structs as follows:
InterleaveLow
InterleaveHigh
InterleaveEvens
InterleaveOdds

Done.

@andrewmilson andrewmilson force-pushed the 04-30-Add_SIMD_backend_arithmetic branch 2 times, most recently from d555471 to 7c485c7 Compare May 11, 2024 02:33
Copy link
Collaborator

@shaharsamocha7 shaharsamocha7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r5, all commit messages.
Reviewable status: 8 of 9 files reviewed, 4 unresolved discussions (waiting on @andrewmilson and @spapinistarkware)


crates/prover/src/core/backend/simd/m31.rs line 263 at r6 (raw file):

}

/// Returns `a * b`.

I wonder if this function should be defined that way.
Is b_double defined correctly? (Values of PackedM31 should be in [0,P], no?)

Suggestion:

/// Returns `a * b_double`.

crates/prover/src/core/backend/simd/utils.rs line 17 at r6 (raw file):

}

const fn parity_interleave<const N: usize>(odd: bool) -> [usize; N] {

Consider adding doc test to understand the indices it returns

Code quote:

const fn parity_interleave<const N: usize>(odd: bool) -> [usize; N] {

Copy link
Contributor Author

@andrewmilson andrewmilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 9 files reviewed, 4 unresolved discussions (waiting on @shaharsamocha7 and @spapinistarkware)


crates/prover/src/core/backend/simd/m31.rs line 263 at r6 (raw file):

Previously, shaharsamocha7 wrote…

I wonder if this function should be defined that way.
Is b_double defined correctly? (Values of PackedM31 should be in [0,P], no?)

My bad. Removed


crates/prover/src/core/backend/simd/utils.rs line 17 at r6 (raw file):

Previously, shaharsamocha7 wrote…

Consider adding doc test to understand the indices it returns

I'd like to but these types are private and doc tests require public types.

@andrewmilson andrewmilson force-pushed the 04-30-Add_SIMD_backend_arithmetic branch from 7c485c7 to d7299c5 Compare May 12, 2024 13:48
Copy link
Collaborator

@spapinistarkware spapinistarkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 9 files at r1, 2 of 3 files at r4, 1 of 2 files at r5, 1 of 1 files at r7, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @andrewmilson)

Copy link
Collaborator

@shaharsamocha7 shaharsamocha7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @andrewmilson)

@andrewmilson andrewmilson merged commit 8d61196 into dev May 15, 2024
12 checks passed
@andrewmilson andrewmilson deleted the 04-30-Add_SIMD_backend_arithmetic branch May 15, 2024 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants