Add support for AVX-512 VNNI saturating dot products #5807

mcleary · 2021-03-15T11:00:12Z

This commit adds support to Intel VNNI saturating dot product
instructions vpdpbuds and vpdpwssd

This was accomplished by adding a new VectorReduce operation
to perform the saturating_add and exposing a new inline reduction
saturaring_sum. Users can then write

RDom r(0, 4);
f(x) = saturating_sum(i32(0), i16(i8(g(x + r)) * u8(h(x + r))))

bool override_associativity_test = true;
int vector_width = 4;
Var xo, xi;
f.update()
.split(x, xo, xi, vector_width)
.atomic(override_associativity_test)
.vectorize(r)
.vectorize(xi);

To lower the expression into a call to vpdpbuds.

Note that override_associativity_test is set to true or halide will fail
to prove the associativity of the saturating_add operation

Add support for VectorReduce::SaturatingAdd in CodeGen_LLVM

Code is correctly generated when no intrinsic is available to perform
a saturating dot product.

Add vpdpbusds,vpdpwssd tests to simd_op_check

Test if the saturating dot product instructions are being generated
for AVX512_SapphireRapids targets

src/CodeGen_X86.cpp

dsharletg · 2021-03-15T18:06:19Z

src/VectorizeLoops.cpp

@@ -1093,6 +1093,12 @@ class VectorSubs : public IRMutator {
                        reduce_op = VectorReduce::Or;
                    }
                }
+            } else if (const Call *call_op = store->value.as<Call>()) {
+                if (call_op->is_intrinsic(Call::saturating_add)) {


We should probably run find_intrinsics before vectorize_loops in lowering, otherwise this will not work if people write saturating add as a pattern they expect to match rather than using the (currently Halide::Internal) intrinsic.

This is a fairly big change to make, because the simplifier and other lowering passes don't understand saturating_add (yet? @rootjalex @abadams).

That sounds reasonable, I will try to move the find_intrinsics earlier in the pipeline and I will the tests.

@dsharletg I placed the find_intrisics pass right before vectorize_loops, thanks for the suggestion (b880a61)

Tests are passing locally, I will keep an eye on the buildbots

rootjalex · 2021-03-15T18:25:51Z

One of the failures was a compilation failure in Bounds.cpp, on handling the bounds of this new vector op (because this case is not covered). A quick fix would be just adding:

case VectoReduce::SaturatingAdd:
    bounds_of_type(op->value.type());
    break;

(essentially, give up on the bounds of this type).
If we had a saturating_mul, we could probably do something smart along the lines of the VectorReduce::Add's case, but I don't think Halide has that intrinsic.
Alternatively, a can_prove(interval.max < op->type.max() / factor) is probably sufficient for reusing VectorReduce::Add's casework for max (and a similar can_prove() guard for min).

rootjalex · 2021-03-15T18:35:05Z

Looks like Monotonic.cpp has a switch statement on the VectorReduce enum as well that needs to be updated as well. That switch case for SaturatingAdd can be the same for the VectorReduce::Add case, as it is a monotonic reduction as well. (right @dsharletg ?)

rootjalex · 2021-03-15T20:08:24Z

@mcleary The switch statement in Simplify_Exprs.cpp that is missing the SaturatingAdd case could use the saturating_mul implementation from Simplify_Mul.cpp for the bounds->min *= factor; and bounds->max *= factor; expressions that it will need.

mcleary · 2021-03-16T09:42:22Z

@mcleary The switch statement in Simplify_Exprs.cpp that is missing the SaturatingAdd case could use the saturating_mul implementation from Simplify_Mul.cpp for the bounds->min *= factor; and bounds->max *= factor; expressions that it will need.

Thanks for the feedback. I will make sure to address this issue today.

Could you give me any hints on what should be done, if anything, with the second switch (op->op) in Simplify_Exprs.cpp? I thought compilation would also fail for that switch since it doesn't handle VectorReduce::Mul for example

https://github.com/halide/Halide/blob/master/src/Simplify_Exprs.cpp#L116

mcleary · 2021-03-16T10:12:32Z

Looks like Monotonic.cpp has a switch statement on the VectorReduce enum as well that needs to be updated as well. That switch case for SaturatingAdd can be the same for the VectorReduce::Add case, as it is a monotonic reduction as well. (right @dsharletg ?)

Thanks for pointing out this one. I will add the SaturatingAdd case to be the same as Add since they are indeed the same.

rootjalex · 2021-03-16T13:42:53Z

Could you give me any hints on what should be done, if anything, with the second switch (op->op) in Simplify_Exprs.cpp? I thought compilation would also fail for that switch since it doesn't handle VectorReduce::Mul for example

https://github.com/halide/Halide/blob/master/src/Simplify_Exprs.cpp#L116

The second switch statement has a default case that covers both VectorReduce::Mul (and now VectorReduce::SaturatingAdd, that doesn't attempt to apply any simplifier rules, so it can probably be left as is until simplifier rules for this operation are either thought up or generated.

I am surprised that we don't have any VectorReduce::Mul rules though.

rootjalex · 2021-03-16T13:45:23Z

One of the failures was a compilation failure in Bounds.cpp, on handling the bounds of this new vector op (because this case is not covered). A quick fix would be just adding:
case VectoReduce::SaturatingAdd:
    bounds_of_type(op->value.type());
    break;
(essentially, give up on the bounds of this type).
If we had a saturating_mul, we could probably do something smart along the lines of the VectorReduce::Add's case, but I don't think Halide has that intrinsic.
Alternatively, a can_prove(interval.max < op->type.max() / factor) is probably sufficient for reusing VectorReduce::Add's casework for max (and a similar can_prove() guard for min).

@mcleary I spoke to @abadams about this yesterday - he agreed that this switch statement:

Halide/src/Bounds.cpp

Line 1576 in 5aa1a65

switch (op->op) {

Should simply give up for now:

case VectoReduce::SaturatingAdd:
    bounds_of_type(op->value.type());
    break;

mcleary · 2021-03-16T14:57:51Z

One of the failures was a compilation failure in Bounds.cpp, on handling the bounds of this new vector op (because this case is not covered). A quick fix would be just adding:
case VectoReduce::SaturatingAdd:
    bounds_of_type(op->value.type());
    break;
(essentially, give up on the bounds of this type).
If we had a saturating_mul, we could probably do something smart along the lines of the VectorReduce::Add's case, but I don't think Halide has that intrinsic.
Alternatively, a can_prove(interval.max < op->type.max() / factor) is probably sufficient for reusing VectorReduce::Add's casework for max (and a similar can_prove() guard for min).
@mcleary I spoke to @abadams about this yesterday - he agreed that this switch statement:

Halide/src/Bounds.cpp

Line 1576 in 5aa1a65

switch (op->op) {

Should simply give up for now:
case VectoReduce::SaturatingAdd:
    bounds_of_type(op->value.type());
    break;

That sounds good, to avoid repeating code I just left the SaturatingAdd case fall through VectorReduce::Mul, since they are going to do the same thing (4a0074a). If you think that making the SaturatingAdd case explicit is better I can change it.

mcleary · 2021-03-16T15:00:54Z

@mcleary The switch statement in Simplify_Exprs.cpp that is missing the SaturatingAdd case could use the saturating_mul implementation from Simplify_Mul.cpp for the bounds->min *= factor; and bounds->max *= factor; expressions that it will need.

I moved the saturating_mul function from Simplify_Mul.cpp to Simplify_Internal.h to make it visible by Simplify_Expr.cpp (d220cca). I just inlined the function but let me know if you want it to be declared in the header or placed somewhere else.

rootjalex · 2021-03-16T17:09:52Z

That sounds good, to avoid repeating code I just left the SaturatingAdd case fall through VectorReduce::Mul, since they are going to do the same thing (4a0074a). If you think that making the SaturatingAdd case explicit is better I can change it.

What you did works fine, no need to change it

rootjalex · 2021-03-16T17:14:21Z

@mcleary The switch statement in Simplify_Exprs.cpp that is missing the SaturatingAdd case could use the saturating_mul implementation from Simplify_Mul.cpp for the bounds->min *= factor; and bounds->max *= factor; expressions that it will need.

I moved the saturating_mul function from Simplify_Mul.cpp to Simplify_Internal.h to make it visible by Simplify_Expr.cpp (d220cca). I just inlined the function but let me know if you want it to be declared in the header or placed somewhere else.

Hmm, I would probably not inline it, but I don't know where best to put it. @abadams or @dsharletg might have opinions on that?

mcleary · 2021-03-16T18:36:35Z

The buildbot/halide-testbranch-main-llvm11-x86-64-linux-cmake is failing the test correctness_async_device_copy, which I don't think it is related to my code changes since the test pass with the host-opencl target for me locally. Could this be a random failure?

steven-johnson · 2021-03-16T18:44:22Z

The buildbot/halide-testbranch-main-llvm11-x86-64-linux-cmake is failing the test correctness_async_device_copy, which I don't think it is related to my code changes since the test pass with the host-opencl target for me locally. Could this be a random failure?

Yeah, this seems to be unrelated, I'm trying to track down the cause, but I think you can ignore this specific failure for now.

dsharletg · 2021-03-17T17:28:42Z

src/Lower.cpp

@@ -332,6 +332,11 @@ Module lower(const vector<Function> &output_funcs,
    debug(2) << "Lowering after unrolling:\n"
             << s << "\n\n";

+    debug(1) << "Finding intrinsics...\n";


Nit: I think this should be above unrolling, just because unrolling and vectorizing are kind of similar, and finding intrinsics should be irrelevant for unrolling (if anything it might make the IR smaller before unrolling, which is kind of nice I guess).

Thanks for the suggestion. I noticed a failure in the simd_op_check when I ran it with the Sapphire Rapids target enabled after moving the find_intrinsics pass. I will investigate this today and report back.

When I ran the tests in #5807 (comment) I forgot to enable the correct target to trigger the generation of the new instructions. After some experimentation I found this.

This first block is supposed to be Lower.cpp without modifications, so we have unrolling, vectorize and find_intrinsics. As you can see, find_intrinsics is able to replace the multiplication with a widening_mul and this is pattern expected in CodeGen_X86

Find Instrinsics After Vectorize Loops

// Unrolling f[(f.s1.x.v1*16) + f.s1.x.v2] = let t7.s = int16((((((f.s1.x.v1*16) + f.min.0) + f.s1.x.v2)*2) + f.s1.rdom$x)) in saturating_add(f[(f.s1.x.v1*16) + f.s1.x.v2], int32(t7.s)*int32(t7.s)) // Vectorize f[ramp(f.s1.x.v1*16, 1, 16) aligned(16, 0)] = (int32x16)saturating_add(f[ramp(f.s1.x.v1*16, 1, 16) aligned(16, 0)], (int32x16)vector_reduce(SaturatingAdd, (int32x32((int16x32)t7.s.widened.f.s1.rdom$x)*int32x32((int16x32)t7.s.widened.f.s1.rdom$x)))) // Find Intrinsics f[ramp(f.s1.x.v1*16, 1, 16) aligned(16, 0)] = (int32x16)saturating_add(f[ramp(f.s1.x.v1*16, 1, 16) aligned(16, 0)], (int32x16)vector_reduce(SaturatingAdd, (int32x32)widening_mul((int16x32)t7.s.widened.f.s1.rdom$x, (int16x32)t7.s.widened.f.s1.rdom$x)))

When I placed find_intrisics before vectorize/unrolling I don't get the widening_mul in the expression, which won't match the expected pattern, thus, simd_op_check will fail for the expression

check("vpdpwssds*zmm", 16, saturating_sum(i32(0), i32(in_i16(2 * x + r)) * in_i16(2 * x + r + 32)));

Find Intrinsics Before Vectorize Loops

// Unrolling f[(f.s1.x.v1*16) + f.s1.x.v2] = let t7.s = int16((((((f.s1.x.v1*16) + f.min.0) + f.s1.x.v2)*2) + f.s1.rdom$x)) in saturating_add(f[(f.s1.x.v1*16) + f.s1.x.v2], int32(t7.s)*int32(t7.s)) // Find Intrisincs f[(f.s1.x.v1*16) + f.s1.x.v2] = let t7.s = int16((((((f.s1.x.v1*16) + f.min.0) + f.s1.x.v2)*2) + f.s1.rdom$x)) in saturating_add(f[(f.s1.x.v1*16) + f.s1.x.v2], int32(t7.s)*int32(t7.s)) // Vectorize let t7.s.widened.f.s1.rdom$x = int16x32(ramp(((f.s1.x.v1*16) + f.min.0)*2, 1, 32)) f[ramp(f.s1.x.v1*16, 1, 16) aligned(16, 0)] = (int32x16)saturating_add(f[ramp(f.s1.x.v1*16, 1, 16) aligned(16, 0)], (int32x16)vector_reduce(SaturatingAdd, (int32x32((int16x32)t7.s.widened.f.s1.rdom$x)*int32x32((int16x32)t7.s.widened.f.s1.rdom$x))))

A quick solution is to repeat the find_intrisics pass after vectorize_loops, but I was wondering if you could give me any hints on what to look to fix this and keep the find_intrisics before unrolling.

Another possible solution is to add more overloads to the list of patterns, something like

{VectorReduce::SaturatingAdd, 2, i32(wild_i16x_) * i32(wild_i16x_), "saturating_dot_product", {}, Pattern::CombineInit}, {VectorReduce::SaturatingAdd, 2, i32(widening_mul(wild_i16x_, wild_i16x_)), "saturating_dot_product", {}, Pattern::CombineInit},

to cover the cases where a widening_mul is not present, or even replace the widening_mul by the explicit multiplication.

As I suspected, just moving the find_intrinsics pass will break the tests, not only for the newly added saturating dot product, but for other intrinsics as well.

https://buildbot.halide-lang.org/master/#/builders/53/builds/115/steps/12/logs/stdio

I guess both of my proposed solutions work but I would like to know your thoughts as well @dsharletg

It looks like VectorizeLoops.cpp looks for particular expressions in quite a few places that would also need to look for intrinsics, for example here: https://github.com/halide/Halide/blob/master/src/VectorizeLoops.cpp#L1067

I don't think we should add new overloads to the list of patterns. That's basically just replicating the logic in find_intrinsics, which is non-trivial.

@dsharletg I rebased and reverted the last two commits to avoid the failure in the tests. I understand your reasoning for moving the find_instrinsics pass earlier in the pipeline but that doesn't seem to work well.

This commit adds support to Intel VNNI saturating dot product instructions vpdpbuds and vpdpwssd This was accomplished by adding a new VectorReduce operation to perform the saturating_add and exposing a new inline reduction saturaring_sum. Users can then write RDom r(0, 4); f(x) = saturating_sum(i32(0), i16(i8(g(x + r)) * u8(h(x + r)))) bool override_associativity_test = true; int vector_width = 4; Var xo, xi; f.update() .split(x, xo, xi, vector_width) .atomic(override_associativity_test) .vectorize(r) .vectorize(xi); To lower the expression into a call to vpdpbuds. Note that override_associativity_test is set to true or halide will fail to prove the associativity of the saturating_add operation Add support for VectorReduce::SaturatingAdd in CodeGen_LLVM Code is correctly generated when no intrinsic is available to perform a saturating dot product. Add vpdpbusds,vpdpwssd tests to simd_op_check Test if the saturating dot product instructions are being generated for AVX512_SapphireRapids targets

…ng_sum

…ify_Exprs.cpp

dsharletg

I can't reply to the existing thread about where we do find_intrinsics, but I'm fine with leaving this as-is for now, despite missing patterns for saturating add for vector reductions. I just have a few nits about the new saturating_sum.

dsharletg · 2021-03-22T10:16:19Z

src/InlineReductions.h

@@ -36,6 +36,7 @@ namespace Halide {
 */
 //@{
 Expr sum(Expr, const std::string &s = "sum");
+Expr saturating_sum(const Expr &init_val, Expr e, const std::string &s = "saturating_sum");


Does this need an init_val in some way that the other inline reductions don't? I think it would be better if this was consistent with the other inline reductions.

Thinking about it you are right. init_val is not required, I will revert it. When I first wrote this function I had the expression saturating_sum(u8(f(x)) * i8(f(x)) in mind. When using groups of 4 8 bit integers and a 32bit accumulator you can never saturate the result unless init_val is something close to INT_MAX but I didn't consider the general case where this won't map directly to the vnni instructinos. I guess this ties in with my question #5825, since inline reductions are basically a shortcut for some common expression patterns.

init_val removed in a16d661

dsharletg · 2021-03-22T10:17:08Z

test/correctness/simd_op_check.h

@@ -208,7 +210,7 @@ class SimdOpCheckTest {
            g.compute_at(f, x)
                .update()
                .split(x, xo, xi, vector_width)
-                .atomic()
+                .atomic(has_inline_reduction.override_associativity_test)


I would just make this unconditionally true, rather than check for saturating_sum.

Thinking about this more, this does indicate that saturating dot products are kind of a problem... are they actually useful? It seems like only an unsigned saturating sum would be useful, a signed saturating sum is going to do weird things when there are mixes of positive and negative terms in the summation. (But this instruction set has only signed saturating sums!)

The result could saturate at the max and add a bunch of positive terms, and then a bunch of negative terms, then the result will be something far from the max, but that's incorrect if the positive terms are bigger than the negative terms.

My understanding was that it's only unsigned saturating dot products that are really useful. signed ones aren't even associative!

From my understanding of the operation, at the instruction level, the saturation part occurs at the end, after all the results are multiplied to each other, so the associativity of the summation terms is not really a problem. A signing saturating sum is indeed not associative, but with the ability to override the associativity test makes it explicit that you know what you are doing.

I didn't want to set it to always be true to avoid interfering with existing tests.

My understanding was that it's only unsigned saturating dot products that are really useful. signed ones aren't even associative!

I don't know much about the field but saturating dot products, both signed and unsigned, are used in VoIP applications, but you are right, they are not associative

atomic(true) won't break anything that atomic() didn't. Basically setting that to always true reduces coverage of atomic()'s associativity test, but IMO it's better to just accept that (very slight) loss in coverage to avoid the extra logic/possibly brittle test behavior.

Regarding associativity: I agree if this instruction were only a 4-way dot product, the saturation is uninteresting. But this is an accumulating dot product instruction, so it can be used in an arbitrarily large loop, and presumably it would be in ML applications (I assume the reason this instruction is being added). In that case, the saturation is hard to reason about.

atomic(true) won't break anything that atomic() didn't. Basically setting that to always true reduces coverage of atomic()'s associativity test, but IMO it's better to just accept that (very slight) loss in coverage to avoid the extra logic/possibly brittle test behavior.

That sounds reasonable, I've set it to true in a1d2685

…tests

mcleary · 2021-03-23T18:15:27Z

@mcleary The switch statement in Simplify_Exprs.cpp that is missing the SaturatingAdd case could use the saturating_mul implementation from Simplify_Mul.cpp for the bounds->min *= factor; and bounds->max *= factor; expressions that it will need.

I moved the saturating_mul function from Simplify_Mul.cpp to Simplify_Internal.h to make it visible by Simplify_Expr.cpp (d220cca). I just inlined the function but let me know if you want it to be declared in the header or placed somewhere else.

Hmm, I would probably not inline it, but I don't know where best to put it. @abadams or @dsharletg might have opinions on that?

Are there any other thoughts on this? I'm happy to change it if needed

steven-johnson · 2021-03-24T16:39:31Z

The OSX failure is unrelated (will be fixed by #5841), should be good to land

mcleary · 2021-03-24T16:43:24Z

The OSX failure is unrelated (will be fixed by #5841), should be good to land

That's excellent. I will merge master here to make sure everything is ok.

dsharletg · 2021-03-29T18:30:11Z

src/Simplify_Internal.h

@@ -28,6 +28,20 @@
 namespace Halide {
 namespace Internal {

+namespace {


I think this should not be in an anonymous namespace. This will cause the linker to put multiple copies of this function in the compiled library (normally, functions with inline linkage are not duplicated, except when they are inlined of course).

Do you prefer to have the function defined in some translation unit? Because the way the Simplify pass is split I'm not sure where to put it.

I think inline is fine (if it had to go somewhere, I guess Simplify.cpp is the right place).

dsharletg · 2021-03-29T18:31:42Z

BTW, I've added a use case of this in the interpret_nn branch: https://github.com/halide/Halide/blob/interpret_nn/apps/hannk/halide/convolution_generator.cpp#L78. I haven't tested it so I have no idea if it works or is fast... but it would be an interesting thing to try :)

edit: I guess this PR is specifically for saturating dot products, which that doesn't use, it's using the previous VNNI dot product support.

mcleary · 2021-03-30T08:41:07Z

BTW, I've added a use case of this in the interpret_nn branch: https://github.com/halide/Halide/blob/interpret_nn/apps/hannk/halide/convolution_generator.cpp#L78. I haven't tested it so I have no idea if it works or is fast... but it would be an interesting thing to try :)

edit: I guess this PR is specifically for saturating dot products, which that doesn't use, it's using the previous VNNI dot product support.

I will give it a try today. Thanks for pointing it out.

mcleary · 2021-03-30T16:02:25Z

BTW, I've added a use case of this in the interpret_nn branch: https://github.com/halide/Halide/blob/interpret_nn/apps/hannk/halide/convolution_generator.cpp#L78. I haven't tested it so I have no idea if it works or is fast... but it would be an interesting thing to try :)

edit: I guess this PR is specifically for saturating dot products, which that doesn't use, it's using the previous VNNI dot product support.

This might be a little bit out of scope, but how do I run this app? I built everything using Make but I can't see an executable. When I tried the scripts it was asking for adb. Does it only works on Android?

dsharletg · 2021-03-30T18:58:49Z

It should work on x86 too. Here's how I run it:

cd apps/hannk
HL_TARGET=host make bin/host/benchmark -j8
bin/host/benchmark ~/mobilenet_v2_1.0_224_quant.tflite

Where that .tflite file is just the mobilenet v2 downloaded from https://www.tensorflow.org/lite/guide/hosted_models.

To try using the VNNI dot products, it should just require the right target flag added to HL_TARGET (and a machine that can run it).

The branch is in a rough WIP state, so the documentation and functionality is spotty :)

rootjalex reviewed Mar 15, 2021

View reviewed changes

src/CodeGen_X86.cpp Outdated Show resolved Hide resolved

rootjalex requested a review from halidebuildbots March 15, 2021 17:35

dsharletg reviewed Mar 15, 2021

View reviewed changes

dsharletg reviewed Mar 17, 2021

View reviewed changes

Thales Sabino added 8 commits March 22, 2021 09:35

Improve code according to report from clang-tidy

1244498

Make init_val a const ref since it only used that way inside saturati…

d4697c7

…ng_sum

clang-format

1fbaccf

Revert removal of clang-format tag in CodeGen_X86.cpp

0555509

Add SaturatingAdd case Monotonic VectorReduce visit

86afec8

Bail out in Bounds when dealing with a SaturatingAdd VectorReduce

d441f76

Move saturating_mul to Simplify_Internal.h so it can be used in Simpl…

872c5a2

…ify_Exprs.cpp

mcleary force-pushed the saturating_dot_products branch from ddd00ad to 872c5a2 Compare March 22, 2021 09:38

dsharletg reviewed Mar 22, 2021

View reviewed changes

Thales Sabino added 2 commits March 23, 2021 10:10

Remove init_val from the saturating_sum inline reduction

a16d661

Unconditionally override the associativity test in the simd_op_check …

a1d2685

…tests

dsharletg approved these changes Mar 23, 2021

View reviewed changes

Merge branch 'master' into saturating_dot_products

805def1

Merge branch 'master' into saturating_dot_products

78afd79

rootjalex approved these changes Mar 24, 2021

View reviewed changes

dsharletg reviewed Mar 29, 2021

View reviewed changes

Thales Sabino added 2 commits March 30, 2021 09:46

Merge branch 'master' into saturating_dot_products

288d2cc

Remove annonymous namespace from saturating_mul utility

57b7da6

dsharletg merged commit 2dd7a6b into halide:master Mar 30, 2021

mcleary deleted the saturating_dot_products branch April 7, 2021 09:27

abadams added the release_notes For changes that may warrant a note in README for official releases. label Apr 20, 2021

alexreinking added this to the v12.0.0 milestone May 19, 2021

alexreinking removed the release_notes For changes that may warrant a note in README for official releases. label Apr 6, 2022

Add support for AVX-512 VNNI saturating dot products #5807

Add support for AVX-512 VNNI saturating dot products #5807

Conversation

mcleary commented Mar 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcleary Mar 17, 2021 • edited Loading

Choose a reason for hiding this comment

rootjalex commented Mar 15, 2021 • edited Loading

rootjalex commented Mar 15, 2021 • edited Loading

rootjalex commented Mar 15, 2021

mcleary commented Mar 16, 2021 • edited Loading

mcleary commented Mar 16, 2021

rootjalex commented Mar 16, 2021

rootjalex commented Mar 16, 2021

mcleary commented Mar 16, 2021

mcleary commented Mar 16, 2021 • edited Loading

rootjalex commented Mar 16, 2021

rootjalex commented Mar 16, 2021

mcleary commented Mar 16, 2021

steven-johnson commented Mar 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcleary Mar 18, 2021 • edited Loading

Choose a reason for hiding this comment

Find Instrinsics After Vectorize Loops

Find Intrinsics Before Vectorize Loops

mcleary Mar 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsharletg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcleary Mar 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abadams Mar 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcleary commented Mar 23, 2021

steven-johnson commented Mar 24, 2021

mcleary commented Mar 24, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsharletg commented Mar 29, 2021 • edited Loading

mcleary commented Mar 30, 2021

mcleary commented Mar 30, 2021

dsharletg commented Mar 30, 2021

mcleary Mar 17, 2021 •

edited

Loading

rootjalex commented Mar 15, 2021 •

edited

Loading

rootjalex commented Mar 15, 2021 •

edited

Loading

mcleary commented Mar 16, 2021 •

edited

Loading

mcleary commented Mar 16, 2021 •

edited

Loading

mcleary Mar 18, 2021 •

edited

Loading

mcleary Mar 18, 2021 •

edited

Loading

mcleary Mar 23, 2021 •

edited

Loading

abadams Mar 22, 2021 •

edited

Loading

dsharletg commented Mar 29, 2021 •

edited

Loading