[Operators] Improving fp32 matrix multiplication on x86 CPUs #378

BolinSNLHM · 2023-11-17T03:00:40Z

No description provided.

…sary

yaoyaoding · 2023-11-17T18:48:39Z

Thanks a lot for the contribution! I will review it soon. Before that, could you spend some time on benchmark on the new implementation vs. baselines (e.g., pytorch and tvm).

BolinSNLHM · 2023-11-27T00:20:46Z

The preliminary performance results of the improved version of the matmul operator:

(The results for the 1x1024x1024, as the bars are too low to be visible:
Hidet: 0.22ms; Ansor: 0.04ms; Torch: 0.08ms)

yaoyaoding

Thanks @BolinSNLHM !

It looks good to me.

yaoyaoding · 2023-12-11T21:11:47Z

python/mat_new.py

@@ -0,0 +1,148 @@
+import numpy as np


Remove this file.

yaoyaoding · 2023-12-11T21:12:40Z

python/hidet/graph/ops/matmul/matmul_f32_x86.py

@@ -64,7 +66,7 @@ def __init__(self, a: TensorNode, b: TensorNode):
        )

        super().__init__(
-            name='matmul_f32_x86',
+            name='matmul_f32_x86_v2',


Maybe still use matmul_f32_x86?

deleted redundant file

use the original name

yaoyaoding · 2023-12-14T18:28:10Z

Thanks @BolinSNLHM !

BolinSNLHM added 30 commits May 27, 2023 22:01

.

efe3e14

Merge branch 'hidet-org:main' into main

b19a212

.

d7e4043

Merge branch 'main' of github.com:BolinSNLHM/hidet into main

e13af0a

added basic openMP primitives

a7bce75

Merge branch 'main' into omp

bad483c

added those primitives back

d7f6469

let me pretend like it's all good for tonight

f211a48

...

bbb5afc

working on refactoring

569fb49

ready to be tested on the eco server

b32ea73

fix stupid error

dbbb2b6

..

014f5c1

fix more error

2d82325

..

11c9e70

fixing hidet script error

4586e89

...:

65c3b9d

....

286c107

...

bfacaf8

..

8246466

..

7518042

fixing strange error

f8a97b2

more errors

1a87c27

more err

3104473

...

68bc03d

...

9059ca3

global

df5a177

global var

27da1ba

.

fca3694

.

14973b4

BolinSNLHM added 21 commits August 30, 2023 10:30

.

073266a

..

d736d96

.

83118f3

....

df1cc83

..

a85e56f

kept debugging the matrix mul kernel

728ec9a

bruh

dfdf084

fixed a dumb bug that got me stuck for way too much longer than neces…

d2e1ab4

…sary

.

0c0efe0

remove prints

1bd2cfe

.

6721ed2

..

442fbd2

logic error fix in packing of A

b4e00e9

seems like still bugs, but they disappear with print...

ad9c453

fix bug caused by static local vairable

d34f031

...

954da89

fix alignment

78d09c4

cleanup

838a61e

Merge branch 'fix-zero-init' into main

6f572a4

ready for PR

3fbb635

......

656bbd0

avoid changing function attributes from outside

ebcc78f

yaoyaoding reviewed Dec 11, 2023

View reviewed changes

BolinSNLHM added 3 commits December 11, 2023 20:42

Delete python/mat_new.py

fa39456

deleted redundant file

Update matmul_f32_x86.py

b61722d

use the original name

Merge branch 'hidet-org:main' into main

575acaf

yaoyaoding merged commit 264beec into hidet-org:main Dec 14, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Operators] Improving fp32 matrix multiplication on x86 CPUs #378

[Operators] Improving fp32 matrix multiplication on x86 CPUs #378

BolinSNLHM commented Nov 17, 2023

yaoyaoding commented Nov 17, 2023

BolinSNLHM commented Nov 27, 2023 •

edited

yaoyaoding left a comment

yaoyaoding Dec 11, 2023

yaoyaoding Dec 11, 2023

yaoyaoding commented Dec 14, 2023

[Operators] Improving fp32 matrix multiplication on x86 CPUs #378

[Operators] Improving fp32 matrix multiplication on x86 CPUs #378

Conversation

BolinSNLHM commented Nov 17, 2023

yaoyaoding commented Nov 17, 2023

BolinSNLHM commented Nov 27, 2023 • edited

yaoyaoding left a comment

Choose a reason for hiding this comment

yaoyaoding Dec 11, 2023

Choose a reason for hiding this comment

yaoyaoding Dec 11, 2023

Choose a reason for hiding this comment

yaoyaoding commented Dec 14, 2023

BolinSNLHM commented Nov 27, 2023 •

edited