Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize 2x2 Avx2 kernel #1280

Closed
wants to merge 2 commits into from
Closed

Conversation

jiyuanzFB
Copy link
Contributor

Summary:
before:

                      MKL_FP32 m =     2 n =    16 k =   170 Gflops =  17.6311 GBytes =  11.2269
                         FBP_t m =     2 n =    16 k =   170 Gflops =  19.0948 GBytes =  12.1589

                      MKL_FP32 m =     2 n =    16 k =   171 Gflops =  15.8830 GBytes =  10.1126
                         FBP_t m =     2 n =    16 k =   171 Gflops =  19.0033 GBytes =  12.0993

                      MKL_FP32 m =     2 n =    16 k =   172 Gflops =  17.6952 GBytes =  11.2653
                         FBP_t m =     2 n =    16 k =   172 Gflops =  19.2606 GBytes =  12.2618

                      MKL_FP32 m =     2 n =    16 k =   173 Gflops =  17.6115 GBytes =  11.2108
                         FBP_t m =     2 n =    16 k =   173 Gflops =  19.1484 GBytes =  12.1891

                      MKL_FP32 m =     2 n =    16 k =   174 Gflops =  17.8602 GBytes =  11.3679
                         FBP_t m =     2 n =    16 k =   174 Gflops =  16.6249 GBytes =  10.5816

                      MKL_FP32 m =     2 n =    16 k =   175 Gflops =  17.9387 GBytes =  11.4167
                         FBP_t m =     2 n =    16 k =   175 Gflops =  19.1257 GBytes =  12.1721

                      MKL_FP32 m =     2 n =    16 k =   176 Gflops =  17.8279 GBytes =  11.3450
                         FBP_t m =     2 n =    16 k =   176 Gflops =  16.8833 GBytes =  10.7439

                      MKL_FP32 m =     2 n =    16 k =   177 Gflops =  17.9972 GBytes =  11.4516
                         FBP_t m =     2 n =    16 k =   177 Gflops =  16.7458 GBytes =  10.6553

                      MKL_FP32 m =     2 n =    16 k =   178 Gflops =  18.2086 GBytes =  11.5849
                         FBP_t m =     2 n =    16 k =   178 Gflops =  19.2716 GBytes =  12.2613

                      MKL_FP32 m =     2 n =    16 k =   179 Gflops =  18.1097 GBytes =  11.5209
                         FBP_t m =     2 n =    16 k =   179 Gflops =  19.3037 GBytes =  12.2805

after:

                      MKL_FP32 m =     2 n =    16 k =   170 Gflops =  19.0005 GBytes =  12.0988
                         FBP_t m =     2 n =    16 k =   170 Gflops =  23.3374 GBytes =  14.8604

                      MKL_FP32 m =     2 n =    16 k =   171 Gflops =  18.5612 GBytes =  11.8178
                         FBP_t m =     2 n =    16 k =   171 Gflops =  20.3515 GBytes =  12.9577

                      MKL_FP32 m =     2 n =    16 k =   172 Gflops =  18.9043 GBytes =  12.0350
                         FBP_t m =     2 n =    16 k =   172 Gflops =  23.7081 GBytes =  15.0933

                      MKL_FP32 m =     2 n =    16 k =   173 Gflops =  17.0170 GBytes =  10.8323
                         FBP_t m =     2 n =    16 k =   173 Gflops =  23.3654 GBytes =  14.8735

                      MKL_FP32 m =     2 n =    16 k =   174 Gflops =  17.8984 GBytes =  11.3922
                         FBP_t m =     2 n =    16 k =   174 Gflops =  24.1328 GBytes =  15.3604

                      MKL_FP32 m =     2 n =    16 k =   175 Gflops =  18.6496 GBytes =  11.8692
                         FBP_t m =     2 n =    16 k =   175 Gflops =  25.5657 GBytes =  16.2707

                      MKL_FP32 m =     2 n =    16 k =   176 Gflops =  19.1214 GBytes =  12.1682
                         FBP_t m =     2 n =    16 k =   176 Gflops =  23.1066 GBytes =  14.7042

                      MKL_FP32 m =     2 n =    16 k =   177 Gflops =  19.1469 GBytes =  12.1831
                         FBP_t m =     2 n =    16 k =   177 Gflops =  25.6815 GBytes =  16.3411

                      MKL_FP32 m =     2 n =    16 k =   178 Gflops =  20.7594 GBytes =  13.2079
                         FBP_t m =     2 n =    16 k =   178 Gflops =  25.4730 GBytes =  16.2068

                      MKL_FP32 m =     2 n =    16 k =   179 Gflops =  18.8219 GBytes =  11.9740
                         FBP_t m =     2 n =    16 k =   179 Gflops =  25.5965 GBytes =  16.2838

Differential Revision: D39265908

Differential Revision: D39214267

fbshipit-source-id: 601465c847cdd6e13eaceb1be682f8431d66441f
Summary:
before:
```
                      MKL_FP32 m =     2 n =    16 k =   170 Gflops =  17.6311 GBytes =  11.2269
                         FBP_t m =     2 n =    16 k =   170 Gflops =  19.0948 GBytes =  12.1589

                      MKL_FP32 m =     2 n =    16 k =   171 Gflops =  15.8830 GBytes =  10.1126
                         FBP_t m =     2 n =    16 k =   171 Gflops =  19.0033 GBytes =  12.0993

                      MKL_FP32 m =     2 n =    16 k =   172 Gflops =  17.6952 GBytes =  11.2653
                         FBP_t m =     2 n =    16 k =   172 Gflops =  19.2606 GBytes =  12.2618

                      MKL_FP32 m =     2 n =    16 k =   173 Gflops =  17.6115 GBytes =  11.2108
                         FBP_t m =     2 n =    16 k =   173 Gflops =  19.1484 GBytes =  12.1891

                      MKL_FP32 m =     2 n =    16 k =   174 Gflops =  17.8602 GBytes =  11.3679
                         FBP_t m =     2 n =    16 k =   174 Gflops =  16.6249 GBytes =  10.5816

                      MKL_FP32 m =     2 n =    16 k =   175 Gflops =  17.9387 GBytes =  11.4167
                         FBP_t m =     2 n =    16 k =   175 Gflops =  19.1257 GBytes =  12.1721

                      MKL_FP32 m =     2 n =    16 k =   176 Gflops =  17.8279 GBytes =  11.3450
                         FBP_t m =     2 n =    16 k =   176 Gflops =  16.8833 GBytes =  10.7439

                      MKL_FP32 m =     2 n =    16 k =   177 Gflops =  17.9972 GBytes =  11.4516
                         FBP_t m =     2 n =    16 k =   177 Gflops =  16.7458 GBytes =  10.6553

                      MKL_FP32 m =     2 n =    16 k =   178 Gflops =  18.2086 GBytes =  11.5849
                         FBP_t m =     2 n =    16 k =   178 Gflops =  19.2716 GBytes =  12.2613

                      MKL_FP32 m =     2 n =    16 k =   179 Gflops =  18.1097 GBytes =  11.5209
                         FBP_t m =     2 n =    16 k =   179 Gflops =  19.3037 GBytes =  12.2805
```

after:
```
                      MKL_FP32 m =     2 n =    16 k =   170 Gflops =  19.0005 GBytes =  12.0988
                         FBP_t m =     2 n =    16 k =   170 Gflops =  23.3374 GBytes =  14.8604

                      MKL_FP32 m =     2 n =    16 k =   171 Gflops =  18.5612 GBytes =  11.8178
                         FBP_t m =     2 n =    16 k =   171 Gflops =  20.3515 GBytes =  12.9577

                      MKL_FP32 m =     2 n =    16 k =   172 Gflops =  18.9043 GBytes =  12.0350
                         FBP_t m =     2 n =    16 k =   172 Gflops =  23.7081 GBytes =  15.0933

                      MKL_FP32 m =     2 n =    16 k =   173 Gflops =  17.0170 GBytes =  10.8323
                         FBP_t m =     2 n =    16 k =   173 Gflops =  23.3654 GBytes =  14.8735

                      MKL_FP32 m =     2 n =    16 k =   174 Gflops =  17.8984 GBytes =  11.3922
                         FBP_t m =     2 n =    16 k =   174 Gflops =  24.1328 GBytes =  15.3604

                      MKL_FP32 m =     2 n =    16 k =   175 Gflops =  18.6496 GBytes =  11.8692
                         FBP_t m =     2 n =    16 k =   175 Gflops =  25.5657 GBytes =  16.2707

                      MKL_FP32 m =     2 n =    16 k =   176 Gflops =  19.1214 GBytes =  12.1682
                         FBP_t m =     2 n =    16 k =   176 Gflops =  23.1066 GBytes =  14.7042

                      MKL_FP32 m =     2 n =    16 k =   177 Gflops =  19.1469 GBytes =  12.1831
                         FBP_t m =     2 n =    16 k =   177 Gflops =  25.6815 GBytes =  16.3411

                      MKL_FP32 m =     2 n =    16 k =   178 Gflops =  20.7594 GBytes =  13.2079
                         FBP_t m =     2 n =    16 k =   178 Gflops =  25.4730 GBytes =  16.2068

                      MKL_FP32 m =     2 n =    16 k =   179 Gflops =  18.8219 GBytes =  11.9740
                         FBP_t m =     2 n =    16 k =   179 Gflops =  25.5965 GBytes =  16.2838
```

Differential Revision: D39265908

fbshipit-source-id: a3c4f60b0ce9f43610b1da30fe3c2e2161ffcdb2
@netlify
Copy link

netlify bot commented Sep 6, 2022

Deploy Preview for eclectic-stroopwafel-199537 canceled.

Name Link
🔨 Latest commit 4b6f1c9
🔍 Latest deploy log https://app.netlify.com/sites/eclectic-stroopwafel-199537/deploys/6316b688f07e4a0009af95cb

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D39265908

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants