Skip to content

Conversation

jianyuh
Copy link
Member

@jianyuh jianyuh commented Dec 11, 2019

Stack from ghstack:

Original commit changeset: d22448b90843

On Skylake T6:

Single Core:
(Note that our benchmark generates batch_size=47 for first case and batch_size=56 for the second case. In spite of that, the vectorized version is still faster than the original reference C version without vectorization.)

  • Before the PR:
native_layer_norm        0.81%            5.884ms          0.81%            5.884ms          122.580us        NaN              0.000us          0.000us          48               [[47, 1, 1024], [1024], [1024]]
  • After the PR:
native_layer_norm        0.68%            5.053ms          0.68%            5.053ms          105.272us        NaN              0.000us          0.000us          48               [[56, 1, 1024], [1024], [1024]]

20 Cores:

  • Before the PR:
native_layer_norm        1.65%            41.682ms         1.65%            41.682ms         868.365us        NaN              0.000us          0.000us          48               [[61, 64, 1024], [1024], [1024]]
  • After the PR:
native_layer_norm        1.34%            33.829ms         1.34%            33.829ms         704.771us        NaN              0.000us          0.000us          48               [[61, 64, 1024], [1024], [1024]]

Differential Revision: D18936428

…on using Vec256"

Original commit changeset: d22448b90843

On Skylake T6:


Single Core:
(Note that our benchmark generates batch_size=47 for first case and batch_size=56 for the second case. In spite of that, the vectorized version is still faster than the original reference C version without vectorization.)
- Before the PR:
```
native_layer_norm        0.81%            5.884ms          0.81%            5.884ms          122.580us        NaN              0.000us          0.000us          48               [[47, 1, 1024], [1024], [1024]]
```

- After the PR:
```
native_layer_norm        0.68%            5.053ms          0.68%            5.053ms          105.272us        NaN              0.000us          0.000us          48               [[56, 1, 1024], [1024], [1024]]
```


20 Cores:
- Before the PR:
```
native_layer_norm        1.65%            41.682ms         1.65%            41.682ms         868.365us        NaN              0.000us          0.000us          48               [[61, 64, 1024], [1024], [1024]]
```


- After the PR:
```
native_layer_norm        1.34%            33.829ms         1.34%            33.829ms         704.771us        NaN              0.000us          0.000us          48               [[61, 64, 1024], [1024], [1024]]
```

Differential Revision: [D18936428](https://our.internmc.facebook.com/intern/diff/D18936428/)

[ghstack-poisoned]
…vectorization using Vec256""

Original commit changeset: d22448b90843

On Skylake T6:


Single Core:
(Note that our benchmark generates batch_size=47 for first case and batch_size=56 for the second case. In spite of that, the vectorized version is still faster than the original reference C version without vectorization.)
- Before the PR:
```
native_layer_norm        0.81%            5.884ms          0.81%            5.884ms          122.580us        NaN              0.000us          0.000us          48               [[47, 1, 1024], [1024], [1024]]
```

- After the PR:
```
native_layer_norm        0.68%            5.053ms          0.68%            5.053ms          105.272us        NaN              0.000us          0.000us          48               [[56, 1, 1024], [1024], [1024]]
```


20 Cores:
- Before the PR:
```
native_layer_norm        1.65%            41.682ms         1.65%            41.682ms         868.365us        NaN              0.000us          0.000us          48               [[61, 64, 1024], [1024], [1024]]
```


- After the PR:
```
native_layer_norm        1.34%            33.829ms         1.34%            33.829ms         704.771us        NaN              0.000us          0.000us          48               [[61, 64, 1024], [1024], [1024]]
```

Differential Revision: [D18936428](https://our.internmc.facebook.com/intern/diff/D18936428/)

[ghstack-poisoned]
@kostmo
Copy link
Member

kostmo commented Dec 12, 2019

CircleCI build failures summary

As of commit a3ad513:

  • 2/2 broken upstream at merge base 679b20b (see grid view)
    • You may want to rebase on the viable/strict branch (see its recency history):
      • If your commit is newer than viable/strict, you can try basing on an older, stable commit:
        git fetch viable/strict
        git rebase --onto viable/strict $(git merge-base origin/master HEAD)
        
      • If your commit is older than viable/strict:
        git fetch viable/strict
        git rebase viable/strict
        
  • 0/2 failures introduced in this PR
  • 1/2 recognized as flaky
    • Re-run these jobs?

Detailed failure analysis

One may explore the probable reasons each build failed interactively on the Dr. CI website.

2 upstream failures recognized by patterns:

These builds matched patterns, but were probably caused by upstream breakages:


This comment was automatically generated by Dr. CI.
Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 3 times.

…vectorization using Vec256""

Original commit changeset: d22448b90843

On Skylake T6:


Single Core:
(Note that our benchmark generates batch_size=47 for first case and batch_size=56 for the second case. In spite of that, the vectorized version is still faster than the original reference C version without vectorization.)
- Before the PR:
```
native_layer_norm        0.81%            5.884ms          0.81%            5.884ms          122.580us        NaN              0.000us          0.000us          48               [[47, 1, 1024], [1024], [1024]]
```

- After the PR:
```
native_layer_norm        0.68%            5.053ms          0.68%            5.053ms          105.272us        NaN              0.000us          0.000us          48               [[56, 1, 1024], [1024], [1024]]
```


20 Cores:
- Before the PR:
```
native_layer_norm        1.65%            41.682ms         1.65%            41.682ms         868.365us        NaN              0.000us          0.000us          48               [[61, 64, 1024], [1024], [1024]]
```


- After the PR:
```
native_layer_norm        1.34%            33.829ms         1.34%            33.829ms         704.771us        NaN              0.000us          0.000us          48               [[61, 64, 1024], [1024], [1024]]
```

Differential Revision: [D18936428](https://our.internmc.facebook.com/intern/diff/D18936428/)

[ghstack-poisoned]
jianyuh added a commit that referenced this pull request Dec 12, 2019
…on using Vec256"

Pull Request resolved: #31127

Original commit changeset: d22448b90843

On Skylake T6:


Single Core:
(Note that our benchmark generates batch_size=47 for first case and batch_size=56 for the second case. In spite of that, the vectorized version is still faster than the original reference C version without vectorization.)
- Before the PR:
```
native_layer_norm        0.81%            5.884ms          0.81%            5.884ms          122.580us        NaN              0.000us          0.000us          48               [[47, 1, 1024], [1024], [1024]]
```

- After the PR:
```
native_layer_norm        0.68%            5.053ms          0.68%            5.053ms          105.272us        NaN              0.000us          0.000us          48               [[56, 1, 1024], [1024], [1024]]
```


20 Cores:
- Before the PR:
```
native_layer_norm        1.65%            41.682ms         1.65%            41.682ms         868.365us        NaN              0.000us          0.000us          48               [[61, 64, 1024], [1024], [1024]]
```


- After the PR:
```
native_layer_norm        1.34%            33.829ms         1.34%            33.829ms         704.771us        NaN              0.000us          0.000us          48               [[61, 64, 1024], [1024], [1024]]
```
ghstack-source-id: 95420889

Differential Revision: [D18936428](https://our.internmc.facebook.com/intern/diff/D18936428/)
@jianyuh jianyuh requested a review from jamesr66a December 12, 2019 06:44
@jianyuh
Copy link
Member Author

jianyuh commented Dec 12, 2019

@jamesr66a : could you re-stamp this PR? It was reverted due to some hypothesis issues (false positive). The original PR is #29104. Thanks!

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 066e3ed.

@facebook-github-bot facebook-github-bot deleted the gh/jianyuh/52/head branch December 16, 2019 15:17
wuhuikx pushed a commit to wuhuikx/pytorch that referenced this pull request Jan 30, 2020
…on using Vec256" (pytorch#31127)

Summary:
Pull Request resolved: pytorch#31127

Original commit changeset: d22448b90843

On Skylake T6:

Single Core:
(Note that our benchmark generates batch_size=47 for first case and batch_size=56 for the second case. In spite of that, the vectorized version is still faster than the original reference C version without vectorization.)
- Before the PR:
```
native_layer_norm        0.81%            5.884ms          0.81%            5.884ms          122.580us        NaN              0.000us          0.000us          48               [[47, 1, 1024], [1024], [1024]]
```

- After the PR:
```
native_layer_norm        0.68%            5.053ms          0.68%            5.053ms          105.272us        NaN              0.000us          0.000us          48               [[56, 1, 1024], [1024], [1024]]
```

20 Cores:
- Before the PR:
```
native_layer_norm        1.65%            41.682ms         1.65%            41.682ms         868.365us        NaN              0.000us          0.000us          48               [[61, 64, 1024], [1024], [1024]]
```

- After the PR:
```
native_layer_norm        1.34%            33.829ms         1.34%            33.829ms         704.771us        NaN              0.000us          0.000us          48               [[61, 64, 1024], [1024], [1024]]
```
ghstack-source-id: 95420889

Test Plan:
buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm"

buck test mode/dev-nosan //caffe2/test:nn -- "test_LayerNorm_1d_no_elementwise_affine_eval"

 python run_test.py -i nn -- TestNN.test_LayerNorm_1d_no_elementwise_affine_eval

Differential Revision: D18936428

fbshipit-source-id: 8cae33d35fb338b5ac49b1597c2709152612d6e5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants