fix segfault for EmbeddingBag on CPU slow path when include_last_offset is true #90358

mingfeima · 2022-12-07T04:52:34Z

Stack from ghstack:

-> fix segfault for EmbeddingBag on CPU slow path when include_last_offset is true #90358

This PR is to fix the segfault reported at #89677, this is a double free issue caused by invalid read.

The reported issue broke at slow path for EmbeddingBag on float32, at EmbeddingBag.cpp#L451

Root cause is that add_indices has index which exceeds range of output_data, for the reported case.

The offsets are given as

{0,  6, 12, 15, 25, 32, 40, 42, 46, 53, 53}

The indices has 55 elements and offsets[-1] != indices.size(0).

When include_last_offset is true, the output will be in the shape of {offsets.size(0) - 1, weight.sizes()[1]}, which will be {10, 5}.
Originally, add_indices will be (i re-arange the 1D tensor by rows, so here 10 rows in total)

### this is 55 elements
  0 0 0 0 0 0
  1 1 1 1 1 1
  2 2 2
  3 3 3 3 3 3 3 3 3 3
  4 4 4 4 4 4 4
  5 5 5 5 5 5 5 5
  6 6
  7 7 7 7
  8 8 8 8 8 8 8
  10 10

The last row has index of 10 which is out of range of output tensor whose size is [10, 5].

The reason is make_offset2bag at EmbeddingBag.cpp#L66 would give the following offset2bag:

### this is 55 + 1 elements:
0 0 0 0 0 0 1
0 0 0 0 0 1
0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 0 0 0 0 0 0 1
0 1
0 0 0 1
0 0 0 0 0 0 2
0 0

Notice for index 53, it is added twice.

The fix is ignore the last index from offsets when include_last_offset is true, also this behavior aligns with CUDA, quote from #57208 (comment)

cc @jgong5 @XiaobingSuper @sanchitintel @ashokei @jingxu10

…et is true [ghstack-poisoned]

pytorch-bot · 2022-12-07T04:52:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90358

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b6e030e:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…et is true ghstack-source-id: 0e9efb9 Pull Request resolved: #90358

aten/src/ATen/native/EmbeddingBag.cpp

@XiaobingSuper

…e_last_offset is true" This PR is to fix the segfault reported at #89677, this is a `double free` issue caused by `invalid read`. The reported issue broke at slow path for `EmbeddingBag` on float32, at [EmbeddingBag.cpp#L451](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L451) Root cause is that `add_indices` has index which exceeds range of `output_data`, for the reported case. The offsets are given as ``` {0, 6, 12, 15, 25, 32, 40, 42, 46, 53, 53} ``` The `indices` has 55 elements and `offsets[-1] != indices.size(0)`. When `include_last_offset` is true, the `output` will be in the shape of {offsets.size(0) - 1, weight.sizes()[1]}, which will be {10, 5}. Originally, `add_indices` will be (i re-arange the 1D tensor by rows, so here 10 rows in total) ``` ### this is 55 elements 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 6 6 7 7 7 7 8 8 8 8 8 8 8 10 10 ``` The last row has index of 10 which is out of range of output tensor whose size is [10, 5]. The reason is `make_offset2bag` at [EmbeddingBag.cpp#L66](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L66) would give the following `offset2bag`: ``` ### this is 55 + 1 elements: 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 2 0 0 ``` Notice for index 53, it is added twice. The fix is ignore the last index from `offsets` when `include_last_offset` is true, also this behavior aligns with CUDA, quote from #57208 (comment) cc jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

@XiaobingSuper

…e_last_offset is true" This PR is to fix the segfault reported at #89677, this is a `double free` issue caused by `invalid read`. The reported issue broke at slow path for `EmbeddingBag` on float32, at [EmbeddingBag.cpp#L451](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L451) Root cause is that `add_indices` has index which exceeds range of `output_data`, for the reported case. The offsets are given as ``` {0, 6, 12, 15, 25, 32, 40, 42, 46, 53, 53} ``` The `indices` has 55 elements and `offsets[-1] != indices.size(0)`. When `include_last_offset` is true, the `output` will be in the shape of {offsets.size(0) - 1, weight.sizes()[1]}, which will be {10, 5}. Originally, `add_indices` will be (i re-arange the 1D tensor by rows, so here 10 rows in total) ``` ### this is 55 elements 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 6 6 7 7 7 7 8 8 8 8 8 8 8 10 10 ``` The last row has index of 10 which is out of range of output tensor whose size is [10, 5]. The reason is `make_offset2bag` at [EmbeddingBag.cpp#L66](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L66) would give the following `offset2bag`: ``` ### this is 55 + 1 elements: 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 2 0 0 ``` Notice for index 53, it is added twice. The fix is ignore the last index from `offsets` when `include_last_offset` is true, also this behavior aligns with CUDA, quote from #57208 (comment) cc jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

ezyang · 2022-12-13T12:31:03Z

I'm not sure the new code is better 😂 It's a lot more complicated but still written fairly naively

aten/src/ATen/native/EmbeddingBag.cpp

@XiaobingSuper

…e_last_offset is true" This PR is to fix the segfault reported at #89677, this is a `double free` issue caused by `invalid read`. The reported issue broke at slow path for `EmbeddingBag` on float32, at [EmbeddingBag.cpp#L451](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L451) Root cause is that `add_indices` has index which exceeds range of `output_data`, for the reported case. The offsets are given as ``` {0, 6, 12, 15, 25, 32, 40, 42, 46, 53, 53} ``` The `indices` has 55 elements and `offsets[-1] != indices.size(0)`. When `include_last_offset` is true, the `output` will be in the shape of {offsets.size(0) - 1, weight.sizes()[1]}, which will be {10, 5}. Originally, `add_indices` will be (i re-arange the 1D tensor by rows, so here 10 rows in total) ``` ### this is 55 elements 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 6 6 7 7 7 7 8 8 8 8 8 8 8 10 10 ``` The last row has index of 10 which is out of range of output tensor whose size is [10, 5]. The reason is `make_offset2bag` at [EmbeddingBag.cpp#L66](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L66) would give the following `offset2bag`: ``` ### this is 55 + 1 elements: 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 2 0 0 ``` Notice for index 53, it is added twice. The fix is ignore the last index from `offsets` when `include_last_offset` is true, also this behavior aligns with CUDA, quote from #57208 (comment) cc jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

…et is true ghstack-source-id: a15f19e Pull Request resolved: #90358

ezyang

ok, this is much better. Can we have a test though?

@XiaobingSuper

…e_last_offset is true" This PR is to fix the segfault reported at #89677, this is a `double free` issue caused by `invalid read`. The reported issue broke at slow path for `EmbeddingBag` on float32, at [EmbeddingBag.cpp#L451](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L451) Root cause is that `add_indices` has index which exceeds range of `output_data`, for the reported case. The offsets are given as ``` {0, 6, 12, 15, 25, 32, 40, 42, 46, 53, 53} ``` The `indices` has 55 elements and `offsets[-1] != indices.size(0)`. When `include_last_offset` is true, the `output` will be in the shape of {offsets.size(0) - 1, weight.sizes()[1]}, which will be {10, 5}. Originally, `add_indices` will be (i re-arange the 1D tensor by rows, so here 10 rows in total) ``` ### this is 55 elements 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 6 6 7 7 7 7 8 8 8 8 8 8 8 10 10 ``` The last row has index of 10 which is out of range of output tensor whose size is [10, 5]. The reason is `make_offset2bag` at [EmbeddingBag.cpp#L66](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L66) would give the following `offset2bag`: ``` ### this is 55 + 1 elements: 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 2 0 0 ``` Notice for index 53, it is added twice. The fix is ignore the last index from `offsets` when `include_last_offset` is true, also this behavior aligns with CUDA, quote from #57208 (comment) cc jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

@XiaobingSuper

…e_last_offset is true" This PR is to fix the segfault reported at #89677, this is a `double free` issue caused by `invalid read`. The reported issue broke at slow path for `EmbeddingBag` on float32, at [EmbeddingBag.cpp#L451](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L451) Root cause is that `add_indices` has index which exceeds range of `output_data`, for the reported case. The offsets are given as ``` {0, 6, 12, 15, 25, 32, 40, 42, 46, 53, 53} ``` The `indices` has 55 elements and `offsets[-1] != indices.size(0)`. When `include_last_offset` is true, the `output` will be in the shape of {offsets.size(0) - 1, weight.sizes()[1]}, which will be {10, 5}. Originally, `add_indices` will be (i re-arange the 1D tensor by rows, so here 10 rows in total) ``` ### this is 55 elements 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 6 6 7 7 7 7 8 8 8 8 8 8 8 10 10 ``` The last row has index of 10 which is out of range of output tensor whose size is [10, 5]. The reason is `make_offset2bag` at [EmbeddingBag.cpp#L66](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L66) would give the following `offset2bag`: ``` ### this is 55 + 1 elements: 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 2 0 0 ``` Notice for index 53, it is added twice. The fix is ignore the last index from `offsets` when `include_last_offset` is true, also this behavior aligns with CUDA, quote from #57208 (comment) cc jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

@XiaobingSuper

…e_last_offset is true" This PR is to fix the segfault reported at #89677, this is a `double free` issue caused by `invalid read`. The reported issue broke at slow path for `EmbeddingBag` on float32, at [EmbeddingBag.cpp#L451](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L451) Root cause is that `add_indices` has index which exceeds range of `output_data`, for the reported case. The offsets are given as ``` {0, 6, 12, 15, 25, 32, 40, 42, 46, 53, 53} ``` The `indices` has 55 elements and `offsets[-1] != indices.size(0)`. When `include_last_offset` is true, the `output` will be in the shape of {offsets.size(0) - 1, weight.sizes()[1]}, which will be {10, 5}. Originally, `add_indices` will be (i re-arange the 1D tensor by rows, so here 10 rows in total) ``` ### this is 55 elements 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 6 6 7 7 7 7 8 8 8 8 8 8 8 10 10 ``` The last row has index of 10 which is out of range of output tensor whose size is [10, 5]. The reason is `make_offset2bag` at [EmbeddingBag.cpp#L66](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L66) would give the following `offset2bag`: ``` ### this is 55 + 1 elements: 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 2 0 0 ``` Notice for index 53, it is added twice. The fix is ignore the last index from `offsets` when `include_last_offset` is true, also this behavior aligns with CUDA, quote from #57208 (comment) cc jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

…et is true ghstack-source-id: 7b8d830 Pull Request resolved: #90358

mingfeima · 2022-12-15T00:01:01Z

ok, this is much better. Can we have a test though?

Test case updated! Add a minimal case which would trigger 'double free' issue without this fix.

mingfeima · 2022-12-16T02:06:28Z

@pytorchbot merge

pytorchmergebot · 2022-12-16T02:08:10Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

fix segfault for EmbeddingBag on CPU slow path when include_last_offs…

ed23f78

…et is true [ghstack-poisoned]

mingfeima added a commit that referenced this pull request Dec 7, 2022

fix segfault for EmbeddingBag on CPU slow path when include_last_offs…

acf2521

…et is true ghstack-source-id: 0e9efb9 Pull Request resolved: #90358

mingfeima added topic: not user facing topic category intel This tag is for PR from Intel module: cpu CPU specific problem (e.g., perf, algorithm) labels Dec 7, 2022

pytorchbot added the open source label Dec 7, 2022

mingfeima mentioned this pull request Dec 7, 2022

torch.nn.functional.embedding_bag Trigger "IOT instruction" Failure #89677

Closed

mingfeima requested review from albanD, ezyang, jbschlosser, kurtamohler and malfet December 8, 2022 01:49

ezyang added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 12, 2022

ezyang reviewed Dec 12, 2022

View reviewed changes

aten/src/ATen/native/EmbeddingBag.cpp Outdated Show resolved Hide resolved

ezyang reviewed Dec 12, 2022

View reviewed changes

aten/src/ATen/native/EmbeddingBag.cpp Outdated Show resolved Hide resolved

mingfeima requested a review from ezyang December 13, 2022 08:24

mingfeima marked this pull request as draft December 13, 2022 09:06

ezyang reviewed Dec 13, 2022

View reviewed changes

aten/src/ATen/native/EmbeddingBag.cpp Show resolved Hide resolved

mingfeima added a commit that referenced this pull request Dec 13, 2022

fix segfault for EmbeddingBag on CPU slow path when include_last_offs…

1b78cd0

…et is true ghstack-source-id: a15f19e Pull Request resolved: #90358

mingfeima marked this pull request as ready for review December 14, 2022 01:07

mingfeima requested a review from ezyang December 14, 2022 01:07

ezyang approved these changes Dec 14, 2022

View reviewed changes

mingfeima added a commit that referenced this pull request Dec 14, 2022

fix segfault for EmbeddingBag on CPU slow path when include_last_offs…

9c3729b

…et is true ghstack-source-id: 7b8d830 Pull Request resolved: #90358

mingfeima requested a review from ezyang December 14, 2022 23:59

ezyang approved these changes Dec 15, 2022

View reviewed changes

pytorchmergebot added the Merged label Dec 16, 2022

pytorchmergebot closed this in 9d52361 Dec 16, 2022

facebook-github-bot deleted the gh/mingfeima/91/head branch June 8, 2023 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix segfault for EmbeddingBag on CPU slow path when include_last_offset is true #90358

fix segfault for EmbeddingBag on CPU slow path when include_last_offset is true #90358

Uh oh!

mingfeima commented Dec 7, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 7, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ezyang commented Dec 13, 2022

Uh oh!

Uh oh!

ezyang left a comment

Uh oh!

mingfeima commented Dec 15, 2022

Uh oh!

mingfeima commented Dec 16, 2022

Uh oh!

pytorchmergebot commented Dec 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix segfault for EmbeddingBag on CPU slow path when include_last_offset is true #90358

fix segfault for EmbeddingBag on CPU slow path when include_last_offset is true #90358

Uh oh!

Conversation

mingfeima commented Dec 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90358

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

ezyang commented Dec 13, 2022

Uh oh!

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

mingfeima commented Dec 15, 2022

Uh oh!

mingfeima commented Dec 16, 2022

Uh oh!

pytorchmergebot commented Dec 16, 2022

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mingfeima commented Dec 7, 2022 •

edited

Loading

pytorch-bot bot commented Dec 7, 2022 •

edited

Loading