Skip to content

Conversation

mingfeima
Copy link
Collaborator

@mingfeima mingfeima commented Dec 7, 2022

Stack from ghstack:

This PR is to fix the segfault reported at #89677, this is a double free issue caused by invalid read.

The reported issue broke at slow path for EmbeddingBag on float32, at EmbeddingBag.cpp#L451

Root cause is that add_indices has index which exceeds range of output_data, for the reported case.

The offsets are given as

{0,  6, 12, 15, 25, 32, 40, 42, 46, 53, 53}

The indices has 55 elements and offsets[-1] != indices.size(0).

When include_last_offset is true, the output will be in the shape of {offsets.size(0) - 1, weight.sizes()[1]}, which will be {10, 5}.
Originally, add_indices will be (i re-arange the 1D tensor by rows, so here 10 rows in total)

### this is 55 elements
  0 0 0 0 0 0
  1 1 1 1 1 1
  2 2 2
  3 3 3 3 3 3 3 3 3 3
  4 4 4 4 4 4 4
  5 5 5 5 5 5 5 5
  6 6
  7 7 7 7
  8 8 8 8 8 8 8
  10 10

The last row has index of 10 which is out of range of output tensor whose size is [10, 5].

The reason is make_offset2bag at EmbeddingBag.cpp#L66 would give the following offset2bag:

### this is 55 + 1 elements:
0 0 0 0 0 0 1
0 0 0 0 0 1
0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 0 0 0 0 0 0 1
0 1
0 0 0 1
0 0 0 0 0 0 2
0 0

Notice for index 53, it is added twice.

The fix is ignore the last index from offsets when include_last_offset is true, also this behavior aligns with CUDA, quote from #57208 (comment)

cc @jgong5 @XiaobingSuper @sanchitintel @ashokei @jingxu10

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 7, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90358

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b6e030e:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mingfeima added a commit that referenced this pull request Dec 7, 2022
…et is true

ghstack-source-id: 0e9efb9
Pull Request resolved: #90358
@mingfeima mingfeima added topic: not user facing topic category intel This tag is for PR from Intel module: cpu CPU specific problem (e.g., perf, algorithm) labels Dec 7, 2022
@ezyang ezyang added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 12, 2022
…e_last_offset is true"


This PR is to fix the segfault reported at #89677, this is a `double free` issue caused by `invalid read`.

The reported issue broke at slow path for `EmbeddingBag` on float32, at [EmbeddingBag.cpp#L451](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L451)

Root cause is that `add_indices` has index which exceeds range of `output_data`, for the reported case.

The offsets are given as
```
{0,  6, 12, 15, 25, 32, 40, 42, 46, 53, 53}
```

The `indices` has 55 elements and `offsets[-1] != indices.size(0)`.

When `include_last_offset` is true, the `output` will be in the shape of {offsets.size(0) - 1, weight.sizes()[1]}, which will be {10, 5}.
Originally, `add_indices` will be (i re-arange the 1D tensor by rows, so here 10 rows in total)
```
### this is 55 elements
  0 0 0 0 0 0
  1 1 1 1 1 1
  2 2 2
  3 3 3 3 3 3 3 3 3 3
  4 4 4 4 4 4 4
  5 5 5 5 5 5 5 5
  6 6
  7 7 7 7
  8 8 8 8 8 8 8
  10 10
```
The last row has index of 10 which is out of range of output tensor whose size is [10, 5].

The reason is `make_offset2bag` at [EmbeddingBag.cpp#L66](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L66) would give the following `offset2bag`:
```
### this is 55 + 1 elements:
0 0 0 0 0 0 1
0 0 0 0 0 1
0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 0 0 0 0 0 0 1
0 1
0 0 0 1
0 0 0 0 0 0 2
0 0
```

Notice for index 53, it is added twice.

The fix is ignore the last index from `offsets` when `include_last_offset` is true, also this behavior aligns with CUDA, quote from #57208 (comment)

cc jgong5 @XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
…e_last_offset is true"


This PR is to fix the segfault reported at #89677, this is a `double free` issue caused by `invalid read`.

The reported issue broke at slow path for `EmbeddingBag` on float32, at [EmbeddingBag.cpp#L451](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L451)

Root cause is that `add_indices` has index which exceeds range of `output_data`, for the reported case.

The offsets are given as
```
{0,  6, 12, 15, 25, 32, 40, 42, 46, 53, 53}
```

The `indices` has 55 elements and `offsets[-1] != indices.size(0)`.

When `include_last_offset` is true, the `output` will be in the shape of {offsets.size(0) - 1, weight.sizes()[1]}, which will be {10, 5}.
Originally, `add_indices` will be (i re-arange the 1D tensor by rows, so here 10 rows in total)
```
### this is 55 elements
  0 0 0 0 0 0
  1 1 1 1 1 1
  2 2 2
  3 3 3 3 3 3 3 3 3 3
  4 4 4 4 4 4 4
  5 5 5 5 5 5 5 5
  6 6
  7 7 7 7
  8 8 8 8 8 8 8
  10 10
```
The last row has index of 10 which is out of range of output tensor whose size is [10, 5].

The reason is `make_offset2bag` at [EmbeddingBag.cpp#L66](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L66) would give the following `offset2bag`:
```
### this is 55 + 1 elements:
0 0 0 0 0 0 1
0 0 0 0 0 1
0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 0 0 0 0 0 0 1
0 1
0 0 0 1
0 0 0 0 0 0 2
0 0
```

Notice for index 53, it is added twice.

The fix is ignore the last index from `offsets` when `include_last_offset` is true, also this behavior aligns with CUDA, quote from #57208 (comment)

cc jgong5 @XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@mingfeima mingfeima requested a review from ezyang December 13, 2022 08:24
@mingfeima mingfeima marked this pull request as draft December 13, 2022 09:06
@ezyang
Copy link
Contributor

ezyang commented Dec 13, 2022

I'm not sure the new code is better 😂 It's a lot more complicated but still written fairly naively

…e_last_offset is true"


This PR is to fix the segfault reported at #89677, this is a `double free` issue caused by `invalid read`.

The reported issue broke at slow path for `EmbeddingBag` on float32, at [EmbeddingBag.cpp#L451](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L451)

Root cause is that `add_indices` has index which exceeds range of `output_data`, for the reported case.

The offsets are given as
```
{0,  6, 12, 15, 25, 32, 40, 42, 46, 53, 53}
```

The `indices` has 55 elements and `offsets[-1] != indices.size(0)`.

When `include_last_offset` is true, the `output` will be in the shape of {offsets.size(0) - 1, weight.sizes()[1]}, which will be {10, 5}.
Originally, `add_indices` will be (i re-arange the 1D tensor by rows, so here 10 rows in total)
```
### this is 55 elements
  0 0 0 0 0 0
  1 1 1 1 1 1
  2 2 2
  3 3 3 3 3 3 3 3 3 3
  4 4 4 4 4 4 4
  5 5 5 5 5 5 5 5
  6 6
  7 7 7 7
  8 8 8 8 8 8 8
  10 10
```
The last row has index of 10 which is out of range of output tensor whose size is [10, 5].

The reason is `make_offset2bag` at [EmbeddingBag.cpp#L66](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L66) would give the following `offset2bag`:
```
### this is 55 + 1 elements:
0 0 0 0 0 0 1
0 0 0 0 0 1
0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 0 0 0 0 0 0 1
0 1
0 0 0 1
0 0 0 0 0 0 2
0 0
```

Notice for index 53, it is added twice.

The fix is ignore the last index from `offsets` when `include_last_offset` is true, also this behavior aligns with CUDA, quote from #57208 (comment)

cc jgong5 @XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
mingfeima added a commit that referenced this pull request Dec 13, 2022
…et is true

ghstack-source-id: a15f19e
Pull Request resolved: #90358
@mingfeima mingfeima marked this pull request as ready for review December 14, 2022 01:07
@mingfeima mingfeima requested a review from ezyang December 14, 2022 01:07
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this is much better. Can we have a test though?

…e_last_offset is true"


This PR is to fix the segfault reported at #89677, this is a `double free` issue caused by `invalid read`.

The reported issue broke at slow path for `EmbeddingBag` on float32, at [EmbeddingBag.cpp#L451](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L451)

Root cause is that `add_indices` has index which exceeds range of `output_data`, for the reported case.

The offsets are given as
```
{0,  6, 12, 15, 25, 32, 40, 42, 46, 53, 53}
```

The `indices` has 55 elements and `offsets[-1] != indices.size(0)`.

When `include_last_offset` is true, the `output` will be in the shape of {offsets.size(0) - 1, weight.sizes()[1]}, which will be {10, 5}.
Originally, `add_indices` will be (i re-arange the 1D tensor by rows, so here 10 rows in total)
```
### this is 55 elements
  0 0 0 0 0 0
  1 1 1 1 1 1
  2 2 2
  3 3 3 3 3 3 3 3 3 3
  4 4 4 4 4 4 4
  5 5 5 5 5 5 5 5
  6 6
  7 7 7 7
  8 8 8 8 8 8 8
  10 10
```
The last row has index of 10 which is out of range of output tensor whose size is [10, 5].

The reason is `make_offset2bag` at [EmbeddingBag.cpp#L66](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L66) would give the following `offset2bag`:
```
### this is 55 + 1 elements:
0 0 0 0 0 0 1
0 0 0 0 0 1
0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 0 0 0 0 0 0 1
0 1
0 0 0 1
0 0 0 0 0 0 2
0 0
```

Notice for index 53, it is added twice.

The fix is ignore the last index from `offsets` when `include_last_offset` is true, also this behavior aligns with CUDA, quote from #57208 (comment)

cc jgong5 @XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
…e_last_offset is true"


This PR is to fix the segfault reported at #89677, this is a `double free` issue caused by `invalid read`.

The reported issue broke at slow path for `EmbeddingBag` on float32, at [EmbeddingBag.cpp#L451](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L451)

Root cause is that `add_indices` has index which exceeds range of `output_data`, for the reported case.

The offsets are given as
```
{0,  6, 12, 15, 25, 32, 40, 42, 46, 53, 53}
```

The `indices` has 55 elements and `offsets[-1] != indices.size(0)`.

When `include_last_offset` is true, the `output` will be in the shape of {offsets.size(0) - 1, weight.sizes()[1]}, which will be {10, 5}.
Originally, `add_indices` will be (i re-arange the 1D tensor by rows, so here 10 rows in total)
```
### this is 55 elements
  0 0 0 0 0 0
  1 1 1 1 1 1
  2 2 2
  3 3 3 3 3 3 3 3 3 3
  4 4 4 4 4 4 4
  5 5 5 5 5 5 5 5
  6 6
  7 7 7 7
  8 8 8 8 8 8 8
  10 10
```
The last row has index of 10 which is out of range of output tensor whose size is [10, 5].

The reason is `make_offset2bag` at [EmbeddingBag.cpp#L66](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L66) would give the following `offset2bag`:
```
### this is 55 + 1 elements:
0 0 0 0 0 0 1
0 0 0 0 0 1
0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 0 0 0 0 0 0 1
0 1
0 0 0 1
0 0 0 0 0 0 2
0 0
```

Notice for index 53, it is added twice.

The fix is ignore the last index from `offsets` when `include_last_offset` is true, also this behavior aligns with CUDA, quote from #57208 (comment)

cc jgong5 @XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
…e_last_offset is true"


This PR is to fix the segfault reported at #89677, this is a `double free` issue caused by `invalid read`.

The reported issue broke at slow path for `EmbeddingBag` on float32, at [EmbeddingBag.cpp#L451](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L451)

Root cause is that `add_indices` has index which exceeds range of `output_data`, for the reported case.

The offsets are given as
```
{0,  6, 12, 15, 25, 32, 40, 42, 46, 53, 53}
```

The `indices` has 55 elements and `offsets[-1] != indices.size(0)`.

When `include_last_offset` is true, the `output` will be in the shape of {offsets.size(0) - 1, weight.sizes()[1]}, which will be {10, 5}.
Originally, `add_indices` will be (i re-arange the 1D tensor by rows, so here 10 rows in total)
```
### this is 55 elements
  0 0 0 0 0 0
  1 1 1 1 1 1
  2 2 2
  3 3 3 3 3 3 3 3 3 3
  4 4 4 4 4 4 4
  5 5 5 5 5 5 5 5
  6 6
  7 7 7 7
  8 8 8 8 8 8 8
  10 10
```
The last row has index of 10 which is out of range of output tensor whose size is [10, 5].

The reason is `make_offset2bag` at [EmbeddingBag.cpp#L66](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L66) would give the following `offset2bag`:
```
### this is 55 + 1 elements:
0 0 0 0 0 0 1
0 0 0 0 0 1
0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 0 0 0 0 0 0 1
0 1
0 0 0 1
0 0 0 0 0 0 2
0 0
```

Notice for index 53, it is added twice.

The fix is ignore the last index from `offsets` when `include_last_offset` is true, also this behavior aligns with CUDA, quote from #57208 (comment)

cc jgong5 @XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
mingfeima added a commit that referenced this pull request Dec 14, 2022
…et is true

ghstack-source-id: 7b8d830
Pull Request resolved: #90358
@mingfeima mingfeima requested a review from ezyang December 14, 2022 23:59
@mingfeima
Copy link
Collaborator Author

ok, this is much better. Can we have a test though?

Test case updated! Add a minimal case which would trigger 'double free' issue without this fix.

@mingfeima
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/mingfeima/91/head branch June 8, 2023 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source topic: not user facing topic category

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants