Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generator optimization: Overlap locals that never have storage live at the same time #60187

Merged
merged 11 commits into from Jun 12, 2019

Conversation

Projects
None yet
10 participants
@tmandry
Copy link
Contributor

commented Apr 23, 2019

The specific goal of this optimization is to optimize async fns which use await!. Notably, await! has an enclosing scope around the futures it awaits (definition), which we rely on to implement the optimization.

More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. We care about storage-liveness when computing the layout, because knowing a field is StorageDead is the only way to prove it will not be accessed, either directly or through a reference.

To determine whether we can overlap two locals in the generator layout, we look at whether they might both be StorageLive at any point in the MIR. We use the MaybeStorageLive dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with.

Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout.

For the layout computation, we use a simplified approach.

  1. Start with the set of locals assigned to only one variant. The rest are disqualified.
  2. For each pair of locals which may conflict and are not assigned to the same variant, we pick one local to disqualify from overlapping.

Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone".

So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this:

You can think of a generator as an enum, where some fields are shared between variants. e.g.

enum Generator {
  Unresumed,
  Poisoned,
  Returned,
  Suspend0(A, B, X),
  Suspend1(B),
  Suspend2(A, Y, Z),
}

where every mention of A and B refer to the same field, which does not move when changing variants. Note that A and B would automatically be sent to the prefix in this example. Assuming that X is never StorageLive at the same time as either Y or Z, it would be allowed to overlap with them.

Note that if two locals (Y and Z in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time.


Depends on:

  • #59897 Multi-variant layouts for generators
  • #60840 Preserve local scopes in generator MIR
  • #61373 Emit StorageDead along unwind paths for generators

Before merging:

  • Wrap the types of all generator fields in MaybeUninitialized in layout::ty::field (opened #60889)
  • Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location)
  • Clean up TODO
  • Fix the layout code to enforce that the same field never moves around in the generator
  • Add tests for async/await
  • Reduce # bits we store by half, since the conflict relation is symmetric (note: decided not to do this, for simplicity)
  • Store liveness information for each yield point in our GeneratorLayout, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at #59897 (comment))
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Apr 23, 2019

r? @davidtwco

(rust_highfive has picked a reviewer for you, use r? to override)

@tmandry tmandry changed the title Generator optimization: Overlap locals that never have storage live at the same time [WIP] Generator optimization: Overlap locals that never have storage live at the same time Apr 23, 2019

@Zoxc

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

@bors try

@bors

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

⌛️ Trying commit e27b6fd with merge faed5e2...

bors added a commit that referenced this pull request Apr 23, 2019

Auto merge of #60187 - tmandry:generator-optimization, r=<try>
[WIP] Generator optimization: Overlap locals that never have storage live at the same time

This depends on #59897 (which is currently failing). Only the commits from "Emit StorageDead for all locals" and on are relevant.

The reason I'm opening this now is so we can get some perf numbers and discuss the approach.
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Apr 23, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:155d63f6:start=1555982977591402820,finish=1555982978514722160,duration=923319340
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---

[00:03:37] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:03:37] tidy error: /checkout/src/librustc_mir/transform/generator.rs:561: TODO is deprecated; use FIXME
[00:03:39] some tidy checks failed
[00:03:39] 
[00:03:39] 
[00:03:39] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:39] 
[00:03:39] 
[00:03:39] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:39] Build completed unsuccessfully in 0:00:45
[00:03:39] Build completed unsuccessfully in 0:00:45
[00:03:39] make: *** [tidy] Error 1
[00:03:39] Makefile:67: recipe for target 'tidy' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0094b6c6
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Tue Apr 23 01:33:29 UTC 2019
---
travis_time:end:15be1d78:start=1555983210069161032,finish=1555983210073859753,duration=4698721
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:07c2e027
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:11effaa0
travis_time:start:11effaa0
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:078d6cef
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bors

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

☀️ Try build successful - checks-travis
Build commit: faed5e2

@Zoxc

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

@rust-timer

This comment has been minimized.

Copy link

commented Apr 23, 2019

Success: Queued faed5e2 with parent 004c549, comparison URL.

@rust-timer

This comment has been minimized.

Copy link

commented Apr 23, 2019

Finished benchmarking try commit faed5e2

@tmandry

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2019

Benchmarking results do not look good. Next I'm trying an approach that doesn't create more basic blocks in the MIR. Instead it propagates StorageDeads forward along the cleanup path, which means we may hit some extra StorageDeads along the way. This shouldn't cause any problems, but I will have to update the analysis pass of the optimization if we keep this.

If the benchmarks still do not look good, I'll try a smarter approach of assembling the StorageDead statements than putting them in a vec and reversing them. At this point I suspect the larger performance hit is due to new basic blocks, but I'm not sure.

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Apr 23, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:20ec75c0:start=1556049886727429768,finish=1556049981616995594,duration=94889565826
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---

[00:03:52] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:03:52] tidy error: /checkout/src/librustc_mir/transform/generator.rs:561: TODO is deprecated; use FIXME
[00:03:54] some tidy checks failed
[00:03:54] 
[00:03:54] 
[00:03:54] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:54] 
[00:03:54] 
[00:03:54] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:54] Build completed unsuccessfully in 0:00:44
[00:03:54] Build completed unsuccessfully in 0:00:44
[00:03:54] Makefile:67: recipe for target 'tidy' failed
[00:03:54] make: *** [tidy] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:02451030
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Tue Apr 23 20:10:25 UTC 2019
---
travis_time:end:00da78d6:start=1556050226138966140,finish=1556050226150654242,duration=11688102
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:25678716
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:00be7c32
travis_time:start:00be7c32
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:15f7112a
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@cramertj

This comment has been minimized.

Copy link
Member

commented Apr 23, 2019

@bors try

@bors

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

⌛️ Trying commit 8bbe344 with merge 32434c7...

bors added a commit that referenced this pull request Apr 23, 2019

Auto merge of #60187 - tmandry:generator-optimization, r=<try>
[WIP] Generator optimization: Overlap locals that never have storage live at the same time

This depends on #59897 (which is currently failing). Only the commits from "Emit StorageDead for all locals" and on are relevant.

The reason I'm opening this now is so we can get some perf numbers and discuss the approach.
@bors

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

☀️ Try build successful - checks-travis
Build commit: 32434c7

@cramertj

This comment has been minimized.

Copy link
Member

commented Apr 23, 2019

@rust-timer

This comment has been minimized.

Copy link

commented Apr 23, 2019

Success: Queued 32434c7 with parent 4eff852, comparison URL.

@rust-lang rust-lang deleted a comment from rust-timer Apr 23, 2019

@rust-lang rust-lang deleted a comment from rust-timer Apr 23, 2019

@rust-timer

This comment has been minimized.

Copy link

commented Apr 23, 2019

Finished benchmarking try commit 32434c7

@Zoxc

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

@tmandry Only changing StorageDead generation for generators should get rid of the performance regressions.

@tmandry

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2019

@tmandry Only changing StorageDead generation for generators should get rid of the performance regressions.

@Zoxc yep, that's what I'll fall back to, if needed!

@rust-timer

This comment has been minimized.

Copy link

commented Apr 24, 2019

Finished benchmarking try commit 32434c7

@tmandry tmandry force-pushed the tmandry:generator-optimization branch 2 times, most recently from 7294f02 to dcd5da8 Apr 25, 2019

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Apr 25, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:1f485cf0:start=1556156481484382630,finish=1556156482401242044,duration=916859414
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---

[00:04:11] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:04:11] tidy error: /checkout/src/librustc_mir/transform/generator.rs:561: TODO is deprecated; use FIXME
[00:04:13] some tidy checks failed
[00:04:13] 
[00:04:13] 
[00:04:13] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:04:13] 
[00:04:13] 
[00:04:13] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:04:13] Build completed unsuccessfully in 0:00:45
[00:04:13] Build completed unsuccessfully in 0:00:45
[00:04:13] make: *** [tidy] Error 1
[00:04:13] Makefile:67: recipe for target 'tidy' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:201bfb6e
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Thu Apr 25 01:45:47 UTC 2019
---
travis_time:end:02be76d9:start=1556156747911994846,finish=1556156747916916594,duration=4921748
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:0e908098
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:084779d0
travis_time:start:084779d0
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:0f7238cd
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bors

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

☔️ The latest upstream changes (presumably #60250) made this pull request unmergeable. Please resolve the merge conflicts.

@tmandry tmandry force-pushed the tmandry:generator-optimization branch from 029e0be to 9f3ad88 Jun 10, 2019

@tmandry tmandry force-pushed the tmandry:generator-optimization branch from 54630d7 to aeefbc4 Jun 11, 2019

@tmandry

This comment has been minimized.

Copy link
Contributor Author

commented Jun 11, 2019

@bors r=eddyb

@bors

This comment has been minimized.

Copy link
Contributor

commented Jun 11, 2019

📌 Commit aeefbc4 has been approved by eddyb

@bors

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2019

⌛️ Testing commit aeefbc4 with merge 05083c2...

bors added a commit that referenced this pull request Jun 12, 2019

Auto merge of #60187 - tmandry:generator-optimization, r=eddyb
Generator optimization: Overlap locals that never have storage live at the same time

The specific goal of this optimization is to optimize async fns which use `await!`. Notably, `await!` has an enclosing scope around the futures it awaits ([definition](https://github.com/rust-lang/rust/blob/08bfe16129b0621bc90184f8704523d4929695ef/src/libstd/macros.rs#L365-L381)), which we rely on to implement the optimization.

More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. **We care about storage-liveness when computing the layout, because knowing a field is `StorageDead` is the only way to prove it will not be accessed, either directly or through a reference.**

To determine whether we can overlap two locals in the generator layout, we look at whether they might *both* be `StorageLive` at any point in the MIR. We use the `MaybeStorageLive` dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with.

Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout.

For the layout computation, we use a simplified approach.

1. Start with the set of locals assigned to only one variant. The rest are disqualified.
2. For each pair of locals which may conflict *and are not assigned to the same variant*, we pick one local to disqualify from overlapping.

Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone".

So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this:

You can think of a generator as an enum, where some fields are shared between variants. e.g.

```rust
enum Generator {
  Unresumed,
  Poisoned,
  Returned,
  Suspend0(A, B, X),
  Suspend1(B),
  Suspend2(A, Y, Z),
}
```

where every mention of `A` and `B` refer to the same field, which does not move when changing variants. Note that `A` and `B` would automatically be sent to the prefix in this example. Assuming that `X` is never `StorageLive` at the same time as either `Y` or `Z`, it would be allowed to overlap with them.

Note that if two locals (`Y` and `Z` in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time.

---

Depends on:
- [x] #59897 Multi-variant layouts for generators
- [x] #60840 Preserve local scopes in generator MIR
- [x] #61373 Emit StorageDead along unwind paths for generators

Before merging:

- [x] ~Wrap the types of all generator fields in `MaybeUninitialized` in layout::ty::field~ (opened #60889)
- [x] Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location)
- [x] Clean up TODO
- [x] Fix the layout code to enforce that the same field never moves around in the generator
- [x] Add tests for async/await
- [x] ~Reduce # bits we store by half, since the conflict relation is symmetric~ (note: decided not to do this, for simplicity)
- [x] Store liveness information for each yield point in our `GeneratorLayout`, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at #59897 (comment))

Centril added a commit to Centril/rust that referenced this pull request Jun 12, 2019

Rollup merge of rust-lang#60187 - tmandry:generator-optimization, r=e…
…ddyb

Generator optimization: Overlap locals that never have storage live at the same time

The specific goal of this optimization is to optimize async fns which use `await!`. Notably, `await!` has an enclosing scope around the futures it awaits ([definition](https://github.com/rust-lang/rust/blob/08bfe16129b0621bc90184f8704523d4929695ef/src/libstd/macros.rs#L365-L381)), which we rely on to implement the optimization.

More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. **We care about storage-liveness when computing the layout, because knowing a field is `StorageDead` is the only way to prove it will not be accessed, either directly or through a reference.**

To determine whether we can overlap two locals in the generator layout, we look at whether they might *both* be `StorageLive` at any point in the MIR. We use the `MaybeStorageLive` dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with.

Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout.

For the layout computation, we use a simplified approach.

1. Start with the set of locals assigned to only one variant. The rest are disqualified.
2. For each pair of locals which may conflict *and are not assigned to the same variant*, we pick one local to disqualify from overlapping.

Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone".

So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this:

You can think of a generator as an enum, where some fields are shared between variants. e.g.

```rust
enum Generator {
  Unresumed,
  Poisoned,
  Returned,
  Suspend0(A, B, X),
  Suspend1(B),
  Suspend2(A, Y, Z),
}
```

where every mention of `A` and `B` refer to the same field, which does not move when changing variants. Note that `A` and `B` would automatically be sent to the prefix in this example. Assuming that `X` is never `StorageLive` at the same time as either `Y` or `Z`, it would be allowed to overlap with them.

Note that if two locals (`Y` and `Z` in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time.

---

Depends on:
- [x] rust-lang#59897 Multi-variant layouts for generators
- [x] rust-lang#60840 Preserve local scopes in generator MIR
- [x] rust-lang#61373 Emit StorageDead along unwind paths for generators

Before merging:

- [x] ~Wrap the types of all generator fields in `MaybeUninitialized` in layout::ty::field~ (opened rust-lang#60889)
- [x] Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location)
- [x] Clean up TODO
- [x] Fix the layout code to enforce that the same field never moves around in the generator
- [x] Add tests for async/await
- [x] ~Reduce # bits we store by half, since the conflict relation is symmetric~ (note: decided not to do this, for simplicity)
- [x] Store liveness information for each yield point in our `GeneratorLayout`, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at rust-lang#59897 (comment))
@bors

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: eddyb
Pushing 05083c2 to master...

@bors bors added the merged-by-bors label Jun 12, 2019

bors added a commit that referenced this pull request Jun 12, 2019

Auto merge of #61758 - Centril:rollup-ew2uxng, r=Centril
Rollup of 9 pull requests

Successful merges:

 - #60187 (Generator optimization: Overlap locals that never have storage live at the same time)
 - #61348 (Implement Clone::clone_from for Option and Result)
 - #61568 (Use Symbol, Span in libfmt_macros)
 - #61632 (ci: Collect CPU usage statistics on Azure)
 - #61654 (use pattern matching for slices destructuring)
 - #61671 (implement nth_back for Range(Inclusive))
 - #61688 (is_fp and is_floating_point do the same thing, remove the former)
 - #61705 (Pass cflags rather than cxxflags to LLVM as CMAKE_C_FLAGS)
 - #61734 (Migrate rust-by-example to MdBook2)

Failed merges:

r? @ghost

@bors bors merged commit aeefbc4 into rust-lang:master Jun 12, 2019

2 checks passed

Travis CI - Pull Request Build Passed
Details
homu Test successful
Details

@tmandry tmandry deleted the tmandry:generator-optimization branch Jun 14, 2019

bors added a commit that referenced this pull request Jun 26, 2019

Auto merge of #62072 - eddyb:generator-memory-index, r=tmandry
rustc: correctly transform memory_index mappings for generators.

Fixes #61793, closes #62011 (previous attempt at fixing #61793).

During #60187, I made the mistake of suggesting that the (re-)computation of `memory_index` in `ty::layout`, after generator-specific logic split/recombined fields, be done off of the `offsets` of those fields (which needed to be computed anyway), as opposed to the `memory_index`.

`memory_index` maps each field to its in-memory order index, which ranges over the same `0..n` values as the fields themselves, making it a bijective mapping, and more specifically a permutation (indeed, it's the permutation resulting from field reordering optimizations).

Each field has an unique "memory index", meaning a sort based on them, even an unstable one, will not put them in the wrong order. But offsets don't have that property, because of ZSTs (which do not increase the offset), so sorting based on the offset of fields alone can (and did) result in wrong orders.

Instead of going back to sorting based on (slices/subsets of) `memory_index`, or special-casing ZSTs to make sorting based on offsets produce the right results (presumably), as #62011 does, I opted to drop sorting altogether and focus on `O(n)` operations involving *permutations*:
* a permutation is easily inverted (see the `invert_mapping` `fn`)
  * an `inverse_memory_index` was already employed in other parts of the `ty::layout` code (that is, a mapping from memory order to field indices)
  * inverting twice produces the original permutation, so you can invert, modify, and invert again, if it's easier to modify the inverse mapping than the direct one
* you can modify/remove elements in a permutation, as long as the result remains dense (i.e. using every integer in `0..len`, without gaps)
  * for splitting a `0..n` permutation into disjoint `0..x` and `x..n` ranges, you can pick the elements based on a `i < x` / `i >= x` predicate, and for the latter, also subtract `x` to compact the range to `0..n-x`
  * in the general case, for taking an arbitrary subset of the permutation, you need a renumbering from that subset to a dense `0..subset.len()` - but notably, this is still `O(n)`!
* you can merge permutations, as long as the result remains disjoint (i.e. each element is unique)
  * for concatenating two `0..n` and `0..m` permutations, you can renumber the elements in the latter to `n..n+m`
* some of these operations can be combined, and an inverse mapping (be it a permutation or not) can still be used instead of a forward one by changing the "domain" of the loop performing the operation

I wish I had a nicer / more mathematical description of the recombinations involved, but my focus was to fix the bug (in a way which preserves information more directly than sorting would), so I may have missed potential changes in the surrounding generator layout code, that would make this all more straight-forward.

r? @tmandry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.