Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Coroutines] The cross suspend point analysis is not scaling #62348

Closed
dyzsr opened this issue Apr 25, 2023 · 4 comments
Closed

[Coroutines] The cross suspend point analysis is not scaling #62348

dyzsr opened this issue Apr 25, 2023 · 4 comments
Assignees
Labels
coroutines C++20 coroutines

Comments

@dyzsr
Copy link

dyzsr commented Apr 25, 2023

The compilation time problem of C++20 coroutine still exists in Clang 16.0.2, after the fix to #58650.

Here shows the comparison of 2 scenarios (a2.cpp and a3.app).

  • a2.cpp is like:
#include <cstdio>
#include <vector>
#include "task.h"
task t() {
std::vector<int> v(100000);
int val = 0;
if (v[1]) val++;
if (v[2]) val++;
if (v[3]) val++;
if (v[4]) val++;
...
if (v[99999]) val++;
if (v[100000]) val++;
printf("%d\n", val);
co_return;
}
  • a3.cpp is like:
#include <cstdio>
#include <vector>
#include "task.h"
task t() {
std::vector<int> v(100000);
int val = 0;
auto foo = [&](int i) { if (v[i]) val++; };
foo(1);
foo(2);
foo(3);
...
foo(99999);
foo(100000);
printf("%d\n", val);
co_return;
}

Generation and test scripts:

  • gen2.cpp generates use cases that possibly enable bounds checking.
#include <cassert>
#include <cstdio>
#include <cstdlib>

int main(int argc, char **argv) {
  assert(argc == 2);
  int n = atoi(argv[1]);
  printf("#include <cstdio>\n");
  printf("#include <vector>\n");
  printf("#include \"task.h\"\n");
  printf("task t() {\n");
  printf("std::vector<int> v(%d);\n", n);
  printf("int val = 0;\n");
  for (int i = 1; i <= n; i++) printf("if (v[%d]) val++;\n", i);
  printf("printf(\"%%d\\n\", val);\n");
  printf("co_return;\n");
  printf("}\n");
  return 0;
}
  • gen3.cpp generates use cases that try to avoid bounds checking.
#include <cassert>
#include <cstdio>
#include <cstdlib>

int main(int argc, char **argv) {
  assert(argc == 2);
  int n = atoi(argv[1]);
  printf("#include <cstdio>\n");
  printf("#include <vector>\n");
  printf("#include \"task.h\"\n");
  printf("task t() {\n");
  printf("std::vector<int> v(%d);\n", n);
  printf("int val = 0;\n");
  printf("auto foo = [&](int i) { if (v[i]) val++; };\n");
  for (int i = 1; i <= n; i++) printf("foo(%d);\n", i);
  printf("printf(\"%%d\\n\", val);\n");
  printf("co_return;\n");
  printf("}\n");
  return 0;
}
  • test2.sh
#!/bin/sh

clang++ -o gen2 gen2.cpp
for i in 20000 40000 60000 80000 100000
do
  printf "n: %d\n" $i
  ./gen2 $i > a2.cpp
  time clang++ -c a2.cpp -std=c++20 -stdlib=libc++
  printf "\n"
done
  • test3.sh
#!/bin/sh

clang++ -o gen3 gen3.cpp
for i in 20000 40000 60000 80000 100000
do
  printf "n: %d\n" $i
  ./gen3 $i > a3.cpp
  time clang++ -c a3.cpp -std=c++20 -stdlib=libc++
  printf "\n"
done

Compilation time:

  • test2
n: 20000

real    0m7.986s
user    0m7.537s
sys     0m0.430s

n: 40000

real    0m24.699s
user    0m23.221s
sys     0m1.420s

n: 60000

real    1m2.934s
user    0m57.135s
sys     0m5.651s

n: 80000

real    1m46.364s
user    1m36.321s
sys     0m9.787s

n: 100000

real    2m51.611s
user    2m40.468s
sys     0m10.736s
  • test3
n: 20000

real    0m2.319s
user    0m2.194s
sys     0m0.120s

n: 40000

real    0m5.908s
user    0m5.227s
sys     0m0.667s

n: 60000

real    0m10.007s
user    0m9.271s
sys     0m0.712s

n: 80000

real    0m13.532s
user    0m11.979s
sys     0m1.522s

n: 100000

real    0m26.432s
user    0m23.669s
sys     0m2.697s
@EugeneZelenko EugeneZelenko added coroutines C++20 coroutines and removed new issue labels Apr 25, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 25, 2023

@llvm/issue-subscribers-coroutines

@ChuanqiXu9 ChuanqiXu9 self-assigned this May 4, 2023
@ChuanqiXu9
Copy link
Member

The root cause problem is that coroutines uses a textbook style data flow analysis for the lifetime analysis, which is powerful but not scaling. I feel it is not easy to refactor this fundamentally given it runs almost good for 5 years, we don't touch it frequently. So I can possibly only add some pruning techniques.

ChuanqiXu9 added a commit that referenced this issue May 5, 2023
…int information

Mitigate #62348

The root cause for the above issue is that we used a textbook dataflow
analysis for the cross suspend point information. The analysis is
powerful but not scaling.

It is not easy to improve the current algorithm and the patch tries to
prune some branches to mitigate the problems.

Before the patch:

```
n: 20000

real	0m11.081s
user	0m10.597s
sys	0m0.320s

n: 40000

real	0m32.927s
user	0m31.403s
sys	0m1.043s

n: 60000

real	1m2.145s
user	0m58.903s
sys	0m2.268s

n: 80000

real	1m47.143s
user	1m41.630s
sys	0m3.857s

n: 100000

real	2m34.758s
user	2m26.587s
sys	0m5.922s
```

After the patch:

```
n: 20000

real	0m10.418s
user	0m9.945s
sys	0m0.311s

n: 40000

real	0m27.884s
user	0m26.430s
sys	0m1.036s

n: 60000

real	0m52.420s
user	0m49.321s
sys	0m2.267s

n: 80000

real	1m25.389s
user	1m20.247s
sys	0m3.856s

n: 100000

real	2m4.275s
user	1m56.405s
sys	0m5.975s
```

This patch intended to be a NFC patch.
@ChuanqiXu9 ChuanqiXu9 changed the title Coroutine compilation time possibly affected by array bounds checking [Coroutines] The cross suspend point analysis is not scaling May 5, 2023
@ChuanqiXu9
Copy link
Member

ChuanqiXu9 commented May 5, 2023

I tried to mitigate the issue. Now the numbers are:

Before:

n: 20000

real	0m11.081s
user	0m10.597s
sys	0m0.320s

n: 40000

real	0m32.927s
user	0m31.403s
sys	0m1.043s

n: 60000

real	1m2.145s
user	0m58.903s
sys	0m2.268s

n: 80000

real	1m47.143s
user	1m41.630s
sys	0m3.857s

n: 100000

real	2m34.758s
user	2m26.587s
sys	0m5.922s

After:

n: 20000

real	0m10.418s
user	0m9.945s
sys	0m0.311s

n: 40000

real	0m27.884s
user	0m26.430s
sys	0m1.036s

n: 60000

real	0m52.420s
user	0m49.321s
sys	0m2.267s

n: 80000

real	1m25.389s
user	1m20.247s
sys	0m3.856s

n: 100000

real	2m4.275s
user	1m56.405s
sys	0m5.975s

It is slightly better now. But I don't have an idea to fix the true issue really.

@witstorm95
Copy link
Contributor

witstorm95 commented Jul 7, 2023

@ChuanqiXu9 I add a patch for this issue, please help to review it.

https://reviews.llvm.org/D154695

ChuanqiXu9 pushed a commit that referenced this issue Jul 27, 2023
information.

Fixed #62348

Propagate cross suspend point information by visiting CFG.

Just only go through two times at most, you can get all the cross
suspend point information.

Before the patch:

```
n: 20000
4.31user 0.11system 0:04.44elapsed 99%CPU (0avgtext+0avgdata
552352maxresident)k
0inputs+8848outputs (0major+126254minor)pagefaults 0swaps

n: 40000
11.24user 0.40system 0:11.66elapsed 99%CPU (0avgtext+0avgdata
1788404maxresident)k
0inputs+17600outputs (0major+431105minor)pagefaults 0swaps

n: 60000
21.65user 0.96system 0:22.62elapsed 99%CPU (0avgtext+0avgdata
3809836maxresident)k
0inputs+26352outputs (0major+934749minor)pagefaults 0swaps

n: 80000
37.05user 1.53system 0:38.58elapsed 99%CPU (0avgtext+0avgdata
6602396maxresident)k
0inputs+35096outputs (0major+1622584minor)pagefaults 0swaps

n: 100000
51.87user 2.67system 0:54.54elapsed 99%CPU (0avgtext+0avgdata
10210736maxresident)k
0inputs+43848outputs (0major+2518945minor)pagefaults 0swaps

```
After the patch:

```
n: 20000
3.17user 0.16system 0:03.33elapsed 100%CPU (0avgtext+0avgdata
551736maxresident)k
0inputs+8848outputs (0major+126192minor)pagefaults 0swaps

n: 40000
6.10user 0.42system 0:06.54elapsed 99%CPU (0avgtext+0avgdata
1787848maxresident)k
0inputs+17600outputs (0major+432212minor)pagefaults 0swaps

n: 60000
9.13user 0.89system 0:10.03elapsed 99%CPU (0avgtext+0avgdata
3809108maxresident)k
0inputs+26352outputs (0major+931280minor)pagefaults 0swaps

n: 80000
12.44user 1.57system 0:14.02elapsed 99%CPU (0avgtext+0avgdata
6603432maxresident)k
0inputs+35096outputs (0major+1624635minor)pagefaults 0swaps

n: 100000
16.29user 2.28system 0:18.59elapsed 99%CPU (0avgtext+0avgdata
10212808maxresident)k
0inputs+43848outputs (0major+2522200minor)pagefaults 0swaps

```

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D154695
eymay pushed a commit to eymay/llvm-project that referenced this issue Jul 28, 2023
information.

Fixed llvm#62348

Propagate cross suspend point information by visiting CFG.

Just only go through two times at most, you can get all the cross
suspend point information.

Before the patch:

```
n: 20000
4.31user 0.11system 0:04.44elapsed 99%CPU (0avgtext+0avgdata
552352maxresident)k
0inputs+8848outputs (0major+126254minor)pagefaults 0swaps

n: 40000
11.24user 0.40system 0:11.66elapsed 99%CPU (0avgtext+0avgdata
1788404maxresident)k
0inputs+17600outputs (0major+431105minor)pagefaults 0swaps

n: 60000
21.65user 0.96system 0:22.62elapsed 99%CPU (0avgtext+0avgdata
3809836maxresident)k
0inputs+26352outputs (0major+934749minor)pagefaults 0swaps

n: 80000
37.05user 1.53system 0:38.58elapsed 99%CPU (0avgtext+0avgdata
6602396maxresident)k
0inputs+35096outputs (0major+1622584minor)pagefaults 0swaps

n: 100000
51.87user 2.67system 0:54.54elapsed 99%CPU (0avgtext+0avgdata
10210736maxresident)k
0inputs+43848outputs (0major+2518945minor)pagefaults 0swaps

```
After the patch:

```
n: 20000
3.17user 0.16system 0:03.33elapsed 100%CPU (0avgtext+0avgdata
551736maxresident)k
0inputs+8848outputs (0major+126192minor)pagefaults 0swaps

n: 40000
6.10user 0.42system 0:06.54elapsed 99%CPU (0avgtext+0avgdata
1787848maxresident)k
0inputs+17600outputs (0major+432212minor)pagefaults 0swaps

n: 60000
9.13user 0.89system 0:10.03elapsed 99%CPU (0avgtext+0avgdata
3809108maxresident)k
0inputs+26352outputs (0major+931280minor)pagefaults 0swaps

n: 80000
12.44user 1.57system 0:14.02elapsed 99%CPU (0avgtext+0avgdata
6603432maxresident)k
0inputs+35096outputs (0major+1624635minor)pagefaults 0swaps

n: 100000
16.29user 2.28system 0:18.59elapsed 99%CPU (0avgtext+0avgdata
10212808maxresident)k
0inputs+43848outputs (0major+2522200minor)pagefaults 0swaps

```
eymay pushed a commit to eymay/llvm-project that referenced this issue Jul 28, 2023
information.

Fixed llvm#62348

Propagate cross suspend point information by visiting CFG.

Just only go through two times at most, you can get all the cross
suspend point information.

Before the patch:

```
n: 20000
4.31user 0.11system 0:04.44elapsed 99%CPU (0avgtext+0avgdata
552352maxresident)k
0inputs+8848outputs (0major+126254minor)pagefaults 0swaps

n: 40000
11.24user 0.40system 0:11.66elapsed 99%CPU (0avgtext+0avgdata
1788404maxresident)k
0inputs+17600outputs (0major+431105minor)pagefaults 0swaps

n: 60000
21.65user 0.96system 0:22.62elapsed 99%CPU (0avgtext+0avgdata
3809836maxresident)k
0inputs+26352outputs (0major+934749minor)pagefaults 0swaps

n: 80000
37.05user 1.53system 0:38.58elapsed 99%CPU (0avgtext+0avgdata
6602396maxresident)k
0inputs+35096outputs (0major+1622584minor)pagefaults 0swaps

n: 100000
51.87user 2.67system 0:54.54elapsed 99%CPU (0avgtext+0avgdata
10210736maxresident)k
0inputs+43848outputs (0major+2518945minor)pagefaults 0swaps

```
After the patch:

```
n: 20000
3.17user 0.16system 0:03.33elapsed 100%CPU (0avgtext+0avgdata
551736maxresident)k
0inputs+8848outputs (0major+126192minor)pagefaults 0swaps

n: 40000
6.10user 0.42system 0:06.54elapsed 99%CPU (0avgtext+0avgdata
1787848maxresident)k
0inputs+17600outputs (0major+432212minor)pagefaults 0swaps

n: 60000
9.13user 0.89system 0:10.03elapsed 99%CPU (0avgtext+0avgdata
3809108maxresident)k
0inputs+26352outputs (0major+931280minor)pagefaults 0swaps

n: 80000
12.44user 1.57system 0:14.02elapsed 99%CPU (0avgtext+0avgdata
6603432maxresident)k
0inputs+35096outputs (0major+1624635minor)pagefaults 0swaps

n: 100000
16.29user 2.28system 0:18.59elapsed 99%CPU (0avgtext+0avgdata
10212808maxresident)k
0inputs+43848outputs (0major+2522200minor)pagefaults 0swaps

```

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D154695
ChuanqiXu9 pushed a commit that referenced this issue Aug 9, 2023
…bout cross suspend infomation to reach a fixed point faster.

Fixed #62348

Propagate cross suspend point information along reverse post-order.
It does not modify the original function, just selects a better
traversal order.

Before the patch:

```
n: 20000
4.31user 0.11system 0:04.44elapsed 99%CPU (0avgtext+0avgdata
552352maxresident)k
0inputs+8848outputs (0major+126254minor)pagefaults 0swaps

n: 40000
11.24user 0.40system 0:11.66elapsed 99%CPU (0avgtext+0avgdata
1788404maxresident)k
0inputs+17600outputs (0major+431105minor)pagefaults 0swaps

n: 60000
21.65user 0.96system 0:22.62elapsed 99%CPU (0avgtext+0avgdata
3809836maxresident)k
0inputs+26352outputs (0major+934749minor)pagefaults 0swaps

n: 80000
37.05user 1.53system 0:38.58elapsed 99%CPU (0avgtext+0avgdata
6602396maxresident)k
0inputs+35096outputs (0major+1622584minor)pagefaults 0swaps

n: 100000
51.87user 2.67system 0:54.54elapsed 99%CPU (0avgtext+0avgdata
10210736maxresident)k
0inputs+43848outputs (0major+2518945minor)pagefaults 0swaps

```
After the patch:

```
n: 20000
3.08user 0.12system 0:03.21elapsed 99%CPU (0avgtext+0avgdata
551012maxresident)k
0inputs+8848outputs (0major+129349minor)pagefaults 0swaps

n: 40000
5.88user 0.33system 0:06.22elapsed 99%CPU (0avgtext+0avgdata
1789248maxresident)k
0inputs+17600outputs (0major+435096minor)pagefaults 0swaps

n: 60000
8.84user 0.77system 0:09.63elapsed 99%CPU (0avgtext+0avgdata
3807800maxresident)k
0inputs+26352outputs (0major+939119minor)pagefaults 0swaps

n: 80000
11.64user 1.58system 0:13.23elapsed 99%CPU (0avgtext+0avgdata
6604708maxresident)k
0inputs+35096outputs (0major+1629566minor)pagefaults 0swaps

n: 100000
15.21user 2.56system 0:17.79elapsed 99%CPU (0avgtext+0avgdata
10208828maxresident)k
8inputs+43848outputs (0major+2526611minor)pagefaults 0swaps

```

Reviewed By: MatzeB, ChuanqiXu

Differential Revision: https://reviews.llvm.org/D156850
razmser pushed a commit to razmser/llvm-project that referenced this issue Sep 8, 2023
information.

Fixed llvm#62348

Propagate cross suspend point information by visiting CFG.

Just only go through two times at most, you can get all the cross
suspend point information.

Before the patch:

```
n: 20000
4.31user 0.11system 0:04.44elapsed 99%CPU (0avgtext+0avgdata
552352maxresident)k
0inputs+8848outputs (0major+126254minor)pagefaults 0swaps

n: 40000
11.24user 0.40system 0:11.66elapsed 99%CPU (0avgtext+0avgdata
1788404maxresident)k
0inputs+17600outputs (0major+431105minor)pagefaults 0swaps

n: 60000
21.65user 0.96system 0:22.62elapsed 99%CPU (0avgtext+0avgdata
3809836maxresident)k
0inputs+26352outputs (0major+934749minor)pagefaults 0swaps

n: 80000
37.05user 1.53system 0:38.58elapsed 99%CPU (0avgtext+0avgdata
6602396maxresident)k
0inputs+35096outputs (0major+1622584minor)pagefaults 0swaps

n: 100000
51.87user 2.67system 0:54.54elapsed 99%CPU (0avgtext+0avgdata
10210736maxresident)k
0inputs+43848outputs (0major+2518945minor)pagefaults 0swaps

```
After the patch:

```
n: 20000
3.17user 0.16system 0:03.33elapsed 100%CPU (0avgtext+0avgdata
551736maxresident)k
0inputs+8848outputs (0major+126192minor)pagefaults 0swaps

n: 40000
6.10user 0.42system 0:06.54elapsed 99%CPU (0avgtext+0avgdata
1787848maxresident)k
0inputs+17600outputs (0major+432212minor)pagefaults 0swaps

n: 60000
9.13user 0.89system 0:10.03elapsed 99%CPU (0avgtext+0avgdata
3809108maxresident)k
0inputs+26352outputs (0major+931280minor)pagefaults 0swaps

n: 80000
12.44user 1.57system 0:14.02elapsed 99%CPU (0avgtext+0avgdata
6603432maxresident)k
0inputs+35096outputs (0major+1624635minor)pagefaults 0swaps

n: 100000
16.29user 2.28system 0:18.59elapsed 99%CPU (0avgtext+0avgdata
10212808maxresident)k
0inputs+43848outputs (0major+2522200minor)pagefaults 0swaps

```
razmser pushed a commit to razmser/llvm-project that referenced this issue Sep 8, 2023
information.

Fixed llvm#62348

Propagate cross suspend point information by visiting CFG.

Just only go through two times at most, you can get all the cross
suspend point information.

Before the patch:

```
n: 20000
4.31user 0.11system 0:04.44elapsed 99%CPU (0avgtext+0avgdata
552352maxresident)k
0inputs+8848outputs (0major+126254minor)pagefaults 0swaps

n: 40000
11.24user 0.40system 0:11.66elapsed 99%CPU (0avgtext+0avgdata
1788404maxresident)k
0inputs+17600outputs (0major+431105minor)pagefaults 0swaps

n: 60000
21.65user 0.96system 0:22.62elapsed 99%CPU (0avgtext+0avgdata
3809836maxresident)k
0inputs+26352outputs (0major+934749minor)pagefaults 0swaps

n: 80000
37.05user 1.53system 0:38.58elapsed 99%CPU (0avgtext+0avgdata
6602396maxresident)k
0inputs+35096outputs (0major+1622584minor)pagefaults 0swaps

n: 100000
51.87user 2.67system 0:54.54elapsed 99%CPU (0avgtext+0avgdata
10210736maxresident)k
0inputs+43848outputs (0major+2518945minor)pagefaults 0swaps

```
After the patch:

```
n: 20000
3.17user 0.16system 0:03.33elapsed 100%CPU (0avgtext+0avgdata
551736maxresident)k
0inputs+8848outputs (0major+126192minor)pagefaults 0swaps

n: 40000
6.10user 0.42system 0:06.54elapsed 99%CPU (0avgtext+0avgdata
1787848maxresident)k
0inputs+17600outputs (0major+432212minor)pagefaults 0swaps

n: 60000
9.13user 0.89system 0:10.03elapsed 99%CPU (0avgtext+0avgdata
3809108maxresident)k
0inputs+26352outputs (0major+931280minor)pagefaults 0swaps

n: 80000
12.44user 1.57system 0:14.02elapsed 99%CPU (0avgtext+0avgdata
6603432maxresident)k
0inputs+35096outputs (0major+1624635minor)pagefaults 0swaps

n: 100000
16.29user 2.28system 0:18.59elapsed 99%CPU (0avgtext+0avgdata
10212808maxresident)k
0inputs+43848outputs (0major+2522200minor)pagefaults 0swaps

```

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D154695
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
coroutines C++20 coroutines
Projects
Status: Done
Development

No branches or pull requests

5 participants