Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race for and hyper for need to improve (maybe for in general) #2844

Closed
JJ opened this issue Apr 16, 2019 · 10 comments
Closed

race for and hyper for need to improve (maybe for in general) #2844

JJ opened this issue Apr 16, 2019 · 10 comments

Comments

@JJ
Copy link
Collaborator

JJ commented Apr 16, 2019

The Problem

Screenshot from 2019-04-16 09-54-00

A 100k loop with race takes 3x the simple for loop. Same with hyper. But it gets worse: if the loop is twice the size, it times out after 10 seconds. Also the for loop without hyper or race.

Expected Behavior

Doubling the size should maybe have improved the time for hyper/race, reducing the overhead.

Steps to Reproduce

my @fib = 1,1, * + * ... *;
my @fib100k = @fib[^100000];
my @fib100k-plus = @fib[1..100001];
my $start = now;
my @sum-fib; 
race   for ^100000 { @sum-fib = @fib100k[$_] + @fib100k-plus[$_] }; say now - $start 

Environment

  • Operating system:
  • Compiler version (perl6 -v):

https://gist.github.com/Whateverable/9a525148d40a3ff0e96963906e9e0263

@lizmat
Copy link
Contributor

lizmat commented Apr 16, 2019

It all depends on the workload that you're giving per iteration. In this example, the workload is very small compared to overhead of starting threads, cutting up the batches, collecting results. If you run this with the snapper, you'll see it takes a long time before even the first worker thread is started. Before that, it is just building the @fib100k and @fib100k-plus arrays.

$ perl6 -Msnapper 1
0.51403902
Telemetry Report of Process #83816 (2019-04-16T08:18:15Z)
Number of Snapshots: 33
Initial/Final Size: 61120 / 715128 Kbytes
Total Time:           3.32 seconds
Total CPU Usage:      3.78 seconds
Supervisor thread ran the whole time

wallclock  util%  max-rss  gw      gtc
   104697  18.85    31820
   102112  14.98     5520
   103780  12.78     4388
   112146  15.25     7620
   100602  13.35    11248
   104380  12.89    13328
   103834  14.04    15660
   105194  13.15    15920
   102599  15.09    14672
   102010  14.34    19176
   105188  13.07    20180
   105192  13.82    21852
   105178  14.48    24412
   105190  12.93    21728
   102362  13.42    22264
   107731  13.80    25044
   109874  14.00    29144
   111269  14.15    29740
   109920  14.24    31548
   114243  14.21    32428
   100901  13.37    30064
   105919  13.49    31308
   102526  14.03    31488
   104258  13.80    25872
   102814  13.29    25224
   105183  14.14    23336
   105188  15.78    18284   1      163
   105173  17.38    32608          680
   105178  14.57    15096          765
   105177  14.61    19076          645
   103745  14.15     3600          594
    52276  14.47      360          284
--------- ------ -------- --- --------
  3315839  14.25   654008   1     3131

Legend:
wallclock  Number of microseconds elapsed
    util%  Percentage of CPU utilization (0..100%)
  max-rss  Maximum resident set size (in Kbytes)
       gw  The number of general worker threads
      gtc  The number of tasks completed in general worker threads

@jnthn
Copy link
Member

jnthn commented Apr 16, 2019

Further to the fact that adding two numbers together is simply not enough work to make the parallelism worth it, the code is also not threadsafe, since it writes to @sum-fib from multiple threads.

@lizmat
Copy link
Contributor

lizmat commented Apr 16, 2019

Also, there's something very wrong with:

my @sum-fib; 
race   for ^100000 { @sum-fib = @fib100k[$_] + @fib100k-plus[$_] }

I'm not sure what the intent is here, but initializing the same array from multiple threads, is very bad indeed. Probably the only reason this code doesn't segfault, is that it actually never got around to running it from more than one thread. This happens if the overhead of managing batches and setting up the next batch, is larger than the time it takes to run a batch asynchronously.

@JJ
Copy link
Collaborator Author

JJ commented Apr 16, 2019

Ah, right. Very wrong. That should have been @sum-fib[$_]. I'm not at my best, today...

@jnthn
Copy link
Member

jnthn commented Apr 16, 2019

Ah, right. Very wrong. That should have been @sum-fib[$_]. I'm not at my best, today...

I guessed that's what was meant, but unless you declared a fixed-size array up front, it's still not safe, due to the resize operation.

@JJ
Copy link
Collaborator Author

JJ commented Apr 16, 2019

@jnthn What would be the best way to illustrate this, then? Set up a channel and do something a bit more complicated inside the loop that writes to that channel?

@jnthn
Copy link
Member

jnthn commented Apr 16, 2019

@JJ A Channel isn't needed, just use the result of the loop. Example:

$ time perl6 -e 'my @a = do for ^100_000 { .is-prime }'

real	0m17,686s
user	0m17,739s
sys	0m0,020s

$ time perl6 -e 'my @a = hyper for ^100_000 { .is-prime }'

real	0m6,202s
user	0m20,347s
sys	0m0,056s

@JJ
Copy link
Collaborator Author

JJ commented Apr 16, 2019

Ah, great. Thanks.

Also, I get an error here:

my @fib = 1,1, * + * ... *; my $c = Channel.new; my $start = now; race for ^1000 { $c.send( [+] @fib[$_..($_+5000)] ) }; say now -  $start
# Error: A worker in a parallel iteration (hyper or race) initiated here:␤  in block  <unit> at /tmp/NCaUDiZCHU line 1␤␤Died at:␤    This continuation has already been invoked␤ 

@timo
Copy link
Member

timo commented Apr 16, 2019

you're causing the fibonacci array to be reified from multiple threads. that's not possible with an iterator like gather/take or the (current implementation of) the ... operator

@JJ
Copy link
Collaborator Author

JJ commented Apr 16, 2019

OK. Thanks. It's better now if I close this.

@JJ JJ closed this as completed Apr 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants