Switch from std::sync::mpsc to crossbeam-channel #1146

tavianator · 2022-10-24T15:29:36Z

walk: Switch back to crossbeam-channel
walk: Use a bounded queue.

tavianator · 2022-10-24T15:33:34Z

This is about a 5% perf improvement for me. @sharkdp Do you still experience the perf regression from #895 (comment)? If so, can you send me the perf.data files from a perf record run before and after this change?

sharkdp · 2022-10-24T16:43:01Z

I currently don't have access to my laptop, but will get some benchmark results next week. Thank you!

This would also resolve the race condition panic bug with the std Rust channels, correct?

tavianator · 2022-10-24T17:10:21Z

I currently don't have access to my laptop, but will get some benchmark results next week. Thank you!

Great! perf stat output would also be helpful.

By the way, if you're taking these results on a laptop, make sure you set the scaling governor to performance (sudo cpupower frequency-set -g performance). It's possible that different frequency scaling behaviour explains the regression you observed. My laptop, for example, runs the bfs tests faster when there's 4 simultaneous copies, since it scales up more aggressively:

tavianator@graphene $ time ./tests/tests.sh
./tests/tests.sh  3.72s user 0.78s system 114% cpu 3.951 total
tavianator@graphene $ time parallel -u -N0 ./tests/tests.sh ::: {1..4}
parallel -u -N0 ./tests/tests.sh ::: {1..4}  6.56s user 0.85s system 321% cpu 2.306 total

But the performance governor makes it make sense:

tavianator@graphene $ sudo cpupower frequency-set -g performance
tavianator@graphene $ time ./tests/tests.sh
./tests/tests.sh  1.55s user 0.18s system 109% cpu 1.580 total
tavianator@graphene $ time parallel -u -N0 ./tests/tests.sh ::: {1..4}
parallel -u -N0 ./tests/tests.sh ::: {1..4}  6.46s user 0.80s system 323% cpu 2.244 total

This would also resolve the race condition panic bug with the std Rust channels, correct?

Yes, the first commit should fix #1060/#1113.

sharkdp · 2022-10-31T13:40:39Z

Here are the results. I compared master (4257034) with master + the two commits in this branch cherry-picked. Scaling governor was set to "performance" (see also: sharkdp/hyperfine#239).

The good news is that I see no significant performance difference in the "simple pattern" benchmark anymore! However, I do see a consistent 20% regression in the "interactive output" benchmark. I can try to play with the channel size later on, or add perf results. Can anyone else reproduce this?

`fd` regression benchmark

Interactive output

Command	Mean [s]	Min [s]	Max [s]	Relative
`./fd-master '' '/home/shark/Informatik/'`	1.570 ± 0.053	1.504	1.644	1.00
`./fd-crossbeam '' '/home/shark/Informatik/'`	1.861 ± 0.015	1.837	1.884	1.19 ± 0.04

No pattern

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master --hidden --no-ignore '' '/home/shark/Informatik/'`	388.7 ± 6.2	380.8	402.2	1.00 ± 0.02
`./fd-crossbeam --hidden --no-ignore '' '/home/shark/Informatik/'`	386.8 ± 3.1	382.1	390.5	1.00

Simple pattern

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master '.*[0-9]\.jpg$' '/home/shark/Informatik/'`	172.8 ± 2.1	170.3	180.6	1.00
`./fd-crossbeam '.*[0-9]\.jpg$' '/home/shark/Informatik/'`	173.5 ± 1.4	171.5	177.2	1.00 ± 0.01

Simple pattern (-HI)

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master -HI '.*[0-9]\.jpg$' '/home/shark/Informatik/'`	350.9 ± 2.8	347.0	355.8	1.00
`./fd-crossbeam -HI '.*[0-9]\.jpg$' '/home/shark/Informatik/'`	351.7 ± 2.7	346.1	355.6	1.00 ± 0.01

File extension

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master -HI --extension jpg '' '/home/shark/Informatik/'`	370.8 ± 3.6	365.4	375.9	1.00
`./fd-crossbeam -HI --extension jpg '' '/home/shark/Informatik/'`	371.2 ± 7.7	365.8	392.1	1.00 ± 0.02

File type

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master -HI --type l '' '/home/shark/Informatik/'`	351.4 ± 3.1	346.8	355.9	1.00 ± 0.01
`./fd-crossbeam -HI --type l '' '/home/shark/Informatik/'`	350.6 ± 3.1	347.4	358.5	1.00

Command execution

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master 'ab' '/home/shark/Informatik/' --exec echo`	398.7 ± 1.7	396.1	401.3	1.00
`./fd-crossbeam 'ab' '/home/shark/Informatik/' --exec echo`	403.9 ± 2.4	400.1	407.3	1.01 ± 0.01

Command execution (large output)

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master -tf 'ab' '/home/shark/Informatik/' --exec cat`	353.5 ± 5.5	349.4	368.4	1.00
`./fd-crossbeam -tf 'ab' '/home/shark/Informatik/' --exec cat`	355.7 ± 2.2	351.5	359.9	1.01 ± 0.02

Cold cache

Command	Mean [s]	Min [s]	Max [s]	Relative
`./fd-master -HI '.*[0-9]\.jpg$' '/home/shark/Informatik/'`	3.036 ± 0.051	3.000	3.094	1.01 ± 0.02
`./fd-crossbeam -HI '.*[0-9]\.jpg$' '/home/shark/Informatik/'`	3.018 ± 0.041	2.992	3.065	1.00

tavianator · 2022-10-31T14:52:07Z

Indeed, that one I can reproduce:

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master '' '/home/tavianator/code/llvm/llvm-project'`	686.5 ± 27.0	653.7	739.0	1.00
`./fd-feature '' '/home/tavianator/code/llvm/llvm-project'`	783.2 ± 20.9	752.8	820.0	1.14 ± 0.05

sharkdp · 2022-10-31T20:01:57Z

Increasing the channel size helps. The performance with a size of 2^14=16384 or higher seems to be similar to master.

Are there any drawbacks in doing so by default (higher memory usage)?

tavianator · 2022-10-31T20:38:01Z

Are there any drawbacks in doing so by default (higher memory usage)?

Yeah just higher memory usage. 16k * nthreads seems fine to me though.

sharkdp

Awesome. Then let's increase the channel size and finally switch to crossbeam. Really looking forward to seeing all those tickets closed. Thank you 👍

tavianator · 2022-11-01T13:56:45Z

I bumped the channel size, should be good to go

tmccombs · 2022-11-01T15:57:30Z

once the conflicts are resolved anyway :)

Fixes sharkdp#933. Fixes sharkdp#1060. Fixes sharkdp#1113.

Fixes sharkdp#918.

sharkdp · 2022-11-01T18:48:47Z

Thank you for the update.

sharkdp approved these changes Oct 31, 2022

View reviewed changes

sharkdp mentioned this pull request Oct 31, 2022

Switch from std::sync::mpsc to flume #942

Closed

tavianator marked this pull request as ready for review November 1, 2022 13:53

tavianator force-pushed the crossbeam branch from fc87204 to f030f60 Compare November 1, 2022 13:55

tavianator added 2 commits November 1, 2022 14:01

walk: Switch back to crossbeam-channel

3742e44

Fixes sharkdp#933. Fixes sharkdp#1060. Fixes sharkdp#1113.

walk: Use a bounded queue.

5cf0c66

Fixes sharkdp#918.

tavianator force-pushed the crossbeam branch from f030f60 to 5cf0c66 Compare November 1, 2022 18:03

sharkdp merged commit 5278405 into sharkdp:master Nov 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch from std::sync::mpsc to crossbeam-channel #1146

Switch from std::sync::mpsc to crossbeam-channel #1146

tavianator commented Oct 24, 2022

tavianator commented Oct 24, 2022

sharkdp commented Oct 24, 2022

tavianator commented Oct 24, 2022

sharkdp commented Oct 31, 2022 •

edited

tavianator commented Oct 31, 2022

sharkdp commented Oct 31, 2022 •

edited

tavianator commented Oct 31, 2022

sharkdp left a comment

tavianator commented Nov 1, 2022

tmccombs commented Nov 1, 2022

sharkdp commented Nov 1, 2022

Switch from std::sync::mpsc to crossbeam-channel #1146

Switch from std::sync::mpsc to crossbeam-channel #1146

Conversation

tavianator commented Oct 24, 2022

tavianator commented Oct 24, 2022

sharkdp commented Oct 24, 2022

tavianator commented Oct 24, 2022

sharkdp commented Oct 31, 2022 • edited

fd regression benchmark

Interactive output

No pattern

Simple pattern

Simple pattern (-HI)

File extension

File type

Command execution

Command execution (large output)

Cold cache

tavianator commented Oct 31, 2022

sharkdp commented Oct 31, 2022 • edited

tavianator commented Oct 31, 2022

sharkdp left a comment

Choose a reason for hiding this comment

tavianator commented Nov 1, 2022

tmccombs commented Nov 1, 2022

sharkdp commented Nov 1, 2022

sharkdp commented Oct 31, 2022 •

edited

`fd` regression benchmark

sharkdp commented Oct 31, 2022 •

edited