Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parallelism using rayon #110

Merged
merged 6 commits into from
May 7, 2024
Merged

Implement parallelism using rayon #110

merged 6 commits into from
May 7, 2024

Conversation

Cadair
Copy link
Member

@Cadair Cadair commented Feb 8, 2024

This is really pushing my rust skills, but I think it works.

Without parallelism enabled:

v1.2 (FORTRAN)
    Unnamed: 0  nseeds      time
0            0       1  0.094430
1            1       2  0.088924
2            2       4  0.092906
3            3       8  0.098172
4            4      16  0.094207
5            5      32  0.097889
6            6      64  0.098379
7            7     128  0.110099
8            8     256  0.139267
9            9     512  0.188014
10          10    1024  0.296180
11          11    2048  0.509781
v2.0 (Rust)
    Unnamed: 0  nseeds      time
0            0       1  0.003156
1            1       2  0.005489
2            2       4  0.011511
3            3       8  0.024024
4            4      16  0.048538
5            5      32  0.092113
6            6      64  0.183648
7            7     128  0.364994
8            8     256  0.729221
9            9     512  1.458128
10          10    1024  2.927491
11          11    2048  5.816854
v2.0 (Rust - Parallel)
    Unnamed: 0  nseeds      time
0            0       1  0.003991
1            1       2  0.003396
2            2       4  0.004035
3            3       8  0.006338
4            4      16  0.008281
5            5      32  0.016827
6            6      64  0.038427
7            7     128  0.066799
8            8     256  0.132020
9            9     512  0.257077
10          10    1024  0.509309
11          11    2048  1.011820

Figure_1

This is rebased on #111 - See 73ec541 for the relevant diff

@Cadair Cadair force-pushed the parallelism branch 3 times, most recently from 73ec541 to 78e90c3 Compare February 13, 2024 21:28
@Cadair Cadair marked this pull request as ready for review February 13, 2024 21:28
@codecov-commenter
Copy link

codecov-commenter commented Feb 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.82%. Comparing base (ae934de) to head (78e90c3).
Report is 14 commits behind head on main.

❗ Current head 78e90c3 differs from pull request most recent head ad122fe. Consider uploading reports for the commit ad122fe to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #110   +/-   ##
=======================================
  Coverage   96.82%   96.82%           
=======================================
  Files           2        2           
  Lines         126      126           
=======================================
  Hits          122      122           
  Misses          4        4           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@Will-Shanks Will-Shanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving my comments here from slack per @nabobalis request 😄 Hope that helps!

seeds.slice(s![i, ..]),
let all_lines: Vec<StreamlineResult> = seeds
.axis_iter(Axis(0))
.into_par_iter()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't need/want the work stealing that rayon does, its a little more leg work to implement, but batching the seeds and using scoped_threads would probably be faster since it doesn't have the communication overhead.

src/trace.rs Outdated Show resolved Hide resolved
@nabobalis
Copy link
Contributor

Moving my comments here from slack per @nabobalis request 😄 Hope that helps!

Thank you @Will-Shanks !

(result.status, result.line)
}).unzip();

let extracted_lines_views: Vec<ArrayView2<f64>> = extracted_lines.iter().map(|arr| arr.view()).collect();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Will-Shanks If you don't mind me continuing to pick your brains, I finally came back to this.

I think I understood what you said about combining the things into a single map, but I couldn't figure out a way around this extra loop to convert the arrays into views which stack needed?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After playing around with it I agree with you, I couldn't come up with anything better that the borrow checker approved of.

     let extracted_lines_views: Vec<ArrayView2<f64>> =
-        extracted_lines.iter().map(|arr| arr.view()).collect();
-    let xs = stack(Axis(0), extracted_lines_views.as_slice()).unwrap();
-    return (statuses, xs);
+        extracted_lines.par_iter().map(|arr| arr.view()).collect();
+    let xs = stack(Axis(0), &extracted_lines_views).unwrap();
+    (statuses, xs)
 }
  • if view() is lightweight (which I'm guessing it probably is) the iter/par_iter change probably isn't productive
  • I would hope extracted_lines_views.as_slice() and &extracted_lines_views end up as the same machine code, but I think the latter should be preferred.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The par_iter seems to have a marginal positive impact for large numbers of slices, and a large negative impact for smaller number of seeds:

Figure_1

So I think it's probably best without. &extracted_lines_views works though.

@Cadair
Copy link
Member Author

Cadair commented May 1, 2024

My latest comparative benchmark (not to be directly compared against the one in the OP as it's a different machine):

Figure_1

@Will-Shanks
Copy link

@Cadair a small suggestion I would make is to run cargo fmt before committing, I also personally try and keep cargo clippy [--fix] happy as it often has good suggestions, and use cargo outdated -R and cargo udeps to help keep track of outdated/unused dependencies.

Of course this is mostly a personal preference so feel free to take it or leave it :)

@nabobalis
Copy link
Contributor

@Cadair a small suggestion I would make is to run cargo fmt before committing, I also personally try and keep cargo clippy [--fix] happy as it often has good suggestions, and use cargo outdated -R and cargo udeps to help keep track of outdated/unused dependencies.

Of course this is mostly a personal preference so feel free to take it or leave it :)

Are these possible to do with a pre-commit config?

@Will-Shanks
Copy link

@Cadair a small suggestion I would make is to run cargo fmt before committing, I also personally try and keep cargo clippy [--fix] happy as it often has good suggestions, and use cargo outdated -R and cargo udeps to help keep track of outdated/unused dependencies.
Of course this is mostly a personal preference so feel free to take it or leave it :)

Are these possible to do with a pre-commit config?

I don't have that setup, but I don't see why you couldn't. I'd probably only do that for cargo fmt as the other 3 can take a minute to run, you could also consider adding them as CI tests.

@Cadair
Copy link
Member Author

Cadair commented May 2, 2024

I was planning on adding a formatter after this PR gets merged (and before we do the next release), thanks for the pointers on the other cargo commands they look really useful.

@Cadair Cadair mentioned this pull request May 2, 2024
@Cadair
Copy link
Member Author

Cadair commented May 3, 2024

pre-commit.ci autofix

@Will-Shanks
Copy link

@Cadair do you have any notes on how you're generating those graphs? Y'all have got me curious enough now I'd like to try running it, but I can't figure out what I'm doing wrong. When I try and run one of the scripts in benchmarks after running maturin develop I get a ModuleNotFoundError: No module named 'streamtracer._streamtracer_rust' error

@Cadair
Copy link
Member Author

Cadair commented May 3, 2024

I am doing a plain pip install [-e] . then running benchmark.py then benchmark_plot.py to plot out the CSVs.

@Cadair
Copy link
Member Author

Cadair commented May 7, 2024

I am going to merge this to unblock #137 and #135. @Will-Shanks I am very happy to continue to take suggestions as and when you have any though!

@Cadair Cadair merged commit 905c5ab into sunpy:main May 7, 2024
14 checks passed
@Cadair Cadair deleted the parallelism branch May 7, 2024 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants