Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct errors in slicing of node collections #3132

Open
wants to merge 55 commits into
base: master
Choose a base branch
from

Conversation

heplesser
Copy link
Contributor

@heplesser heplesser commented Mar 4, 2024

This PR fixes #3108 by providing correct NodeCollection::{rank,thread}_local_begin(() implementations for composite node collections.

To do:

  • Improve documentation of algorithm for selection of first nc entry
  • Extend set of tests
  • Move slice_positions_if_sliced_nc() code to more suitable location

I removed the ConnBuilder refactoring (see #3106) from this PR to reduce complexity and ensure we can integrate this bug-fixing PR asap. The refactoring can then following in a later PR.

@heplesser heplesser added T: Bug Wrong statements in the code or documentation S: High Should be handled next I: No breaking change Previously written code will work as before, no one should note anything changing (aside the fix) labels Mar 4, 2024
@heplesser heplesser added this to the NEST 3.7 milestone Mar 4, 2024
@heplesser heplesser added this to In progress in Kernel via automation Mar 4, 2024
@heplesser heplesser marked this pull request as draft March 4, 2024 14:45
@heplesser heplesser marked this pull request as ready for review March 7, 2024 15:27
@heplesser
Copy link
Contributor Author

@mlober @gtrensch This PR is ready for review now. I have reduced the scope by postponing the refactoring of ConnBuilders to use local iterators. So this is a pure bugfix PR and would be nice to get into NEST 3.7.

@diesmann You may want to check the documentation in numerics.h/cpp.

@heplesser heplesser changed the title Correct errors in slicing of node collections and use thread-local node iterators in ConnBuilders Correct errors in slicing of node collections Mar 8, 2024
Copy link
Contributor

@gtrensch gtrensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution 👍. Looks good to me.

@heplesser heplesser requested review from gtrensch and removed request for mlober March 12, 2024 20:17
@heplesser heplesser force-pushed the fix_nc_slicing branch 2 times, most recently from 07a9890 to 26c96b7 Compare April 18, 2024 20:51
@heplesser heplesser marked this pull request as ready for review April 19, 2024 13:42
@heplesser
Copy link
Contributor Author

@gtrensch @jstapmanns @diesmann After much hard work, I believe I have solved all problems now and that the testsuite is broad enough to cover all eventualities. Kindly review the new version of the PR. I suggest you start with the documentation near the beginning of node_collection.h.

@heplesser heplesser force-pushed the fix_nc_slicing branch 4 times, most recently from 7123538 to abeb018 Compare April 23, 2024 12:26
…ithub Linux runners due to performance issues; the tests are run under macOS
…thub now is macOS 14 and only 3.10 and later are available
@heplesser
Copy link
Contributor Author

@gtrensch @jstapmanns Ping!

@gtrensch
Copy link
Contributor

The author of the original model with which the problem was originally discovered has reported that the full model now works perfectly!

Copy link
Contributor

@jstapmanns jstapmanns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @heplesser, for implementing these slicing algorithms. In general, I'm pretty confident they do what they are supposed to do, but I still have a question about the example given in the description (which I find very helpful to understand the implementation).

* All thr 2 * | * *
* All thr 3 * | * *
* -----------------------------------------------------------------
* [::3] 1stInPt # | #
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two questions regarding this example.

  1. To my understanding, [::3] 1stInPt should point to the object with GID = 9 because
    n_pe = 1 + ( cumul_abs_size_[ part_idx_ - 1 ] - 1 - first_elem_ ) / stride = 1 + (5-1-0)/3 = 2, so that next_abs_idx = first_elem_ + n_pe * stride_ = 0 + 2*3 = 6, and finally
    next_loc_idx = next_abs_idx - cumul_abs_size_[ part_idx_ - 1 ] = 6 - 5 = 1.
  2. I think I do not completely understand the behavior of the local iterators. Shouldn't [::3] thr 2 yield every third element that is local to thread 2? In this case, objects with GID=[2, 10, 14] are on thread 2, which would mean [::3] thr 2 = [2]. Similarly for thread 3, where GID=[3, 14, 15].

Copy link
Contributor

@gtrensch gtrensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution 👍. Approved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I: No breaking change Previously written code will work as before, no one should note anything changing (aside the fix) S: High Should be handled next T: Bug Wrong statements in the code or documentation
Projects
Kernel
  
In progress
Development

Successfully merging this pull request may close these issues.

NEST behaves incorrectly or fails when connecting sliced layers while using MPI
4 participants