Improved performance of threaded connection generation for large networks #485

Merged
merged 26 commits into from Nov 21, 2016

Conversation

Projects
None yet
5 participants
@heplesser
Contributor

heplesser commented Sep 15, 2016

This PR integrates work by @tammoippen providing improved performance of threaded connection generation.

I suggest @jougs and @jakobj as reviewers.

@jakobj

This comment has been minimized.

Show comment
Hide comment
@jakobj

jakobj Sep 19, 2016

Contributor

short notice: I am currently running some simulations on JQ with this branch, just for double checking. I'll get around to reviewing the code this week.

Contributor

jakobj commented Sep 19, 2016

short notice: I am currently running some simulations on JQ with this branch, just for double checking. I'll get around to reviewing the code this week.

@jakobj

I can't comment on the performance yet, but one thing that comes to my mind: if someone doesn't use GidCollections which are ranges in the connect call for very large networks, the performance (most likely) takes a bit hit, since every find operation needs to iterate through the whole GidCollection. Are there any measures in place that prevent the regular user from doing this?

nestkernel/conn_builder.cpp
- sum_dist += static_cast< double >( targets_on_vp[ k ].size() );
- sum_partitions += static_cast< unsigned int >( num_conns_on_vp[ k ] );
+ sum_dist += static_cast< double >( number_of_targets_on_vp[ k ] );
+ sum_partitions += static_cast< uint >( num_conns_on_vp[ k ] );

This comment has been minimized.

@jakobj

jakobj Sep 20, 2016

Contributor

shouldn't this be unsigned int in a similar spirit as #343?

@jakobj

jakobj Sep 20, 2016

Contributor

shouldn't this be unsigned int in a similar spirit as #343?

This comment has been minimized.

@heplesser

heplesser Sep 26, 2016

Contributor

Will be fixed in next commit.

@heplesser

heplesser Sep 26, 2016

Contributor

Will be fixed in next commit.

nestkernel/connection_manager.cpp
+ // back to the system anyway. Hence, why bother cleaning up our highly
+ // scattered connection infrastructure? They do not have any open files, which
+ // need to be closed or similar.
+ // delete_connections_();

This comment has been minimized.

@jakobj

jakobj Sep 20, 2016

Contributor

I think the comment is enough. No need to keep outcommented delete_connections_() around.

@jakobj

jakobj Sep 20, 2016

Contributor

I think the comment is enough. No need to keep outcommented delete_connections_() around.

This comment has been minimized.

@heplesser

heplesser Sep 26, 2016

Contributor

I removed the last line. The method is selves is needed elsewhere in the code in conjunction with ResetKernel.

@heplesser

heplesser Sep 26, 2016

Contributor

I removed the last line. The method is selves is needed elsewhere in the code in conjunction with ResetKernel.

- for ( tSConnector::nonempty_iterator iit = it->nonempty_begin();
- iit != it->nonempty_end();
- ++iit )
+#pragma omp for schedule( static, 1 )

This comment has been minimized.

@jakobj

jakobj Sep 20, 2016

Contributor

is there a reason this is an omp for and not an omp parallel block? I appreciate the move to omp for but for consistency it might be better to stick with our old threading model and use omp parallel.

@jakobj

jakobj Sep 20, 2016

Contributor

is there a reason this is an omp for and not an omp parallel block? I appreciate the move to omp for but for consistency it might be better to stick with our old threading model and use omp parallel.

This comment has been minimized.

@heplesser

heplesser Sep 26, 2016

Contributor

I think the explicit for schedule( static, 1 ) here makes sense, since it will ensure a fixed assignment of threads to loop iterations. I would rather think we should introduce it in other places as well (might improve performance?).

@heplesser

heplesser Sep 26, 2016

Contributor

I think the explicit for schedule( static, 1 ) here makes sense, since it will ensure a fixed assignment of threads to loop iterations. I would rather think we should introduce it in other places as well (might improve performance?).

This comment has been minimized.

@jakobj

jakobj Oct 3, 2016

Contributor

well, we are using the thread id in many places to directly access connections_[ tid ] in an omp parallel block, as you point out, so this assignment should also hold here. i agree with you that we should think about our threading model in nest, but this might be beyond the scope of this PR.

@jakobj

jakobj Oct 3, 2016

Contributor

well, we are using the thread id in many places to directly access connections_[ tid ] in an omp parallel block, as you point out, so this assignment should also hold here. i agree with you that we should think about our threading model in nest, but this might be beyond the scope of this PR.

This comment has been minimized.

@heplesser

heplesser Oct 4, 2016

Contributor

@tammoippen I think you added the omp for schedule. Do you have a comment on this?

@heplesser

heplesser Oct 4, 2016

Contributor

@tammoippen I think you added the omp for schedule. Do you have a comment on this?

This comment has been minimized.

@tammoippen

tammoippen Oct 4, 2016

Contributor

My hope was, that this makes the code clearer: there are already a lot #ifdef's and not using the #pragma omp for schedule( static, 1 ) would increase this further. Another point is, that the indentation becomes 'more' wrong, if we are compiling with OpenMP, as the outer loop is gone and the inner loop is indented one level to much. Here is a version with the pragma omp for removed:

void
nest::ConnectionManager::delete_connections_()
{
#ifdef _OPENMP
#pragma omp parallel
  {
    size_t t = kernel().vp_manager.get_thread_id();
#else
    for ( size_t t = 0; t < connections_.size(); ++t )
    {
#endif
      for (
        tSConnector::nonempty_iterator iit = connections_[ t ].nonempty_begin();
        iit != connections_[ t ].nonempty_end();
        ++iit )
      {
#ifdef USE_PMA
        validate_pointer( *iit )->~ConnectorBase();
#else
      delete validate_pointer( *iit );
#endif
      }
      connections_[ t ].clear();
#ifdef _OPENMP
    }
#endif

#if defined _OPENMP && defined USE_PMA
#ifdef IS_K
    poormansallocpool[ t ].destruct();
    poormansallocpool[ t ].init();
#else
    poormansallocpool.destruct();
    poormansallocpool.init();
#endif
#endif

#ifdef _OPENMP
  }
#endif
}
@tammoippen

tammoippen Oct 4, 2016

Contributor

My hope was, that this makes the code clearer: there are already a lot #ifdef's and not using the #pragma omp for schedule( static, 1 ) would increase this further. Another point is, that the indentation becomes 'more' wrong, if we are compiling with OpenMP, as the outer loop is gone and the inner loop is indented one level to much. Here is a version with the pragma omp for removed:

void
nest::ConnectionManager::delete_connections_()
{
#ifdef _OPENMP
#pragma omp parallel
  {
    size_t t = kernel().vp_manager.get_thread_id();
#else
    for ( size_t t = 0; t < connections_.size(); ++t )
    {
#endif
      for (
        tSConnector::nonempty_iterator iit = connections_[ t ].nonempty_begin();
        iit != connections_[ t ].nonempty_end();
        ++iit )
      {
#ifdef USE_PMA
        validate_pointer( *iit )->~ConnectorBase();
#else
      delete validate_pointer( *iit );
#endif
      }
      connections_[ t ].clear();
#ifdef _OPENMP
    }
#endif

#if defined _OPENMP && defined USE_PMA
#ifdef IS_K
    poormansallocpool[ t ].destruct();
    poormansallocpool[ t ].init();
#else
    poormansallocpool.destruct();
    poormansallocpool.init();
#endif
#endif

#ifdef _OPENMP
  }
#endif
}

This comment has been minimized.

@heplesser

heplesser Oct 4, 2016

Contributor

@tammoippen Thank you for the explanation! Makes sense to me.

@heplesser

heplesser Oct 4, 2016

Contributor

@tammoippen Thank you for the explanation! Makes sense to me.

#ifdef USE_PMA
- validate_pointer( *iit )->~ConnectorBase();
+ validate_pointer( *iit )->~ConnectorBase();

This comment has been minimized.

@jakobj

jakobj Sep 20, 2016

Contributor

two leading spaces.

@jakobj

jakobj Sep 20, 2016

Contributor

two leading spaces.

This comment has been minimized.

@heplesser

heplesser Sep 26, 2016

Contributor

As far as I can see, the indentation is correct wrt the bracket before the #ifdef.

@heplesser

heplesser Sep 26, 2016

Contributor

As far as I can see, the indentation is correct wrt the bracket before the #ifdef.

This comment has been minimized.

@jakobj

jakobj Oct 3, 2016

Contributor

you're right. my mistake.

@jakobj

jakobj Oct 3, 2016

Contributor

you're right. my mistake.

nestkernel/conn_builder.cpp
+ }
+ else
+ {
+ for ( SparseNodeArray::const_iterator it = get_local_nodes().begin();

This comment has been minimized.

@jakobj

jakobj Sep 20, 2016

Contributor

I don't really like that we need to iterate over a vector that is (for good reasons) hidden deep in the node manager. Would it be possible to ask the NodeManager for a GidCollection of all local nodes?

@jakobj

jakobj Sep 20, 2016

Contributor

I don't really like that we need to iterate over a vector that is (for good reasons) hidden deep in the node manager. Would it be possible to ask the NodeManager for a GidCollection of all local nodes?

This comment has been minimized.

@heplesser

heplesser Sep 26, 2016

Contributor

@jakobj The problem with a GidCollection would be that it represents gobal sets of GIDs in a rather abstract fashion that first needs to be expanded. The SparseNodeArray gives us access to local node pointers right away. But we are risking encapsulation here, so I will look at how to implement this cleaner.

@heplesser

heplesser Sep 26, 2016

Contributor

@jakobj The problem with a GidCollection would be that it represents gobal sets of GIDs in a rather abstract fashion that first needs to be expanded. The SparseNodeArray gives us access to local node pointers right away. But we are risking encapsulation here, so I will look at how to implement this cleaner.

@jakobj

This comment has been minimized.

Show comment
Hide comment
@jakobj

jakobj Sep 23, 2016

Contributor

so here are some preliminary results from JQ weak scaling benchmarks (12500 neurons/node, 8 threads/node, 500ms simulation duration, hpc_benchmark.sli):
jq_weak_master_threadedperf2
dark blue: current master
pink: threaded_perf2
walltime is cumulative:
dotted: creation of nodes
dashed: + creation of edges
solid: + simulation

Contributor

jakobj commented Sep 23, 2016

so here are some preliminary results from JQ weak scaling benchmarks (12500 neurons/node, 8 threads/node, 500ms simulation duration, hpc_benchmark.sli):
jq_weak_master_threadedperf2
dark blue: current master
pink: threaded_perf2
walltime is cumulative:
dotted: creation of nodes
dashed: + creation of edges
solid: + simulation

@heplesser

This comment has been minimized.

Show comment
Hide comment
@heplesser

heplesser Sep 26, 2016

Contributor

@jakobj Thank you for your review! I have tried to answer most points above and will fix the access to the SparseNodeArray shortly. We will soon convert NEST to GidCollections throughout, so that issue should be fixed.

Contributor

heplesser commented Sep 26, 2016

@jakobj Thank you for your review! I have tried to answer most points above and will fix the access to the SparseNodeArray shortly. We will soon convert NEST to GidCollections throughout, so that issue should be fixed.

@heplesser

This comment has been minimized.

Show comment
Hide comment
@heplesser

heplesser Sep 27, 2016

Contributor

@jakobj I just pushed a commit that should address the issues you raised. Access to the SparseNodeArray is now encapsulated better, and ConnBuilder no longer a friend of NetworkManager. Since all access is via const_iterators, it is also safe. Once we have removed subnets, we can simplify code in ConnBuilder further (see #493).

Contributor

heplesser commented Sep 27, 2016

@jakobj I just pushed a commit that should address the issues you raised. Access to the SparseNodeArray is now encapsulated better, and ConnBuilder no longer a friend of NetworkManager. Since all access is via const_iterators, it is also safe. Once we have removed subnets, we can simplify code in ConnBuilder further (see #493).

@heplesser

This comment has been minimized.

Show comment
Hide comment
@heplesser

heplesser Sep 29, 2016

Contributor

@suku248 could you look at this?

Contributor

heplesser commented Sep 29, 2016

@suku248 could you look at this?

@jakobj

This comment has been minimized.

Show comment
Hide comment
@jakobj

jakobj Oct 3, 2016

Contributor

looks good so far. i don't have a strong opinion on the whole omp for vs omp parallel section, so i would also be fine if you left this unchanged. considering your comment on GidCollections, that's a 👍 from me.

Contributor

jakobj commented Oct 3, 2016

looks good so far. i don't have a strong opinion on the whole omp for vs omp parallel section, so i would also be fine if you left this unchanged. considering your comment on GidCollections, that's a 👍 from me.

@suku248

Done with 1st review waiting for response from @heplesser and/or @tammoippen

nestkernel/conn_builder.cpp
+
+ const index sgid = ( *sources_ )[ idx ];
+
+ if ( tid != target_thread )

This comment has been minimized.

@suku248

suku248 Oct 21, 2016

This could be checked earlier right after target_thread has been defined.

@suku248

suku248 Oct 21, 2016

This could be checked earlier right after target_thread has been defined.

This comment has been minimized.

@heplesser

heplesser Oct 24, 2016

Contributor

Thanks, fixed and cleaned up a bit in the vicinity.

@heplesser

heplesser Oct 24, 2016

Contributor

Thanks, fixed and cleaned up a bit in the vicinity.

nestkernel/node_manager.h
+inline SparseNodeArray::const_iterator
+NodeManager::local_nodes_end() const
+{
+ return local_nodes_.begin();

This comment has been minimized.

@suku248

suku248 Oct 21, 2016

Replace begin() with end().

@suku248

suku248 Oct 21, 2016

Replace begin() with end().

This comment has been minimized.

@heplesser

heplesser Oct 24, 2016

Contributor

@suku248 Thank you for catching this! Nasty copy paste error, and rather unfortunate that it it not trigger any test. I will try to create tests that detect this.

@heplesser

heplesser Oct 24, 2016

Contributor

@suku248 Thank you for catching this! Nasty copy paste error, and rather unfortunate that it it not trigger any test. I will try to create tests that detect this.

@heplesser

This comment has been minimized.

Show comment
Hide comment
@heplesser

heplesser Oct 24, 2016

Contributor

@suku248 I have now pushed commits that add tests for the bug you discovered, fixes the issues you pointed out, cleans up code a little more, and merges recent changes from the master branch.

Contributor

heplesser commented Oct 24, 2016

@suku248 I have now pushed commits that add tests for the bug you discovered, fixes the issues you pointed out, cleans up code a little more, and merges recent changes from the master branch.

@suku248

Only minor changes to documentation required

testsuite/mpitests/test_all_to_all.sli
+
+ Description:
+ test_all_to_all.sli checks that all-to-all connections are created
+ correctly if then number of targets exceeds the number of local nodes.

This comment has been minimized.

@suku248

suku248 Nov 21, 2016

Replace then with the in all 4 tests

@suku248

suku248 Nov 21, 2016

Replace then with the in all 4 tests

+ correctly if then number of targets exceeds the number of local nodes.
+
+ Author: Hans Ekkehard Plesser
+ SeeAlso: testsuite::test_one_to_one, testsuite::test_fixed_indegree,

This comment has been minimized.

@suku248

suku248 Nov 21, 2016

Replace testsuite::test_fixed_indegree with testsuite::test_all_to_all

@suku248

suku248 Nov 21, 2016

Replace testsuite::test_fixed_indegree with testsuite::test_all_to_all

+
+ Author: Hans Ekkehard Plesser
+ SeeAlso: testsuite::test_one_to_one, testsuite::test_fixed_indegree,
+ testsuite::test_bernoulli

This comment has been minimized.

@suku248

suku248 Nov 21, 2016

Replace with testsuite::test_pairwise_bernoulli

@suku248

suku248 Nov 21, 2016

Replace with testsuite::test_pairwise_bernoulli

testsuite/mpitests/test_one_to_one.sli
+ correctly if then number of targets exceeds the number of local nodes.
+
+ Author: Hans Ekkehard Plesser
+ SeeAlso: testsuite::test_one_to_one, testsuite::test_fixed_indegree,

This comment has been minimized.

@suku248

suku248 Nov 21, 2016

Replace testsuite::test_one_to_one with testsuite::test_all_to_all

@suku248

suku248 Nov 21, 2016

Replace testsuite::test_one_to_one with testsuite::test_all_to_all

testsuite/mpitests/test_one_to_one.sli
+
+ Author: Hans Ekkehard Plesser
+ SeeAlso: testsuite::test_one_to_one, testsuite::test_fixed_indegree,
+ testsuite::test_bernoulli

This comment has been minimized.

@suku248

suku248 Nov 21, 2016

Replace with testsuite::test_pairwise_bernoulli

@suku248

suku248 Nov 21, 2016

Replace with testsuite::test_pairwise_bernoulli

+
+ Author: Hans Ekkehard Plesser
+ SeeAlso: testsuite::test_one_to_one, testsuite::test_fixed_indegree,
+ testsuite::test_bernoulli

This comment has been minimized.

@suku248

suku248 Nov 21, 2016

Replace with testsuite::test_bernoulli with testsuite::test_all_to_all

@suku248

suku248 Nov 21, 2016

Replace with testsuite::test_bernoulli with testsuite::test_all_to_all

@suku248

This comment has been minimized.

Show comment
Hide comment
@suku248

suku248 Nov 21, 2016

👍 for merge

suku248 commented Nov 21, 2016

👍 for merge

@heplesser

This comment has been minimized.

Show comment
Hide comment
@heplesser

heplesser Nov 21, 2016

Contributor

@suku248 Thank you! I have just committed the requested changes to the documentation.

Contributor

heplesser commented Nov 21, 2016

@suku248 Thank you! I have just committed the requested changes to the documentation.

@heplesser heplesser merged commit ba239ae into nest:master Nov 21, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@heplesser heplesser deleted the heplesser:tammo_threaded_perf2 branch Mar 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment