Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split blocks automatically #264

merged 10 commits into from Feb 7, 2020

Split blocks automatically #264

merged 10 commits into from Feb 7, 2020


Copy link

@hunse hunse commented Dec 4, 2019

Loihi is organized into cores with a fixed number of compartments on each, and since the start we've required users to manually break their model into Ensembles that will each fit on one core.

This PR automates splitting larger ensembles to fit across multiple cores. This allows users to create the model in terms of structures that work well conceptually, and worry less about how that is going to map to Loihi.

There are two ways of using this functionality. One is to let nengo-loihi figure things out itself, in which case it simply splits large ensembles sequentially (putting the first N neurons on one core, the next N on the next core, etc.). This works well for NEF or other fully-connected Ensembles that have a fairly uniform structure in terms of input and output connections.

The second way to use block splitting is to provide instructions on how you (as a user) want an ensemble to be split. This is done via the full_shape and block_shape config options.

with nengo.Network() as net:
    a = nengo.Ensemble(120)
    net.config[a].full_shape = (6, 5, 4)
    net.config[a].block_shape = (3, 2, 3)

The block_shape specifies the shape that a single block (i.e. one core) will represent. The maximum number of compartments on that core is the product of all numbers of the shape. We then tile that shape to fill the full shape. So in the above example, we'll have 2 cores in the first dimension (since 6 \ 3 = 2, where \ represents ceiling division ceil(a / b)), 3 cores in the second dimension (5 \ 2 = 3), and 2 cores in the third (4 \ 3 = 2). The total number of cores is 2 * 3 * 2 = 12, and the layout of the cores is (2, 3, 2). We then "rebalance" the block_shape so that it is as uniform as possible across cores, given this layout, by taking the ceiling division of each element of the full shape by the corresponding number of cores in that dimension: (6, 5, 4) \ (2, 3, 2) = (3, 2, 2). You can see this is close to the original block shape, but with the last dimension being 2 instead of 3. What's happened is that in the first dimension, we'll have 2 cores of length 3 go evenly into 6, so everything is already balanced there. In the second dimension, we'll have 2 cores of length 2 and one of length 1 to make up 5. This isn't perfectly balanced, but there's no way to make one core shorter and another longer to have it be better. But in the last dimension, with the original block shape we would have one core of length 3 and one of length 1 to make up the 4. We can instead have two cores of length 2, which is more balanced (and doesn't use any more cores), so we do that instead.

This PR also adds a number of features that we needed for a recent project.

  • "Make transform builder pluggable": This allows users to write custom builders for their own transforms. At this point, they still have to re-write the whole connection builder function (though of course they can use one of ours as a template), just as we have a completely different function for building connections with Convolutional transforms vs those with Dense/Sparse transforms.
  • "Move model validation to": This allows validation to be turned off if desired, and follows naturally from block splitting. By default, large ensembles will be split and the whole model will be validated. However, this may be detrimental to performance in the emulator, where the large blocks do not need to be split. To allow this to work, validation must be turned off, so that the large blocks to not result in an error.
  • "Added GreedyChip allocator": This is our second multi-chip allocator. The first---the RoundRobin allocator---alternates between chips when placing blocks, which is great for testing, but results in extra inter-chip communication for models where nearby blocks are more likely to be connected or to get input from a common source (which is typical of most models, particularly when we introduce block splitting). The GreedyChip allocator fills one chip first, before moving to the next chip. It allows a maximum number of cores per chip to be specified, so chips do not need to be fully utilized.
  • "Nengo IO SNIP uses spike packing": Currently, when probing spikes in the IO SNIP, we get voltages for all neurons, and then check on the superhost if this is equal to a magic number to determine if a neuron has spiked. This moves that check to the IO SNIP, so that rather than transmitting back a 32-bit voltage for each neuron, we just send a 1-bit spike, resulting in reduced memory usage in the IO SNIP and reduced data transfer back to the superhost.
  • "Allow initial synapse index bits to exceed max": We were checking whether the synapse index bits exceeds the maximum when we created the synapse, which is problematic for creating large ensembles that later get split. Now, we put in a placeholder of -1 if we exceed the max, and check that these are no longer present when we do validation (indicating large blocks have been properly split). This commit could be merged into one of the block splitting commits.
  • "Better reporting of board connection errors": We currently log board connection errors, but if logging is off (default), the user gets no feedback about why we couldn't connect to the board. So make this part of the exception we send to them. Also, only try to connect 3 times by default instead of 10, since often connection fails because of a problem with the model, which won't get fixed by repeated attempts.
  • "Any exception fails TensorFlow and NengoDL import": Importing versions of these that we don't support can sometimes result in errors other than ImportErrors (like AttributeErrors). So if we get any error when trying to import, just mark it as not available.
  • "Represent inputs at board level instead of core": Inputs can input to multiple cores, even across multiple chips. Rather than storing them at the Core level in our model, store them at the Board level instead. Makes the allocators cleaner.

Based on #261.


  • What happens if != ensemble.n_neurons?

@hunse hunse mentioned this pull request Dec 4, 2019
3 tasks
@kinjalpatel27 kinjalpatel27 force-pushed the split-allocator-host-snips branch 5 times, most recently from a8a1e08 to 6ec1533 Compare Dec 18, 2019
Copy link

tbekolay commented Jan 21, 2020

Note to self: see if the spiking MNIST example can be simplified using this PR.

@tbekolay tbekolay force-pushed the split-allocator-host-snips branch 3 times, most recently from 76d5d9d to dfead71 Compare Jan 27, 2020
@tbekolay tbekolay force-pushed the split-allocator-host-snips branch 2 times, most recently from fce301d to 10875f9 Compare Feb 3, 2020
@tbekolay tbekolay force-pushed the split-allocator-host-snips branch from ea2956b to 82a00fb Compare Feb 5, 2020
@tbekolay tbekolay force-pushed the split-allocator-host-snips branch 5 times, most recently from 5b2cdb0 to 0d689bc Compare Feb 6, 2020
@tbekolay tbekolay force-pushed the split-allocator-host-snips branch 2 times, most recently from 53366ea to b04bfaa Compare Feb 6, 2020
@tbekolay tbekolay force-pushed the split-allocator-host-snips branch 3 times, most recently from 3c7027f to 3b2636d Compare Feb 7, 2020
hunse and others added 3 commits Feb 7, 2020
This is required to allow probing across multiple blocks.
Since this means that probes are not specific to one block,
we should no longer consider them as part of a block, and
instead the mental model is that they lie alongside the
`LoihiInput` and `LoihiBlock` in the hierarchy of objects.
So, a Loihi Model is a collection of `LoihiBlock`s that get
input from `LoihiInput`s and get output from `LoihiProbe`s.
Accordingly, we've renamed the `Probe` to `LoihiProbe` for
consistency, and moved it a new top-level file,

Co-authored-by: Kinjal Patel <>

Co-authored-by: Trevor Bekolay <>
The reason for not connecting can be any error that NxSDK throws
when trying to connect, including problems with building the model,
as well as actual connectivity problems. We now include the actual
NxSDK error message in our error message to better inform users.

Also reduce the number of connection attempts, since if it fails
more than 3 times it's likely due to a model problem, not a
connectivity problem.
hunse and others added 4 commits Feb 7, 2020
Previously we ignored the case when they weren't installed,
but raised other errors from having bad versions installed.
We don't really care why they don't import, just that they don't,
so now we ignore any issue with trying to import these.
Inputs can input to multiple chips and cores,
so represent them at the board level instead.
This commit introduces a new build step that splits blocks
produced by the build process that would fail validation
because they cannot fit on a single neuron core. Each large
block is split into smaller blocks that can fit on a single
neuron core. This happens automatically when a model fundamentally
cannot be made to fit, which most commonly occurs when an
ensemble has more than 1024 neurons. However, it can also be occur
manually with the new `block_shape` Ensemble configuration option,
which is added in `add_params`. This parameter accepts a
`BlockShape` instance, which specifies how the ensemble should be
split. See the `BlockShape` documentation for more details.

Adding a new build step also added more complexity to the
Simulator.__init__ method, so we moved many of the build-related
steps that were previously in __init__ to the `build_network`
function. In doing so, it is now harder to half-build a model,
and the Simulator.__init__ method is less complex.

Note that the block splitting process occurs before discretization,
which means that this commit can change the behavior of existing
models by specifying the `block_shape` of an ensemble; large
models were not possible previously, so no existing models
will change because blocks are only automatically split if they
are too large to fit on one neuron core. We believe that doing
discretization after splitting will only yield improvements,
but we have not rigorously tested that belief.

Co-authored-by: Trevor Bekolay <>
This commit also adds some build functions to the API
documentation page.
@tbekolay tbekolay force-pushed the split-allocator-host-snips branch from 2a83b42 to 316fa5c Compare Feb 7, 2020
@tbekolay tbekolay merged commit 316fa5c into master Feb 7, 2020
2 of 3 checks passed
@tbekolay tbekolay deleted the split-allocator-host-snips branch Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

Successfully merging this pull request may close these issues.

None yet

3 participants