You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In gpu_cycle.hlsl, after ly.SpikeFmG, aggregate a list of neuron indexes that spiked:
uintspikeIdx :=InterlockedAdd(ctx.SpikedCount, 1); // returns previous valueSpikers[spikeIdx] =nin; // store our index
Then for SynCa, SendSpike, check if idx.x < ctx.SpikedCount and use Neurons[Spikers[idx.x]].
For single-cycle can specifically launch based on ctx.SpikedCount but otherwise just use number of neurons and the end of the list will bail. Allocate the full number of neurons for the spikers list to handle the worst case -- not likely worth optimizing that.
SynCa should actually use a second list that also checks for the UpdtThr cutoff -- can do Spikers list first and then that.
The text was updated successfully, but these errors were encountered:
This is surprisingly not-beneficial for SendSpike, SynCaSend or SynCaRecv, on either Mac M1 or A100. I guess that the numbers of neurons are small enough that just running all of them doesn't matter so much, relative to the additional costs for indirecting through the extra index and processing things out of order (I guess).
Specifically, on M1 it was slower, and on the A100 it was maybe 5% faster.
For SendSpike, it actually does not even work to do this because it needs to call PostSpike on all neurons.
Here's the actual working code for future reference:
gpu_cycle.hlsl:
[[vk::binding(1, 3)]] RWStructuredBuffer<uint>Spikers; // [[Neurons]] -- indexes of those that spikedvoidCycleNeuron2(inLayerParamsly, uintnin, inoutNeuronnrn, inPoolpl, inPoollpl, inLayerValsvals) {
uintni=nin-ly.Idxs.NeurSt; // layer-based as in GoGInteg(Ctx[0], ly, ni, nin, nrn, pl, vals);
ly.SpikeFmG(Ctx[0], ni, nrn);
// important: because we have to refer to Ctx[0] directly here for atomic, and can't use // a ctx arg, we *can't* also have that ctx arg -- it will overwrite anything we do!floatupdtThr=ly.Learn.CaLrn.UpdtThr;
if ((nrn.Spike>0) &&!((nrn.CaSpkP<updtThr) && (nrn.CaSpkD<updtThr))) {
intspIdx=InterlockedAdd(Ctx[0].NSpiked, 1);
Spikers[spIdx] =nin; // used later for SendSpike, SynCa
}
}
In
gpu_cycle.hlsl
, afterly.SpikeFmG
, aggregate a list of neuron indexes that spiked:Then for SynCa, SendSpike, check if
idx.x < ctx.SpikedCount
and useNeurons[Spikers[idx.x]]
.For single-cycle can specifically launch based on
ctx.SpikedCount
but otherwise just use number of neurons and the end of the list will bail. Allocate the full number of neurons for the spikers list to handle the worst case -- not likely worth optimizing that.SynCa should actually use a second list that also checks for the UpdtThr cutoff -- can do Spikers list first and then that.
The text was updated successfully, but these errors were encountered: