GPU: Make a list of neuron indexes that spiked, use that for SynCa and SendSpike #171

rcoreilly · 2023-02-16T02:47:43Z

In gpu_cycle.hlsl, after ly.SpikeFmG, aggregate a list of neuron indexes that spiked:

    uint spikeIdx := InterlockedAdd(ctx.SpikedCount, 1); // returns previous value
   Spikers[spikeIdx] = nin; // store our index

Then for SynCa, SendSpike, check if idx.x < ctx.SpikedCount and use Neurons[Spikers[idx.x]].

For single-cycle can specifically launch based on ctx.SpikedCount but otherwise just use number of neurons and the end of the list will bail. Allocate the full number of neurons for the spikers list to handle the worst case -- not likely worth optimizing that.

SynCa should actually use a second list that also checks for the UpdtThr cutoff -- can do Spikers list first and then that.

The text was updated successfully, but these errors were encountered:

rcoreilly · 2023-02-16T08:17:52Z

This is surprisingly not-beneficial for SendSpike, SynCaSend or SynCaRecv, on either Mac M1 or A100. I guess that the numbers of neurons are small enough that just running all of them doesn't matter so much, relative to the additional costs for indirecting through the extra index and processing things out of order (I guess).

Specifically, on M1 it was slower, and on the A100 it was maybe 5% faster.

For SendSpike, it actually does not even work to do this because it needs to call PostSpike on all neurons.

Here's the actual working code for future reference:

gpu_cycle.hlsl:

[[vk::binding(1, 3)]] RWStructuredBuffer<uint> Spikers;  // [[Neurons]] -- indexes of those that spiked

void CycleNeuron2(in LayerParams ly, uint nin, inout Neuron nrn, in Pool pl, in Pool lpl, in LayerVals vals) {
	uint ni = nin - ly.Idxs.NeurSt; // layer-based as in Go
	
	GInteg(Ctx[0], ly, ni, nin, nrn, pl, vals);
	ly.SpikeFmG(Ctx[0], ni, nrn);
	
	// important: because we have to refer to Ctx[0] directly here for atomic, and can't use 
	// a ctx arg, we *can't* also have that ctx arg -- it will overwrite anything we do!
	float updtThr = ly.Learn.CaLrn.UpdtThr;
	if ((nrn.Spike > 0) && !((nrn.CaSpkP < updtThr) && (nrn.CaSpkD < updtThr))) {
		int spIdx = InterlockedAdd(Ctx[0].NSpiked, 1);
		Spikers[spIdx] = nin; // used later for SendSpike, SynCa
	}
}

gpu_syncasend.hlsl, gpu_syncarecv.hlsl:

[numthreads(64, 1, 1)]
void main(uint3 idx : SV_DispatchThreadID) { // over Neurons
	if (idx.x < Ctx[0].NSpiked) {
		SynCaSend(Ctx[0], Spikers[idx.x], Neurons[Spikers[idx.x]]);
	}
}

The NSpiked counts in bench/run_gpu.sh are highly variable -- 0 to 40%..

rcoreilly mentioned this issue Feb 16, 2023

Threads.SynCa and SynCaFun no longer needed (after GPU PR merge) #172

Closed

rcoreilly closed this as completed Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU: Make a list of neuron indexes that spiked, use that for SynCa and SendSpike #171

GPU: Make a list of neuron indexes that spiked, use that for SynCa and SendSpike #171

rcoreilly commented Feb 16, 2023

rcoreilly commented Feb 16, 2023

GPU: Make a list of neuron indexes that spiked, use that for SynCa and SendSpike #171

GPU: Make a list of neuron indexes that spiked, use that for SynCa and SendSpike #171

Comments

rcoreilly commented Feb 16, 2023

rcoreilly commented Feb 16, 2023