Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU: Make a list of neuron indexes that spiked, use that for SynCa and SendSpike #171

Closed
rcoreilly opened this issue Feb 16, 2023 · 1 comment

Comments

@rcoreilly
Copy link
Member

In gpu_cycle.hlsl, after ly.SpikeFmG, aggregate a list of neuron indexes that spiked:

    uint spikeIdx := InterlockedAdd(ctx.SpikedCount, 1); // returns previous value
   Spikers[spikeIdx] = nin; // store our index

Then for SynCa, SendSpike, check if idx.x < ctx.SpikedCount and use Neurons[Spikers[idx.x]].

For single-cycle can specifically launch based on ctx.SpikedCount but otherwise just use number of neurons and the end of the list will bail. Allocate the full number of neurons for the spikers list to handle the worst case -- not likely worth optimizing that.

SynCa should actually use a second list that also checks for the UpdtThr cutoff -- can do Spikers list first and then that.

@rcoreilly
Copy link
Member Author

This is surprisingly not-beneficial for SendSpike, SynCaSend or SynCaRecv, on either Mac M1 or A100. I guess that the numbers of neurons are small enough that just running all of them doesn't matter so much, relative to the additional costs for indirecting through the extra index and processing things out of order (I guess).

Specifically, on M1 it was slower, and on the A100 it was maybe 5% faster.

For SendSpike, it actually does not even work to do this because it needs to call PostSpike on all neurons.

Here's the actual working code for future reference:

gpu_cycle.hlsl:

[[vk::binding(1, 3)]] RWStructuredBuffer<uint> Spikers;  // [[Neurons]] -- indexes of those that spiked

void CycleNeuron2(in LayerParams ly, uint nin, inout Neuron nrn, in Pool pl, in Pool lpl, in LayerVals vals) {
	uint ni = nin - ly.Idxs.NeurSt; // layer-based as in Go
	
	GInteg(Ctx[0], ly, ni, nin, nrn, pl, vals);
	ly.SpikeFmG(Ctx[0], ni, nrn);
	
	// important: because we have to refer to Ctx[0] directly here for atomic, and can't use 
	// a ctx arg, we *can't* also have that ctx arg -- it will overwrite anything we do!
	float updtThr = ly.Learn.CaLrn.UpdtThr;
	if ((nrn.Spike > 0) && !((nrn.CaSpkP < updtThr) && (nrn.CaSpkD < updtThr))) {
		int spIdx = InterlockedAdd(Ctx[0].NSpiked, 1);
		Spikers[spIdx] = nin; // used later for SendSpike, SynCa
	}
}
gpu_syncasend.hlsl, gpu_syncarecv.hlsl:

[numthreads(64, 1, 1)]
void main(uint3 idx : SV_DispatchThreadID) { // over Neurons
	if (idx.x < Ctx[0].NSpiked) {
		SynCaSend(Ctx[0], Spikers[idx.x], Neurons[Spikers[idx.x]]);
	}
}

The NSpiked counts in bench/run_gpu.sh are highly variable -- 0 to 40%..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant