-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient spike/spike-like event recording #372
Conversation
…tructs and update pointers
* Allocate and zero shared memory for block's spikes * Atomic or shared memory if spike is emitted
# Conflicts: # generate_swig_interfaces.py # include/genn/backends/cuda/backend.h # src/genn/backends/cuda/backend.cc
…rather than in nasty !init block
Codecov Report
@@ Coverage Diff @@
## master #372 +/- ##
==========================================
+ Coverage 86.25% 86.46% +0.20%
==========================================
Files 70 70
Lines 12072 12327 +255
==========================================
+ Hits 10413 10658 +245
- Misses 1659 1669 +10
Continue to review full report at Codecov.
|
I'm getting
when compiling PyGeNN. Isn't openmode a c-string? |
Well that last minute addition was clearly not great 😄 can you get latest and try? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, it compiles and the results look sensible. Thought they looked short, but I wasn't setting save steps high enough.
API is fine, but maybe convenient for lazy people to set layer recording in the neuron population constructor? Also see the per-neuron push/pull recording buffer note.
I see you plan to do it for neuron state vars (and synapse state?) as well, which could be handy.
@@ -278,6 +300,14 @@ def delay_slots(self): | |||
def size(self): | |||
return self.pop.get_num_neurons() | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be cool to have a per-layer pull_recording_buffers_from_device
method, so you can selectively pull recording buffers without pulling anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the use case for doing that? If you've enabled recording on a neuron population wouldn't you always want the data at the end of the recording period?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I suppose you would.
… implement for single-threaded CPU...
Turning over the metaphorical rock by adding tests uncovered a number of loose ends/bugs (including what appears to be a bug in NVIDIA's OpenCL implementation which needs further investigation). These are all now addressed and this should be good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did I miss it or should we add a few words about this to the manual? - otherwise approved.
# Conflicts: # src/genn/genn/code_generator/groupMerged.cc
The standard GeNN idiom for recording spikes was to use synchronous
cudaMemcpy
to read the number of spikes for each population in the current timestep, then use another synchronouscudaMemcpy
to read that many spikes and then copy the host data into a host-side data structure. Especially with simulation with a 0.1ms timestep and lots of populations, this pretty much prevents any chance of real-time. This PR introduces very simple system which lets you allocate any remaining GPU memory for spike recording meaning that, in many simulations, you can run for a large number of timesteps without any device->host memory transfers.Mark a neuron population for spike recording in C++:
or Python:
Allocate a number of timesteps of spike recording buffer in C++:
or Python:
Pull the spike recording buffers in C++:
pullRecordingBuffersFromDevice();
or Python:
Access recording data in C++ (where this helper function is in userprojects - you can do whatever you want with the data):
or Python:
I think recording spikes is the most common and most inefficient case but this could easily be extended in future to record subsets of neurons/state variables..
From the upcoming PyGeNN paper, this disproportionately helps Python and slow CPUs where the copying of data and sticking it in a host-side data structure component is particularly costly: