-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more extensive logging #754
Comments
Agreed, this would be cool. If we start with something easy, like using ACG, amplitudes and a few dozen simple features, than it is definitely feasible. I have something similar in Suite2p, where a classifier is trained on every user's own manual decisions in the GUI. This way the classifier can be customized on any user's data, with their particular type of recordings. |
Agreed in principle. One detail: how would we encode relationships between cells?
It isn't a simple classification problem like suite2p, there are multiple outcomes for each cell (keep, drop, merge with another)
…Sent from my iPhone
On Jun 7, 2017, at 9:51 AM, Marius Pachitariu <notifications@github.com<mailto:notifications@github.com>> wrote:
Agreed, this would be cool. If we start with something easy, like using ACG, amplitudes and a few dozen simple features, than it is definitely feasible. I have something similar in Suite2p, where a classifier is trained on every user's own manual decisions in the GUI. This way the classifier can be customized on any user's data, with their particular type of recordings.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#754 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AD7KlXvRlyW6MjJ8GWUWkgpoZSDyyAiMks5sBmScgaJpZM4NyYSd>.
|
Multi-class classifiers? Not sure exactly how it would work, but we can start simple, like just cleaning up the output of Kilosort as well as possible. If we had concrete log data, it would probably become clearer what can and cannot be done this way. |
Great idea! We need to come up with a full spec of what we need in the log, e.g.
What happens when the user clicks on clusters manually without using the "wizard" keyboard shortcuts? When the user selects "merge" in the dropdown menu vs. using the "g" keyboard shortcut? Should that be recorded too? Then we could just record all of this information in a file (for example in a "pickle" binary format) and save it along the other files in the directory. Ideas (perhaps in TaskLogger) :
|
In principle, I guess all that needs to be logged is what the user did – because everything else (e.g. the autocorrelations that were available) can be recalculated as required. It might be good to log who the user is, since different users might have different styles.
There is a question of whether to include what the user was looking at when they made a decision – but not sure this is important. We are assuming the users are making correct decisions. If there was some information they could have used but didn’t, that should also go into the classifier.
Certainly we should record menu uses as well as keyboard shortcuts. The simplest thing would be to record all actions (including changes of view as well as consequential actions such as merges or deletions), it can’t do any harm.
Finally, maybe we should start by training the classifier on keep/delete decisions, rather than merges? This might be easier… Nick, when you do manual spike sorting, do you start by throwing a lot of clusters away? That part could probably be automated.
From: Cyrille Rossant [mailto:notifications@github.com]
Sent: 07 June 2017 10:21
To: kwikteam/phy <phy@noreply.github.com>
Cc: Harris, Kenneth <kenneth.harris@ucl.ac.uk>; Mention <mention@noreply.github.com>
Subject: Re: [kwikteam/phy] more extensive logging (#754)
Great idea! We need to come up with a full spec of what we need in the log, e.g.
· timestamp
· action_name
· best_cluster_id
· target_cluster_id (optional)
· pc_feature_dim?
· template_feature_dim?
· best_cluster_acg?
· target_cluster_acg?
· amplitudes?
· ...
What happens when the user clicks on clusters manually without using the "wizard" keyboard shortcuts? When the user selects "merge" in the dropdown menu vs. using the "g" keyboard shortcut? Should that be recorded too?
Then we could just record all of this information in a file (for example in a "pickle" binary format) and save it along the other files in the directory.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#754 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AD7Kla_3FoZkjxWMG8lC-BzUpus5hfE0ks5sBmuZgaJpZM4NyYSd>.
|
We just had a discussion about this, and came up with a concrete suggestion. The idea is every time the user views a cluster you log: • Autocorrelogram (subsampled on a power scale, i.e. bin n is between An^p and A(n+1)^p milliseconds, for example where n=0 to 100, A = 1 and p=1.5. (I think this is better than log, since log will over emphasize around 0). • Mean template amplitude vs. time for that cluster, with time evenly binned into 100 bins. • Mean firing rate vs time, with time evenly binned into 100 bins. • Histogram of template amplitude for the cell, again binned into 100 bins from 0 to 3*mean say. • Template waveform (represented by PCs). For every pair of clusters view simultaneously you also log: • Histogram of projection onto difference between templates. You also log what the user did with the cluster (delete, mark as good, merge, split, etc). This would be a good start to building a learning algorithm. It might be we find we don’t need all this information, but we can remove the rest later. |
Sounds good, I'll have a go after the holiday. |
@rossant @kdharris101 @marius10p
We had an interesting discussion the other day about the possibility of using all the manual decisions that people make to train an algorithm to mimic those decisions.
A first step would be to increase the logging that's done in the existing log file (or maybe make a new one that's more detailed). E.g. you'd want to include when someone selects a cluster but doesn't make a decision about it, when they review a best vs. similar comparison and then skip to the next similar, and also anything that would help clarify what they are looking at, like which views are currently visible. You'd also want to have a way to uniquely identify which dataset they're looking at, e.g. by storing md5 hash of relevant files.
A second step could involve uploading the log files to a database automatically. Uploading the dataset would probably be prohibitive, but for any sorting that's done within our lab the datasets would be available, or we could work on finding some minimum set that could be uploaded, or we could consider versions of what to upload, e.g. maybe you want to work on training simply the "good vs. mua" decision - so you just upload a few simple things like ACG, waveform amplitude, a PC projection of nearest few clusters, etc.
The end result of all this could be an algorithm that says "I think most people would call this Good with 87% confidence" or "I think most people would merge these two with 43% confidence". In the limit of good/trusted algorithm, it could apply confident operations for you.
The text was updated successfully, but these errors were encountered: