Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve default dimensions in feature grid view #360

Closed
rossant opened this issue Jun 17, 2015 · 28 comments · Fixed by #366
Closed

Improve default dimensions in feature grid view #360

rossant opened this issue Jun 17, 2015 · 28 comments · Fixed by #366

Comments

@rossant
Copy link
Contributor

rossant commented Jun 17, 2015

@nsteinme what should be the x and y dimensions in subplot (i, j)? Previously, x=dim_j, y=dim_i, now this is reversed. I can find arguments for both ways. WDYT?

Also, maybe we could show different dimensions in (i, j) and (j, i) (currently, the plots are just symmetric)

@rossant rossant added this to the v0.2.0 milestone Jun 17, 2015
@rossant rossant self-assigned this Jun 17, 2015
@nsteinme
Copy link
Contributor

Hm. Definitely it makes sense to show different dimensions in (i,j) vs (j,i) rather than just mirroring. Are you saying that it currently picks just one channel, and shows you PC1, PC2, and PC3 from that channel in column 1, 2, and 3? And the same for a different channel in the rows?

Well, I don't want to claim I'm certain I have the best answer. But here's how I'd do it if I were doing it from scratch:

(1,1) = time vs. depth (where depth is a feature computed for each spike - did I request this as a feature? I think it's a really good idea).

Then I would keep the x axis of all first-row and first-column plots as Time (like it currently is but keeping time on x rather than y, which I think is much easier to look at.

Column 2, 3, 4 = channel x, y, and z where those are the best channels for the first selected cluster.

Then you plot PC1, PC1, PC2, and PC3 for each of the four rows in that column (all on the y-axes)

Row 2, 3, 4 = channel m, n, p where those are the best channels for the second selected.
Then PC1, PC1, PC2, and PC3 in the four columns of that row (all on the x-axes).

So for example plot (3, 1) is channelN PC1 vs Time. Plot (3,2) is ChannelN PC1 [again] vs ChannelX PC2. Plot (4,4) is channel P PC3 vs channel Z PC3. etc.

This gives you 18 different features to look at, plus six different PC1vsTime plots.

It could know to pick a different channel for m, n, or p if those were already selected in x, y, and z.

Just one idea! Just thinking how to maximize information there.

There could be another "mode" that would be very useful in which all plots were PC1 vs Time, and the plots were arranged by their spatial layout on the probe. So plot(2,1) is the channel physically just above plot(3,1), and you're seeing PC1 vs Time for each. This would really help see drift.

@rossant rossant changed the title Question about axes in feature grid view Improve default dimensions in feature grid view Jun 17, 2015
@rossant
Copy link
Contributor Author

rossant commented Jun 17, 2015

Great stuff in there!

(where depth is a feature computed for each spike - did I request this as a feature? I think it's a really good idea)

what is it exactly? is it the "waveform mean position"?

@nsteinme
Copy link
Contributor

Yes, the y-coordinate of it. Except for each spike individually, rather
than for the mean waveform.

On Wed, Jun 17, 2015 at 7:57 PM, Cyrille Rossant notifications@github.com
wrote:

Great stuff in there!

(where depth is a feature computed for each spike - did I request this as
a feature? I think it's a really good idea)

what is it exactly? is it the "waveform mean position"?


Reply to this email directly or view it on GitHub
#360 (comment).

@rossant rossant added the ready label Jun 17, 2015
@nsteinme
Copy link
Contributor

More generally, if I compute a quantity of my own for each spike, can I
plot that quantity as a feature in feature view? This would be a really
great thing!

On Wed, Jun 17, 2015 at 8:09 PM, Nick Steinmetz nick.steinmetz@gmail.com
wrote:

Yes, the y-coordinate of it. Except for each spike individually, rather
than for the mean waveform.

On Wed, Jun 17, 2015 at 7:57 PM, Cyrille Rossant <notifications@github.com

wrote:

Great stuff in there!

(where depth is a feature computed for each spike - did I request this as
a feature? I think it's a really good idea)

what is it exactly? is it the "waveform mean position"?


Reply to this email directly or view it on GitHub
#360 (comment).

@rossant
Copy link
Contributor Author

rossant commented Jun 18, 2015

great idea! you can already register your own cluster statistic, we could do the same for spikes, and then add an option to plot it in the feature view

what do you have in mind specifically? would it be ok if the interface was something like:

def myspikequantity(session, cluster):
    # Get the spikes in that cluster
    spike_ids = session.model.spike_train(cluster)
    ...
    return somearraywithonevalueperspike

Basically computing your quantities on a per-cluster basis: this would just make my life simpler because this mechanism already exists...

@nsteinme
Copy link
Contributor

Yep, I think that would do it. I don't think computing on a per-cluster
basis is a problem. By the way, would the thing you compute be stored to
hard drive so it's available for the next time that you start phy on the
same dataset?

On Thu, Jun 18, 2015 at 10:08 AM, Cyrille Rossant notifications@github.com
wrote:

great idea! you can already register your own cluster statistic, we could
do the same for spikes, and then add an option to plot it in the feature
view

what do you have in mind specifically? would it be ok if the interface was
something like:

def myspikequantity(session, cluster):
# Get the spikes in that cluster
spike_ids = session.model.spike_train(cluster)
...
return somearraywithonevalueperspike

Basically computing your quantities on a per-cluster basis: this would
just make my life simpler because this mechanism already exists...


Reply to this email directly or view it on GitHub
#360 (comment).

@rossant
Copy link
Contributor Author

rossant commented Jun 18, 2015

By the way, would the thing you compute be stored to
hard drive so it's available for the next time that you start phy on the
same dataset?

No. The assumption is that custom stats are fast to compute and they are always computed on-the-fly. But that's something we could change if necessary. How intensive are your functions? What is it that you want to compute exactly?

(that's also the case for the default stats, except mean features/masks/waveforms which take a lot if time to compute due to the high I/O cost of loading everything in RAM...)

@nsteinme
Copy link
Contributor

The main thing is computing the x,y location of each individual spike, which optimally would be based on the waveform amplitude on each channel for each spike. That might be fairly intensive. You might be able to do it on the fly for a selection of waveforms.

Another thing that I would sometimes find useful (and I know at least two other people who think it is a major feature) is computing the value of the waveform at a particular sample. I.e. just to take the value session.store.waveforms(clusterID)[spikeNum, specifiedSampleNum, specifiedChannelNum]. Then plotting that as a feature so you can use the lasso to split on it.

@rossant
Copy link
Contributor Author

rossant commented Jun 18, 2015

You don't have fast access to the waveforms of all spikes. They are fetched dynamically from the raw data, which is slow for thousands of waveforms. So instead, phy samples ~100 waveforms per cluster and saves them for fast per-cluster read access.

So you can compute quickly anything based on the waveforms as long as only this small subset of waveforms is used. Anything that uses all waveforms will be extremely slow.

@nsteinme
Copy link
Contributor

That is an argument for being able to save the thing you compute so you
just have to do it once, yeah? You'd want to compute the depth for all the
ones that are plotted in feature view, not the ones that are plotted in
waveform view.

On Thu, Jun 18, 2015 at 12:21 PM, Cyrille Rossant notifications@github.com
wrote:

You don't have fast access to the waveforms of all spikes. They are
fetched dynamically from the raw data, which is slow for thousands of
waveforms. So instead, phy samples ~100 waveforms per cluster ad saves
them for fast per-cluster read access.

So you can compute quickly anything based on the waveforms as long as only
this small subset of waveforms is used. Anything that uses all waveforms
will be extremely slow.


Reply to this email directly or view it on GitHub
#360 (comment).

@nsteinme
Copy link
Contributor

Or, how about this, let's say I computed the waveform positions myself and
saved them to a file. Can I just load that file and use the values as a
feature to plot in feature view?

On Thu, Jun 18, 2015 at 12:24 PM, Nick Steinmetz nick.steinmetz@gmail.com
wrote:

That is an argument for being able to save the thing you compute so you
just have to do it once, yeah? You'd want to compute the depth for all the
ones that are plotted in feature view, not the ones that are plotted in
waveform view.

On Thu, Jun 18, 2015 at 12:21 PM, Cyrille Rossant <
notifications@github.com> wrote:

You don't have fast access to the waveforms of all spikes. They are
fetched dynamically from the raw data, which is slow for thousands of
waveforms. So instead, phy samples ~100 waveforms per cluster ad saves
them for fast per-cluster read access.

So you can compute quickly anything based on the waveforms as long as
only this small subset of waveforms is used. Anything that uses all
waveforms will be extremely slow.


Reply to this email directly or view it on GitHub
#360 (comment).

@rossant
Copy link
Contributor Author

rossant commented Jun 18, 2015

Or, how about this, let's say I computed the waveform positions myself and
saved them to a file. Can I just load that file and use the values as a
feature to plot in feature view?

that would work, but it's kind of a hack...

Ok, so that suggests a different approach:

  • there is a new notion of spike statistics for computing statistics that do not depend on the cluster assignements (this is critical)
  • you provide a function mystuff(spike_start, spike_stop) to compute all stats for the spikes between spike_start (included) and spike_stop (excluded). This is for performance/RAM reasons: it will be computed on a chunk-by-chunk basis (e.g. the first 10,000 spikes, the next 10,000 ones, etc). The function must return a NumPy array with spike_end-spike_start elements in the first axis. To access features/masks/waveforms from within the function, you cannot use the store, but instead the model which memmaps the HDF5 file. So something like W = model.waveforms[spike_start:spike_end,...].
  • the function is called during the store creation, and the results are saved in a single big file with the stats of all spikes
  • fetching the stats during the interactive session is very fast, and you can plot them in the feature view

cc @nippoo

@nippoo
Copy link
Contributor

nippoo commented Jun 18, 2015

OK - so I'm not sure how useful this will be, in reality. The concept of an amplitude discriminator at a fixed time offset seems nice in theory, but isn't this the same goal that the PCs are trying to achieve? If you're finding that the features don't discriminate adequately enough between waveforms, perhaps try increasing the number of PCs per waveform from 3 to 5 or something?

Arbitrary user-defined amplitude splitting is always going to be REALLY slow (aka cups-of-coffee slow) since you're effectively bypassing PCA and going back to the waveforms. We should really avoid any clustering method based on the raw waveforms, since IIRC you should be able to get as much information as you need out of the features. (Correct me if I'm wrong)...

I'm not entirely confident of the maths, but @kdharris101 or @shabnamkadir might have some ideas to suggest about the best way of going around this?

@nsteinme
Copy link
Contributor

The problem arises when some neuron has a really unusual waveform, and this
waveform shape is not represented well at all in the PCs. Typically, for
instance, this could be some kind of brief up-tick before the main negative
deflection. Then the PCs won't have it, but you can see with your eyes that
you could easily pick out the spikes from this neuron just by that one
sample. What if it were just computed for the currently selected clusters?

And adding more PCs isn't a good solution: it will take a lot longer to do
the clustering and there's no guarantee those extra ones will get it,
particularly if it is a small number of spikes in question (since they
won't contribute much to the total variance).

On Thu, Jun 18, 2015 at 5:00 PM, nippoo notifications@github.com wrote:

OK - so I'm not sure how useful this will be, in reality. The concept of
an amplitude discriminator at a fixed time offset seems nice in theory, but
isn't this the same goal that the PCs are trying to achieve? If you're
finding that the features don't discriminate adequately enough between
waveforms, perhaps try increasing the number of PCs per waveform from 3 to
5 or something? Or perhaps there's even a way of defining the waveform
amplitude at fixed time offset to be a feature and using that at
SpikeDetekt-time?

Arbitrary user-defined amplitude splitting is always going to be REALLY
slow (aka cups-of-coffee slow) since you're effectively bypassing PCA and
going back to the waveforms. We should really avoid any clustering method
based on the raw waveforms, since IIRC you should be able to get as much
information as you need out of the features. (Correct me if I'm wrong)...

I'm not entirely confident of the maths, but @kdharris101
https://github.com/kdharris101 or @shabnamkadir
https://github.com/shabnamkadir might have some ideas to suggest about
the best way of going around this?


Reply to this email directly or view it on GitHub
#360 (comment).

@nippoo
Copy link
Contributor

nippoo commented Jun 18, 2015

We could write a function quite easily to split a cluster in two based on amplitude above/below a certain point, clickable in the WaveformView. But I reckon it'd be something along the lines of:

  • some keyboard shortcut + click on point in WaveformView (maybe highlight spikes below/above in a certain colour)
  • press split
  • wait a minute (probably about 1ms per spike)
  • result

@nippoo
Copy link
Contributor

nippoo commented Jun 18, 2015

So for a cluster with 100,000 spikes it'd take about 1-2 minutes, as a rough guess.

@rossant
Copy link
Contributor Author

rossant commented Jun 18, 2015

I feel like this is the sort of things that's best done from IPython rather than hard-coded in the GUI...

@nsteinme
Copy link
Contributor

Well, I'll put it this way. This is a feature that you can do in Plexon
Offline Sorter (software I used to use), and I used it there sometimes.
When I showed KV to Marius Bauza (O'Keefe lab - you remember him) he was
adamant that this feature was critical. Then Dan Denman (Allen Inst)
recently told me he thought this was an important feature_. So that's why
it occurred to me to bring it up. I think those people would be willing to
pay 30 seconds for it (it won't be a cluster with 100k spikes or its
waveform shape *would_ have been picked up by the PCA). But Kenneth thinks
it's kinda hack-y and a little dubious, which I don't totally disagree
with, so I don't have a problem with letting it be a DIY-in-ipython
feature, at least for now, pending further user demands. I'll try to send
you a screenshot next time I see where this would be useful.

Nick

  • Denman: "i've always had good results with the time domain is well - if
    the spikes are well-aligned, choosing a value at a time point away from the
    amplitude [trough or somewhere in repolarization] provides good separation."

On Thu, Jun 18, 2015 at 5:14 PM, nippoo notifications@github.com wrote:

So for a cluster with 100,000 spikes it'd take about 1-2 minutes, as a
rough guess.


Reply to this email directly or view it on GitHub
#360 (comment).

@nippoo
Copy link
Contributor

nippoo commented Jun 18, 2015

Same. Unless it's something you'll end up doing regularly...? If this is more than a once-off thing and it'll actually be useful, there should be some way of clicking to select the discriminator's standard and visually confirm it's about right.

@rossant
Copy link
Contributor Author

rossant commented Jun 18, 2015

anyway, this is something that might be partially doable with this suggestion -- do you still need it? If so, I'll open a new issue.

@nippoo
Copy link
Contributor

nippoo commented Jun 18, 2015

I feel like that might be too general a solution to a specific problem - probably YAGNI for the moment, though I think I've come round to Nick's viewpoint that we probably need something to deal with this particular issue.

@rossant
Copy link
Contributor Author

rossant commented Jun 18, 2015

well, I think there are many other examples when you'll want to compute custom per-spike data and plot it (like position of the mouse at the time of every spike etc.) no?

@nsteinme
Copy link
Contributor

Well there's a big difference between per-spike data that is computed from
the waveforms, and that which is computed from just the spike times (like
your position example). So I think you'd handle that differently - you'd
already have a vector of x and y positions (small enough for RAM by far),
and you'd just index into it.

On Thu, Jun 18, 2015 at 5:28 PM, Cyrille Rossant notifications@github.com
wrote:

well, I think there are many other examples when you'll want to compute
custom per-spike data and plot it (like position of the mouse at the time
of every spike etc.) no?


Reply to this email directly or view it on GitHub
#360 (comment).

@rossant
Copy link
Contributor Author

rossant commented Jun 19, 2015

@nsteinme I'm gonna do this. What about the following API:

FEATURE VISUAL (low-level, rare user access)

read-only props:
view.x_dimensions = n*n array of strings or tuples (eg 'time', (7, 1), etc) describing the dimension in the x axis of every subplot
view.y_dimensions = n*n array of strings or tuples

function to change the dimensions:
set_x_dimensions({(1, 1): 'time', (2, 3): 'myextrafeature'})


FEATURE VIEW MODEL (high-level, user-exposed)

default matrix:

time, depth     time,    (x, 0)     time,    (y, 0)     time, (z, 0)

time, (x', 0)   (x', 0), (x, 0)     (x', 1), (y, 0)     (x', 2), (z, 0)

time, (y', 0)   (y', 0), (x, 1)     (y', 1), (y, 1)     (y', 2), (z, 1)

time, (z', 0)   (z', 0), (x, 2)     (z', 1), (y, 2)     (z', 2), (z, 2)

read-only props:
vm.x_channels = [x', y', z']
vm.y_channels = [x, y, z]

API to change the channels
vm.set_x_channel(1, y'')  # change the channel in the 3rd row (0-based indexing, and the first row is always time!)

@nippoo
Copy link
Contributor

nippoo commented Jun 19, 2015

I'm not 100% sure what this API is supposed to achieve - but if I understand right, you're suggesting we should be able to define an arbitrary feature as the amplitude at a certain point and calculate it for all spikes? If so, I think this isn't really a very good solution to the problem - this will be for one cluster and the time offset will probably be different every time you want to do it, and you'll want to discard this information as soon as the split is complete.

But maybe I'm understanding wrongly?

@kdharris101
Copy link

This is something the very early spike sorting systems had (e.g. Michael Recce’s software).

At that time I was against it as it makes it easy to “carve out” things that look like clear spikes from pure MUA. But I guess it could be useful in a modern context.

From: nippoo [mailto:notifications@github.com]
Sent: 19 June 2015 11:31
To: kwikteam/phy
Cc: Harris, Kenneth
Subject: Re: [phy] Improve default dimensions in feature grid view (#360)

I'm not 100% sure what this API is supposed to achieve - but if I understand right, you're suggesting we should be able to define an arbitrary feature as the amplitude at a certain point and calculate it for all spikes? If so, I think this isn't really a very good solution to the problem - this will be for one cluster and the time offset will probably be different every time you want to do it, and you'll want to discard this information as soon as the split is complete.

But maybe I'm understanding wrongly?


Reply to this email directly or view it on GitHubhttps://github.com//issues/360#issuecomment-113462993.

@nsteinme
Copy link
Contributor

There are a couple of things this is addressing (correct me if I'm wrong,
Cyrille):

  1. Allowing the user to choose what's shown in the features grid
  2. enabling the display of custom features (for example, the main one that
    the discussion started with is depth along the probe of each spike, which
    I believe we all think would be very useful. For another example, it would
    also allow the kind of amplitude-at-a-single-time-sample type feature that
    became the later part of the discussion.)

Looks good to me!

Nick

On Fri, Jun 19, 2015 at 11:35 AM, Kenneth Harris notifications@github.com
wrote:

This is something the very early spike sorting systems had (e.g. Michael
Recce’s software).

At that time I was against it as it makes it easy to “carve out” things
that look like clear spikes from pure MUA. But I guess it could be useful
in a modern context.

From: nippoo [mailto:notifications@github.com]
Sent: 19 June 2015 11:31
To: kwikteam/phy
Cc: Harris, Kenneth
Subject: Re: [phy] Improve default dimensions in feature grid view (#360)

I'm not 100% sure what this API is supposed to achieve - but if I
understand right, you're suggesting we should be able to define an
arbitrary feature as the amplitude at a certain point and calculate it for
all spikes? If so, I think this isn't really a very good solution to the
problem - this will be for one cluster and the time offset will probably be
different every time you want to do it, and you'll want to discard this
information as soon as the split is complete.

But maybe I'm understanding wrongly?


Reply to this email directly or view it on GitHub<
https://github.com/kwikteam/phy/issues/360#issuecomment-113462993>.


Reply to this email directly or view it on GitHub
#360 (comment).

@rossant
Copy link
Contributor Author

rossant commented Jun 19, 2015

@nsteinme yes you understood correctly; you can define any n_spikes-long array and plot it wherever you want

@rossant rossant added done and removed ready labels Jun 20, 2015
@nippoo nippoo removed the done label Jun 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants