# Convert recording and sorting extractor data to TINT format

The Hussaini lab uses the proprietary TINT software from Axona to analyze extracellular electrophysiology data. While we are already able to read various data formats from Axona (`raw` data or `unit` data) into spikeinterface, perform preprocessing, spike sorting and export the data to NWB, we also want to allow to export data to the TINT format. 

The TINT format is essentially the same as the `unit` data, including `.X` and `.pos` files, but also `.cut` or `.clu`. The latter two contain information about the spike sorted units.

The conversion can be facilitated by using the existing tools from the Hussaini lab, which [convert `.bin` data to `.X` and `.pos`](https://github.com/HussainiLab/BinConverter/blob/master/BinConverter/core/ConversionFunctions.py). Some of this code is only relevant for using the GUI, which did not work for me. I cleared out GUI code and ran a conversion from `.bin` to `.X` and `.pos` in this notebook: [explore_hussaini_tools.ipynb](https://github.com/sbuergers/hussaini-lab-to-nwb-notebooks/blob/master/explore_hussaini_tools.ipynb).

They also already wrote a [`write_cut()`](https://github.com/GeoffBarrett/gebaSpike/blob/967097ec28592182ef9783d2d391930e1c63ca58/gebaSpike/core/writeCut.py) function.

We can test our solutions by reading data with these [Hussaini lab tools](https://github.com/HussainiLab/BinConverter/blob/master/BinConverter/core/Tint_Matlab.py). 

<a id='index'></a>
## Index

* [Testing functions](#testing_functions)
* [Hussaini-lab functions](#hussaini-lab_functions)
* [Convert Recording Extractor to TINT](#Convert_recording_extractor_to_tint)
* [Convert Sorting Extractor to TINT](#Convert_sorting_extractor_to_tint)

In [1]:
import sys
import os
from pathlib import Path
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (16, 8)
plt.rcParams.update({'font.size':14})
%matplotlib inline

import spikeextractors as se

print(sys.version, sys.platform, sys.executable)

3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0] linux /home/sbuergers/spikeinterface/spikeinterface_new_api/venv/bin/python


In [2]:
# Directories

dir_name = Path('/mnt/d/freelance-work/catalyst-neuro/hussaini-lab-to-nwb/sample_bin_to_tint_no_bin')
print('Input directory = ', dir_name)

save_dir = dir_name / 'conversion_to_tint'
save_dir.mkdir(parents=True, exist_ok=True)
print('Output directory = ', save_dir)

Input directory =  /mnt/d/freelance-work/catalyst-neuro/hussaini-lab-to-nwb/sample_bin_to_tint_no_bin
Output directory =  /mnt/d/freelance-work/catalyst-neuro/hussaini-lab-to-nwb/sample_bin_to_tint_no_bin/conversion_to_tint


In [3]:
# Read cached spikeextractors data

r_cache = se.load_extractor_from_pickle(os.path.join(dir_name, 'cached_unit_data_no_bin_preproc.pkl'))

In [4]:
# Read NWB recording data

nwb_dir = Path(dir_name, 'nwb')
recording_nwb = se.NwbRecordingExtractor(nwb_dir / 'axona_tutorial_re2.nwb')

  warn(msg)


In [5]:
# Read NWB sorting data

sorting_nwb = se.NwbSortingExtractor(nwb_dir / 'axona_se_MS4.nwb', sampling_frequency=48000)

In [6]:
# Show data types of different objects

print(type(r_cache))
print(type(recording_nwb))
print(type(sorting_nwb))

<class 'spikeextractors.extractors.bindatrecordingextractor.bindatrecordingextractor.BinDatRecordingExtractor'>
<class 'spikeextractors.extractors.nwbextractors.nwbextractors.NwbRecordingExtractor'>
<class 'spikeextractors.extractors.nwbextractors.nwbextractors.NwbSortingExtractor'>


<a id="testing_functions"></a>
## Testing functions
[back to index](#index)

As we start exporting to putative TINT format, we will want to check if we can read it back in.

In [7]:
from spikeextractors.extractors.axonaunitrecordingextractor import AxonaUnitRecordingExtractor
import os


def test_axonaunitrecordingextractor(filename):
    '''Reads UNIT data with AxonaUnitRecordingExtractor and
    performs some simple operations as a sanity check. 
    
    Parameters
    ----------
    filename : str or Path
        Full filename of `.set` file (could be any extension actually)
    '''
    re = AxonaUnitRecordingExtractor(filename=filename)
    
    # TEST AXONARECORDINGEXTRACTOR
    # Retrieve some simple recording information and print it
    recording = re
    print('Channel ids = {}'.format(recording.get_channel_ids()))
    print('Num. channels = {}'.format(len(recording.get_channel_ids())))
    print('Sampling frequency = {} Hz'.format(recording.get_sampling_frequency()))
    print('Num. timepoints = {}'.format(recording.get_num_frames()))
    print('Stdev. on third channel = {}'.format(np.std(recording.get_traces(channel_ids=2))))
    print('Location of third electrode = {}'.format(
        recording.get_channel_property(channel_id=2, property_name='location')))
    print('Channel groups = {}'.format(recording.get_channel_groups()))
    
    # TEST NEO_READER (axonaio)
    print(recording.neo_reader.header['signal_channels'])
    
    
def test_tetrode_files(filename):
    '''Reads UNIT data with AxonaUnitRecordingExtractor and
    performs some simple operations as a sanity check. 
    Will only test .X  and .set files (no .clu or .cut, no .pos).
    
    Parameters
    ----------
    filename : str or Path
        Full filename of `.set` file (could be any extension actually)
    '''
    test_axonaunitrecordingextractor(filename)

<a id="hussaini-lab_functions"></a>
## Hussaini-lab functions
[back to index](#index)

`gebaSpike` actually wants already existing `.cut` or `.clu` files, and allows modifying them. So these might not be all that useful for exporting to `.cut` or `.clu`.

In [54]:
# From 
# https://github.com/GeoffBarrett/gebaSpike/blob/967097ec28592182ef9783d2d391930e1c63ca58/gebaSpike/main.py

def save_function(self):
    """
    this method will save the .cut file
    :return:
    """
    if self.cut_filename.text() == default_filename:
        return

    save_filename = os.path.realpath(self.cut_filename.text())

    if os.path.exists(save_filename):
        self.choice = None
        self.LogError.signal.emit('OverwriteCut!%s' % save_filename)
        while self.choice is None:
            time.sleep(0.1)

        if self.choice != QtWidgets.QMessageBox.Yes:
            return

    if len(self.tetrode_data) == 0:
        return

    # organize the cut data
    n_spikes_expected = self.tetrode_data.shape[1]
    n_spikes = len(np.asarray([item for sublist in self.cell_indices.values() for item in sublist]))

    # check that with the manipulation of the spikes, that we still have the correct number of spikes
    if n_spikes != n_spikes_expected:
        self.choice = None
        self.LogError.signal.emit('cutSizeError')
        while self.choice is None:
            time.sleep(0.1)
        return

    # we will check if we are missing some of the spikes somehow. If we kept track of them, then the indices from
    # the spikes, when sorted, should produce an array from 0 -> N-1 spikes.
    if not np.array_equal(np.sort(np.asarray([item for sublist in self.cell_indices.values() for item in sublist])),
                      np.arange(len(self.cut_data_original))):
        self.choice = None
        self.LogError.signal.emit('cutIndexError')
        while self.choice is None:
            time.sleep(0.1)
        return

    cut_values = np.zeros(n_spikes)
    for cell, cell_indices in self.cell_indices.items():
        cut_values[cell_indices] = cell

    if '.clu.' in save_filename:
        # save the .clu filename
        write_clu(save_filename, cut_values)
        self.choice = None
        self.LogError.signal.emit('saveCompleteClu')
        while self.choice is None:
            time.sleep(0.1)
        self.actions_made = False

    else:
        # save the cut filename
        write_cut(save_filename, cut_values)
        self.choice = None
        self.LogError.signal.emit('saveComplete')
        while self.choice is None:
            time.sleep(0.1)
        self.actions_made = False

In [55]:
# From 
# https://github.com/GeoffBarrett/gebaSpike/blob/967097ec28592182ef9783d2d391930e1c63ca58/gebaSpike/core/writeCut.py

def write_cut(cut_filename, cut, basename=None):
    if basename is None:
        basename = os.path.basename(os.path.splitext(cut_filename)[0])

    unique_cells = np.unique(cut)

    if 0 not in unique_cells:
        # if it happens that there is no zero cell, add it anyways
        unique_cells = np.insert(unique_cells, 0, 0)  # object, index, value to insert

    n_clusters = len(np.unique(cut))
    n_spikes = len(cut)

    write_list = []  # the list of values to write

    tab = '    '  # the spaces didn't line up with my tab so I just created a string with enough spaces
    empty_space = '               '  # some of the empty spaces don't line up to x tabs

    # we add 1 to n_clusters because zero is the garbage cell that no one uses
    write_list.append('n_clusters: %d\n' % (n_clusters))
    write_list.append('n_channels: 4\n')
    write_list.append('n_params: 2\n')
    write_list.append('times_used_in_Vt:%s' % ((tab + '0') * 4 + '\n'))

    zero_string = (tab + '0') * 8 + '\n'

    for cell_i in np.arange(n_clusters):
        write_list.append(' cluster: %d center:%s' % (cell_i, zero_string))
        write_list.append('%smin:%s' % (empty_space, zero_string))
        write_list.append('%smax:%s' % (empty_space, zero_string))
    write_list.append('\nExact_cut_for: %s spikes: %d\n' % (basename, n_spikes))

    # now the cut file lists 25 values per row
    n_rows = int(np.floor(n_spikes / 25))  # number of full rows

    remaining = int(n_spikes - n_rows * 25)
    cut_string = ('%3u' * 25 + '\n') * n_rows + '%3u' * remaining

    write_list.append(cut_string % (tuple(cut)))

    with open(cut_filename, 'w') as f:
        f.writelines(write_list)

In [56]:
# From 
# https://github.com/GeoffBarrett/gebaSpike/blob/967097ec28592182ef9783d2d391930e1c63ca58/gebaSpike/core/writeCut.py

def write_clu(clu_filename, data):
    # the .clu files and the .cut files are different since the .clu files are the .cut files (with no manual sorting)
    # without the headers, and the values go from 1 -> N instead of 0 -> N, (1-based numbering instead of 0-based). Thus
    # we add 1 to the .cut data to get the .clu data

    data = np.asarray(data).astype(int)  # ensuring that the data is the integer data-type

    data += 1  # making the data 1-based instead of 0-based

    # calculating the number of clusters
    n_clust = len(np.unique(data))

    # ensuring that the cluster number is the 1st value
    data = np.concatenate(([n_clust], data))

    # saving the data as a column (delimter='\n') and integer format.
    np.savetxt(clu_filename, data, fmt='%d', delimiter='\n')

<a id="Convert_recording_extractor_to_tint"></a>
## Convert Recording extractor to TINT
[back to index](#index)

Hmm, since we are writing to TINT, thereby creating `.X` tetrode files, we throw away all information in-between spikes. There is no point to convert the fake continuous recording used for spike sorting to TINT at all. We really only want to export the spike sorting output!

In [None]:
# Anything to do here?

<a id="Convert_sorting_extractor_to_tint"></a>
## Convert Sorting extractor to TINT
[back to index](#index)

There are several points in the pipeline at which we might want to export to TINT. Ideally it should work for any `SortingExtractor` object!

In [43]:
print('Where do we load data from?\n\n', dir_name)

Where do we load data from?

 /mnt/d/freelance-work/catalyst-neuro/hussaini-lab-to-nwb/sample_bin_to_tint_no_bin


From a sorting extractor we can obtain a list unit spike sample arrays. We can convert this to the .clu or .cut type array of unit ID labels for each spike.


In [None]:
cut_filename = Path('/mnt/d/freelance-work/catalyst-neuro/hussaini-lab-to-nwb/Axona_Tint_1ms/20201004_Tint_1.cut')

basename = os.path.basename(os.path.splitext(cut_filename)[0])

print(basename)

In [219]:
filename = Path('/mnt/d/freelance-work/catalyst-neuro/hussaini-lab-to-nwb/Axona_Tint_1ms/20201004_Tint.set')
print(filename)

Path(str(filename.with_suffix('')) + '_{}'.format(1) + '.cut')

/mnt/d/freelance-work/catalyst-neuro/hussaini-lab-to-nwb/Axona_Tint_1ms/20201004_Tint.set


PosixPath('/mnt/d/freelance-work/catalyst-neuro/hussaini-lab-to-nwb/Axona_Tint_1ms/20201004_Tint_1.cut')

In [40]:
def convert_spike_train_to_label_array(spike_train):
    '''Takes a list of arrays, where each array is a series of
    sample points at which a spike occured for a given unit
    (each list item is a unit). Converts to .cut array, i.e.
    orders spike samples from all units and labels each sample
    with the appropriate unit ID.
    
    Parameters
    ----------
    spike_train : List of np.arrays
        Output of `get_units_spike_train()` method of sorting extractor
        
    Return
    ------
    unit_labels_sorted : np.array
        Each entry is the unit ID corresponding to the spike sample that
        occured at this ordinal position
    '''

    # Generate Index array (indexing the unit for a given spike sample)
    unit_labels = []
    for i, l in enumerate(spike_train):
        unit_labels.append(np.ones((len(l),), dtype=int) * i)
    
    # Flatten lists and sort them
    spike_train_flat = np.concatenate(spike_train).ravel()
    unit_labels_flat = np.concatenate(unit_labels).ravel()

    sort_index = np.argsort(spike_train_flat)

    unit_labels_sorted = unit_labels_flat[sort_index]

    return unit_labels_sorted

In [66]:
def write_to_cut_file(cut_filename, unit_labels):
    '''Write spike sorting output to .cut file.
    
    Parameters
    ----------
    cut_filename : str or Path
        Full filename of .cut file to write to. A given .cut file belongs
        to a given tetrode file. For example, for tetrode `my_file.1`, the
        corresponding cut_filename should be `my_file_1.cut`.
    unit_labels : np.array
        Vector of unit labels for each spike sample (ordered by time of 
        occurence)
        
    Example
    -------
    # Given a sortingextractor called sorting_nwb:
    spike_train = sorting_nwb.get_units_spike_train()
    unit_labels = convert_spike_train_to_label_array(spike_train)
    write_to_cut_file(cut_filename, unit_labels)
    
    ---
    Largely based on gebaSpike implementation by Geoff Barrett
    https://github.com/GeoffBarrett/gebaSpike
    '''

    unique_cells = np.unique(unit_labels)

    n_clusters = len(np.unique(unit_labels))
    n_spikes = len(unit_labels)

    write_list = []

    tab = '    '
    empty_space = '               '

    write_list.append('n_clusters: %d\n' % (n_clusters))
    write_list.append('n_channels: 4\n')
    write_list.append('n_params: 2\n')
    write_list.append('times_used_in_Vt:%s' % ((tab + '0') * 4 + '\n'))

    zero_string = (tab + '0') * 8 + '\n'

    for cell_i in np.arange(n_clusters):
        write_list.append(' cluster: %d center:%s' % (cell_i, zero_string))
        write_list.append('%smin:%s' % (empty_space, zero_string))
        write_list.append('%smax:%s' % (empty_space, zero_string))
    write_list.append('\nExact_cut_for: %s spikes: %d\n' % (basename, n_spikes))

    # The unit label array consists of 25 values per row in .cut file
    n_rows = int(np.floor(n_spikes / 25))
    remaining = int(n_spikes - n_rows * 25)

    cut_string = ('%3u' * 25 + '\n') * n_rows + '%3u' * remaining

    write_list.append(cut_string % (tuple(unit_labels)))

    with open(cut_filename, 'w') as f:
        f.writelines(write_list)

In [207]:
def write_to_clu_file(clu_filename, unit_labels):
    ''' .clu files are pruned .cut files, containing only a long vector of unit
    labels, which are 1-indexed, instead of 0-indexed. In addition, the very first
    entry is the total number of units.
    
    Parameters
    ----------
    clu_filename : str or Path
        Full filename of .clu file to write to. A given .clu file belongs
        to a given tetrode file. For example, for tetrode `my_file.1`, the
        corresponding clu_filename should be `my_file_1.clu`.
    unit_labels : np.array
        Vector of unit labels for each spike sample (ordered by time of 
        occurence)
        
    ---
    Largely based on gebaSpike implementation by Geoff Barrett
    https://github.com/GeoffBarrett/gebaSpike
    '''
    unit_labels = np.asarray(unit_labels).astype(int)
    unit_labels += 1

    n_clust = len(np.unique(unit_labels))
    unit_labels = np.concatenate(([n_clust], unit_labels))

    np.savetxt(clu_filename, unit_labels, fmt='%d', delimiter='\n')

In [164]:
def set_cut_filename_from_basename(filename, tetrode_id):
    '''Given a str or Path object, assume the last entry after a slash
    is a filename, strip any file suffix, add tetrode ID label, and
    .cut suffix to name.
    
    Parameters
    ----------
    filename : str or Path
    tetrode_id : int
    '''
    return Path(str(filename).split('.')[0] + '_{}'.format(tetrode_id) + '.cut')

In [210]:
def write_unit_labels_to_file(sorting_extractor, filename):
    '''Write spike sorting output to .cut file, separately for each
    tetrode.
    
    Parameters
    ----------
    sorting_extractor : spikeextractors.SortingExtractor
    filename : str or Path
        Full filename of .set file or base-filename (i.e. the part of the
        filename all Axona files have in common). A given .cut file belongs
        to a given tetrode file. For example, for tetrode `my_file.1`, the
        corresponding cut_filename should be `my_file_1.cut`. This will be
        set automatically given the base-filename or set file.
        
    TODO: Any reason one might want to only convert some tetrodes or some
    samples? Should those be parameters?
    '''
    tetrode_ids = sorting_extractor.get_units_property(property_name='group')
    tetrode_ids = np.array(tetrode_ids)
    
    unit_ids = np.array(sorting_extractor.get_unit_ids())
    
    for i in np.unique(tetrode_ids):
        
        print('Converting Tetrode {}'.format(i))

        spike_train = sorting_extractor.get_units_spike_train(unit_ids=unit_ids[tetrode_ids==i])
        unit_labels = convert_spike_train_to_label_array(spike_train)

        # We use Axona conventions for filenames (tetrodes are 1 indexed)
        cut_filename = set_cut_filename_from_basename(filename, i + 1)
        clu_filename = Path(str(cut_filename).replace('.cut', '.clu'))

        write_to_cut_file(cut_filename, unit_labels)
        write_to_clu_file(clu_filename, unit_labels)

In [215]:
# We also have sorting data exported in `.nwb` format

nwb_dir = Path(dir_name, 'nwb')
sorting_nwb = se.NwbSortingExtractor(nwb_dir / 'axona_se_MS4.nwb', sampling_frequency=48000)

print(type(sorting_nwb))

<class 'spikeextractors.extractors.nwbextractors.nwbextractors.NwbSortingExtractor'>


In [217]:
print('Sampling frequency:', sorting_nwb.get_sampling_frequency(), 'Hz')

Sampling frequency: 48000 Hz


In [218]:
# Convert all tetrodes from sorting extractor to cut files
write_unit_labels_to_file(sorting_nwb, filename)

Converting Tetrode 0
Converting Tetrode 1
Converting Tetrode 2
Converting Tetrode 3


In [220]:
def write_to_tetrode_file(sorting_extractor, save_dir):
    '''Given a sorting extractor object create .X (tetrode) files.
    
    Parameters
    ----------
    sorting_extractor : spikeextractors.SortingExtractor
    save_dir : str or Path
        Directory where to save the output
    '''
    # TODO ...
    pass

In [232]:
def write_to_tint(sorting_extractor, filename):
    '''Given a sorting extractor object, write appropriate data
    to TINT format (from Axona). Will therefore create .X (tetrode),
    .cut and .clu (spike sorting information) files.
    
    Parameters
    ----------
    sorting_extractor : spikeextractors.SortingExtractor
    filename : str or Path
        Full path and base filename shared by all output files 
        (e.g. my_dir/my_file will yield
        my_dir/my_file.1, my_dir/my_file.2, ..., 
        my_dir/my_file_1.cut, my_dir/my_file_2.cut, ...,
        my_dir/my_file_1.clu, my_dir/my_file_2.clu, ...)
        If a file extension is given, it is simply ignored.
        
    Notes
    -----
    For details about the .X file format see:
    http://space-memory-navigation.org/DacqUSBFileFormats.pdf
    '''
    # Make sure directory exists
    filename.parent.absolute().mkdir(parents=True, exist_ok=True)
    
    # writes to .X files for each tetrode
    # TODO...
    write_to_tetrode_file(sorting_extractor, filename)
    
    # writes to .cut and .clu files for each tetrode
    write_unit_labels_to_file(sorting_extractor, filename)
    
    # Position data?
    # TODO ...

In [233]:
filename = Path(
    '/mnt/d/freelance-work/catalyst-neuro/hussaini-lab-to-nwb/Axona_Tint_1ms/spikeextractors_to_tint/20201004_Tint'
)

In [234]:
write_to_tint(sorting_nwb, filename)

Converting Tetrode 0
Converting Tetrode 1
Converting Tetrode 2
Converting Tetrode 3
