Link to more in-depth descriptions here: https://blackrockneurotech.com/research/wp-content/ifu/LB-0023-7.00_NEV_File_Format.pdf

**Nev obj structure**

It has three main attributes/functions: basic_header, getdata(), extended_headers. The documentation mentions others, such as processroicommments(), but the files we have do not have those.

nevobj.basic_header returns a dictionary with the following keys/values:
- **key**: 'FileTypeID', **value**: str (e.g. 'NEURALEV')
- **key**: 'FileSpec', **value**: str with float (e.g. '2.3')
- **key**: 'AddFlags', **value**: int (likely bool 1/0)
- **key**: 'BytesInHeader', **value**: int
- **key**: 'BytesInDataPackets', **value**: int
- **key**: 'TimeStampResolution', **value**: int
- **key**: 'SampleTimeResolution', **value**: int
- **key**: 'TimeOrigin', **value**: datetime.datetime
- **key**: 'CreatingApplication', **value**: str (e.g. 'File Dialog v7.0.4')
- **key**: 'Comment', **value**: str
- **key**: 'NumExtendedHeaders', **value**: int

nevobj.getdata() takes a long time and returns a dictionary with the following structure:
- **key**: spike_events, **value**: dict
    - **key**: TimeStamps, **value**: list
        - A list of times (integers) in ascending order at which spikes occur (**note**: this is NOT the same as the total duration of the session)
        - The length of this list should, in theory, equal the number of spikes (aka threshold crossings)
    - **key**: Unit, **value**: list
        - A list whose length is equal to Timestamps. In all files I've opened, this list contains only 0s
    - **key**: Channel, **value**: list
        - A list that contains the channel number that corresponds to the spike event time in TimeStamps
        - If the first entry of Timestamps is 30 and the first entry of Channel is 2, that means that a spike occurred in channel 2 at time 30
    - **key**: Waveforms, ****value****: array
        - Array shape: num timestamps x num channels
        - The columns of this array contain the activity of the corresponding channel
- **key**: digital_events, **value**: dict (**note: not all files have digital_events - FR and Cage usually do not**)
    - **key**: Timestamps, **value**: list
        - Not the same timestamps as spike_events. Different lengths and values.
    - **key**: InsertionReason, **value**: list
        - A list whose length is equal to Timestamps. In all files I've opened, this list contains only 1s
    - **key**: UnparsedData, **value**: list
        - A list with integers that encode various task- and trial-related information. Details can be found here:  https://github.com/limblab/Behavior/blob/master/src/target/words.h
        
nevobj.extended_headers returns a list of dicts; the number of dicts equals 'NumExtendedHeaders' in nevobj.basic_headers. 3 dicts in a row correspond to one electrode and contain the following info:
- dict1
    - **key**: 'PacketID':, **value**: str (e.g. 'NEUEVWAV')
    - **key**: 'ElectrodeID':, **value**:  int
    - **key**: 'PhysicalConnector':, **value**: int
    - **key**: 'ConnectorPin':, **value**:  int,
    - **key**: 'DigitizationFactor':, **value**:  int,
    - **key**: 'EnergyThreshold':, **value**:  int,
    - **key**: 'HighThreshold':, **value**:  int,
    - **key**: 'LowThreshold':, **value**:  int,
    - **key**: 'NumSortedUnits':, **value**:  int,
    - **key**: 'BytesPerWaveform':, **value**:  int,
    - **key**: 'SpikeWidthSamples':, **value**:  int,
    - **key**: 'EmptyBytes':, **value**: bytes
- dict2: electrode number
    - **key**: PacketID:, **value**: str 'NEUEVLBL'
    - **key**: ElectrodeID:, **value**: int (e.g. 1, should correspond to dict1)
    - **key**: Label:, **value**: str (the actual electrode number - e.g. 'elec78')
    - **key**: EmptyBytes:, **value**:  bytes (e.g. b'\x00\x00\x00\x00\x00\x00')
- dict3: filter information (type, frequency)
    - **key**: PacketID, **value**: str (e.g. 'NEUEVFLT')
    - **key**: ElectrodeID, **value**: int (e.g. 1, should corresond to dicts 1,2)
    - **key**: HighFreqCorner, **value**: str with float (e.g. '250.0 Hz')
    - **key**: HighFreqOrder, **value**: int (e.g. 4),
    - **key**: HighFreqType, **value**: str (e.g. 'butterworth')
    - **key**: LowFreqCorner, **value**: str with float (e.g. '7500.0 Hz')
    - **key**: LowFreqOrder, **value**: int (e.g. 3)
    - **key**: LowFreqType, **value**: str (e.g. 'butterworth')
    - **key**: EmptyBytes, **value**: bytes
        
**Nsx obj structure**

nsxobj.basic_header produces a dictionary with the following structure: 
- **key**: 'FileTypeID', **value**: str (e.g.  'NEURALCD')
- **key**: 'FileSpec', **value**: str with float (e.g.  '2.3')
- **key**: 'BytesInHeader', **value**: int (e.g.  8762)
- **key**: 'Label', **value**: str representing a rate (e.g.  '2 kS/s')
- **key**: 'Comment', **value**: str 
- **key**: 'Period', **value**: int (e.g.  15)
- **key**: 'TimeStampResolution', **value**: int (e.g.  30000)
- **key**: 'TimeOrigin', **value**:(e.g.  datetime.datetime(2023, 2, 14, 21, 41, 36, 14000))
- **key**: 'ChannelCount', **value**: int (e.g.  128)

nsxobj.getdata() takes a few seconds a dict with the following structure:
- **key**: elec_ids, **value**: list
    - List of electrode id's
- **key**: start_time_s, **value**: float
    - Usually 0.
- **key**: data_time_s, **value**: str
    - In all of the files I've opened, this string has been 'all'.
- **key**: downsample, **value**: int
    - Likely boolean - 1s or 0s. In all the files I've opened, it's been 1.
- **key**: data, **value**: list
    - A list containing a numpy array. This contains emg, force, and other data.
- **key**: data_headers, **value**: list whose only element is a dict
    - **key**: Timestamp, **value**: int (always 0 from what I've seen)
    - **key**: NumDataPoints, **value**: int (with number of total data points - should equal time length of file times sampling rate)
- **key**: ExtendedHeaderIndices, **value**: list
    - A list containing the unique electrode ids - i.e. the length of this list equals the number of electrodes. Values usually between 1 - num electrodes.

nsxobj.extended_headers produces a list of dicts, one for each electrode. The dicts contain the following info:
- **key**: 'Type', **value**: str (e.g. 'CC')
- **key**: 'ElectrodeID', **value**: int (e.g. 1)
- **key**: 'ElectrodeLabel', **value**: str (e.g. 'elec109')
- **key**: 'PhysicalConnector', **value**: int (e.g. 1)
- **key**: 'ConnectorPin', **value**: int (e.g. 1)
- **key**: 'MinDigitalValue', **value**: int (e.g. -32764)
- **key**: 'MaxDigitalValue', **value**: int (e.g. 32764)
- **key**: 'MinAnalogValue', **value**: int (e.g. -8191)
- **key**: 'MaxAnalogValue', **value**: int (e.g. 8191)
- **key**: 'Units', **value**: str (e.g. 'uV')
- **key**: 'HighFreqCorner', **value**: str with float in hz (e.g. '0.3 Hz')
- **key**: 'HighFreqOrder', **value**: int (e.g. 1)
- **key**: 'HighFreqType', **value**: str (e.g. 'butterworth')
- **key**: 'LowFreqCorner', **value**: str with float in hz (e.g. '250.0 Hz')
- **key**: 'LowFreqOrder', **value**: int (e.g. 4)
- **key**: 'LowFreqType', **value**: str (e.g. 'butterworth')

EMG Files

- sessions_key - sessions table
- paper_key  - ignore
- filename - nsx file
- file_id - auto increments
- **rec_system** - 'Cerebus'. If there is emg, 'Cerebus, Jim Baker's wired'. Talk to xuan about matching up .rhd files with .nev files. rhd files are 'DSPW'
- sampling_rate - based on nsx file type. for .rhd, check headers
- **emg_quality** - don't worry for now
- **emg_notes** - don't worry for now
- **muscle_list - sometimes in nsx extended headers, but not always - seems to be only sometimes for FR files - e.g. NsxFileObj.extended_headers[hdr_idx]['ElectrodeLabel'] returns EMG_FCU1**. Plots that look like emgs oscillate around 0. Check to see if any files have emg-looking plots that are not properly labeled with muscle names

Some dates have nsx and ccf files, but no nev files. Are we interested in these? If so, how do we account for sessions key, given that data for the sessions table is pulled from nev files?

In [1]:
# Import dependencies
import pandas as pd
import numpy as np
from sqlalchemy import create_engine
import os
from os import path, system
import sys
import glob
import time
# from PyQt5.QtWidgets import QFileDialog

# brpylib is the module that contains functions/classes that allow us to open and extract data from .nev and .nsx files
sys.path.insert(0, r'C:\Users\aqy3283\Desktop\proc-kevin')
sys.path.insert(0, r'C:\Users\aqy3283\Desktop\xds\xds_python')
import load_intan_rhd_format
from Blackrock_Python_Utilities import brpylib

In [2]:
sampling_rate_dict = {'ns1': 500,'ns2': 1000,'ns3': 2000,'ns4': 10000,'ns5': 30000,'ns6': 30000}

In [4]:
base_dir = 'R:\Basic_Sciences\Phys\L_MillerLab\data'
cerebus_data_dict = {}
for monkey in sorted(os.listdir(base_dir)):
#     if monkey not in ['.DS_Store','archive','Backed_up_data', 'Behavior','chewie-delete','CompiledCOFiles','DeepLabCutVids','DLC_models','DPZ','FSMIT_DataRestore_03172021', 'Han_13B1_target','IMU','Jarvis','Jango_redo','Jango_target_redo','LoadCell','Mihili_12A3_target','OldCerebusTest','Rats','Rats_target','Test data','Thumbs.db']:
    if monkey == 'Pop_18E3':
        monkey_path = os.path.join(base_dir, monkey)
        x = [i for i in os.listdir(monkey_path) if 'cerebus' in i.lower()]
        if len(x) != 0:
            cerebus_path = os.path.join(monkey_path, x[0])
        else:
            cerebus_path = monkey_path
        print(cerebus_path)
        date_list = glob.glob(f"{cerebus_path}/*")
        #date_list_trunc = [i.split('\\')[-1] for i in date_list if '.' not in i]
        date_dict = {date: {} for date in date_list}
        cerebus_data_dict[monkey] = date_dict 

R:\Basic_Sciences\Phys\L_MillerLab\data\Pop_18E3\CerebusData


In [10]:
cerebus_data_dict

{'Pop_18E3': {'R:\\Basic_Sciences\\Phys\\L_MillerLab\\data\\Pop_18E3\\CerebusData\\20190327': {'nev_list': ['R:\\Basic_Sciences\\Phys\\L_MillerLab\\data\\Pop_18E3\\CerebusData\\20190327\\20190327_Pop_FreeReaching001.nev'],
   'nsx_list': [],
   'ccf_list': ['R:\\Basic_Sciences\\Phys\\L_MillerLab\\data\\Pop_18E3\\CerebusData\\20190327\\20190327_Pop_FreeReaching001.ccf'],
   'rhd_list': []},
  'R:\\Basic_Sciences\\Phys\\L_MillerLab\\data\\Pop_18E3\\CerebusData\\20190329': {'nev_list': ['R:\\Basic_Sciences\\Phys\\L_MillerLab\\data\\Pop_18E3\\CerebusData\\20190329\\20190329_Pop__PG_001.nev'],
   'nsx_list': ['R:\\Basic_Sciences\\Phys\\L_MillerLab\\data\\Pop_18E3\\CerebusData\\20190329\\20190329_Pop__PG_001.ns3'],
   'ccf_list': ['R:\\Basic_Sciences\\Phys\\L_MillerLab\\data\\Pop_18E3\\CerebusData\\20190329\\20190329_Pop__PG_001.ccf'],
   'rhd_list': []},
  'R:\\Basic_Sciences\\Phys\\L_MillerLab\\data\\Pop_18E3\\CerebusData\\20190402': {'nev_list': ['R:\\Basic_Sciences\\Phys\\L_MillerLab\\da

In [6]:
for monkey in cerebus_data_dict:
    for date in cerebus_data_dict[monkey]:
        nev_list = glob.glob(f"{date}\\*.nev")
        nsx_list = glob.glob(f"{date}\\*.ns*")
        ccf_list = glob.glob(f"{date}\\*.ccf")
        rhd_list = glob.glob(f"{date}\\*.rhd")
        cerebus_data_dict[monkey][date]['nev_list'] = nev_list
        cerebus_data_dict[monkey][date]['nsx_list'] = nsx_list
        cerebus_data_dict[monkey][date]['ccf_list'] = ccf_list
        cerebus_data_dict[monkey][date]['rhd_list'] = rhd_list

In [15]:
samp_rate = load_intan_rhd_format.get_sampling_rate(cerebus_data_dict['Pop_18E3']['R:\\Basic_Sciences\\Phys\\L_MillerLab\\data\\Pop_18E3\\CerebusData\\20200821']['rhd_list'][0])


Reading Intan Technologies RHD2000 Data File, Version 1.5

n signal groups 7


In [16]:
samp_rate

2011.060546875

In [34]:
emg_dict = {'filename': [], 'rec_system': [], 'sampling_rate': [], 'muscle_list': []}

In [20]:
emg_dict = {'filename': [], 'rec_system': [], 'sampling_rate': [], 'muscle_list': []}

count_dict = {'neither': 0, 'rhd': 0, 'nsx': 0, 'both': 0}
both_files_count_dict = {'same': 0, 'nsx_more': 0, 'rhd_more': 0}
nsx_more_dates = []

for monkey in cerebus_data_dict:
    for date in cerebus_data_dict[monkey]:
        if (len(cerebus_data_dict[monkey][date]['rhd_list']) == 0) and (len(cerebus_data_dict[monkey][date]['nsx_list']) == 0):
            count_dict['neither']+=1
#         elif (len(cerebus_data_dict[monkey][date]['rhd_list']) == 0) and (len(cerebus_data_dict[monkey][date]['nsx_list']) != 0):
#             count_dict['nsx']+=1
#         elif (len(cerebus_data_dict[monkey][date]['rhd_list']) != 0) and (len(cerebus_data_dict[monkey][date]['nsx_list']) == 0):
#             count_dict['rhd']+=1
#         else:
#             count_dict['both']+=1
#             if len(cerebus_data_dict[monkey][date]['rhd_list']) > len(cerebus_data_dict[monkey][date]['nsx_list']):
#                 both_files_count_dict['rhd_more'] += 1 
#             elif len(cerebus_data_dict[monkey][date]['rhd_list']) < len(cerebus_data_dict[monkey][date]['nsx_list']): 
#                 both_files_count_dict['nsx_more'] += 1 
#                 [i for i in rhd_list]
#                 nsx_more_dates.append((date, len(cerebus_data_dict[monkey][date]['rhd_list']), len(cerebus_data_dict[monkey][date]['nsx_list']), ))
#             else:
#                 both_files_count_dict['same'] += 1 
        elif len(cerebus_data_dict[monkey][date]['rhd_list']) > 0:
            for rhd_file in cerebus_data_dict[monkey][date]['rhd_list']:
                filename = rhd_file.split('\\')[-1]
                sampling_rate = get_sampling_rate(rhd_file)
                
                emg_dict['filename'].append(filename)
                emg_dict['sampling_rate'].append(sampling_rate)
                emg_dict['rec_system'].append('DSPW')
        else:
            for nsx_file in cerebus_data_dict[monkey][date]['nsx_list']:
                filename = nsx_file.split('\\')[-1]
                file_ext = nsx_file[-3:]
                sampling_rate = sampling_rate_dict[file_ext]
                
                emg_dict['filename'].append(filename)
                emg_dict['sampling_rate'].append(sampling_rate)
                emg_dict['rec_system'].append('Cerebus, Jim Bakers Wired')
                
        muscle_labels_lst = []
        for nsx_file in cerebus_data_dict[monkey][date]['nsx_list']:
            nsxobj = brpylib.NsxFile(nsx_file)
            output_nsx = nsxobj.getdata()
            if output_nsx == 0:
                continue
            for plot_chan in output_nsx['elec_ids']:
                ch_idx  = output_nsx['elec_ids'].index(plot_chan)
                hdr_idx = output_nsx['ExtendedHeaderIndices'][ch_idx]
                label = nsxobj.extended_headers[hdr_idx]['ElectrodeLabel']
                muscle_labels_lst.append(label)
                muscle_labels_lst.append(',')
        full_labels_str = ''.join(muscle_labels_lst)[:-1]
        
        emg_dict['muscle_list'].append(full_labels_str)


R:\Basic_Sciences\Phys\L_MillerLab\data\Pop_18E3\CerebusData\20190329\20190329_Pop__PG_001.ns3 opened

R:\Basic_Sciences\Phys\L_MillerLab\data\Pop_18E3\CerebusData\20190403\20190403_Pop_KG_001.ns3 opened

R:\Basic_Sciences\Phys\L_MillerLab\data\Pop_18E3\CerebusData\20190403\20190403_Pop_KG_002.ns3 opened

R:\Basic_Sciences\Phys\L_MillerLab\data\Pop_18E3\CerebusData\20190409\20190409_Pop_PG_learning_001.ns3 opened

R:\Basic_Sciences\Phys\L_MillerLab\data\Pop_18E3\CerebusData\20190423\20190423_Pop_key_001.ns3 opened

R:\Basic_Sciences\Phys\L_MillerLab\data\Pop_18E3\CerebusData\20190424\20190424_Pop_key_001.ns3 opened

R:\Basic_Sciences\Phys\L_MillerLab\data\Pop_18E3\CerebusData\20190424\20190424_Pop_key_002.ns3 opened

R:\Basic_Sciences\Phys\L_MillerLab\data\Pop_18E3\CerebusData\20190429\20190429_Pop_wm_horiz_001.ns3 opened

R:\Basic_Sciences\Phys\L_MillerLab\data\Pop_18E3\CerebusData\20190430\20190430_Pop_WM_horiz001.ns3 opened

R:\Basic_Sciences\Phys\L_MillerLab\data\Pop_18E3\CerebusD

KeyboardInterrupt: 

In [18]:
count_dict

{'neither': 15, 'rhd': 3, 'nsx': 132, 'both': 43}

In [19]:
both_files_count_dict

{'same': 17, 'nsx_more': 26, 'rhd_more': 0}

In [55]:
nsx_more_dates

[('Z:\\data\\Pop_18E3\\CerebusData\\20200313', 4, 8),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20200320', 8, 16),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20200626', 8, 18),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20200717', 8, 17),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20200724', 7, 16),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20200731', 4, 13),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20200821', 8, 13),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20200904', 9, 15),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20200925', 2, 3),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20201005', 7, 14),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20201020', 12, 21),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20210602', 11, 19),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20210611', 3, 6),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20210616', 8, 13),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20210618', 9, 25),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20210625', 8, 14),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20210630', 6, 11),
 ('Z:\\data\\Pop_18E3\\CerebusData\\20210702', 6, 11),
 ('Z:\\data

In [21]:
emg_dict

{'filename': ['20190329_Pop__PG_001.ns3',
  '20190403_Pop_KG_001.ns3',
  '20190403_Pop_KG_002.ns3',
  '20190409_Pop_PG_learning_001.ns3',
  '20190423_Pop_key_001.ns3',
  '20190424_Pop_key_001.ns3',
  '20190424_Pop_key_002.ns3',
  '20190429_Pop_wm_horiz_001.ns3',
  '20190430_Pop_WM_horiz001.ns3',
  '20190430_Pop_WM_horiz002.ns3',
  '20190503_Pop_wm_horiz_001.ns3',
  '20190503_Pop_wm_horiz_002.ns3',
  '20190506_Pop_wm_horiz001.ns3',
  '20190506_Pop_wm_horiz002.ns3',
  '20190603_Pop_horizWM_001.ns3',
  '20190603_Pop_MG_key_002.ns3',
  '20190604_Pop_MG_key_001.ns3',
  '20190604_Pop_MG_key_002.ns3',
  '20190604_Pop_MG_key_003.ns3',
  'pop_20190604.ns3',
  '20190605_Pop_horiz_WM_001.ns3',
  '20190605_Pop_horiz_WM_002.ns3',
  '20190605_Pop_horiz_WM_003.ns3',
  '20190606_Pop_freereach_004.ns3',
  '20190606_Pop_horiz_WM_001.ns3',
  '20190606_Pop_horiz_WM_002.ns3',
  '20190606_Pop_horiz_WM_003.ns3',
  '20190607_Pop_horiz_WM_001.ns3',
  '20190607_Pop_horiz_WM_002.ns3',
  '20190607_Pop_horiz_WM_00

In [39]:
for key in emg_dict.keys():
    print(key, len(emg_dict[key]))

filename 704
rec_system 704
sampling_rate 704
muscle_list 178


In [37]:
df = pd.DataFrame(emg_dict)

ValueError: All arrays must be of the same length

In [59]:
ns3file = 'Z:\\data\\Pop_18E3\\CerebusData\\20200626\\20200626_Pop_Cage_001.ns3'
ns6file = 'Z:\\data\\Pop_18E3\\CerebusData\\20200626\\20200626_Pop_Cage_001.ns3'

ns3obj = brpylib.NsxFile(ns3file)
output_ns3 = nsxobj.getdata()
ns6obj = brpylib.NsxFile(ns6file)
output_ns6 = nsxobj.getdata()


Z:\data\Pop_18E3\CerebusData\20200626\20200626_Pop_Cage_001.ns3 opened

Output data requested is larger than 1 GB, skipping

Z:\data\Pop_18E3\CerebusData\20200626\20200626_Pop_Cage_001.ns3 opened

Output data requested is larger than 1 GB, skipping


In [63]:
ns6_data_dict = {}
base_dir = 'Z:\data'
for monkey in sorted(os.listdir(base_dir)):
#     if monkey not in ['.DS_Store','archive','Backed_up_data', 'Behavior','chewie-delete','CompiledCOFiles','DeepLabCutVids','DLC_models','DPZ','FSMIT_DataRestore_03172021', 'Han_13B1_target','IMU','Jarvis','Jango_redo','Jango_target_redo','LoadCell','Mihili_12A3_target','OldCerebusTest','Rats','Rats_target','Test data','Thumbs.db']:
    if (monkey == 'Pancake_20K3') or (monkey == 'Pop_18E3'):
        print(monkey)
        ns6_data_dict[monkey] = {}
        monkey_path = os.path.join(base_dir, monkey)
        x = [i for i in os.listdir(monkey_path) if 'cerebus' in i.lower()]
        if len(x) != 0:
            cerebus_path = os.path.join(monkey_path, x[0])
        else:
            cerebus_path = monkey_path
        print(cerebus_path)
        ns6_list = glob.glob(f"{cerebus_path}\*\*.ns6")
        print(len(ns6_list))
        ns6_data_dict[monkey]['ns6_list'] = ns6_list

Pancake_20K3
Z:\data\Pancake_20K3\Cerebus_data
14
Pop_18E3
Z:\data\Pop_18E3\CerebusData
211
