Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add boxy files for testing #67

Merged
merged 4 commits into from
Jul 24, 2020
Merged

Conversation

kuziekj
Copy link
Contributor

@kuziekj kuziekj commented Jul 8, 2020

This adds testing files for the the following PR: mne-tools/mne-python#7717

Total size is about 3.3 MB.

@larsoner
Copy link
Member

larsoner commented Jul 8, 2020

@rob-luke can you look?

@rob-luke
Copy link
Member

rob-luke commented Jul 8, 2020

Can you please describe the process used to record this data. How long is the recording? Are there triggers in it. Do you have an image of the optode positions? Do you have a figure(s) summarising the settings used in the recording device? What version of the hardware and software did you use. Was it a recording on a person or dummy? Etc

All this information is useful for validating the readers are operating correctly. It also helps to validate issues when people report bugs in the future, then we know if these edge cases were included in your testing data.

Please see this pr (and follow up prs) for a suggestion on what is useful to include

#51

#56

@kuziekj
Copy link
Contributor Author

kuziekj commented Jul 10, 2020

Hi @rob-luke,

Sorry for keeping you waiting, and I apologise for the very sparse details when I opened the PR. @kylemath and I are waiting to hear back from a colleague regarding some of the details you requested.

I'll be sure to keep you updated, hopefully we hear back soon. Thanks for your patience.

Take care.

@rob-luke
Copy link
Member

Sorry for keeping you waiting, and I apologise for the very sparse details

No need to apologise mate, there is no rush on these things, just shoot through the details when you get the chance. Then we should be able to get the reader merged soon, its all coming together nicely.

@kuziekj
Copy link
Contributor Author

kuziekj commented Jul 17, 2020

All right, here are some details on the recordings and how everything was made.

This recording is of a single participant and is a truncated version of a longer experiment. The participant was presented with an alternating checkerboard pattern, which alternated at 1Hz. The data is separated into four files (two blocks and two different channel montages), with each file containing ~3 seconds of data. Block and montage is indicated in the file name (eg. 1anc071a.001 = montage a, block 1; 1anc071b.002 = montage b, block 2). Each montage contains data for ten sources and eight detectors.

Data was collected using an ISS Imagent system (specs can be found here: http://www.iss.com/biomedical/instruments/imagent.html) using the BOXY recording software (version 0.40). Below is an image of the settings used for recording, which can also be found at the beginning of each data file:

image

There are triggers, although these are part of a separate file for each block and channel montage, and are located here (MNE-testing-data\BOXY\boxy_short_recording\evt). Each event file contains five triggers, a combination of 1’s and 2’s, along with the onset of each trigger, in sample points. Triggers can also be sent using the ‘digaux’ port on the Imagent hardware and would appear directly in the data recording, although this is not the case for these short recordings. The triggers don’t mean anything for these shorter recordings, since they were placed arbitrarily, but for the complete experiment they indicate when the checkerboard pattern was reversed. The first few reversals are labelled as ‘2’ in the event that they should be treated differently from the rest of the reversals.

Comparison files were created using the p_pod software (https://uofi.app.box.com/s/a2135zo5jludjq8y6p5qljgx1cx6ks44). We preprocessed the data using version 10.6.3, along with the following settings, which can be changed by clicking on the ‘Parameters’ tab on the main p_pod GUI:

image
image
image
image
image
image

For preprocessing, the data was only normalised and epoched/averaged, so no filtering or quality controls were applied. We modified several p_pod files to save the raw, epoched, and evoked activity at various points to better compare with the MNE processed data. The following files were modified with the following code:

b_norm.m
(line 77)
for i_ch=1:boxy_hdr.n_chans

(lines 163-167)

    % save some data for python comparison
    if ~exist(fullfile(pwd, 'python_compare_files'), 'dir')
       mkdir(fullfile(pwd, 'python_compare_files'))
    end
    save(fullfile(pwd, 'python_compare_files', [file_name 'raw_data.mat']),'ac','dc','ph')

e_avg.m
(lines 90-93)

    if ~exist(fullfile(pwd, 'python_compare_files'), 'dir')
       mkdir(fullfile(pwd, 'python_compare_files'))
    end
    save(fullfile(pwd, 'python_compare_files', [file_name 'evoke_data.mat']),'aac','adc','aph')

e_sum_qual
(lines 178-180)

        dc_epochs = cell(n_trials,1);
        ac_epochs = cell(n_trials,1);
        ph_epochs = cell(n_trials,1);

lines 354-356

                dc_epochs{i_trial} = dc(start:stop,good_chs);
                ac_epochs{i_trial} = ac(start:stop,good_chs);
                ph_epochs{i_trial} = ph(start:stop,good_chs);

lines 373-377

        % save some data for python comparison
        if ~exist(fullfile(pwd, 'python_compare_files'), 'dir')
           mkdir(fullfile(pwd, 'python_compare_files'))
        end
        save(fullfile(pwd, 'python_compare_files', [file_name 'epoch_data.mat']),'ac_epochs','dc_epochs','ph_epochs')

The files p_pod now saves contain raw, epoched, and evoked data for each montage and block, located here (MNE-testing-data\BOXY\boxy_short_recording\boxy_p_pod_files).
Also included is a .mtg file, which contains labels, wavelengths, and modulation frequencies for each source/detector combination, and an .elp file, containing coordinates for each source/detector, and fiducial.

To visual the digitised channel coordinates, we used Matlab and the “Optimized Coregistration Package” (OCP; version 1.0), downloaded from the same link used for p_pod. The channel coordinates from OCP and MNE are shown below. For the OCP plots, red dots indicate the raw channel locations and blue indicates their location on the scalp.

image
image
image

I hope this helps. Please let me know if there's anything else you need.

@kylemath, feel free to add anything else I may have missed.

Take care.

@rob-luke
Copy link
Member

Thanks for this additional information. I have a few follow up questions below. But to summarise, I would like to confirm that this PR contains a single measurment exactly as it was exported from the vendor software (we dont want any custom post processing or addtional files, we want a recording in the format that any lab with the same hardware would get). And one additional file to verify the content of the raw data (.mat file created with other analysis software as you included already).

Below is an image of the settings used for recording

This seems like very few parameters. Does the ISS system only have a dozen possible settings? This is just surprising as the NIRX system has half a dozen tabs each with multiple parameters and I would have expected more from a FD NIRS system as its fundamentally more complex.

The data is separated into four files (two blocks and two different channel montages), with each file containing ~3 seconds of data. Block and montage is indicated in the file name (eg. 1anc071a.001 = montage a, block 1; 1anc071b.002 = montage b, block 2).

Is this how the vendor software outputs a single measurement? Or have you taken multiple measurements? To me this reads as if you have taken four measurements, with two different montages and two different experiments (what is a block?). Or does the vendor software saved single measurements in multiple 350 KB files?

Basically we want a copy of the data from a single measurement exactly as it was exported from the vendor software. Unless there is a specific reason to have multiple measurements (for example the file format changed when the vendor software was updated). If we need multiple measurements I would prefer they were stored in different directories to make this explicit.

There are triggers, although these are part of a separate file for each block and channel montage, and are located here (MNE-testing-data\BOXY\boxy_short_recording\evt). Each event file contains five triggers, a combination of 1’s and 2’s, along with the onset of each trigger, in sample points.

Is this the format that the vendor (ISS) software exported the triggers? Or is this saved by a different program?

Did you get a screen shot from the vendor software with the order of the triggers? So that we verify the order and timing of the triggers.

Triggers can also be sent using the ‘digaux’ port on the Imagent hardware and would appear directly in the data recording, although this is not the case for these short recordings.

This is good information. If you did not use the digaux port for sending triggers, how were triggers sent for these measurements?

For preprocessing, the data was only normalised and epoched/averaged, so no filtering or quality controls were applied. We modified several p_pod files to save the raw, epoched, and evoked activity at various points to better compare with the MNE processed data.

I dont think we want processed data files in the MNE test set (@larsoner what do you think?). The purpose of this test data is to ensure that MNE reads the data correctly. Not to ensure that MNE post processing is the same as your other software (MNE epoching etc is already validated, I wouldnt necessarily really expect the two postprocessing steps to be identical anyway). Can you include a simple mat or hdf5 file that has just the raw data in it?

Also included is a .mtg file, which contains labels, wavelengths, and modulation frequencies for each source/detector combination, and an .elp file, containing coordinates for each source/detector, and fiducial.

Where are these files from? Are they exported by default in the ISS software?

@kuziekj
Copy link
Contributor Author

kuziekj commented Jul 18, 2020

This seems like very few parameters. Does the ISS system only have a dozen possible settings? This is just surprising as the NIRX system has half a dozen tabs each with multiple parameters and I would have expected more from a FD NIRS system as its fundamentally more complex.

Yes you're correct, sorry for that. There's a configuration file that can get loaded before each experiment, to ensure parameters are consistent across recordings. I'll see if I can get a hold of that file and post the details.

Is this how the vendor software outputs a single measurement? Or have you taken multiple measurements? To me this reads as if you have taken four measurements, with two different montages and two different experiments (what is a block?). Or does the vendor software saved single measurements in multiple 350 KB files?

Basically we want a copy of the data from a single measurement exactly as it was exported from the vendor software. Unless there is a specific reason to have multiple measurements (for example the file format changed when the vendor software was updated). If we need multiple measurements I would prefer they were stored in different directories to make this explicit.

Yes this data is the result of multiple recordings/measurements, the BOXY recording software only makes a single file per recording. Blocks in this case are multiple recordings of a montage. All recordings are of the same experiment/task though.

We wanted to include multiple montages and blocks since that is how the data is typically loaded and analysed in the p_pod software (all montages and blocks for each participant are combined).

Is this the format that the vendor (ISS) software exported the triggers? Or is this saved by a different program?

This is good information. If you did not use the digaux port for sending triggers, how were triggers sent for these measurements?

These .evt files are created in Matlab, using custom scripts for each experiment. The timings and trigger numbers are generated by the experiment presentation software (such as E-Prime), and the output file generated by the experiment software is read by the Matlab scripts. We mostly included these .evt files to ensure compatibility with older recordings. Sending triggers using TTL pulses through the digaux port would likely simplify this process, since the triggers would be recorded directly in the data.

Here is an image of the .evt contents, should all be the same for these shorter files:

image

I dont think we want processed data files in the MNE test set (@larsoner what do you think?). The purpose of this test data is to ensure that MNE reads the data correctly. Not to ensure that MNE post processing is the same as your other software (MNE epoching etc is already validated, I wouldnt necessarily really expect the two postprocessing steps to be identical anyway). Can you include a simple mat or hdf5 file that has just the raw data in it?

Sorry for the confusion on this. MNE itself isn't doing any of the processing. P_pod always does a certain amount of processing to the data, and we wanted to replicate those steps when designing our code to import raw boxy files. So any processing is only done in the script used to import the boxy data files, for the sake of replicating the p_pod software. The most we have MNE do directly is epoch the data so that we can compare those to the epochs and evoked activity created by p_pod.

We do have mat files containing the raw, evoked, and epoched data generated from p_pod (BOXY\boxy_short_recording\boxy_p_pod_files) but not from data imported by MNE.

Where are these files from? Are they exported by default in the ISS software?

The channel coordinates are digitised using the Polhemus FASTRACK system, which creates the .elp file.

The .mtg file is created by a program called NOMAD, which @kylemath designed (so he can probably better address this).

But to summarise, I would like to confirm that this PR contains a single measurment exactly as it was exported from the vendor software (we dont want any custom post processing or addtional files, we want a recording in the format that any lab with the same hardware would get). And one additional file to verify the content of the raw data (.mat file created with other analysis software as you included already).

So we are making a test file that's designed to read in the raw boxy files with MNE (each montage and block, including the .mtg, .elp, and .evt files), epoch the raw data, and then compare the raw, epoched, and evoked activity to the raw, epoched, and evoked activity generated by p_pod (BOXY\boxy_short_recording\boxy_p_pod_files). Perhaps that's a bit too much for the sake of testing?

Like you suggest, would it be better to instead include two mat files that each contain the all the raw data (one from p_pod and the other from MNE) and compare the two? Rather than reading/importing the raw boxy files directly?

So to summarise, we currently have the following files:

  • four raw boxy files (two montages and two blocks)
  • four .evt files (events for each montage and block)
  • one .elp file (digitised coordinates for each channel)
  • one .mtg file (channel labels and info for each montage/block)
  • ten .mat files (containing the raw, epoched, and evoked data processed by p_pod. I can reduce this by not forcing p_pod to save variables at specific points.)

@kylemath, would you be able to elaborate more on some of these questions?

@rob-luke
Copy link
Member

Thanks again for your clear explanation and response. I have a few suggestions on how to proceed. To summarise I think we should keep this PR for the Imagent software output only. If you want to write a Polhemus, EPrime or NOMAD readers then that should be done separately and not intertwined with the Imagent reader. I suggest this as other labs with Imagent devices may not have the same Polhemus/Nomad/EPrime setup as you, and we still want to be able to read their data. Additionally, breaking the testing in to the smallest possible components makes it easier to identify bugs later.

There's a configuration file that can get loaded before each experiment, to ensure parameters are consistent across recordings. I'll see if I can get a hold of that file and post the details.

It would be handy if you could share this just in this thread. No need to commit this file with the PR. Having this information will allow us to know if future issues that people have are covered by your test data.

Yes this data is the result of multiple recordings/measurements

Ok can we restrict this to a single measurement file?

We wanted to include multiple montages and blocks since that is how the data is typically loaded and analysed in the p_pod software (all montages and blocks for each participant are combined).

I understand you want to be able to load multiple measurements like your other software. But we just want to test that the reader is accurately loading measured data. We want the smallest data set to accurately test the reader, and to reduce duplication where possible.

The methods for how your lab reads multiple files is something you can test locally. Other labs may not take measurements in sets of four like your lab. We don’t want to test loading of your particular experimental design or lab setiup, but to test that the boxy data is read accurately from disk.

These .evt files are created in Matlab, using custom scripts for each experiment.

Ok lets not include these files then and keep this as Imagent boxy files only.

so that we can compare those to the epochs and evoked activity created by p_pod.

I dont think we want to compare the epoching and evoked generation between your p_pod software and MNE. I think we want to keep this PR for reading boxy files only. As such, I suggest we just keep the .mat file that has the p_pod raw data for comparison with MNE.

The channel coordinates are digitised using the Polhemus FASTRACK system, which creates the .elp file.

Ok we should remove these files then, as this PR data is to test boxy file reader. Other labs may not have a Polhemus system, but we still want to read their data. If the boxy file format does not contain spatial information for the optodes then we may need to write a Polhemus reader and write a tutorial on how to combine boxy and Polhemus data.

The .mtg file is created by a program called NOMAD, which @kylemath designed (so he can probably better address this).

Again I dont think we include this file either then (maybe we need a NOMAD reader?). The user should be able to take a measurement with the imagent software then directly load that in the MNE. What does a montage file contain that the Polhemus file does not? Again maybe we will need a tutorial on how to merge Imagent data with Polhemus and NOMAD location info.

So we are making a test file that's designed to read in the raw boxy files with MNE (each montage and block, including the .mtg, .elp, and .evt files), epoch the raw data, and then compare the raw, epoched, and evoked activity to the raw, epoched, and evoked activity generated by p_pod (BOXY\boxy_short_recording\boxy_p_pod_files). Perhaps that's a bit too much for the sake of testing?

I think its great that you want to test so much. However, I think it would be clearer to outsiders such as myself if each of these components was tested separately and not intertwined. I think we should start simple and make sure that the boxy data is being read accurately. Then as step two we can test the post processing, but probably only the imagent specific post processing, epoching etc is already tested elsewhere. It seems like we may also need to write and test other readers for Polhemus, EPrime and NOMAD separately.

So to summarise my suggestion is to include:

  • One Imagent boxy file (e.g. 1anc071a.001)
  • One file demonstrating how the MATLAB based software read the raw file (e.g. 1anc071a.001_raw_data.mat)

And I suggest we do not include the following files (or they are placed in their own file readers):

  • Polhemus location data. If this is required then we should write a Polhemus reader. But I assume other labs could have different location digitisers or none. So we dont want to exclude those labs from using MNE boxy reader.
  • Custom evt files. Other labs wont have the same custom scripts as your lab, I assume most labs use the native triggering in the system
  • NOMAD mtg files. As above if we need this we should write a NOMAD reader and add tutorial about how to merge nomad and boxy files.

I would also like to say that I appreciate you taking the time to commit these test files. It’s not often I feel like I ask people to do less. Please correct any misunderstandings I have, its always trick to understand someone else’s lab setup without seeing it. Thanks for the attention to detail. @larsoner @agramfort do you agree with the direction I suggest here, or am I just creating extra work?

@kylemath
Copy link

thanks for the great review and comments @rob-luke - I will have a look this coming week and think that we agree with you, this PR should focus on the reader for the raw data .

I think the issue stems from the relatively small group of labs using this equipment, who all generally learn the same pipeline and rely on these same additonal files and programs, but I agree this PR should be more agnostic to that process,

@rob-luke
Copy link
Member

rob-luke commented Jul 19, 2020

Great. We can also see what the others think, they may already know some shortcuts for us. One that I thought of already is that perhaps we can use some of the already implemented Polhemus code in MNE, except you seem to have a fast track and the code is for isotrak, but maybe they are compatible?

https://mne.tools/stable/generated/mne.channels.read_dig_polhemus_isotrak.html#mne.channels.read_dig_polhemus_isotrak

I think the issue stems from the relatively small group of labs using this equipment,

Yeah we had the same thing with the NIRX system, everyone seems to pair it with a presentation by neurobs. It makes sense to use the same system as another lab if it means you know it works, particularly because these things aren’t cheap.

@kylemath
Copy link

kylemath commented Jul 19, 2020 via email

@rob-luke
Copy link
Member

channel names is another example

This we might be able to get around. Originally we read the frequency info from the channel names, but now it’s stored in info. So maybe we can relax certain criteria if its blocking your progress.

thinking of sensor locations specifically

This I’m less sure how we can get around. Perhaps MNE has a standard for what to do if location info isn’t provided.

But these issues (and any others) are probably more suited for the reader thread in the MNE repo. It seems like you’ve considered these things thoroughly already, so feel free to make a list of unresolved issues in the reader PR and we can start brainstorming together.

@kylemath
Copy link

Thanks for this additional information. I have a few follow up questions below. But to summarise, I would like to confirm that this PR contains a single measurment exactly as it was exported from the vendor software (we dont want any custom post processing or addtional files, we want a recording in the format that any lab with the same hardware would get). And one additional file to verify the content of the raw data (.mat file created with other analysis software as you included already).

Ok so overall comment, I think the issue here is that MOST labs that use this equipment also use all these same corrolary files, since they were all trained in the same lab for the most part, the goal of moving to MNE and python is to get away from this rigid analysis pipeline and reliance on a single lab group (Gratton Fabiani), so I agree with making this as generic as possible.

Below is an image of the settings used for recording

This seems like very few parameters. Does the ISS system only have a dozen possible settings? This is just surprising as the NIRX system has half a dozen tabs each with multiple parameters and I would have expected more from a FD NIRS system as its fundamentally more complex.

That is pretty much all the settings, there are certainly a great deal of menus and options, but here are screen shots from another experiment from the boxy recording software directly, the first one

This formats the output file differently:
Screen Shot 2020-07-19 at 11 51 43 PM
This changes the overall recording time
Screen Shot 2020-07-19 at 11 51 34 PM

And these are the main settings that jon posted above
Screen Shot 2020-07-19 at 11 51 25 PM

The data is separated into four files (two blocks and two different channel montages), with each file containing ~3 seconds of data. Block and montage is indicated in the file name (eg. 1anc071a.001 = montage a, block 1; 1anc071b.002 = montage b, block 2).

Is this how the vendor software outputs a single measurement? Or have you taken multiple measurements? To me this reads as if you have taken four measurements, with two different montages and two different experiments (what is a block?). Or does the vendor software saved single measurements in multiple 350 KB files?

Basically we want a copy of the data from a single measurement exactly as it was exported from the vendor software. Unless there is a specific reason to have multiple measurements (for example the file format changed when the vendor software was updated). If we need multiple measurements I would prefer they were stored in different directories to make this explicit.

There are triggers, although these are part of a separate file for each block and channel montage, and are located here (MNE-testing-data\BOXY\boxy_short_recording\evt). Each event file contains five triggers, a combination of 1’s and 2’s, along with the onset of each trigger, in sample points.

Is this the format that the vendor (ISS) software exported the triggers? Or is this saved by a different program?

Did you get a screen shot from the vendor software with the order of the triggers? So that we verify the order and timing of the triggers.

We don't have this from this old data set, and are still locked out of our lab here, but perhaps it is good to record a new data file instead of relying on these old ones, so we have access to stuff like this

Triggers can also be sent using the ‘digaux’ port on the Imagent hardware and would appear directly in the data recording, although this is not the case for these short recordings.

This is good information. If you did not use the digaux port for sending triggers, how were triggers sent for these measurements?

This is an old data set where a single trigger would be sent at the onset of an eprime experiment, which saved the timing of stimuli, and a matlab script would have been used to convert those times from the eprime saved file into this event file.

For preprocessing, the data was only normalised and epoched/averaged, so no filtering or quality controls were applied. We modified several p_pod files to save the raw, epoched, and evoked activity at various points to better compare with the MNE processed data.

I dont think we want processed data files in the MNE test set (@larsoner what do you think?). The purpose of this test data is to ensure that MNE reads the data correctly. Not to ensure that MNE post processing is the same as your other software (MNE epoching etc is already validated, I wouldnt necessarily really expect the two postprocessing steps to be identical anyway). Can you include a simple mat or hdf5 file that has just the raw data in it?

Yes we can do this, this makes more sense and gives us a better idea of the scope of this loading test vs the scope of our overall tutorial on using MNE for imagent analysis of fast and slow optical signals.

Also included is a .mtg file, which contains labels, wavelengths, and modulation frequencies for each source/detector combination, and an .elp file, containing coordinates for each source/detector, and fiducial.

Where are these files from? Are they exported by default in the ISS software?

This I think is our most sticky issue, so the imagent system is very generic and just records channel numbers and detector numbers. These sources and detectors can be placed on the head in any configuration, and so their exact position on the head is never known by boxy software (kind of, boxy CAN know channel distances in order to check for good signals but that doesn't' influence the output files ).

So we use a matlab software to automatically assign channel and source locations, on the head for a given area of the brain of interest (github.com/kylemath/nomad), and that is what is used to make the .mtg file. The .elp file is made by polhemus, which gets the order of channels from another nomad output file (what a mess)

____>>> SECOND COMMENT >>>

Thanks again for your clear explanation and response. I have a few suggestions on how to proceed. To summarise I think we should keep this PR for the Imagent software output only. If you want to write a Polhemus, EPrime or NOMAD readers then that should be done separately and not intertwined with the Imagent reader. I suggest this as other labs with Imagent devices may not have the same Polhemus/Nomad/EPrime setup as you, and we still want to be able to read their data. Additionally, breaking the testing in to the smallest possible components makes it easier to identify bugs later.

Agreed

There's a configuration file that can get loaded before each experiment, to ensure parameters are consistent across recordings. I'll see if I can get a hold of that file and post the details.

It would be handy if you could share this just in this thread. No need to commit this file with the PR. Having this information will allow us to know if future issues that people have are covered by your test data.

I attached screenshots above of some settings, we will find those that match the test data

Yes this data is the result of multiple recordings/measurements

Ok can we restrict this to a single measurement file?

yes @kuziekj - remember this is just for this test, not for our entire tutorial we made, this test is just a test of the load_boxy.py function we made.

We wanted to include multiple montages and blocks since that is how the data is typically loaded and analysed in the p_pod software (all montages and blocks for each participant are combined).

I understand you want to be able to load multiple measurements like your other software. But we just want to test that the reader is accurately loading measured data. We want the smallest data set to accurately test the reader, and to reduce duplication where possible.

agreed, but doesn't read_NIRX.py load in multiple files?

The methods for how your lab reads multiple files is something you can test locally. Other labs may not take measurements in sets of four like your lab. We don’t want to test loading of your particular experimental design or lab setiup, but to test that the boxy data is read accurately from disk.

Ahh i see, we have file from different blocks, (which anyone running with this equipment would have) and also from different montages (moving optodes around on a second day), which they may not, I think we should keep the multiple blocks but not necessarily the multiple montages,

For this test we want as few files as possible, remember this test is different from our main tutorial example

These .evt files are created in Matlab, using custom scripts for each experiment.

Ok lets not include these files then and keep this as Imagent boxy files only.

so that we can compare those to the epochs and evoked activity created by p_pod.

I dont think we want to compare the epoching and evoked generation between your p_pod software and MNE. I think we want to keep this PR for reading boxy files only. As such, I suggest we just keep the .mat file that has the p_pod raw data for comparison with MNE.

I think we were just confusing the scope of this test vs the overall work we were doing on the tutorial

The channel coordinates are digitised using the Polhemus FASTRACK system, which creates the .elp file.

Ok we should remove these files then, as this PR data is to test boxy file reader. Other labs may not have a Polhemus system, but we still want to read their data. If the boxy file format does not contain spatial information for the optodes then we may need to write a Polhemus reader and write a tutorial on how to combine boxy and Polhemus data.

This one I am unsure of, but think you are correct,
This was another issue of us misunderstanding the scope of the test, this is just a single test of the reader, not of our whole analysis pipeline

The .mtg file is created by a program called NOMAD, which @kylemath designed (so he can probably better address this).

Again I dont think we include this file either then (maybe we need a NOMAD reader?). The user should be able to take a measurement with the imagent software then directly load that in the MNE. What does a montage file contain that the Polhemus file does not? Again maybe we will need a tutorial on how to merge Imagent data with Polhemus and NOMAD location info.

Yes I agree, we are making that tutorial.
.mtg file has the names of each source and detector, and knows the order of those names in the recording (which one is channel 1, channel 2, etc). The .elp file has the 3d coordinates.
So our load_boxy.py file, to match the scope of what you did with the NIRx data, loads these both so it can include channel names and assign them, and also so it knows the 3d coordinates. I agree this test PR is just for the simple raw data load, and we can focus on this stuff in the tutorial

BUT - it will still be part of the load_BOXY.py file, so shouldnt' we be testing it here? That is the confusion...

So we are making a test file that's designed to read in the raw boxy files with MNE (each montage and block, including the .mtg, .elp, and .evt files), epoch the raw data, and then compare the raw, epoched, and evoked activity to the raw, epoched, and evoked activity generated by p_pod (BOXY\boxy_short_recording\boxy_p_pod_files). Perhaps that's a bit too much for the sake of testing?

Agreed, we misunderstood the scope of the testing,

I think its great that you want to test so much. However, I think it would be clearer to outsiders such as myself if each of these components was tested separately and not intertwined. I think we should start simple and make sure that the boxy data is being read accurately. Then as step two we can test the post processing, but probably only the imagent specific post processing, epoching etc is already tested elsewhere. It seems like we may also need to write and test other readers for Polhemus, EPrime and NOMAD separately.

So to summarise my suggestion is to include:

  • One Imagent boxy file (e.g. 1anc071a.001)
  • One file demonstrating how the MATLAB based software read the raw file (e.g. 1anc071a.001_raw_data.mat)

And I suggest we do not include the following files (or they are placed in their own file readers):

  • Polhemus location data. If this is required then we should write a Polhemus reader. But I assume other labs could have different location digitisers or none. So we dont want to exclude those labs from using MNE boxy reader.
  • Custom evt files. Other labs wont have the same custom scripts as your lab, I assume most labs use the native triggering in the system
  • NOMAD mtg files. As above if we need this we should write a NOMAD reader and add tutorial about how to merge nomad and boxy files.

I guess this is the one thing we need to discuss. In order to match the API of read_nirx.py, we included the reading of event, as well as the reading of channel names, and locations, into the io/read_boxy.py file. So is this test a test of that entire file, or just the parts of the file that load the raw data.

the output of both read_nirx and read_boxy are Raw objects which INCLUDE the event information and the channel names and locations. Are you suggesting we remove those functions from read_boxy and create new io functions for them?

I would also like to say that I appreciate you taking the time to commit these test files. It’s not often I feel like I ask people to do less. Please correct any misunderstandings I have, its always trick to understand someone else’s lab setup without seeing it. Thanks for the attention to detail. @larsoner @agramfort do you agree with the direction I suggest here, or am I just creating extra work?

@kuziekj is very thorough

Thanks for the help everyone,

@rob-luke
Copy link
Member

Thanks for the additional info.

agreed, but doesn't read_NIRX.py load in multiple files?

Correct that reader does load multiple files, I can see the confusion here. But all the NIRX files are created by the NIRX system automatically for every recording, so if you purchased a new NIRX system plugged it in and hit record it would save these files without you having to install or run anything else (the evt file is created by the NIRX system, the montage is created by the NIRX system, etc). I think this is different to the system you describe above where you have multiple pieces of equipment and then other post processing software (correct me if I am wrong).

BUT - it will still be part of the load_BOXY.py file, so shouldnt' we be testing it here? That is the confusion...

I agree this is what we need to clear up. I think the load function should only load the data saved by the imagent system, if this doesn’t include spatial information then that is fine (and spatial information can be loaded elsewhere). We want anyone with an imagent system to be able to load their data, not just those labs with the same post processing equipment/software.

EEG systems dont usually record spatial information either. For example biosemi doesnt include electrode position, but you can load a montage after. See here for a description https://mne.tools/stable/auto_tutorials/io/plot_20_reading_eeg_data.html#reading-electrode-locations-and-head-shapes-for-eeg-recordings

@larsoner
Copy link
Member

The channel coordinates are digitised using the Polhemus FASTRACK system, which creates the .elp file.

Ok we should remove these files then, as this PR data is to test boxy file reader. Other labs may not have a Polhemus system, but we still want to read their data. If the boxy file format does not contain spatial information for the optodes then we may need to write a Polhemus reader and write a tutorial on how to combine boxy and Polhemus data.

Was the Polhemus part of the setup recommended (or provided) by the manufacturer of the reader? If so, then I think we should include the Polhemus data as part of the testing data. I think that there are already readers that read standard dig formats if the files are available in the given directory (e.g., KIT and CTF come to mind, one or both might also be Polhemus).

The way those readers were written at one point was to use their own internal code in mne/io/whatever/... to read the digitized positions, and then populate info['dig'] and info['chs'] appropriately. But the more modern way to do this would be to read them using some read_dig_* function, like:

https://mne.tools/stable/generated/mne.channels.read_polhemus_fastscan.html#mne.channels.read_polhemus_fastscan

Then call directly self.set_montage(montage) in the constructor.

If, on the other hand, the Polhemus was purchased separately, then I think it makes sense to not include any of this stuff in the reader itself, and just make sure that doing this works:

raw = mne.io.read_raw_boxy(fname)
montage = mne.channels.read_dig_fastscan(dig_fname)
raw.set_montage(montage)

But even in that case, it would be nice to have the digitized data available for testing, so +1 for including it here, maybe in a subdirectory or something to make it clear that it's a non-standard (i.e., not produced by a standard setup) file.

@kuziekj
Copy link
Contributor Author

kuziekj commented Jul 21, 2020

Alright, I want to make sure I'm up to speed here.

So we currently have one big read_raw_boxy file that does a lot of different things. We want to break this down into several smaller files that do more specific/specialised things (eg. read raw boxy data, read .elp file, read .mtg file, read .evt file, etc.) that can be called when/if they are needed.

I agree that this is a good idea since other labs may not have all of these files, or they just want to look at the raw data and will get all the other information some other way.

@larsoner to answer your question about the Polhemus hardware, it wasn't included with the Imagent hardware. We're actually using a different Polhemus device in our lab (Polhemus G4) which doesn't natively create the .elp files, so I agree that we should keep the .elp files, and any .elp reader, separate from the raw boxy data reader.

So for this current PR, we only want to read in the raw boxy data and compare that to the raw data read in by another piece of software (which would be p_pod in this case). So we would have only two data files, one .mat file containing the raw boxy data as read in by p_pod, and one raw boxy data file (so a single montage and block).

So to clarify, should this .mat file contain the raw data that p_pod outputs normally, or the raw data just after p_pod reads in the boxy file but before anything else is done to the data? I ask because p_pod requires a minimum amount of processing/corrections be applied to the data (specifically either normalisation or pulse correction) before it will output anything, or even read in a data file. Unfortunately, you can't just read in a data file with p_pod and then decide to do something else. Once you hit 'Go', the data file and any processing steps are automatically done, and an output file isn't generated until all these steps are finished. I can force p_pod to stop once the raw data file is read, but you don't normally have access to this, or to any of the variables p_pod creates, when p_pod is used as intended.

@larsoner
Copy link
Member

So for this current PR, we only want to read in the raw boxy data and compare that to the raw data read in by another piece of software (which would be p_pod in this case). So we would have only two data files, one .mat file containing the raw boxy data as read in by p_pod, and one raw boxy data file (so a single montage and block).

Sounds good to me. Then once the main reader is implemented we can add files for Polhemus, etc. to add those other features you need.

So to clarify, should this .mat file contain the raw data that p_pod outputs normally, or the raw data just after p_pod reads in the boxy file but before anything else is done to the data?

We definitely want the just-after-read-but-interrupted (abnormal use) file to get basic reading working and make sure we're reading the data correctly at that level first. If it's necessary / in scope for MNE to do the other steps that p_pod goes through to produce normal files, then you can also include them here so that we can test against them later.

WDYT?

@kuziekj
Copy link
Contributor Author

kuziekj commented Jul 21, 2020

@larsoner that sounds good to me, thanks. I assumed that's what we wanted, but always good to check first.

Most of those other processing steps are incorporated into the current boxy reader, but probably good to keep them separate going forward. Aside from normalisation and pulse correction, the extra steps mostly apply corrections to the phase data (fixing phase wrapping, removing outliers, converting to pico seconds). Different labs may not want to incorporate these steps, or maybe they'd want to go about them in a different way, so perhaps they don't need their own specific tests.

At any rate, I'll generate the two files we definitely need and I'll make sure they are good to go before updating this PR.

@larsoner
Copy link
Member

Different labs may not want to incorporate these steps, or maybe they'd want to go about them in a different way, so perhaps they don't need their own specific tests.

Indeed, or there might be parameters or underlying methods that change, etc. So better to separate into just reading and then processing with mne.preprocessing.fnirs.unwrap_phase or whatever.

Let's see what @rob-luke says but sounds like we have a plan!

@rob-luke
Copy link
Member

rob-luke commented Jul 23, 2020

So better to separate into just reading and then processing

Yes sounds like a plan!

Most of those other processing steps are incorporated into the current boxy reader, but probably good to keep them separate going forward.

I think this is a good idea looking forward also because then other FD NIRS systems can use this code, not just the imagent reader.

so perhaps they don't need their own specific tests.

This bit I would disagree with, we will need to test all these steps too. Not everything needs to be tested against p_pod though, some fundamental steps can be tested with simple simulated arrays or manual numbers. I think its preferable to test against first principles rather than another software package (as that package can have bugs too). But thats just a personal thing, and theres exceptions to my own rule (e.g. I tested the beer lambert conversion against nirs toolbox).

Thanks @kuziekj

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to click the green button, push a commit to bump the version to 0.96, and cut a new release if you're happy @rob-luke

@kuziekj
Copy link
Contributor Author

kuziekj commented Jul 23, 2020

@larsoner did you want me to push a new commit with only the two data files, and excluding the others (.elp, .mtg, .evt, etc)?

@larsoner
Copy link
Member

Sure, whatever is consistent with the plan we hatched. Also the next version number is now 0.96

@larsoner
Copy link
Member

@rob-luke make sure you "Squash and merge" so that the intermediate files don't end up in history

@kuziekj
Copy link
Contributor Author

kuziekj commented Jul 23, 2020

Alright so now only two files are included:

P_pod loaded boxy data (contains dc, ac, and phase data)

BOXY/boxy_short_recording/boxy_p_pod_files/1anc071a_001.mat

Raw boxy data (extension changed to .txt since that's the default format boxy saves data as)

BOXY/boxy_short_recording/1anc071a_001.txt

@rob-luke
Copy link
Member

Thanks @kuziekj and @kylemath

@rob-luke rob-luke merged commit 8509ae6 into mne-tools:master Jul 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants