API update for tsv files #634

adam2392 · 2020-11-17T20:55:57Z

Describe the problem

With #601, an API for "updating" BIDs dataset was introduced in mne-bids. This however, only works for JSON files currently, which are easier because it can be represented as a dictionary update().

I have many datasets where additional information is appended though at the channel level, which are encoded in the channels.tsv, electrodes.tsv files. It would be awesome to have a similarly structured API for updating the corresponding .tsv files inside a BIDS dataset.

Describe your solution

I don't have one heh, but was hoping the brilliant minds here would have an idea on how to update a tsv file robustly and efficiently.

Describe possible alternatives

A naive way of solving the above problem would be to just have a nested dictionary as the dictionary update. That is:

an updating dictionary entries for a channels.tsv file would look like:

{
  'C1': {
      'status_description': 'blah',
},
   'C2': {
       'type': 'Electroencephalography',
},
}

Then the corresponding status_description and type are updated for channel rows "C1" and "C2". Everything else left as is.

Additional context

The text was updated successfully, but these errors were encountered:

hoechenberger · 2020-11-17T21:16:00Z

My take would be a list of tuples:

new_vals = [('row_val_1', 'col_val_1', val_1),
            ('row_val_n', 'col_val_n', val_n)]

To select all rows or columns, we could use None:

new_vals = [(None, 'col_val_1', val_1),  # all rows
            ('row_val_n', None, val_n)]  # all cols

We need a way to specify which column shall serve as the index, so specifying
a row value for indexing actually makes sense.

For example, to select all bad channels and change their
status_description, we'd do:

new_vals = [('status', 'bad', 'status_description', 'super bad')]

but one can already see how this is going to be horrible to use.

I believe what we actually want is pandas-style indexing (i.e.,
DataFrame.loc)

agramfort · 2020-11-18T08:13:29Z

let's not reinvent the wheel here. if we have a way to read it the events as a dataframe we can let users do what they want and then we offer them a way to write it back doing some schema checks. my 2c

…

hoechenberger · 2020-11-18T08:26:28Z

Totally agree. I was just trying to think within the existing framework of the JSON updates.

But I would very much prefer using pandas here too.

We should not expose these implementation details to the users though, so let's discuss API :) I don't want to end up with one update function for JSON and another for TSV files, that's all

adam2392 · 2020-11-19T03:40:21Z

let's not reinvent the wheel here. if we have a way to read it the events as a dataframe we can let users do what they want and then we offer them a way to write it back doing some schema checks. my 2c

Agreed internally should definitely use pandas DataFrame. The API would be just for consistency sake, since I can see that there's a lot of use for updating a list of channels with some updated column value in channels.tsv/electrodes.tsv.

new_vals = [('row_val_1', 'col_val_1', val_1),
            ('row_val_n', 'col_val_n', val_n)]

To select all rows or columns, we could use None:

new_vals = [(None, 'col_val_1', val_1),  # all rows 
            ('row_val_n', None, val_n)]  # all cols

I like this approach.

We need a way to specify which column shall serve as the index, so specifying
a row value for indexing actually makes sense.

I think for files such as channels.tsv/electrodes.tsv we can safely assume that the index is the name column? That way if you want to update the status and status_description?

new_vals = [(None, 'status', 'bad'), (None, 'status_description', 'super bad')]

agramfort · 2020-11-19T07:57:48Z

can you give me a concrete usage example for an event or channel tsv file?

…

hoechenberger · 2020-11-19T08:20:53Z

I'm also not yet totally convinced what exactly we're trying to achieve and how to get there :)

adam2392 · 2020-11-19T15:04:06Z

This would be a change to electrodes/channels.tsv. Say I need to change description of a channel, or set of channels to resected, or say left temporal lobe:

new_vals = [
('C1', 'description', 'resected'),
('C2', 'description', 'resected'),
(C40', 'description', 'left temporal lobe'),
]

# index column is assumed based on the fact that it's a 'channels.tsv' file to be 'name'
update_sidecar_tsv(bids_path, new_vals)

I acknowledge this can potentially be messy down the road, but minimally I feel like it would be useful to have a tsv reader/writer that might abstract away the pandas DataFrame reading?

hoechenberger · 2020-11-19T15:21:53Z

but minimally I feel like it would be useful to have a tsv reader/writer that might abstract away the pandas DataFrame reading?

I see the biggest advantage in the fact that a BIDSPath can essentially match more than just one TSV file (an entire study), which could be useful

jasmainak · 2020-11-19T15:43:54Z

to have a tsv reader/writer that might abstract away the pandas DataFrame reading?

is it more than 1 line of code to read pandas? We will add a dependency with no benefit. Can't you achieve the same with BIDSPath.match() and pd.read_csv(...) ?

hoechenberger · 2020-11-19T16:10:59Z

is it more than 1 line of code to read pandas? We will add a dependency with no benefit. Can't you achieve the same with BIDSPath.match() and pd.read_csv(...) ?

Yes, and it'd be more flexible too, I suppose.

But this raises the question why we offer JSON sidecar updating and not TSV sidecar updating?

agramfort · 2020-11-19T17:04:46Z

cause pandas does exist for JSON :) what you describe is: df = read_sidecar_tsv(bids_path) new_vals = [ ('C1', 'description', 'resected'), ('C2', 'description', 'resected'), (C40', 'description', 'left temporal lobe'), ] for val in new_vals: df.at[val[0], val[1]] = val[2] if we stick to such updates then yes maybe pandas is not necessary and this could be done for all matching files.

hoechenberger · 2020-11-19T20:12:00Z

cause pandas does exist for JSON :)

Well.

# %%
import json_tricks
import pandas as pd

x = {
    'foo': 'bar',
    'baz': {
         'xxx': 1,
         'yyy': False
    }
}

x_json = json_tricks.dump(x, '/tmp/foo.json')

df = pd.read_json('/tmp/foo.json')
print(df)

     foo  baz
xxx  bar    1
yyy  bar    0

agramfort · 2020-11-19T20:24:24Z

:) can you try on a json as specified by bids? more seriously let's stick to TSV here

…

hoechenberger added the enhancement label Nov 17, 2020

adam2392 mentioned this issue Apr 12, 2021

BIDS conversion of anatomical scans #693

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API update for tsv files #634

API update for tsv files #634

adam2392 commented Nov 17, 2020

hoechenberger commented Nov 17, 2020

agramfort commented Nov 18, 2020 via email

hoechenberger commented Nov 18, 2020 •

edited

Loading

adam2392 commented Nov 19, 2020 •

edited

Loading

agramfort commented Nov 19, 2020 via email

hoechenberger commented Nov 19, 2020

adam2392 commented Nov 19, 2020

hoechenberger commented Nov 19, 2020

jasmainak commented Nov 19, 2020 •

edited

Loading

hoechenberger commented Nov 19, 2020

agramfort commented Nov 19, 2020 via email

hoechenberger commented Nov 19, 2020

agramfort commented Nov 19, 2020 via email

API update for tsv files #634

API update for tsv files #634

Comments

adam2392 commented Nov 17, 2020

Describe the problem

Describe your solution

Describe possible alternatives

Additional context

hoechenberger commented Nov 17, 2020

agramfort commented Nov 18, 2020 via email

hoechenberger commented Nov 18, 2020 • edited Loading

adam2392 commented Nov 19, 2020 • edited Loading

agramfort commented Nov 19, 2020 via email

hoechenberger commented Nov 19, 2020

adam2392 commented Nov 19, 2020

hoechenberger commented Nov 19, 2020

jasmainak commented Nov 19, 2020 • edited Loading

hoechenberger commented Nov 19, 2020

agramfort commented Nov 19, 2020 via email

hoechenberger commented Nov 19, 2020

agramfort commented Nov 19, 2020 via email

hoechenberger commented Nov 18, 2020 •

edited

Loading

adam2392 commented Nov 19, 2020 •

edited

Loading

jasmainak commented Nov 19, 2020 •

edited

Loading