-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API update for tsv files #634
Comments
My take would be a list of tuples: new_vals = [('row_val_1', 'col_val_1', val_1),
('row_val_n', 'col_val_n', val_n)] To select all rows or columns, we could use new_vals = [(None, 'col_val_1', val_1), # all rows
('row_val_n', None, val_n)] # all cols We need a way to specify which column shall serve as the index, so specifying For example, to select all new_vals = [('status', 'bad', 'status_description', 'super bad')] but one can already see how this is going to be horrible to use. I believe what we actually want is pandas-style indexing (i.e., |
let's not reinvent the wheel here.
if we have a way to read it the events as a dataframe we can let users do
what they want
and then we offer them a way to write it back doing some schema checks.
my 2c
… |
Totally agree. I was just trying to think within the existing framework of the JSON updates. But I would very much prefer using pandas here too. We should not expose these implementation details to the users though, so let's discuss API :) I don't want to end up with one update function for JSON and another for TSV files, that's all |
Agreed internally should definitely use pandas DataFrame. The API would be just for consistency sake, since I can see that there's a lot of use for updating a list of channels with some updated column value in
I like this approach.
I think for files such as
|
can you give me a concrete usage example for an event or channel tsv file?
… |
I'm also not yet totally convinced what exactly we're trying to achieve and how to get there :) |
This would be a change to new_vals = [
('C1', 'description', 'resected'),
('C2', 'description', 'resected'),
(C40', 'description', 'left temporal lobe'),
]
# index column is assumed based on the fact that it's a 'channels.tsv' file to be 'name'
update_sidecar_tsv(bids_path, new_vals) I acknowledge this can potentially be messy down the road, but minimally I feel like it would be useful to have a tsv reader/writer that might abstract away the pandas DataFrame reading? |
I see the biggest advantage in the fact that a |
is it more than 1 line of code to read pandas? We will add a dependency with no benefit. Can't you achieve the same with |
Yes, and it'd be more flexible too, I suppose. But this raises the question why we offer JSON sidecar updating and not TSV sidecar updating? |
cause pandas does exist for JSON :)
what you describe is:
df = read_sidecar_tsv(bids_path)
new_vals = [
('C1', 'description', 'resected'),
('C2', 'description', 'resected'),
(C40', 'description', 'left temporal lobe'),
]
for val in new_vals:
df.at[val[0], val[1]] = val[2]
if we stick to such updates then yes maybe pandas is not necessary
and this could be done for all matching files.
|
Well. # %%
import json_tricks
import pandas as pd
x = {
'foo': 'bar',
'baz': {
'xxx': 1,
'yyy': False
}
}
x_json = json_tricks.dump(x, '/tmp/foo.json')
df = pd.read_json('/tmp/foo.json')
print(df)
|
:)
can you try on a json as specified by bids?
more seriously let's stick to TSV here
… |
Describe the problem
With #601, an API for "updating" BIDs dataset was introduced in mne-bids. This however, only works for JSON files currently, which are easier because it can be represented as a dictionary
update()
.I have many datasets where additional information is appended though at the channel level, which are encoded in the
channels.tsv
,electrodes.tsv
files. It would be awesome to have a similarly structured API for updating the corresponding.tsv
files inside a BIDS dataset.Describe your solution
I don't have one heh, but was hoping the brilliant minds here would have an idea on how to update a tsv file robustly and efficiently.
Describe possible alternatives
A naive way of solving the above problem would be to just have a nested dictionary as the dictionary update. That is:
an updating dictionary entries for a
channels.tsv
file would look like:Then the corresponding
status_description
andtype
are updated for channel rows "C1" and "C2". Everything else left as is.Additional context
The text was updated successfully, but these errors were encountered: