Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add expert option to load BIDS layouts from database file #187

Merged
merged 35 commits into from Dec 9, 2019

Conversation

Shotgunosine
Copy link
Collaborator

Closes #186

@effigies
Copy link
Collaborator

You're going to want to add it to these places, as well:

layout = BIDSLayout(bids_dir)

layout = bids.BIDSLayout(self.inputs.bids_dir, validate=False)

layout = BIDSLayout(self.inputs.bids_dir, force_index=force_index,
ignore=ignore, derivatives=derivatives)

layout = BIDSLayout(self.inputs.bids_dir, derivatives=derivatives)

And this layout should be built with derivatives.

@codecov-io
Copy link

codecov-io commented Sep 27, 2019

Codecov Report

Merging #187 into master will decrease coverage by 0.02%.
The diff coverage is 82.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #187      +/-   ##
==========================================
- Coverage   76.51%   76.48%   -0.03%     
==========================================
  Files          18       18              
  Lines        1026     1029       +3     
  Branches      179      181       +2     
==========================================
+ Hits          785      787       +2     
- Misses        149      150       +1     
  Partials       92       92
Flag Coverage Δ
#ds003 76.48% <82.14%> (-0.03%) ⬇️
Impacted Files Coverage Δ
fitlins/utils/bids.py 44.64% <100%> (-0.98%) ⬇️
fitlins/interfaces/bids.py 72.96% <62.5%> (-1.14%) ⬇️
fitlins/workflows/base.py 60.21% <75%> (-0.23%) ⬇️
fitlins/cli/run.py 84.49% <93.33%> (+2.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7d2675c...da3031d. Read the comment docs.

@Shotgunosine
Copy link
Collaborator Author

Alright, trying to add the database_file option to those interfaces. Is there a preferred way to specify an optional input that should be an existing file path if provide, but be None otherwise?

@effigies
Copy link
Collaborator

effigies commented Sep 30, 2019

This or similar:

if database_file is not None:
    interface.inputs.database_file = database_file

@Shotgunosine
Copy link
Collaborator Author

Shotgunosine commented Sep 30, 2019

Something like this should work for specifying it in the input spec?

class ModelSpecLoaderInputSpec(BaseInterfaceInputSpec):
    bids_dir = Directory(exists=True,
                         mandatory=True,
                         desc='BIDS dataset root directory')
    database_file = traits.File(exists=True,
                                desc='Optional path to bids database file.')
    model = traits.Either('default', InputMultiPath(File(exists=True)),
                          desc='Model filename')
    selectors = traits.Dict(desc='Limit models to those with matching inputs')

@effigies
Copy link
Collaborator

Yes.

Copy link
Collaborator

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks reasonable.

fitlins/workflows/base.py Outdated Show resolved Hide resolved
@effigies effigies changed the title Database file ENH: Add expert option to load BIDS layouts from database file Sep 30, 2019
@effigies
Copy link
Collaborator

If this works for you, I'm good to merge it. It would be good to add tests at some point, but most of the lines we're currently missing require iterating over BIDS datasets with different properties, which makes it a finicky proposition.

@Shotgunosine
Copy link
Collaborator Author

I've currently only tested that the code still works if you don't pass a database file, I'll try to test functionality when passed a database file sometime today and add it to the CircleCI config. I've got an abstract to write today though, so I may not get to it till later on in the week.

@pep8speaks
Copy link

pep8speaks commented Oct 2, 2019

Hello @Shotgunosine, Thank you for updating!

Cheers! There are no style issues detected in this Pull Request. 🍻 To test for issues locally, pip install flake8 and then run flake8 fitlins.

Comment last updated at 2019-12-06 20:14:23 UTC

@effigies
Copy link
Collaborator

effigies commented Oct 2, 2019

@Shotgunosine Heads up that I resolved merge conflicts, so you'll want to pull before any further work.

Copy link
Collaborator

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style changes.

fitlins/cli/run.py Outdated Show resolved Hide resolved
fitlins/cli/run.py Outdated Show resolved Hide resolved
fitlins/cli/run.py Outdated Show resolved Hide resolved
@effigies
Copy link
Collaborator

Requires #197...

@Shotgunosine
Copy link
Collaborator Author

@effigies This is pretty much good to go. I think there's an issue with adding the code coverage file for the test I added. Do you think you could take a look at that?

@effigies effigies added this to Waiting on @effigies in Priority Nov 19, 2019
@effigies
Copy link
Collaborator

Yup. I'll get to this ASAP.

g_bids.add_argument('--force-index', action='store', default=None,
help='regex pattern or string to include files')
g_bids.add_argument('--ignore', action='store', default=None,
help='regex pattern or string to ignore files')
g_bids.add_argument('--desc-label', action='store', default='preproc',
help="use BOLD files with the provided description label")
g_bids.add_argument('--database-path', action='store', default=None,
help="Caution, this is an Expert level option subject to change! "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this caution is necessary given how few people use this, and how it's overall an early stage project.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is turning into less of a one-off hack. I guess we can remove the warning.

reset_database = True
make_layout = True
elif Path(opts.database_path).exists():
layout = BIDSLayout.load(opts.database_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to let pybids handle some of this logic.

That is, is path exists, let pybids decide to load the database from it.
The problem with the current approach is that if someone passes a valid db path, but then changes the CLIs arguments (so that they would actually want to re-index), no error will be thrown. So their CLI options are being silently ignored in favor for the indexing options in the db_path.

I rather let pybids crash if the options mismatch. Also simplifies this logic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

# Go ahead and initialize the layout database
if opts.database_path is None:
database_path = Path(work_dir) / 'dbcache'
reset_database = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to manually ask pybids to reset_database? It will do it on its own if it doesn't file a valid sqlite file inside the database_path.

I suppose this is safer though, as it could be a mismatching dbcache from another run.

An alternative here is to clean up the dbcache folder on fitlins teardown. (and put it somewhre more hidden, at least make it .dbcache). Or use a tmpdir, although @effigies seemed to think that would be a problem (not guaranted to be on disk).

I think the right solution is to use tempfile.TemporaryDirectory but set prefix to work_dir/.dbcache. That way its cleaned up automatically. Or can also be cleaned up manually with ``cleanup` function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is safer though, as it could be a mismatching dbcache from another run.

Yes, that's the purpose here. If somebody passes a db file, then we assume they know that it's valid. If we're creating it, we don't want to take any chances, and will always clear it.

I think the right solution is to use tempfile.TemporaryDirectory but set prefix to work_dir/.dbcache. That way its cleaned up automatically.

That would be fine, but I tend to think of anything in the scratch directory as disposable anyway.

Copy link
Collaborator

@adelavega adelavega Nov 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. Maybe I'm overthinking this then, and it might actually be useful to have the index there for debugging.

Maybe I'm over complicating things these, but I could imagine it being useful to not force re-index, when you re-run fitlins several times. I guess you could just build it yourself and pass the path in, but its nice to not have to manage that.

Feel free to ignore this suggestion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deffo want it for debugging, and this way, if they want to reuse a layout saved in a working directory, they can pass the path for it explicitly, otherwise it gets reset in order to avoid nasty surprises.

fitlins/interfaces/bids.py Show resolved Hide resolved
fitlins/interfaces/bids.py Show resolved Hide resolved
g_bids.add_argument('--force-index', action='store', default=None,
help='regex pattern or string to include files')
g_bids.add_argument('--ignore', action='store', default=None,
help='regex pattern or string to ignore files')
g_bids.add_argument('--desc-label', action='store', default='preproc',
help="use BOLD files with the provided description label")
g_bids.add_argument('--database-path', action='store', default=None,
help="Caution, this is an Expert level option subject to change! "
"Path to directory containing SQLite database indicies "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Path to directory containing SQLite database indicies "
"Path to directory containing SQLite database indices "

g_bids.add_argument('--force-index', action='store', default=None,
help='regex pattern or string to include files')
g_bids.add_argument('--ignore', action='store', default=None,
help='regex pattern or string to ignore files')
g_bids.add_argument('--desc-label', action='store', default='preproc',
help="use BOLD files with the provided description label")
g_bids.add_argument('--database-path', action='store', default=None,
help="Caution, this is an Expert level option subject to change! "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is turning into less of a one-off hack. I guess we can remove the warning.

# Go ahead and initialize the layout database
if opts.database_path is None:
database_path = Path(work_dir) / 'dbcache'
reset_database = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is safer though, as it could be a mismatching dbcache from another run.

Yes, that's the purpose here. If somebody passes a db file, then we assume they know that it's valid. If we're creating it, we don't want to take any chances, and will always clear it.

I think the right solution is to use tempfile.TemporaryDirectory but set prefix to work_dir/.dbcache. That way its cleaned up automatically.

That would be fine, but I tend to think of anything in the scratch directory as disposable anyway.

reset_database = True
make_layout = True
elif Path(opts.database_path).exists():
layout = BIDSLayout.load(opts.database_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if opts.participant_label is not None:
subject_list = bids.collect_participants(
opts.bids_dir, participant_label=opts.participant_label,
database_path=database_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just pass the layout, rather than instantiating another one in the function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it

@@ -196,7 +229,7 @@ def run_fitlins(argv=None):
except Exception:
retcode = 1

layout = BIDSLayout(opts.bids_dir, derivatives=derivatives)
layout = BIDSLayout.load(database_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that layout is guaranteed to exist, I think we can just reuse it?

Suggested change
layout = BIDSLayout.load(database_path)

fitlins/interfaces/bids.py Show resolved Hide resolved
@@ -142,6 +142,8 @@ class LoadBIDSModelInputSpec(BaseInterfaceInputSpec):
desc='BIDS dataset root directory')
derivatives = traits.Either(traits.Bool, InputMultiPath(Directory(exists=True)),
desc='Derivative folders')
database_path = Directory(exists=False,
desc='Optional path to bids database directory.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, you're setting this up as an alternative to bids_dir/derivatives, but in _run_interface, you're replacing them entirely. We should be consistent about which one we're doing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you prefer it to be optional or to have LoadBIDSModel and BIDSSelect only work if passed a database_path?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional at the CLI level for sure. But we could set up a default internal path (don't we already do that?).

@@ -395,6 +382,8 @@ class BIDSSelectInputSpec(BaseInterfaceInputSpec):
desc='BIDS dataset root directories')
derivatives = traits.Either(True, InputMultiPath(Directory(exists=True)),
desc='Derivative folders')
database_path = Directory(exists=False,
desc='Optional path to bids database path.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again. Inputs say optional, runtime says replace.

@effigies effigies moved this from Waiting on @effigies to Waiting on submitter in Priority Nov 19, 2019
setup.cfg Outdated Show resolved Hide resolved
@effigies
Copy link
Collaborator

effigies commented Dec 4, 2019

@Shotgunosine Just a reminder that this one's waiting on you.

@Shotgunosine
Copy link
Collaborator Author

Yeah, thanks, have not had time, but will try to get to it soon.

@Shotgunosine
Copy link
Collaborator Author

Alright, changes made. Let me know if there's anything else to tweak.

@effigies
Copy link
Collaborator

effigies commented Dec 6, 2019

Thanks for pushing on this. Today was slow (sick kid), but I'll try to get to this on Monday.

@effigies
Copy link
Collaborator

effigies commented Dec 9, 2019

LGTM. Thanks for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Priority
Waiting on submitter
Development

Successfully merging this pull request may close these issues.

Add command line option for saved bids layout
5 participants