Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Make dataset description #29

Merged
merged 3 commits into from
May 21, 2018
Merged

Conversation

effigies
Copy link
Collaborator

Following BEP 003:

Derived dataset and pipeline description

<dataset>/derivatives/<pipeline_name>/dataset_description.json

Keys:

  • HowToAcknowledge
  • PipelineDescription
    • Name
    • Version
    • CodeURL
    • DockerHubContainerTag
    • SingularityContainerURL
    • SingularityContainerMD5
  • SourceDatasetsURLs - a list of URLs to the source dataset(s) (mandatory)
  • SourceDatasetsVersions - a list of versions of the source dataset(s) (optional only if SourceDatasetsURLs specify the version unequivocally)
  • License

We should start adding docker hub tags and singularity URLs as environment variables when we build images. I've put in a stub to get the MD5, which will attempt to pull the 'version' tag from Singularity Hub.

BIDS doesn't specify a field for version, which seems like an oversight, but perhaps there's another mechanism for marking it?

Long-term, it would be good to standardize the construction of this metadata in the BIDS-App spec and pybids.

Related to discussion in #28.

@adelavega
Copy link
Collaborator

adelavega commented May 11, 2018

Okay, I think you're right about this. I was mostly confused about the semantics of calling this a "dataset", but I see that such a file was recommended in the original derivatives BEP. Otherwise, it very much makes sense to include such meta-data in the outputs.

@effigies effigies requested a review from adelavega May 21, 2018 14:12
@effigies effigies changed the title [WIP] ENH: Make dataset description ENH: Make dataset description May 21, 2018
Copy link
Collaborator

@adelavega adelavega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@@ -135,9 +135,7 @@ def create_workflow(opts):
deriv_dir = op.join(output_dir, 'fitlins')
os.makedirs(deriv_dir, exist_ok=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can omit this line, since the dir is also made in write_derivative_description

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense for write_derivative_description to assume that deriv_dir exists?

Copy link
Collaborator

@adelavega adelavega May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, although that's what's nice about exist_ok, is you can use it either way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think in the future there may be other files you might want to instantiate in the folder, you could call the function something like create_derivative_template or something

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. For now I'm just going to let create_workflow do the makedirs. Might change again in the future.

@effigies effigies merged commit 1cc3afe into poldracklab:master May 21, 2018
@effigies effigies deleted the enh/description branch May 21, 2018 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants