Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boldref conversions should have the same datatype of corresponding bold #2000

Closed
oesteban opened this issue Feb 26, 2020 · 7 comments
Closed
Assignees
Labels
derivatives effort: low Estimated low effort task good first issue impact: low Estimated low impact task optimization
Milestone

Comments

@oesteban
Copy link
Member

Currently, they are float32 even when the _bold files are int16

@oesteban oesteban added derivatives effort: low Estimated low effort task good first issue impact: low Estimated low impact task optimization labels Feb 26, 2020
@oesteban oesteban added this to Triage in pipelines via automation Feb 26, 2020
@oesteban oesteban added this to the 20.0.1 milestone Feb 26, 2020
@oesteban oesteban moved this from Triage to Development backlog in pipelines Feb 26, 2020
@effigies
Copy link
Member

It's unclear what the problem is, here. We're reporting what we're using internally as a reference image, which is float32.

Is this aesthetic or is there a use case that makes matching dtypes necessary?

@effigies
Copy link
Member

Also, this will change outputs for non-erroring runs, so I would suggest making it 20.1, not 20.0.1.

@oesteban oesteban modified the milestones: 20.0.1, 20.1.0 Feb 26, 2020
@oesteban
Copy link
Member Author

We're reporting what we're using internally as a reference image, which is float32.

In my head, the outputs should not be a report of what internally happened rather than the ingredients for downstream analysis. With that mindset, the bold reference should be as consistent as possible w.r.t. the corresponding bold and data type is one of the most obvious features to do that.

For example, maybe in the future, we might switch the internal reference to a super-sampled reference for whatever reason. However, for downstream processing, it doesn't much sense to have a boldref with a resolution different to the bold it references (perhaps with the exception of sbrefs, and I'd be doubtful on this one too).

I would be happy to report that the internal boldref is float32 for the sake of transparency, but I don't think propagating the internal datatype helps shareability and ease-of-use in any way.

It's unclear what the problem is, here.

Given the argumentation above, the problem would be the inconsistency of the data type of bold-refs with that of the bold they reference.

@effigies
Copy link
Member

I might also just be missing what downstream uses there are... I thought it really was mostly a reporting thing.

@oesteban
Copy link
Member Author

The only use-case I can think of quickly is having an actual file to link from a SpatialReference field of another derivative.

@effigies
Copy link
Member

Okay. In that case the dtype is kind of irrelevant. I guess saving 50% on 3MB * 10k subjects (~15GB) is worth it.

@oesteban
Copy link
Member Author

Agreed, the proposal is for consistency rather than storage savings.

@oesteban oesteban modified the milestones: 20.1.x, 20.1.0 LTS Mar 25, 2020
@mgxd mgxd self-assigned this Apr 3, 2020
@mgxd mgxd closed this as completed Apr 9, 2020
pipelines automation moved this from Development backlog to Done Apr 9, 2020
@oesteban oesteban removed this from Done in pipelines Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
derivatives effort: low Estimated low effort task good first issue impact: low Estimated low impact task optimization
Projects
None yet
Development

No branches or pull requests

3 participants