-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConstructCSV when the directory layout is not in default format #532
Comments
Unfortunately, |
The way they did it in Niftynet was to search through a given folder for all files named for example xxx_ct.nii.gz and xxx_gt.nii.gz and save the names to a CSV. |
I started implementing something but ran into a problem right from the get-go: How should the program know how to match subjects in a single folder?
In the above example, all files contain Additionally, the above structure is somewhat related to the brain imaging data structure (BIDS) (a formalized mechanism to define data formats), but not entirely, since BIDS has definitions mostly for DICOM. Anyway, let me know what you think. |
The subjects are matched based on their number. For example 001, 002 and so on. I guess it is not a very important issue, it would just be easier in the scenario where the data was structured in such a manner, which is how it could typically be in NiftyNet or in Monai 😊 https://niftynet.readthedocs.io/en/dev/filename_matching.html#automatic-filename-matching I think it is totally ok to keep it as it already is in GaNDLF also, because it is working once we have the right directory format. |
What do you guys think about this, @AlexanderGetka-cbica, @Geeks-Sid? |
While this is of great utility, construct_csv is a starter code for folks to get started. There could be many more formats for folder structuring and while it would be great to support all of them, It is currently not in our plans. But as always, pull requests are appreciated. |
Cool, thanks for the input! What about you, @AlexanderGetka-cbica? |
I wrote some code similar to this for the automatic multi-subject feature extraction pipeline on the IPP. But as I learned, any heuristically based method is going to fail at some point. The difficulty is this: If we can safely assume that subjectIDs only differ by number, then we actually can autodetect this case (and provide a switch just in case users actually don't get the output they expect.) Is that a reasonable assumption? |
I think this is a very well-put argument. I'll ask @carlpe for more clarification. |
For the data sets consisting of only one single channel (_ct.nii.gz), the file names will differ only by number. But in case we have multiple channels such as for example several MR weightings (_T1.nii.gz, T2.nii.gz), it will differ in more than the number exclusively. I suppose it might be better to keep it as you already have it now, as there is a good reasoning for the formatting. |
Closing this until we have a different solution. |
From Constructing the Data CSV it shows the users how to construct the data CSV automatically, but this assumes the data is in the following format:
Sometimes, the files are not in this order, but for example like this:
It would be great if it was possible to use the constructCSV regardless of the directory format.
If this is already implemented, please show an example how to do this in the documentation.
Thank you.
The text was updated successfully, but these errors were encountered: