Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Unit Tests to utilize separate data store for sample images #167

Closed
wetzelj opened this issue Dec 7, 2020 · 14 comments
Closed

Comments

@wetzelj
Copy link
Contributor

wetzelj commented Dec 7, 2020

Created this issue to continue discussion started on PR #166:

With this proposed new issue, I believe that I have an opportunity to build out our test image data set with a larger variety of PHI-safe test images from a variety of makes/models of scanners. Our department has an annual employee art show in which we can image (XR, CT, US) objects to serve as the basis for works of art.

I really like this idea. And actually, we can fairly easy separate the small example data (e.g., dicom cookies) from those that are just used for testing, which arguably shouldn't even be stored in the repository because they are so large. I think what would make sense to do is follow the convention of pydicom-data, and even first ask if there is room to accept files for deid here to that library (so pydicom can have a single, shared data library). @darcymason would this be something we could consider? If we want to keep the libraries separate, and you'd want to make a new dataset not just for deid but for other dicom enthusiasts then we could also have it branded differently and then used by the library here. Either way, we could definitely have a function to decompress after install.

Having looked closer at pydicom-data, I wonder if it would be better for us to first attempt to convert the unit tests to use images that are already stored within pydicom-data? While I haven't done an exhaustive look to determine if all of our test cases could be covered by images in pydicom-data, it does appear that there's high potential that they could be. This could eliminate the need for a standalone deid test image repo.

@vsoch
Copy link
Member

vsoch commented Dec 7, 2020

@wetzelj I think this is a great idea - and we could still keep (the smaller) files alongside deid in case anyone wants a quick dataset to play with. If there is something missing, I'm hoping that @darcymason would be open to a PR for us to add it. That way, both repos can share the same testing data.

@wetzelj
Copy link
Contributor Author

wetzelj commented Dec 7, 2020

What are your thoughts on potentially upgrading pydicom beyond the current required version 1.3.0 with this?

For about two weeks now, I've been running the latest deid code with pydicom 2.1.1 and have seen no issues with this upgrade. Between 1.3.0 and 2.1.1 there have been some changes to pydicom.data which could impact how this issue is implemented.

@vsoch
Copy link
Member

vsoch commented Dec 7, 2020

Yep I think that would be totally okay - the reason for pinning was something related to the cleaner a while back, and likely it's not an issue anymore. I bumped to allow for >= 1.3.0 for the feedstock and haven't heard any issues (yet). https://github.com/conda-forge/deid-feedstock/blob/master/recipe/meta.yaml#L26. Definitely update the version to be >= whatever minimal one that you suspect needing, being conservative if possible if some folks might still be using a 1.x version.

@darcymason
Copy link
Member

so pydicom can have a single, shared data library). @darcymason would this be something we could consider?

Absolutely, no problem. Wouldn't want huge files (or a large number of smaller files) because some people will still download the whole set, but if just adding some reasonably sized ones, I would welcome that.

@vsoch
Copy link
Member

vsoch commented Dec 7, 2020

Awesome, thank you @darcymason ! @wetzelj you're good to go!

@vsoch
Copy link
Member

vsoch commented Sep 28, 2022

@wetzelj do you still want to work on this? I actually think it would be fine to creat a deid-data package too. If not I’m happy to take lead.

@wetzelj
Copy link
Contributor Author

wetzelj commented Sep 28, 2022

Good Morning @vsoch & @RobinFrcd - I won't object at all if you two want to take this on. Unfortunately I haven't been able to circle back to this and I don't see it on the horizon anytime soon. :(

I've been keeping an eye on this project - LOTS of good stuff going on! We're still using 0.2.28... The changes introduced since that version will be very handy when we have the opportunity to upgrade!

@vsoch
Copy link
Member

vsoch commented Sep 28, 2022

okay sounds good! I think I'm going to try just making a simple pip install to place the data in the expected spot. Stay tuned!

@vsoch
Copy link
Member

vsoch commented Sep 28, 2022

Looks like @RobinFrcd beat me to a PR - with an interesting idea too - moving the data to be alongside tests (and not installing to the package). @wetzelj what do you think of the two ideas?

  • have data in a separate package you install with pip, and it re-assembles as we have it now - tests / existing docs and functions to get a dataset continue to work
  • move the data to be a part of tests, and remove the docs that use get_dataset, etc., direct the user to clone the full repository if data is wanted.

@wetzelj
Copy link
Contributor Author

wetzelj commented Sep 28, 2022

The separate pip install solution seems slightly more intuitive to me. I'd prefer that approach - but honestly either would be good!

@vsoch
Copy link
Member

vsoch commented Sep 28, 2022

Gotcha. Thank you @wetzelj!

@vsoch
Copy link
Member

vsoch commented Sep 28, 2022

@wetzelj take a look at my idea here: #227 and @RobinFrcd #226

For the first, that was also my first gut instinct - it's a bit simpler in the sense that deid-data is just an external package that anyone can install, and deid uses it for tests (and someone could arguably use it for their own use case). I think it might be preferable to have this format, primarily so someone can easily install the data on a whim without cloning, and we can cleanly maintain the two projects (https://github.com/pydicom/deid-data). Let me know your thoughts!

@wetzelj
Copy link
Contributor Author

wetzelj commented Sep 28, 2022

I think this will be fine! Thanks for your work! :)

@vsoch
Copy link
Member

vsoch commented Sep 28, 2022

This is done! https://pypi.org/project/deid/0.2.36/ woot!

@vsoch vsoch closed this as completed Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants