New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing scientific (domain-specific) images and corresponding examples and tutorials #3384
Comments
I think this is extremely valuable. However, we have the size issue... A discussion started here: #3323 |
@sciunto There is the size issue indeed but it seems that it is due primarily to cython files; moreover the largest data files are the astronaut HOG data which are used only for testing, maybe it's possible to decrease their size by using a subimage. |
@emmanuelle just last week I gave a talk about scikit-image, and as I was browsing the gallery for useful examples, I noted this exact problem: we have a nice gallery of examples that feel too artificial to compel scientists. I ended up using the images from your blog posts. =) So, a big 👍 from me. We need to solve the scientific examples issue, and @sciunto if that means that we need to solve the size issue, then we need to solve that issue also. =) Since we are considering moving data out of the core package and have it download on demand, it might be a good opportunity to revisit our example data also. |
We might use images from Wikipedia. I was quickly browsing through wikipedia microscopy images, |
So far, we used only CC0 / public domain images, and If I'm correct, that was a requirement made by @stefanv I guess for the compatibility with the BSD license. |
Two issues arise: 1) how to communicate such requirements with users
effectively and 2) do we want to place potentially surprising restrictions
in place. For number 2, we've argued strongly "no": you should be able to
use the output of scikit-image without breaching a patent or attribution
requirement.
|
I would not use data size as an excuse not to include sample images. The
data issue is not something that will magically go away. Furthermore, I
think it is really important to run benchmarks on “out of cache” datasets.
Performance on small image sizes might be ok, but for images larger than a
few MB, it is unlikely that performance will ve the same since the data
wont fit in cache.
One way to get realistic images is to simply ask for them. We can ask for
them to be donated, maybe when people submit issues and we help them debug,
it can be appropriate to ask them to donate the image they posted under an
appropriate license. That would give us images that people actually care
about too!
…On Mon, Sep 3, 2018 at 2:52 PM Stefan van der Walt ***@***.***> wrote:
Two issues arise: 1) how to communicate such requirements with users
effectively and 2) do we want to place potentially surprising restrictions
in place. For number 2, we've argued strongly "no": you should be able to
use the output of scikit-image without breaching a patent or attribution
requirement.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3384 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFfmPRka4_PechHl9Fl_uNNcmL3-oINks5uXaSpgaJpZM4WU9R5>
.
|
Actually, I do hope to make the data problem go away, by hosting the data externally, and downloading it as necessary. We've been investigating tooling for that, such as https://github.com/frictionlessdata/datapackage-py, but currently it is too heavy a dependency. However, the developers have indicated interest in improving that situation, and we can even write our own very simplistic downloader that handles their metadata spec. |
I meant: "It won't magically go away if we don't add things to it" The package at 10s of MB is already too large. Cool package! |
Yes, CC0/public domain makes sense, thanks for your answers. Some packages using large datasets for their examples, like |
Yes to this, definitely. I'd also be ok with @stefanv's solution of not bundling large images, but instead allowing them to be downloaded on demand. It's hard to get good CC0/public domain scientific images. There was an offer to donate a 3D dataset (#2513), that conversation should be revived. I really want a 5D microscopy dataset available. If there is sufficient interest we (@jni and I) can ask someone at work if they can make us one and donate it, but we'd need to be clear about what we want it to be useful for. |
A note that https://github.com/fatiando/pooch is being discussed here on #3323 |
Do we start a new repository to push our new images? |
@sciunto I don't know whether a repository is the right place to store big images, compared to something like Figshare or the Open Science Framework. And if we are going to store some files on one of these storage platforms, perhaps we might as well store all of them? |
Just an fyi, MetPy is now using pooch, so that gives me a bit of extra confidence about using it. |
Hi guys, if you need real 3D data for testing from the medical field involving Frangi-Ridge detections: It contains a CLARITY (microscopy, large white vessels) sample, a susceptibility weighting image (black blood) and a time-of-flight (white blood). It is all in the nifti format (.nii). After the release of the 3D (nD) versions of Frangi filters and others, I'll be happy to provide examples of vessel segmentation using those. |
@braincharter what is the license on those files? |
For downloading images on request and only if they are not already cached, it might be interesting to take a look at what they nilearn team has written https://github.com/nilearn/nilearn/blob/master/nilearn/datasets/utils.py |
It's high on my priority list to check out Pooch, which does exactly that. Hopefully soon! |
I have some fluorescence microscopy data we could release under CC0 ( They are:
I took these while training on a Nikon inverted microscope, and the sample slides are nothing special (just the ones we get for free from manufacturers to test with). |
I guess my question is:
|
Awesome! They are going to be too big to stick in the repo directly, so if you share them online anywhere semi-useful (say, cloudstor), we can grab them and experiment with pooch (#3945). It would be super cool to use one or more of these in the tutorial at SciPy in 2 weeks. As far as I know the long term public-facing storage problem isn't yet solved in general, but these will be relatively small, so Open Science Framework (osf.io) is probably a good option. I put the data from the skan paper there. |
@GenevieveBuckley reading this conversation one year after :-). Would you be interested in adding these datasets to https://gitlab.com/scikit-image/data/ and write tutorials about these images? |
I had forgotten, thanks for this! I might try and do it this weekend, for the sprint. |
I've opened a PR over at scikit-image/data on GitLab |
@scikit-image/core Who has the rights on gitlab? This PR is pending: https://gitlab.com/scikit-image/data/-/merge_requests/6 |
Thanks for the reminder @sciunto! For now it's Juan, Stéfan and me but we should add all the core devs who have an account on gitlab, so please ping me here or on zulip to give me your gitlab handle so that I can add you. |
@GenevieveBuckley I just commented in the gitlab PR (just one quick question). The images are gorgeous :-). Now we'll need tutorials using these beautiful images :-). Could you please explain what's to be seen in the images and what kind of image processing would typically be done on such images? |
@emmanuelle: I am also grlee77 on gitlab |
Not so gorgeous, if you look at the clipping of the histogram at both the top and bottom of the histograms! I took them when I was learning to use the inverted Nikkon, so the purpose was not really about the quality of the images. But since I own those images outright I can donate them here, whereas most other stuff I work on requires other people's permission to share. I'll put tutorials on my to-do list. I'm on leave next week, so I don't have a good estimate of when I might find time to start work on this. |
no pressure! thank you @GenevieveBuckley ! |
Thanks @GenevieveBuckley ! Great to know that you have ideas of tutorials using these images. @mkcor could be interested in collaborating with you on these tutorials if you think it's a good idea (Marianne has been working on life-science-oriented tutorials thanks to the CZI grant). She is on vacation next week, so no hurry at all. |
Absolutely! Talk soon. |
I think scientific users could be more interested in starting using scikit-image if our documentation showed more examples corresponding to applications in different scientific domains. This could include cell images, satellite / remote sensing images, hyperspectral images, materials science images, astronomy images, etc. We could ask users to contribute some open data and examples, but before doing that is there a consensus that it would be useful?
The text was updated successfully, but these errors were encountered: