Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing scientific (domain-specific) images and corresponding examples and tutorials #3384

Open
emmanuelle opened this issue Aug 31, 2018 · 33 comments
Labels
📄 type: Documentation Updates, fixes and additions to documentation 💬 Discussion

Comments

@emmanuelle
Copy link
Member

emmanuelle commented Aug 31, 2018

I think scientific users could be more interested in starting using scikit-image if our documentation showed more examples corresponding to applications in different scientific domains. This could include cell images, satellite / remote sensing images, hyperspectral images, materials science images, astronomy images, etc. We could ask users to contribute some open data and examples, but before doing that is there a consensus that it would be useful?

@sciunto
Copy link
Member

sciunto commented Sep 1, 2018

I think this is extremely valuable. However, we have the size issue... A discussion started here: #3323

@sciunto sciunto added 📄 type: Documentation Updates, fixes and additions to documentation 💬 Discussion labels Sep 1, 2018
@emmanuelle
Copy link
Member Author

@sciunto There is the size issue indeed but it seems that it is due primarily to cython files; moreover the largest data files are the astronaut HOG data which are used only for testing, maybe it's possible to decrease their size by using a subimage.

@jni
Copy link
Member

jni commented Sep 1, 2018

@emmanuelle just last week I gave a talk about scikit-image, and as I was browsing the gallery for useful examples, I noted this exact problem: we have a nice gallery of examples that feel too artificial to compel scientists. I ended up using the images from your blog posts. =) So, a big 👍 from me. We need to solve the scientific examples issue, and @sciunto if that means that we need to solve the size issue, then we need to solve that issue also. =)

Since we are considering moving data out of the core package and have it download on demand, it might be a good opportunity to revisit our example data also.

@emmanuelle
Copy link
Member Author

@sciunto
Copy link
Member

sciunto commented Sep 2, 2018

So far, we used only CC0 / public domain images, and If I'm correct, that was a requirement made by @stefanv I guess for the compatibility with the BSD license.

@stefanv
Copy link
Member

stefanv commented Sep 3, 2018 via email

@hmaarrfk
Copy link
Member

hmaarrfk commented Sep 4, 2018 via email

@stefanv
Copy link
Member

stefanv commented Sep 4, 2018

Actually, I do hope to make the data problem go away, by hosting the data externally, and downloading it as necessary. We've been investigating tooling for that, such as https://github.com/frictionlessdata/datapackage-py, but currently it is too heavy a dependency. However, the developers have indicated interest in improving that situation, and we can even write our own very simplistic downloader that handles their metadata spec.

@hmaarrfk
Copy link
Member

hmaarrfk commented Sep 4, 2018

I meant: "It won't magically go away if we don't add things to it" The package at 10s of MB is already too large.

Cool package!

@emmanuelle
Copy link
Member Author

Yes, CC0/public domain makes sense, thanks for your answers. Some packages using large datasets for their examples, like nilearn, download them instead of shipping them (see for example http://nilearn.github.io/auto_examples/05_advanced/plot_neurovault_meta_analysis.html).

@GenevieveBuckley
Copy link
Contributor

Yes to this, definitely. I'd also be ok with @stefanv's solution of not bundling large images, but instead allowing them to be downloaded on demand.

It's hard to get good CC0/public domain scientific images. There was an offer to donate a 3D dataset (#2513), that conversation should be revived.

I really want a 5D microscopy dataset available. If there is sufficient interest we (@jni and I) can ask someone at work if they can make us one and donate it, but we'd need to be clear about what we want it to be useful for.

@stefanv
Copy link
Member

stefanv commented Oct 1, 2018

A note that https://github.com/fatiando/pooch is being discussed here on #3323

@sciunto
Copy link
Member

sciunto commented Oct 6, 2018

Do we start a new repository to push our new images?

@jni
Copy link
Member

jni commented Oct 7, 2018

@sciunto I don't know whether a repository is the right place to store big images, compared to something like Figshare or the Open Science Framework. And if we are going to store some files on one of these storage platforms, perhaps we might as well store all of them?

@jni
Copy link
Member

jni commented Dec 17, 2018

Just an fyi, MetPy is now using pooch, so that gives me a bit of extra confidence about using it.

Unidata/MetPy#915

@braincharter
Copy link

braincharter commented Feb 7, 2019

Hi guys, if you need real 3D data for testing from the medical field involving Frangi-Ridge detections:
https://www.dropbox.com/sh/f7uutop5v09fecc/AAAK7QJsqAS9OJZnzgOe39-5a?dl=0

It contains a CLARITY (microscopy, large white vessels) sample, a susceptibility weighting image (black blood) and a time-of-flight (white blood). It is all in the nifti format (.nii).

After the release of the 3D (nD) versions of Frangi filters and others, I'll be happy to provide examples of vessel segmentation using those.

@jni
Copy link
Member

jni commented Feb 10, 2019

@braincharter what is the license on those files?

@emmanuelle
Copy link
Member Author

For downloading images on request and only if they are not already cached, it might be interesting to take a look at what they nilearn team has written https://github.com/nilearn/nilearn/blob/master/nilearn/datasets/utils.py

@stefanv
Copy link
Member

stefanv commented Apr 8, 2019

It's high on my priority list to check out Pooch, which does exactly that. Hopefully soon!

@GenevieveBuckley
Copy link
Contributor

GenevieveBuckley commented Jun 25, 2019

I have some fluorescence microscopy data we could release under CC0 (I'm double checking this with our microscopists now EDIT: our facility manager just confirmed I do hold the ocpyright).

They are:

  • 2D RGB image of kidney tissue
  • 2D image with four color channels, lily of the valley
  • I also have a quasi-3D volume of the same kidney tissue (also with 3 color channels). There's only 16 slices in the z-stack because the sample was a very thin slice. So it might not be very interesting but you are welcome to it.

I took these while training on a Nikon inverted microscope, and the sample slides are nothing special (just the ones we get for free from manufacturers to test with).

@GenevieveBuckley
Copy link
Contributor

I guess my question is:

  1. If you want to take a look, how would you like me to share them with you?
  2. What information do you need to know about the sample and imaging conditions?

@jni
Copy link
Member

jni commented Jun 25, 2019

Awesome! They are going to be too big to stick in the repo directly, so if you share them online anywhere semi-useful (say, cloudstor), we can grab them and experiment with pooch (#3945). It would be super cool to use one or more of these in the tutorial at SciPy in 2 weeks.

As far as I know the long term public-facing storage problem isn't yet solved in general, but these will be relatively small, so Open Science Framework (osf.io) is probably a good option. I put the data from the skan paper there.

@emmanuelle
Copy link
Member Author

@GenevieveBuckley reading this conversation one year after :-). Would you be interested in adding these datasets to https://gitlab.com/scikit-image/data/ and write tutorials about these images?

@GenevieveBuckley
Copy link
Contributor

I had forgotten, thanks for this! I might try and do it this weekend, for the sprint.

@GenevieveBuckley
Copy link
Contributor

I've opened a PR over at scikit-image/data on GitLab

@sciunto
Copy link
Member

sciunto commented Aug 5, 2020

@scikit-image/core Who has the rights on gitlab? This PR is pending: https://gitlab.com/scikit-image/data/-/merge_requests/6

@emmanuelle
Copy link
Member Author

Thanks for the reminder @sciunto! For now it's Juan, Stéfan and me but we should add all the core devs who have an account on gitlab, so please ping me here or on zulip to give me your gitlab handle so that I can add you.

@emmanuelle
Copy link
Member Author

@GenevieveBuckley I just commented in the gitlab PR (just one quick question). The images are gorgeous :-). Now we'll need tutorials using these beautiful images :-). Could you please explain what's to be seen in the images and what kind of image processing would typically be done on such images?

@grlee77
Copy link
Contributor

grlee77 commented Aug 5, 2020

@emmanuelle: I am also grlee77 on gitlab

@GenevieveBuckley
Copy link
Contributor

Not so gorgeous, if you look at the clipping of the histogram at both the top and bottom of the histograms! I took them when I was learning to use the inverted Nikkon, so the purpose was not really about the quality of the images. But since I own those images outright I can donate them here, whereas most other stuff I work on requires other people's permission to share.

I'll put tutorials on my to-do list. I'm on leave next week, so I don't have a good estimate of when I might find time to start work on this.

@sciunto
Copy link
Member

sciunto commented Aug 7, 2020

no pressure! thank you @GenevieveBuckley !

@emmanuelle
Copy link
Member Author

Thanks @GenevieveBuckley ! Great to know that you have ideas of tutorials using these images. @mkcor could be interested in collaborating with you on these tutorials if you think it's a good idea (Marianne has been working on life-science-oriented tutorials thanks to the CZI grant). She is on vacation next week, so no hurry at all.

@mkcor
Copy link
Member

mkcor commented Aug 11, 2020

Thanks @GenevieveBuckley ! Great to know that you have ideas of tutorials using these images. @mkcor could be interested in collaborating with you on these tutorials if you think it's a good idea (Marianne has been working on life-science-oriented tutorials thanks to the CZI grant). She is on vacation next week, so no hurry at all.

Absolutely! Talk soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📄 type: Documentation Updates, fixes and additions to documentation 💬 Discussion
Projects
None yet
Development

No branches or pull requests

10 participants