Request for comment: road to scikit-image 1.0 #3263
@stefanv, @emmanuelle, and I have been working on a blog post about the future of scikit-image. We want a much broader input. The only reason we haven't gotten more of you involved yet is just lack of organisation on our part. But let's call this issue the moment where we get organised.
The post is live on my blog here:
Please provide comments, suggestions, corrections, insults ;), etc, either here or on the blog. Ok maybe reserve insults for the blog...
I'm looking forward to working with all of you to make an (even more) amazing library!
The text was updated successfully, but these errors were encountered:
Hey @jni, great writeup. I'm really impressed with the way you summarized and explained the values of the project.
Here are my toughts.
User data + magic
I think maybe the most important step for the 1.0 version will have to be deciding how care for the users’ data and respect the We don’t do magic go hand in hand with the library being easy to use.
A rule that I am converging to in my own work is:
I've been personally experimenting with the
Metadata + units
I've been experimenting quite a bit with
I expect that sticking to numpy might be a good idea seeing as many people are working toward making sure the
When xarray doesn't work, I often have to access the "data" attribute, or call
I think what you have posted is a nice summary and am in general agreement.
I agree that it could help to lower the barrier to inviting additional new core members to help broaden the base of regular contributors. Finding funding support to either hire new developers or allow existing contributors to spend more time on scikit-image would also help, but is obviously easier said than done. Have there been discussions of becoming a NumFocus sponsored project or otherwise applying for external funding?
If there are plans for an updated manuscript, I would be interested in contributing to that. Publications provide at least some justification of time spent for those of us at academic institutions and I unfortunately joined a bit late to be involved in with the initial publication related to the library.
I will just add here a few items that attracted me as a user and eventual developer for scikit-image as opposed to other libraries such as OpenCV or ITK:
Aside from the specific features above, the open and helpful nature of interactions with the core team on GitHub made a positive impression and added to my desire to contribute to the project.
I would like to see bi-weekly to month releases. This would help ensure that developers see a quick return on their investment in the scikit-image infrastructure and encourage them to contribute to scikit-image as opposed to other projects or their personal forks.
Xref mailing list: https://mail.python.org/pipermail/scikit-image/2018-September/005632.html
I'm back on this topic after dropping it for a while. =)
First: I promised a way to submit anonymous comments, and now I finally have it:
The site above assures me that comments are completely anonymous by design, with zero knowledge even from the developers/admins. Please share this link! Together with the link to the blog post.
About some of the existing comments:
I am on board with this. I think all of it is extremely useful though, but PRs explicitly to simplify this process would be very welcome. (Thank you @hmaarrfk, for example, for your most recent PR removing the special-casing for PYAMG!)
I think this speaks to the mentorship part of the values. For me, @stefanv's mentorship when I was first contributing to this project was life-changing, and I definitely want to pay this forward. Could you elaborate on your ideal collaboration model for PRs? I think the main challenge here is that we are so widely distributed around the globe, so real-time collaboration suffers.
One idea is that we could have some sort of direct link "organize a micro-sprint", that sends an email to the mailing list with a topic, a doodle poll for availability, and a request for help implementing a feature. This might make it easier to get from "idea for complex feature" to "merge" in a short time, rather than months or years, and also to get help with stuff you are not super familiar with.
As a side point, I also didn't touch on it but I think as core developers we should push the final "polishing" commits ourselves, rather than ask contributors to fix things.
This to me is the most controversial of the comments. =P It is in a bit of tension with "we don't do magic", as well as the idea that we are, at our core, about scientific processing and analysis library. For example, @hmaarrfk has been making efforts to remove our core dependency on matplotlib, and I very much support these efforts: matplotlib is certainly a part of most interactive Python environments, but when it comes to producing a lean piece of software for an image analysis pipeline, it's just way too much.
HOWEVER, I think this sort of thing could find a very good place with the viewer sub-package, as something that ships separately to scikit-image.
I'd love to hear from other people on this topic: where do we want to fit in on the magic/convenience spectrum? Are they actually in tension or is this a false dichotomy?
Yes, this is a bugbear of mine also. I'd aim to deprecate importing
ditto. It requires careful thinking but I consider this very worthwhile.
Yes, I will definitely never suggest using anything other than NumPy arrays to represent our images. (This was also noted by Royi Avital on the blog post.) Metadata could be useful on the viewer side of things. Perhaps we could simply provide a
I think this is something we should consider in the far future. As Stéfan pointed out in the mailing list, one advantage of slowish release cycles is that we have time to correct things when funky APIs or actual bugs make their way into master. When we have a larger number of people on master, the number of people testing cutting-edge stuff will increase, and this might be more viable.
In terms of encouraging new contributors, this is a big deal for me, but I want to put most effort to reducing PR -> merge time, rather than merge -> release time.
Yes, my intention (which I am announcing here I guess =P) is to write papers periodically, and invite all contributors (core and not) to participate in the paper. I'd certainly like to write one to coincide with 1.0. I can attest from personal experience that my work with scikit-image was viewed quite differently at my university pre- and post- paper, and it is unfair that contributors that came after the paper don't get the same benefits. Of course I have many thoughts on the brokenness of the incentive system, but in the meantime, I think periodic papers offer (in my opinion) the best way to credit new contributors.
Stéfan has briefly looked into NumFOCUS, but there has been no concerted effort to become sponsored by them. From what I can tell though, that funding is still small-potatoes, and not something that I think would be game-changing for skimage.
Part of the purpose of this roadmap, though, is to make it easier to get funding. So let's get it finished! I'm hoping to submit a PR with the "official" version (incorporating all this feedback) next month.
Ok, that's my take on the existing comments. If anyone has further input, please speak now or forever hold your peace! At the start of November, I'll collect all the comments, try to incorporate them as best I can, and then submit a PR proposing the adoption of the roadmap. At that point, I hope that we only have to deal with minor issues with the phrasing, not the overall vision. =)
I'd also like to make the point that some of the comments may lean more in one or the other direction. Sometimes a consensus can be built, and I hope that will be the case here, but other times this is difficult. Even if you only write in to agree with particular items of this vision, this is useful for us to get an idea of community support for various aspects of it.
I would like to see many of the rough edges sanded down in chaining together scikit-image functions. There a quite a few places where individual functions are great, but you have to do a bunch of fiddling to turn the results into a format that matches the input of the next skimage function.
The way I interpreted François' comment was that it was more about 'better abstraction of ideas' rather than 'doing magic' or 'matplotlib', and I'd agree with that. It's similar to what I'm saying here too.
@hmaarrfk huh? Nelle proofread this post before it was posted, and the BIDS sprint is acknowledged twice in it. =) If you mean she might want to put it in a central location, then yes, she might. =P
I have so many more posts (in my head) from that though. Life is way too short. LOL