Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Past, Present and Future of Open Science (Emergent session): Containers: Ticket to Valhalla or Ticket to the Inferno? #87

Open
jsheunis opened this issue Jun 25, 2020 · 67 comments

Comments

@jsheunis
Copy link
Contributor

Containers: Ticket to Valhala or Ticket to the Inferno?

By David Kennedy, University of Massachusetts Medical School

  • Theme: Past, Present and Future of Open Science
  • Format: Emergent session

Abstract

The containerization of neuroimaging analysis workflows has quickly become a hot topic in the OSR and beyond. But with great power comes great responsibility. Containers sometimes get presented as the 'end all and be all' by some and as a 'dangerous bandaid for masking bad software development practices' by others. What's the poor researcher to do? In this session we hope to have a pleasant discussion of the pros and cons, useful application areas, and practical logistics about using containers in the 'real world'.
We propose to present this as a round table with input from a number of perspectives, then followed by a dialog and public discussion aimed at determining where the community stands regarding 'best practices' and use of containers. The round table may include (subject to confirmation and further discussion): Jo Etzel, Pierre Bellec, Peer Herholz, Satra Ghosh, Agah Karakuzu.

Useful Links

https://github.com/ReproNim/neurodocker
https://ww5.aievolution.com/hbm1901/index.cfm?do=abs.viewAbs&abs=4639

Tagging @dnkennedy

@PeerHerholz
Copy link

tagging @satra, @dnkennedy, @jaetzel and @pbellec as interested folks based on the Mattermost thread. Did I miss anyone/who else should be tagged?

@gllmflndn
Copy link

Great! Don't forget @stebo85

@PeerHerholz
Copy link

Thx @gllmflndn, was also thinking about @stebo85, but wasn't sure if he could make it given the time zones! @stebo85 is a there a time that would work for you?
Also saw that I forgot @agahkarakuzu, sorry.

@pbellec
Copy link

pbellec commented Jun 25, 2020

LGTM. I think Gael Varoquaux has pretty strong opinions about 'dangerous bandaid for masking bad software development practices' (could not locate the link, but I think he wrote a blog post about that a while back).

It would also be great to have someone speak about reproducibility and containers. Maybe Valerie Hayot?

I am happy to be dropped from the discussion, as I don't think I have expertise not covered by others. I guess I could play the role of devil's advocate, as I am not sold on the utility of containers as a software distribution tool.

@PeerHerholz
Copy link

tagging @ValHayot, thx @pbellec.

@gllmflndn
Copy link

Please stay @pbellec, we need a diversity of opinions!

Searching for Gael's blog post, I found this:
http://gael-varoquaux.info/programming/of-software-and-science-reproducible-science-what-why-and-how.html
http://ivory.idyll.org/blog/2014-containers.html

@PeerHerholz
Copy link

tagging @GaelVaroquaux to check if he would be interested and has bandwidth to stop by.

@GaelVaroquaux
Copy link

Happy to complain :).

When exactly do you need me?

@PeerHerholz
Copy link

thx @GaelVaroquaux, the rock of complains, hehe! Also tagging @hcp4715 who did a lot of work to introduce containers in his lab/institute and certainly has important and interesting points to add.

@hcp4715
Copy link

hcp4715 commented Jun 25, 2020

@PeerHerholz, FYI, yesterday we have an excellent master student in China wrote a Chinese tutorial that covered the whole process from installing docker, to using heudiconv, and running fmriprep, in both Linux and Windows (you can imagine the frustrations he had experienced ;)). We put is on OSF: https://osf.io/naxgd/

@ValHayot
Copy link

Hey, thanks for the tag. While I do have opinions, I'm no expert on the matter. I'll tag @gkiar and @ali4006 who have done extensive work on this.

I'll try to listen in though :)

@PeerHerholz
Copy link

Thx @hcp4715, cool! Following up on our conversation in Mattermost: we need a diverse set of experience levels, use cases and backgrounds in order to create a fruitful discussion. So far I think the following have been mentioned (thx @emdupre):

  • tool developers that use containerization
  • tool developers for containerization
  • software consumers in small projects
  • software consumers at scale
  • educators that taught containerization

Please discuss and add further groups!

@emdupre
Copy link

emdupre commented Jun 25, 2020

It looks like there's already a pretty clear mapping:

If you want to add more, then I'd aim for the slots with only one person. But not sure how big you're envisioning this !

@dnkennedy
Copy link

Any comments on time for this? While there are still a number of open slots?
Wednesday or Thursday?
5am, 2pm, 3pm, 9pm, 10pm EDT?

@jaetzel
Copy link

jaetzel commented Jun 25, 2020

All but the 5 am EDT slot are ok for me, either day. 5 am EDT is possible, but very early for me.

Any comments on time for this? While there are still a number of open slots?
Wednesday or Thursday?
5am, 2pm, 3pm, 9pm, 10pm EDT?

@hcp4715
Copy link

hcp4715 commented Jun 25, 2020

Any comments on time for this? While there are still a number of open slots?
Wednesday or Thursday?
5am, 2pm, 3pm, 9pm, 10pm EDT?

5 am, 2 pm, and 3 pm EDT works for me (time zone CEST).

@satra
Copy link

satra commented Jun 25, 2020

wed: 2,3,9 EDT
thu: 9 EDT

@GaelVaroquaux
Copy link

wed, 2 and 3pm EDT?

@PeerHerholz
Copy link

No preference on my side, all times would work for me!

@stebo85
Copy link

stebo85 commented Jun 25, 2020

5:00 am, 9pm and 10pm work for us in Australia :)

@Starborn
Copy link

Starborn commented Jun 25, 2020 via email

@dnkennedy
Copy link

With huge apologies to the APAC time zone, due to the voting above, Wed 1 Jul 2020 7pm - 8pm (GMT) (Wed 7/1/2020 3:00 PM - 4:00 PM) is the time slot I requested. Can we come up with a way to extend the conversation (more or less formally?) to include the APAC later in the day, perhaps seeded by the initial discussions in the above time slot? I am sorry, this was one of the hardest part of agreeing to be the abstract submitter :-(

@jaetzel
Copy link

jaetzel commented Jun 26, 2020

@Starborn, your comments ring so true for me, both the initial reaction at seeing what accessing a container actually looks like, and the amount of time involved. Even the impulse for more and more tutorials ...

@satra
Copy link

satra commented Jun 26, 2020

pasting from mattermost.

a few questions to consider for discussion

  • is it appropriate to say that a container is a software, and therefore all caveats of software apply to containers? does this conceptualization miss any key dimensions? or is better to say a container is more like a computer?
  • how have containers made your research life better/worse?
  • for what applications/use-cases are containers well suited/ill suited?
  • what knowledge is required to create/use containers?
  • do containers make research more or less reproducible? or has no impact on reproducibility?
  • how do you find containers that do what you want to do?

it may also be useful to create and evaluate a set of polls prior to the discussion

@guiomar
Copy link

guiomar commented Jun 26, 2020

Thanks for organizing this!!
Do you know if they can also be efficiently used with matlab code?

@yarikoptic
Copy link

I would be a happy defense attorney for containers, hammers, chainsaws, and any other useful tool or tech!

@yarikoptic
Copy link

@guiomar :

Do you know if they can also be efficiently used with matlab code?

"Efficiently" - not sure. But you can just place matlab inside and then expose license from outside. I know that

@Starborn
Copy link

Starborn commented Jun 28, 2020 via email

@civier
Copy link

civier commented Jun 28, 2020

Hello All,
Just to clarify the VNM (https://github.com/NeuroDesk/vnm) runs equally well on workstations or on the cloud. We are also looking into making it work on HPC, though there are several technical challenges with that. If anyone has experience with nesting Singularity containers, please get in contact with me at orenciv@gmail.com
Oren

@satra
Copy link

satra commented Jun 28, 2020

@dnkennedy - was the time decided? and are there any todo's?

@complexbrains
Copy link

complexbrains commented Jun 29, 2020

This event has been scheduled to be run on 01.07.2020, 19:00- 20:00 UTC

For more information, please go to https://ohbm.github.io/osr2020/schedule/emea

@dnkennedy
Copy link

dnkennedy commented Jun 30, 2020

OK, for better or worse, I've tried to distill what and how we might present this OSR Containers session. We have a handful of invited folks that cover a variety of application areas (software developers, container developers, consumers, and educators). Each presenter gets 4 minutes to briefly say something about the 'good', or the 'bad' or the 'good but difficult' issues with using containers in the 'their real world'. We will keep a time clock, and a scoreboard: https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0. I then want to also open it up to the rest of the community for similar 4 minute statements about their good/bad/problem containers issues in their world. This will be about collecting these issues, not solving them (we do not have enough time here to argue/solve/discuss at much length the details of any of the issues themselves). With these proceedings, I posit that we can then, as a community (off line), attempt to develop a document along the lines @satra suggested and by way of doing that, discuss/argue/debate/resolve (I hope) the details of the various issues. Of course, having @satra 's points of discussion in mind can influence what good/bad/problem anyone brings up, but addressing those directly,I think, is too far reaching for a 1-hour session with community involvement...

@satra
Copy link

satra commented Jun 30, 2020

@dnkennedy - sounds like a plan! using the forum to listen to and aggregate different viewpoints would indeed be a great starting point.

@dnkennedy
Copy link

@satra Any TODO's you ask? Well, if folks can tolerate the design I put together, we need to promote the session and make sure that those who will be speaking know the 'ground rules' and scope. Some questions remain: should we let the speakers pre fill in their bullet points on the 'scoreboard'? I think there will only be one shared screen (mine) which can just be the 'scoreboard' with the community filling it in as we go. It MIGHT be possible for a speaker to provide me 1 slide or webpage that I could display.

@dnkennedy
Copy link

The @satra post-session community white paper topics for discussion (as alluded to above):

  • is it appropriate to say that a container is a software, and therefore all caveats of software apply to containers? does this conceptualization miss any key dimensions? or is better to say a container is more like a computer?
  • how have containers made your research life better/worse?
  • for what applications/use-cases are containers well suited/ill suited?
  • what knowledge is required to create/use containers?
  • do containers make research more or less reproducible? or has no impact on reproducibility?
  • how do you find containers that do what you want to do?

@dnkennedy
Copy link

So, do @GaelVaroquaux @PeerHerholz @gllmflndn @satra @ValHayot @stebo85 @hcp4715 @jaetzel and @pbellec consent to the plan outlined above and in https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0?

scheduled to be run on 01.07.2020, 19:00- 20:00 UTC

I have to provide email addresses to the OSR to get the Zoom links sent to ya'll. I will be sharing my screen, with the 'scoreboard' of the aforementioned google spreadsheet.

You can share with me 1 slide or one URL that I can try to show during your 4 minutes! Please stick to the 4 minutes, I will be draconian. Please try to stick to the enumeration of (any of the) good things, bad things, problem things and solution things about containers. You can pre-fill in the aforementioned scoreboard/spread sheet, if you want. These are all longer discussions for the future, in this session we are collecting... Feel free to also share links to other presentations, tools, resources that you want, even though we can not get into their details in this session's format.

@PeerHerholz
Copy link

One important aspect I think that's missing is the scale, as outlined by @jaetzel and @emdupre in the Mattermost channel: on what project scale should/could/must containers be used?
Something along the lines of:
lab - institute/center - multisite - consortia and single publication - multi publication - software package with dataset intended for eventual public use somewhere in there.
Re community feedback: a tweet/mattermost message should be sent out asap so that folks can gather and prepare information and their points.

@yarikoptic
Copy link

One aspect which containers facilitate is standardization of the application interfaces: BIDS-Apps, Flywheel gears, Brainlife ABC apps, Boutiques; and even more generic Singularity SCI-F Apps (harmonization of entry points within single container). Although such APIs can be used without containers, IMHO abstraction away from "software distribution" aspect helped to concentrate on APIs, and now they are typically used only with the containers. I think some exposure to those and discussion on possible ways to improve interoperability (and metadata harmonization to facilitate discovery between associated platforms) would be a valuable topic.

@yarikoptic
Copy link

Questions concentrate around "research", but many participants and audience will also be "scientific software developers". So discussion of aspects related to software development where containers provided huge assistance IMHO is a worthwhile topic: use of containers for troubleshooting/debugging, continuous integration, etc.

@dnkennedy
Copy link

A bunch of good points above. 1) scale. I collapsed the (larger and smaller) scale into the section on Software Consumers of various scale, so that scale dimension can still be explored... @yarikoptic 's additional lovely points may need a whole additional emergent session to really get to. But, to the extent that these are some of the pointers to some of the 'good' of containers, make sure you get them into the 'good' column of the 'scoreboard'!

@dnkennedy
Copy link

Hi @gllmflndn @GaelVaroquaux @ValHayot @hcp4715. Please confirm that you're on board with this plan, and you have the zoom info. Sorry for the chaotic communication, too many channels of communication for my small internet-less brain...

@ValHayot
Copy link

ValHayot commented Jul 1, 2020

Yep! works for me / got the email.

@raamana
Copy link

raamana commented Jul 1, 2020

Another point to be considered, that I noted in mattermost last week and that is very important IMHO, is to prioritize numerical and algorithmic stability/reproducibility as the first resort to achieving reproducibility. When possible (might not always be), this would return better bang for the back IMHO, over "nuking" the app with tons of layers of containers (even if one doesn't see them), that adds to the complexity of the app as well as difficulty in usage.

My experience with BIDS-App OPPNI and graynet/hiwenet partly contributed to the above point of view. Looking back, I feel standardizing HPC environments with the same stack would save a ton of effort and money, which moving the science forward. Just my 2¢.

@gllmflndn
Copy link

@dnkennedy Yes, got your email, thanks! Time is not ideal for me so if I miss the beginning of the session or lurk in the background, just skip me - or I'll try to send you a short summary of some of my thoughts on the topic.

@dnkennedy
Copy link

Hi. In an above comment I put an incorrect link to the 'scorecard'. I corrected it above, but am repeating the correct 'scorecard' link here: https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0. Apologies to anyone who tried that above link and didn't get let into an internal doc that was just my reconstruction of the Mattermost /Town hall container discussion thread before it moved to to the containers channel.

@dnkennedy
Copy link

OK, @gllmflndn Would love you input and thoughts regarding the containerization of all things SPM and beyond... Either in person, or at least in the scorecard doc (https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0).

@gllmflndn
Copy link

@dnkennedy Thanks, just seeing the scorecard now - to be honest, my thoughts (and beyond) seem to be nicely covered by @satra and @GaelVaroquaux.

@dnkennedy
Copy link

@gllmflndn It's ok to reiterate a little, that way we effectively "+1" some of the common topics that are important to multiple folks. If it's easier, I guess you can annotate the other points with a "+1" in some other way...

@dnkennedy
Copy link

Hello again. Yesterdays session was the 'fun' part. Now, the 'hard' work starts of trying to sift and consolidate the raw observations, in order to see what came out. Any good ideas about how to proceed? Can we get volunteers to take a column each (C (Good), D (Bad), E (Problems), F (Solutions)) to distill into a bullet list of points (with a counter of how many times a similar thing came up)? [vertical integration]. Then we can follow that up with a horizontal integration...

@yarikoptic
Copy link

And also make it available for comments. E.g. although I agreed with @GaelVaroquaux about "Encourage bad behavior from tool developer perspective (not worrying about portability, dependences)" I later reconsidered it: I saw many projects where trying to create a Dockerfile lead developers to realize shortcomings of their build process/infrastructure and have them addressed. So it is again the stick of two ends and not all "black and white".

@dnkennedy dnkennedy changed the title Past, Present and Future of Open Science (Emergent session): Containers: Ticket to Valhala or Ticket to the Inferno? Past, Present and Future of Open Science (Emergent session): Containers: Ticket to Valhalla or Ticket to the Inferno? Jul 2, 2020
@Starborn
Copy link

Starborn commented Jul 2, 2020 via email

@dnkennedy
Copy link

Hi @Starborn ; the raw notes from the session are at https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0. The whole community is invited to help refactor these raw notes into a more coherent set of observations and then a more formal 'best practices' recommendation.

@Starborn
Copy link

Starborn commented Jul 3, 2020 via email

@robertoostenveld
Copy link

A thought that stuck with me following the online discussion was

If all the people that are spending time on making FreeSurfer (*) containers would contribute a bit to improving FreeSurfers release/packaging/installation/deployment/infrastructure mechanisms, would that not be much more effective?

(*) you can insert your favourite software here instead of FreeSurfer, but it was one that was explicitly mentioned

I think that for many computer scientists is more interesting to spend the time on "your own" software/container than on someone else's open-source project. This reflects a problem with the academic incentive structure, which does not favour contributions to "someone else's" projects or software. The same problem would not only apply to analysis software, but also to the containers from other people.

@dnkennedy
Copy link

This sentiment resonates with me. Is that to say, there really should only be 1 FreeSurfer 6.0 container (again, taking a 'random' example), and it should live in some well known standard place, and everyone should use that unless there is a really good reason to make an new FreeSurfer 6.0 container, then fine, document why, and put it in a standard place?

@satra
Copy link

satra commented Jul 3, 2020

even for freesurfer there are many use cases: neurodocker distributes a minimized freesurfer just for recon-all while most of these big packages have many needs. the freesurfer group themselves now release a version of freesurfer as a whole container.

yes, whole installations can (and are) be(ing) distributed by people who develop the software. but there are many use cases for container construction (e.g., fmriprep, giraffe.tools, optimize size for running/shipping).

take a look at the ga4gh registry of containers to see what can be done to help users. i think in this area they did a really good job: https://dockstore.org/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests