Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Steps towards a NeuroLibre pre-print publication #42

Open
jcohenadad opened this issue Jan 11, 2024 · 25 comments · Fixed by #91
Open

Steps towards a NeuroLibre pre-print publication #42

jcohenadad opened this issue Jan 11, 2024 · 25 comments · Fixed by #91
Assignees

Comments

@jcohenadad
Copy link
Member

jcohenadad commented Jan 11, 2024

We plan to submit the manuscript as a preprint to NeuroLibre. In this issue we discuss what it takes, who is doing what, etc.

@mathieuboudreau mathieuboudreau self-assigned this Jan 17, 2024
@mathieuboudreau
Copy link
Member

I told Nikola I'd give a crack at it for some time I have available this week to get an idea of how quick that might get done; I'll start a new repo for it just because I need to mess with some github settings to get it working with Jupyter Book

@mathieuboudreau
Copy link
Member

@jcohenadad
Copy link
Member Author

@mathieuboudreau we aim at submitting the manuscript in ~1w from now. Do you have an ETA for the neurolibre?

@mathieuboudreau
Copy link
Member

mathieuboudreau commented Feb 2, 2024

@jcohenadad A few update points and general points before answering:

  • Annoyingly, had difficulty getting datalad setup (and working) in a Binder-generated container; had to resort to download a snapshot of the data directly and host it as a Google Drive link (for my other NeuroLibre submissions, we ended up having to do this too because NeuroLibre requires that data be fetched and archived using repo2data, and in addition to my troubles mentionned aboved, repo2data doesn't support fetching the derivatives/ folder when a datalad link is provided.
  • Annoyingly x2, running the notebook (well, the registration step) requires more RAM than I can provide Docker with my 3 yr old laptop (6GB out of my system's 8GB). MyBinder also could not handle this, they've got a very low RAM allowance (2GB). I reached out to Agah to see if I could the NeuroLibre test server to test the capability of running the notebook there, but the test server is not currently operational. The NeuroLibre production server (post-publication) has a ~7.8 GB RAM limit which I think would make this notebook compatible (Colab has 12GB), but I wouldn't be able to know for sure prior to publication. I asked Agah if it would be can to raise this limit for a specific publication, and he said possibly.
  • Due to the issue above, I couln't ensure that the notebook is reproducible in an environment outside Colab.
  • Due to the issue above x2, I resorted to running the notebook in Colab, zipping the data+output folder, and host it on Drive. I set this link as the data that repo2data will/would archive in the NeuroLibre submission.
  • I've got Jupyter Book build setup for NeuroLibre (single page doc), and it auto-builds with GitHub actions to Pages so it should work fine for NeuroLibre's build.
  • Following this, I've setup a slightly pretty-fied notebook draft that's compatible with a NeuroLibre submission.

Link: https://shimming-toolbox.github.io/rf-shimming-7t-neurolibre/

As for getting it published in NeuroLibre before you click submit:

  • I think it's possible; @agahkarakuzu and I were able to get my T1 manuscript resubmission through the NeuroLibre steps quite quickly. At the minimum, a DOI is generated & available much earlier than the final accept step from NeuroLibre
  • The "Jupyter Book" as is is just a copy of your notebook at the moment; I wasn't given any instructions on going beyond that. I was thinking about adding plotly figures next now that I've got the data "archived" and a much easier to work with setup, but for a 1w deadline maybe that could only be done for your lineplots and not images.
  • I had considered adding some sample images of the outputs for some of your steps to fill in the document a bit, but can probably skip this.
  • If you want & give me access to the manuscript, I could add some (not all) of that content to the notebook to give a bit more context (the Colab notebook is all the info I had about this work, so I can't really write it out myself)
  • I'd need information about the manuscript submission (eg summary; could be abstract. acknowledgements), possibly a new paragraph or two (eg statement of need).
  • I'd need a list of authors, their ORCID, their affiliations
    • Both for the paper.md but also to fill in the top of the Jupyter Book HTML page

Hope this is all clear - let me know which directions you'd like me to go in for some of the points above. Maybe I could push @agahkarakuzu a bit more to get access to the production server during the review stage to ensure that, after publication, the notebook would run on their servers without crashing during the registration step and without timing out.

@agahkarakuzu
Copy link
Collaborator

agahkarakuzu commented Feb 2, 2024

@mathieuboudreau to clarify:

At the minimum, a DOI is generated & available much earlier than the final accept step from NeuroLibre

We can generate a DOI-formatted interactive preprint URL, but it will not be minted an official DOI before publishing it. Also to achieve this, we need to have the repo submitted and the REVIEW started (so that we have the corresponding issue ID).

@jcohenadad
Copy link
Member Author

Is there a way to not have to duplicate the repos into a new repos? One scenario I anticipate is that, in 6 months, we find a bug on the original repos, we fix it on the original repos and we forget to fix it on the 2nd repos.

@mathieuboudreau
Copy link
Member

Is there a way to not have to duplicate the repos into a new repos? One scenario I anticipate is that, in 6 months, we find a bug on the original repos, we fix it on the original repos and we forget to fix it on the 2nd repos.

Yes I think this is possible, I can either update this current repo to be neurolibre compatible (and move any colab-specific files/notebooks, though there may be a way to merge them together somehow), or simply make a specific branch and point neurolibre to that one. I was just using the new repo to do dev so that you could mute it if you got annoyed by the frequent commits and such (but wanted to keep an eye on this current one).

@mathieuboudreau
Copy link
Member

@jcohenadad I've done some upates + converted the plots to a plotly figure (https://shimming-toolbox.github.io/rf-shimming-7t-neurolibre/), there's a few paths that I can take from here depending on your preference(s)

Path 1

Keep the overall structure of your Colab notebook (eg text is flow of technical info about the analysis), i.e. as it is now.

Path 2

Convert the NeuroLibre submission to a full "preprint" version of the manuscript, i.e. all the text in it but with the code for the figures enbedded as hidden cells in the HTML, i.e. like we did for our T1 mapping challenge manusript and cNeuromod manusript.

The disadvantage with taking this path, is that I'd have to wait for all the co-authors are done their changes to the manuscript before then adding the text/references to the NeuroLibre notebook & formatting the text (takes about a day). This path means a bit more of a delay before submitting to NeuroLibre (and to MRM), as you gave the co-authors until Friday to give their feedback.

Now for both paths, there's a few other decisions/limitations to consider:

Structure of repo notebook(s)

You mentioned that you'd like the notebook to live in this current repo, in case you want to update it/them in the future. Note that, the NeuroLibre publication is essentially an archive of the notebook/data/environment; even if you make changes here, the HTML and accompanying Jupyter Notebook that people would view will never get updated, regardless of what you change here.

So that brings my question, would you want to have two separate notebooks in this repo (one just for NeuroLibre that would likely not be changed later on, and one mostly for a Colab link here that you would always do changes to), or one that would be compatible with both. If the latter case, just a forwarning that, to make the notebook compatible and work smoothly with both the NeuroLibre submission pipeline and NeuroLibre Binderhub (ie., not have it execute the actual pipeline by default; only download the processed data and plot it. An optional flag to run the pipeline in Binderhub would be set), the notebook would not be as clean as your current Colab one (i.e. most cells would have flags that would change the behaviour/what is run depending on if it's in a plot-only mode, running in Binder, or running in Colab).

Note that, overall, Colab provides more resources, so it may be nice to have a notebook compatible with it "as a backup" in case the notebook hits a limit in Binder (though I'd really like it to run completely there; @agahkarakuzu have you got an ETA/idea of how I could test that during the submission?)

If you'd like to have a quick chat tomorrow to touch base on some of these questions let me know; tl;dr I can either do some final touches and submit NeuroLibre close to what it is right now likely by end-of-day tomorrow, or wait for the manuscript and make it look like a full preprint of the manuscript.

@jcohenadad
Copy link
Member Author

Thank you so much for all your efforts @mathieuboudreau, the notebook as is now looks great. I think "path 1" makes more sense because:

  • I don't see the point in duplicating the text of the published article (which will be open access)
  • Less work for you
  • No need to wait for co-authors to finish their edits

But if we go with path 1, one can also wonder: what is the point of a neurolibre book if it's essentially a 'more cosmetic' version of the google colab? Some arguments

  • results are already produced (ie: no need to wait for the notebook to run)
    • counterargument: we can save the outputs from a notebook
      • possible countercounterargument: can the saved outputs be interactive?
  • neurolibre is a publishable object

Tagging @nstikov @pbellec because this discussion is at the core of the user-who-wants-to-get-their-notebook experience.

@mathieuboudreau
Copy link
Member

Thanks @jcohenadad !

Here are a few more listed benefits of using NeuroLibre:

  • It generates (and hosts) a docker container image that stores a snapshot of the project + the dependencies that were installed during submission. This image will live «« forever » on the servers, as opposed to images generated by MyBinder (purged regularly), and no worry of complications resulting from reinstallation of the dependencies on Colab in 1, 2, 5, 10 years (eg see your current requirements file; no fixed versions).
  • NeuroLibre provides a technical evaluation of the submission (ie the notebook runs, and does what it says the authors does). They also will point out some potential flaws in your environment/script setup.
  • A snapshot of the data is archived on the NeuroLibre servers (ie no downloads will occur when in a Docker session on their servers)
  • Access to the NeuroLibre Binderhub (more memory, longer computation time limit vs MyBinder)
  • Web hosting of the Jupyter Book that’s generated
  • As you mentionned, a DOI associated to the notebook that you can cite

There might be more, @agahkarakuzu knows the backend in and out, so he can comment best more likely.

@pbellec
Copy link

pbellec commented Feb 6, 2024

in brief: neurolibre tests the submission and archives everything needed to reproduce the work as proper academic records, for the long run. Collab does not offer any of that.

@mathieuboudreau
Copy link
Member

mathieuboudreau commented Feb 7, 2024

Debugging an issue now related to a folder permissions error during the SCT installation when in a docker container. It worked for me everyday last week, but today started to fail. This would impact a NeuroLibre build, as they use repo2docker as well I believe.

Opened an issue on repo2docker: jupyterhub/repo2docker#1334

And DM'd @joshuacwnewton (for now, will post to forum later if he deems relevant to SCT specifically) with the following log from inside a Docker session with a blank repo2docker Docker image:


mathieuboudreau@b363f05df303:~$ mkdir content
mathieuboudreau@b363f05df303:~$ cd content
mathieuboudreau@b363f05df303:~/content$ cd ..
mathieuboudreau@b363f05df303:~$ git clone https://github.com/spinalcordtoolbox/spinalcordtoolbox ~/content/sct

cd ~/content/sct
yes | ./install_sct
Cloning into '/home/mathieuboudreau/content/sct'...
remote: Enumerating objects: 60992, done.
remote: Counting objects: 100% (2211/2211), done.
remote: Compressing objects: 100% (1317/1317), done.
remote: Total 60992 (delta 1462), reused 1474 (delta 880), pack-reused 58781
Receiving objects: 100% (60992/60992), 119.15 MiB | 16.17 MiB/s, done.
Resolving deltas: 100% (35117/35117), done.


*******************************
* Welcome to SCT installation *
*******************************



Checking OS type and version...

Linux b363f05df303 6.6.12-linuxkit #1 SMP Fri Jan 19 08:53:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Checking requirements...


OK!


SCT version ......... 6.2.dev0
Installation type ... in-place
Operating system .... linux (unknown)
Shell config ........ /home/mathieuboudreau/.bashrc

SCT will be installed here: [/home/mathieuboudreau/content/sct]



Do you agree? [y]es/[n]o: 
Skipping copy of source files (source and destination folders are the same)


Do you want to add the sct_* scripts to your PATH environment? [y]es/[n]o: 
Downloading Miniconda...


wget -nv -O /tmp/tmp.j49W08kc15/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

2024-02-07 19:15:28 URL:https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh [141613749/141613749] -> "/tmp/tmp.j49W08kc15/miniconda.sh" [1]

Installing Miniconda...


bash /tmp/tmp.j49W08kc15/miniconda.sh -p /home/mathieuboudreau/content/sct/python -b -f

PREFIX=/home/mathieuboudreau/content/sct/python
Unpacking payload ...
                                                                                                                                                           
Installing base environment...


Downloading and Extracting Packages: ...working... done

Downloading and Extracting Packages: ...working... done
Preparing transaction: ...working... done
Executing transaction: ...working... done
installation finished.

Creating conda environment...


CondaError: Error encountered while attempting to create cache directory.
  Directory: /home/mathieuboudreau/.cache/conda/notices
  Exception: [Errno 13] Permission denied: '/home/mathieuboudreau/.cache/conda'


Installation failed!

Please find the file "/home/mathieuboudreau/content/sct/install_sct_log.txt",
then upload it as a .txt attachment in a new topic on SCT's forum:
--> http://forum.spinalcordmri.org/c/sct

mathieuboudreau@b363f05df303:~/content/sct$ 

@pbellec
Copy link

pbellec commented Feb 7, 2024

in brief: neurolibre tests the submission and archives everything needed to reproduce the work as proper academic records, for the long run. Collab does not offer any of that.

Trying to refine the argument. TL,DR google collab notebooks break relatively fast, and NeuroLibre preprint do not.

There is a very fast decay of google collab notebooks. Check this example from a tutorial with cneuromod data which I believe was set two years ago: https://colab.research.google.com/drive/10aKI0NcSqWbwxOgvBrcv6xk-LXhKNM2u?usp=sharing#scrollTo=iL3KlwjxgOOq
It won't run because it uses data hosted on google drive, and the data is no longer available.

As Mathieu already pointed out, this decay will also happen with dependencies. I would be very surprised that a google collab environment runs after 3-5 years (or much less than that really). A google search for "broken dependencies google collab" gives over 100k hits (sample).
Fun fact: ubuntu does retire their channels. Controlling versions in a docker build file does not mean this environment will build in a few years. The only way to make a work reproducible is to save the binary environment along with the code.

I would not be surprised if neurolibre preprints still run in a couple decades, provided we manage to maintain the platform. I am saying this because we archive binary container images, and that given the very large number of container binaries out there it is almost certain that archivists will develop solutions for long term support of these images.

@mathieuboudreau
Copy link
Member

Debugging an issue now related to a folder permissions error during the SCT installation when in a docker container. It worked for me everyday last week, but today started to fail. This would impact a NeuroLibre build, as they use repo2docker as well I believe.

Issue is now fixed, see this comment: jupyterhub/repo2docker#1334 (comment)

@jcohenadad
Copy link
Member Author

@mathieuboudreau just checking what is the timeline for the NeuroLibre submission? Thanks

@mathieuboudreau
Copy link
Member

@jcohenadad I mad a PR last week (#91) and tagged/pinged you for review. I’m waiting for the these changes to be merged before submitting to NeuroLibre, as you requested in this thread that the submission be hosted in this repo and not in the separate repo I had made. So the ETA is as soon as you approve those changed/content and it gets merged master, I’ll submit (which takes just minutes).

@mathieuboudreau
Copy link
Member

Just currently doing a check the the script/setup reproduces on Colab now as a last-minute sanity check (already found and fixed one minor bug), and that the repo2docker setup also works. Will submit to NeuroLibre ASAP after this check

@mathieuboudreau
Copy link
Member

Submitted to NeuroLibre,

Screenshot 2024-02-14 at 4 07 22 PM

There is not a public link yet, this is from my dashboard when I'm logged in. Once @agahkarakuzu or @pbellec accept it for review, there will be a GitHub issue opened on the NeuroLibre GitHub and I'll post the link here.

@jcohenadad
Copy link
Member Author

amazing! is that ok to submit to MRM at this point?

@mathieuboudreau
Copy link
Member

If we can wait a short amount of time so that they can trigger the start of the process, then I believe it would generate a DOI that could add to the manuscript (even though the neurolibre review process hasn’t been done/completed).

@agahkarakuzu or @pbellec, could you do this ASAP?

@agahkarakuzu
Copy link
Collaborator

agahkarakuzu commented Feb 15, 2024

@mathieuboudreau the DOI link (after publication) will be:

https://doi.org/10.55458/neurolibre.00025

Reproducible preprint will be served at:

https://preprint.neurolibre.org/10.55458/neurolibre.00025

I've seen your notes on the submission form regarding two options, (2hr run vs quick run), we'll test and see how it goes.

@mathieuboudreau if you give me write access to the repo, I can quickly push fixes needed, or I can send PRs, whichever is more convenient for you.

@mathieuboudreau
Copy link
Member

@mathieuboudreau if you give me write access to the repo, I can quickly push fixes needed, or I can send PRs, whichever is more convenient for you.

Done.

@mathieuboudreau
Copy link
Member

amazing! is that ok to submit to MRM at this point?

@jcohenadad the submission has passed the pre-review stage (neurolibre/neurolibre-reviews#24) and is currently under review (neurolibre/neurolibre-reviews#25).

I've updated the details Agah shared (doi and link) in the Data Availability Statement; they aren't currently active but I've indicated in the manuscript that it has been submitted and is under review along with those links.

@mathieuboudreau
Copy link
Member

@jcohenadad now that you've decided to keep the old MPL-based figure in the manuscript instead of the Plotly one, would you like me to re-integrate the MPL code in the notebook that generates that image? The lines of code that were removed are in 0d05160 as your comment #91 (review) suggested to switch to the plotly one for the manuscript (and thus, the MPL code wasn't needed anywhere anymore).

@jcohenadad
Copy link
Member Author

i think there has been some misunderstanding 0d05160#commitcomment-138731916

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants