Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reviewer feedback #44

Closed
37 tasks done
gkiar opened this issue Dec 15, 2016 · 0 comments
Closed
37 tasks done

Reviewer feedback #44

gkiar opened this issue Dec 15, 2016 · 0 comments
Assignees

Comments

@gkiar
Copy link
Collaborator

gkiar commented Dec 15, 2016

Overview

Organized below is the feedback we received in the first submission of the SIC manuscript to Gigascience. I attempted to break the suggestions out into bulleted lists where each bullet corresponds to an action I can take/item I can address. An indented quote block is text from the reviewer explaining the bulleted items nearby.

My plan is to address each of these in the manuscript, and as I do, add a comment of my own discussing how I addressed the changes, as I will need to upload that in resubmission.

My goal is to be done addressing all contents of this issue by January 15th, 2017, one month from today.


Web Service

Figures

  • potentially mirroring challenges of text in figure fig 1 #39

  • fixing axes labels figure 4 numbers #37

  • While the authors have cost estimates spread throughout the paper, I believe further discussion is necessary.

    • Thus, perhaps it is advisable that the authors to include for the pipeline in Fig 2, who much time did each step take, how much did it cost, etc (maybe a table)?

It would help the readers to understand for a typically sized study how much does it cost to upload data, store them for X days/months, download them, and for computation. Based on our experience what was costly to store was the registration non-linear warps on the cloud and we had to keep special scripts to keep clean our data store.

Minor formatting

  • First line of discussion, there is a double the.

Lit review

In its current form, it suffers from a few main issues (that some could be remedied):

  • Lack of a fair literature review. The way the authors present it, it appears they are the first to have attempted this. For example, what is the relevance between what the authors present and:
    • G. B. Frisoni, A. Redolfi, D. Manset, M.-E. Rousseau, A. Toga, and A. C. Evans, "Virtual imaging laboratories for marker discovery in neurodegenerative diseases," Nature Reviews Neurology, vol. 7, no. 8, pp. 429-438, Jul. 2011.
    • I. Dinov, K. Lozev, P. Petrosyan, Z. Liu, P. Eggert, J. Pierce, A. Zamanyan, S. Chakrapani, J. Van Horn, D. S. Parker, R. Magsipoc, K. Leung, B. Gutman, R. Woods, and A. Toga, "Neuroimaging study designs, computational analyses and data provenance using the LONI pipeline," PLoS ONE, vol. 5, no. 9, pp. e13 070+, Sep. 2010.
    • neuGRID
    • outGRID
    • the effort on NeuroDebian
    • Neurodebian on AWS (EC2) https://www.nitrc.org/forum/forum.php?forum_id=3664
    • M. Minervini, M. Damiano, V. Tucci, A. Bifone, A. Gozzi, S.A. Tsaftaris, "Mouse Neuroimaging Phenotyping in the Cloud," 3rd International Conference on Image Processing Theory, Tools and Applications, Special Session on Special Session on High Performance Computing in Computer Vision Applications (HPC-CVA) , Istanbul, Turkey, Oct 15-18, 2012.
    • M. Minervini, C. Rusu, M. Damiano, V. Tucci, A. Bifone, A. Gozzi, S.A. Tsaftaris, "Large-Scale Analysis of Neuroimaging Data on Commercial Clouds with Content-Aware Resource Allocation Strategies," International Journal of High Performance Computing Applications, Jan 17, 2014.

I personally find relevance to the above methods at least in terms of motivation (albeit some may have used different methods). Obviously the last two were authored by my team a few years back, on the basis of a different Python based backbone that is now defunct (PiCloud). But the second one (last in the list), it went even beyond that: it considered optimization of resources (type of Amazon instance) with a machine learning method that predicted resource needs for non-linear registration in a pipeline of atlas based segmentation.
I am really fond of the approach of the authors as it adopts newer technologies (containers etc) that can perhaps make such systems future-proof. I should note that some of the technologies are used also by other systems on different applications. For example, there is US based initiative called CyVerse (iPlant) which the authors could explore as well.

Feasibility

  • Lack of discussion on how the current approach can be extended to use other tools such as freesurfer, ANTs etc

As I am sure you are aware, the same neuroimaging tools don't work for everyone. While I agree with the idea of having standardized pipelines, the ability to evolve said pipelines (as forks) can help the system evolve and (even) be maintained. Can you please expand on this.

Unfortunately, from at least how I understand the code, it appears that to do the same pipeline for the NKI1 dataset (40 scans) the process is linear (ie one scan after the others). This is enforced by the comment of the authors in the discussion, related to Kubernetes, "would help enable SIC to scale well when working with big-data or running many parallel jobs. " If this is true, the SIC framework loses one of the greatest aspects of cloud computing: that of scalability.

  • The authors should comment on this, particularly as this would make a proper fit for the GigaScience journal.

In my vision, the main difficulty to address in the proposed pipeline, is the inherent complexity. For instance, while the authors propose the use of Docker containers to create easily setup scripts and data loading, in a real scenario there are two main criticisms: 1) the complexity of creating the Docker container by the research groups, for instance, considering the data scientists associated to the MRI problem may not have that knowledge; 2) to run the containers, it is still needed some technology background.

  • Thus, the methodology and guidelines should be considered to approach the problem, and the strengths and weakness should be presented in discussion.

Methods

  • ported issue for handling methods methods changes for sic #46

  • Data Storage

    • what kind of protocols should be considered? Only HTTP?
    • If we considered to virtualize the machines, the users might want to have different access points and applied mount for instance, via NFS or CIFS.
    • [ ]Moreover, could be another API used as for instance mount the Storage as a Volume?
  • Cloud environments

    • do you consider to use API middleware to solve the problem of different providers? There are libraries that allow to run machines from multiple clouds.
  • Docker

    • is proposed to run in AWS EC2 in the case study. But what are the differences between run in a local datacenter?
    • Moreover, AWS has already a service dedicated to Docker containers. Could you consider to use this kind of tools in your approach?
    • On the other hand, there are already tools like Totum that may facilitate the deployment of Docker containers. Could be a pre-installed machine help to deploy new containers?
  • Open standards for data

    • what are the standards and how they are used? It should be clarified in the manuscript.
  • Did you consider several levels of security? For instance, only allow the reviewers to access the container - online available?

  • What are the differences of this architecture comparing with only publishing a README with instructions? Easy for end-user, complex for developer/researcher.

  • Docker vs Vagrant?

    • Could be a virtual machine do the same? What are the differences for the proposed pipeline? This kind of technical details should be addressed in the discussion, because in the end, the manuscript is placed as a technical research paper.
@gkiar gkiar self-assigned this Dec 15, 2016
This was referenced Dec 22, 2016
@gkiar gkiar closed this as completed Jan 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant