Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

methods changes for sic #46

Closed
16 tasks done
gkiar opened this issue Dec 22, 2016 · 2 comments
Closed
16 tasks done

methods changes for sic #46

gkiar opened this issue Dec 22, 2016 · 2 comments
Assignees

Comments

@gkiar
Copy link
Collaborator

gkiar commented Dec 22, 2016

each bullet is a piece of feedback directly from a reviewer. in comments underneath each is the response. Once I have a response for each, I will bake the question and answer into the relevant paragraphs in the methods section.

Methods

  • Data Storage

    point to emphasize: have de-identified data and store it any way that is publicly accessible that makes you happy.

    • what kind of protocols should be considered? Only HTTP?

      either

    • If we considered to virtualize the machines, the users might want to have different access points and applied mount for instance, via NFS or CIFS.

      sure

    • Moreover, could be another API used as for instance mount the Storage as a Volume?

      sure

  • Cloud environments

    point to emphasize: middleware provides flexibility for deployment across varied compute resources

    • do you consider to use API middleware to solve the problem of different providers? There are libraries that allow to run machines from multiple clouds.

      middleware can definitely solve the problem of multiple providers; in a single "cloud" (i.e. amazon or google, but not both), such middleware can be used if one chooses but is not necessary

  • Docker

    point to emphasize: the cloud and docker enables scalability in resources and consistent performance across resources. prebuilt images and packages make such deployment relatively easy (as compared to managing a local cluster/compute resource)

    • is proposed to run in AWS EC2 in the case study. But what are the differences between run in a local datacenter?

      compute is "infinitely" scalable, machines are isolated, and hardware is consistent, in the cloud --data centers are none of these.

    • Moreover, AWS has already a service dedicated to Docker containers. Could you consider to use this kind of tools in your approach?

      Yup, ECS is awesome and we will update our deployment strategy to use it

    • On the other hand, there are already tools like Totum that may facilitate the deployment of Docker containers. Could be a pre-installed machine help to deploy new containers?

      Sure, pick a machine with docker or install docker yourself, makes no difference

  • Open standards for data

    point to emphasize: data standards make tools interoperable and goodly; data should be anonymized or equivalent so that security is never an issue.

    • what are the standards and how they are used? It should be clarified in the manuscript.

      this doesn't really make sense to me, but my best guess at answering is to say that standards are documented and community accepted schemas for organizing data, and when one's data is compliant with the standard it enables generality of tools to apply out-of-the-box to a wider range of datasets.

    • Did you consider several levels of security? For instance, only allow the reviewers to access the container - online available?

      again I don't really get this sentence... General policy on security is that data should be anonymized or de-identified, and there is nothing to worry about.

  • What are the differences of this architecture comparing with only publishing a README with instructions? Easy for end-user, complex for developer/researcher.

    Creating a docker container is not significantly harder for developer/researcher, as they had to install all of the given dependencies in order for their tool to run, and write them down in a readme in order for it to be documented. docker is simply writing them down in a script which is interpreted by a virtualization engine to do the installation for you

  • Docker vs Vagrant?

    answer this one is discussion not methods
    vagrant is a layer on top of virtualization, and can sit on top of docker even. They are not really comparable in terms of execution, just in that they both document a set of installation requirements

    • Could be a virtual machine do the same? What are the differences for the proposed pipeline? This kind of technical details should be addressed in the discussion, because in the end, the manuscript is placed as a technical research paper.

      answer this one is discussion not methods
      virtual machines could do the same, but have considerably more overhead and "hard-drive" files which can bloat the system. The benefit of docker is that ultimately if you are running pipelines, you are running a set of scripts and then exiting the environment - all else considered equal, the less overhead the better, leaving more resources available to the pipeline.

@gkiar gkiar self-assigned this Dec 22, 2016
@gkiar
Copy link
Collaborator Author

gkiar commented Dec 22, 2016

methods section of #44

@gkiar gkiar mentioned this issue Dec 22, 2016
37 tasks
@gkiar
Copy link
Collaborator Author

gkiar commented Jan 4, 2017

changes addressed in recent push and tag to overleaf. :)

@gkiar gkiar closed this as completed Jan 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant