Develop a Cluster Computing Framework for Dynamical Modeling #77

0u812 · 2017-01-15T22:49:27Z

Introduction

Data growth will be a major factor in the near future. However, most academic software in systems biology is not written with explosive growth in mind. This is unfortunate, as related fields have made great gains in scalability simply by leveraging the tools of big data, as evidenced by the great success of startups like H2O.ai.

Our group at the University of Washington is developing a Python-based framework for biological modeling. The core of this framework is a high-speed ODE/stochastic biochemical network simulator, Roadrunner, which pushes the limits of single-threaded computing. This summer, we would like to mentor a student in developing a cluster computing framework for running simulations more scalably.

Goal

The overall goal is to scale up common types of tasks in dynamical modeling. These tasks usually involve 1) loading a model (usu. SBML), 2) making some perturbation to the model (changing parameter values), 3) simulating the modified model, and 4) collecting some metrics from the results. In order to make this project tractable for a single summer, I suggest breaking it down into smaller tasks which can be used as milestones. For the initial phase of the project, we should ideally focus on feasibility and figuring out how to implement cluster computing in a uniform way. For example,

Using a Python-compatible cluster framework (e.g. Spark, interface with our pre-existing Python framework for modeling)
Using the cluster framework, generate a bunch of random networks in parallel (using e.g. the Erdos-Renyi method)
Simulate these networks and run some aggregate statistics as a proof-of-concept

From here, the next step would be to construct a more general API that can handle the common analysis types in dynamical modeling. The common types of analyses that can be parallelized include parameter scans, parameter fitting, sensitivity analysis and parameter identifiability. If we can implement at least some of these during the summer, that would be great.

Skills Required

Familiarity with cluster computing, such as Spark or Hadoop (though Spark is preferred due to its lower overhead), would be ideal. Experience with Python and Linux would also be helpful. Above all, we want students who are self-driven, eager to learn, and excited about research. This is a highly unexplored application of cluster computing, and would likely lead to a peer-reviewed paper if successful.

Possible Mentors

Main Contact

Kyle Medley

References

Somogyi, E. T., Bouteiller, J. M., Glazier, J. A., König, M., Medley, J. K., Swat, M. H., & Sauro, H. M. (2015). libRoadRunner: a high performance SBML simulation and analysis library. Bioinformatics, btv363.

Sauro, H. M., Choi, K., Medley, J. K., Cannistra, C., Konig, M., Smith, L., & Stocking, K. (2016). Tellurium: A Python Based Modeling and Reproducibility Platform for Systems Biology. bioRxiv, 054601.

0u812 · 2017-01-17T05:20:48Z

Added the Java label because most modern cluster frameworks are Java- or Scala-based, so knowing one of these languages beforehand would be helpful.

matthiaskoenig · 2017-01-18T10:55:30Z

+1 I second this proposal.

Just for clarification: In an implemented first version there will be no synchronization between the different distributed models/simulations, i.e. the simulation tasks are completely independent from each other? Also there is no dependence of simulations on each other, but every single simulation is an independent task.

108krohan · 2017-02-15T17:55:01Z

Hello Everyone,

My Masters degree (pursuing) in Biological Sciences should be of interest to a rapidly growing organisation like yours. My sound knowledge of Python, Java, C, C++, SQL matches the project description. Primary OS: Linux Ubuntu 16.04 LTS.

I'll be honest I'm new to high-performance computing. And you can expect nothing but eagerness for the research paper. You can expect S.O.L.I.D. programming principles followed rigorously because that would help the organisation in the long run.

I do have 3 questions in mind:

Do I have to mail alex.pico [at] gladstone.ucsf.edu or the mentors, in order to get in touch?
What steps should I take in order to be a strong candidate?
Do I start with from NRNB GSoC Google Doc template?

Thank you for reading.
Hoping for a fast and positive response.

0u812 · 2017-02-15T18:55:07Z

Hi Rohan,

Thanks for your interest. I will try to answer each of your questions:

At this stage, you're basically getting to know the mentors and bouncing ideas off of us, so posting here is fine.
I think having a solid proposal is the most important thing. You can use the Google Doc that you linked and start filling it out (use File -> Make a copy). Once you have the content basically filled in you can share it with us for feedback. Feel free to reach out to us, especially for the parts you may not be familiar with such as parameter sweeps and parameter fitting. Google has some guidelines for selecting students. In addition to those, I would also pay specific attention to:

Does the student's plan have enough detail and does it lead to a useful feature of the software (such as the ability to perform parameter sweeps and parameter fitting on a cluster)?
Is the proposed work realistic for GSoC?
Does the student have the skills necessary to carry out the proposed work?

I think having all of these things would lead to a high chance of the project being successful, which is good for both us and the student.

That is correct. You can make a copy for your own editing (use File -> Make a copy).

Regards,
Kyle

108krohan · 2017-02-16T13:03:26Z

Thanks for such a prompt response! As instructed, I've mailed a preliminary Document, awaiting suggestions.

Meanwhile, I've set up Tellurium and the tutorials from the Tellurium page are quite helpful. Could you please confirm if that's the right way to proceed?

This page has lots of relevant links, I just wanted to know which are the most important so I can dig more deeply for the project.

Thank you for taking the time to read and promptly reply :)

0u812 · 2017-02-20T05:39:21Z

Hi Rohan,

The tutorials you linked to should be helpful. You can also find more helpful tutorials at http://tellurium.readthedocs.io/en/stable/index.html, especially the Models & Model Building section. I can't provide feedback on the document you sent because the project proposal isn't filled it yet, but I assume you are trying to learn how to use tellurium first. Can you tell me how far along you are in the process? For example, if I gave you a description of a reaction network could you encode it and simulate it in tellurium?

108krohan · 2017-02-20T15:57:59Z

Thanks for the tutorial link (http://tellurium.readthedocs.io/en/stable/index.html)

Sorry, I've been busy with college tests (4 tomorrow). I'm trying to slip in an hour or two for Tellurium tutorials each day though. And I'll let you know when I'm through with encoding and simulation.

Regarding Reaction Network, does it entail Antimony usage?

You are busy, please don't trouble to reply if that's correct.

108krohan · 2017-02-26T11:59:16Z

Finished executing examples from documentation.
Where should one ideally go from here?

Okay, while going through the documentation I noticed certain things:

Bioservices needs to be installed separately.
Tellurium build installer for Linux? Initial setup via conda mentioned here wasn't enough. Had trouble with SED-ML and Combine examples because they need pygraphviz, and sbml2matlab.
te.plotArray() used where r.plot() produced same results. Any reasons for this?

matthiaskoenig · 2017-02-27T07:40:52Z

Hi Rohan, if you have any feedback on the tutorials please let me know. I will update these within the next few days. If you found any errors, unclear information or missing information please let me know so I can update the respective pages. The best Matthias

…

On Sun, Feb 26, 2017 at 12:59 PM, Rohan Kumar ***@***.***> wrote: I've completed executing every code from the documentation. So I kind of learnt Tellurium a little now. That's how far along I am right now. Where should one go from here? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#77 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA29ulzTK06IVtBP684Qvc5-PVI7yOrBks5rgWkUgaJpZM4LkDzB> .

-- Matthias König Junior Group Leader LiSym - Systems Medicine of the Liver Humboldt-University Berlin, Institute for Theoretical Biology https://www.livermetabolism.com konigmatt@googlemail.com Tel: +49 30 20938450 Tel: +49 176 81168480

108krohan · 2017-02-27T08:10:42Z

Hi Matthias,

The tutorial is pretty accurate. But I'll try to go over the documentation again today and list out whichever errors, unclear information or missing information I find here.

One more thing, though pygraphviz (+sbml2matlab required for SED-ML and Combine) started working after some head-scratching, I wanted to confirm if more dependencies/libraries than just these conda installs is required. Because I needed to. (Example: pandas, bioservices)
conda install -c sys-bio tellurium
conda install jinja2 ipython
conda install -c SBMLTeam python-libsbml
More specifically, is there no way for enabling IDE plugins and SBOL functionality via conda-install method? Or are they optional?

Regards,
Rohan

0u812 · 2017-03-01T08:21:39Z

Hi Rohan, how is the application coming? What questions do you have? Do you think you need more info on modeling/Tellurium/cluster computing?

108krohan · 2017-03-03T05:19:58Z

Hi Kyle, really sorry for the late response. I figured it would be better to learn Spark before posting here or updating the application (I'll share the updated doc latest by day after tomorrow morning, EST for feedback).

Our overall goal is to scale up model 1) loading 2) perturbation 3) simulation and 4) metric generation through HPC via Spark, yes? You've already done a fantastic job of breaking our project into tasks. What kind of subtasks are you expecting? Can you meanwhile suggest names of other materials you might want me familiarised with?

Regards,
Rohan

0u812 · 2017-03-03T05:30:02Z

No worries 😄
I think you've got the right idea for scaling up. Now that you've finished the tellurium tutorials, I can give you more specific examples of the types of analysis we can parallelize. It might help to talk face-to-face. Are you free next week or during the weekend to Skype?

108krohan · 2017-03-04T09:38:12Z

Yes! Are you free between 7:30PM and 11:59PM Monday night EST? (Schedule EST/IST here)

matthiaskoenig · 2017-03-04T14:56:36Z

Let me know what time. If I am free I would like to join.

…

On Mar 4, 2017 10:38 AM, "Rohan Kumar" ***@***.***> wrote: Yes! Are you free between 7:30PM and 11:59PM Monday night EST? (Schedule EST/IST here <https://www.worldtimebuddy.com/?qm=1&lid=30,5,2643743&h=30&date=2017-3-7&sln=6-11> ) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#77 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA29usDnilwuMVyMNbonu7GnpFGVIvvCks5riTEEgaJpZM4LkDzB> .

ShaikAsifullah · 2017-03-06T19:38:26Z

Hi, I have been a little late. Can others join this meeting if it is not scheduled yet. Or if it is already done, may I get updates please. I am also planning to contribute to it.

0u812 · 2017-03-06T20:33:17Z

Hi all, for the meeting it looks like the best to for all three time zones (PST/IST/CET) is 8 am PST / 9:30 pm IST / 5 pm CET. Would it work to Skype Wednesday at that time for about an hour? If that doesn't work, I can set up a survey.

hsauro · 2017-03-06T20:51:45Z

8 pst is Ok with me. Herbert

…

On Mon, Mar 6, 2017 at 12:37 PM Kyle Medley ***@***.***> wrote: Hi all, for the meeting it looks like the best to for all three time zones (PST/IST/CET) <https://www.timeanddate.com/worldclock/meetingtime.html?iso=20170306&p1=234&p2=54&p3=37> is 8 am PST / 9:30 pm IST / 5 pm CET. Would it work to Skype Wednesday at that time for about an hour? If that doesn't work, I can set up a survey. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#77 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABAZDsdtLjq8D7SwoU98gx9FDvix9uMEks5rjG2OgaJpZM4LkDzB> .

108krohan · 2017-03-06T20:55:17Z

Hi Kyle,
Yes! Awesome :D 👍 You'll receive an updated doc within the next 3-4 hours. Your feedback would be incredibly valuable. I noticed you had mentioned parameter sweeps and fitting in an earlier comment, and I've been trying to learn as much as I can. I'd like to be prepared when we Skype. Please tell me anything you'd like me to be completely thorough with.

Regards,
Rohan

108krohan · 2017-03-06T20:55:23Z

Hi Matthias @matthiaskoenig,

Been adding feedback comments to example codes from documentation in a private repository because 'Edit on Github' link on the tutorials page doesn't seem to be working for me. (returns a 404 error)
You'll also find a list of all the libraries I needed as an entire noob while getting the examples to work properly on my Linux Ubuntu 16.04. This could be great for new contributors not only to our project but to tellurium as a whole! Thanks Kyle for forking and improving the sbml2matlab at sys-bio.
One more thing, you might notice the cmake-config links for sbml2matlab are broken in the README.md. I finally got it to work, and you can find the screenshot on the repo so you don't have to do it again.

Hope it helps!

Regards,
Rohan

0u812 · 2017-03-07T19:37:51Z

It looks like we can have our first Skype meeting tomorrow at 8 am PST / 9:30 pm IST / 5 pm CET. Anyone who can make it is welcome. This meeting should be pretty informal. I just mainly want to get a sense of where the students are at and try to fill in any gaps in your knowledge of Tellurium.

My Skype user id is jkylemedley. If @108krohan and @ShaikAsifullah could please send me a contact request on Skype that would be great.

hsauro · 2017-03-08T04:09:38Z

This is my contact name hsauro, if I attend I'll just observe. Herbert

…

On Tue, Mar 7, 2017 at 11:37 AM, Kyle Medley ***@***.***> wrote: It looks like we can have our first Skype meeting tomorrow at 8 am PST / 9:30 pm IST / 5 pm CET. Anyone who can make it is welcome. This meeting should be pretty informal. I just mainly want to get a sense of where the students are at and try to fill in any gaps in your knowledge of Tellurium. My Skype user id is jkylemedley. If @108krohan <https://github.com/108krohan> and @ShaikAsifullah <https://github.com/ShaikAsifullah> could please send me a contact request on Skype that would be great. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#77 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABAZDo2P9yp1vFgjsQMmdiJJ7gfvCkchks5rjbIQgaJpZM4LkDzB> .

hsauro · 2017-03-08T04:10:23Z

PS What's the procedure for attending the skype call, do you just call us all? Herbert

…

On Tue, Mar 7, 2017 at 11:37 AM, Kyle Medley ***@***.***> wrote: It looks like we can have our first Skype meeting tomorrow at 8 am PST / 9:30 pm IST / 5 pm CET. Anyone who can make it is welcome. This meeting should be pretty informal. I just mainly want to get a sense of where the students are at and try to fill in any gaps in your knowledge of Tellurium. My Skype user id is jkylemedley. If @108krohan <https://github.com/108krohan> and @ShaikAsifullah <https://github.com/ShaikAsifullah> could please send me a contact request on Skype that would be great. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#77 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABAZDo2P9yp1vFgjsQMmdiJJ7gfvCkchks5rjbIQgaJpZM4LkDzB> .

matthiaskoenig · 2017-03-08T10:11:54Z

My skype name is `konigmatt` On Wed, Mar 8, 2017 at 5:10 AM, Herbert Sauro <notifications@github.com> wrote:

…

PS What's the procedure for attending the skype call, do you just call us all? Herbert On Tue, Mar 7, 2017 at 11:37 AM, Kyle Medley ***@***.***> wrote: > It looks like we can have our first Skype meeting tomorrow at 8 am PST / > 9:30 pm IST / 5 pm CET. Anyone who can make it is welcome. This meeting > should be pretty informal. I just mainly want to get a sense of where the > students are at and try to fill in any gaps in your knowledge of Tellurium. > > My Skype user id is jkylemedley. If @108krohan > <https://github.com/108krohan> and @ShaikAsifullah > <https://github.com/ShaikAsifullah> could please send me a contact > request on Skype that would be great. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#77# issuecomment-284834813>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ ABAZDo2P9yp1vFgjsQMmdiJJ7gfvCkchks5rjbIQgaJpZM4LkDzB> > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#77 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA29ulAzetyhjTeBfwxmUpnAdIKLDMhvks5rjiowgaJpZM4LkDzB> .

-- Matthias König Junior Group Leader LiSym - Systems Medicine of the Liver Humboldt-University Berlin, Institute for Theoretical Biology https://www.livermetabolism.com konigmatt@googlemail.com Tel: +49 30 20938450 Tel: +49 176 81168480

khanspers · 2017-05-08T20:13:22Z

GSoC 2017 selected project

hsauro added Python High-performance computing labels Jan 15, 2017

0u812 added Difficulty: Medium Java labels Jan 16, 2017

0u812 self-assigned this Jan 18, 2017

khanspers closed this as completed May 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop a Cluster Computing Framework for Dynamical Modeling #77

Develop a Cluster Computing Framework for Dynamical Modeling #77

0u812 commented Jan 15, 2017 •

edited

Loading

0u812 commented Jan 17, 2017

matthiaskoenig commented Jan 18, 2017

108krohan commented Feb 15, 2017 •

edited

Loading

0u812 commented Feb 15, 2017

108krohan commented Feb 16, 2017

0u812 commented Feb 20, 2017

108krohan commented Feb 20, 2017 •

edited

Loading

108krohan commented Feb 26, 2017 •

edited

Loading

matthiaskoenig commented Feb 27, 2017 via email

108krohan commented Feb 27, 2017 •

edited

Loading

0u812 commented Mar 1, 2017

108krohan commented Mar 3, 2017 •

edited

Loading

0u812 commented Mar 3, 2017

108krohan commented Mar 4, 2017

matthiaskoenig commented Mar 4, 2017 via email

ShaikAsifullah commented Mar 6, 2017

0u812 commented Mar 6, 2017

hsauro commented Mar 6, 2017 via email

108krohan commented Mar 6, 2017

108krohan commented Mar 6, 2017 •

edited

Loading

0u812 commented Mar 7, 2017

hsauro commented Mar 8, 2017 via email

hsauro commented Mar 8, 2017 via email

matthiaskoenig commented Mar 8, 2017 via email

khanspers commented May 8, 2017

Develop a Cluster Computing Framework for Dynamical Modeling #77

Develop a Cluster Computing Framework for Dynamical Modeling #77

Comments

0u812 commented Jan 15, 2017 • edited Loading

Introduction

Goal

Skills Required

References

0u812 commented Jan 17, 2017

matthiaskoenig commented Jan 18, 2017

108krohan commented Feb 15, 2017 • edited Loading

0u812 commented Feb 15, 2017

108krohan commented Feb 16, 2017

0u812 commented Feb 20, 2017

108krohan commented Feb 20, 2017 • edited Loading

108krohan commented Feb 26, 2017 • edited Loading

matthiaskoenig commented Feb 27, 2017 via email

108krohan commented Feb 27, 2017 • edited Loading

0u812 commented Mar 1, 2017

108krohan commented Mar 3, 2017 • edited Loading

0u812 commented Mar 3, 2017

108krohan commented Mar 4, 2017

matthiaskoenig commented Mar 4, 2017 via email

ShaikAsifullah commented Mar 6, 2017

0u812 commented Mar 6, 2017

hsauro commented Mar 6, 2017 via email

108krohan commented Mar 6, 2017

108krohan commented Mar 6, 2017 • edited Loading

0u812 commented Mar 7, 2017

hsauro commented Mar 8, 2017 via email

hsauro commented Mar 8, 2017 via email

matthiaskoenig commented Mar 8, 2017 via email

khanspers commented May 8, 2017

0u812 commented Jan 15, 2017 •

edited

Loading

108krohan commented Feb 15, 2017 •

edited

Loading

108krohan commented Feb 20, 2017 •

edited

Loading

108krohan commented Feb 26, 2017 •

edited

Loading

108krohan commented Feb 27, 2017 •

edited

Loading

108krohan commented Mar 3, 2017 •

edited

Loading

108krohan commented Mar 6, 2017 •

edited

Loading