Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

capture basic performance data from jobs #283

Open
levinas opened this issue Jan 28, 2015 · 6 comments
Open

capture basic performance data from jobs #283

levinas opened this issue Jan 28, 2015 · 6 comments
Assignees

Comments

@levinas
Copy link
Contributor

levinas commented Jan 28, 2015

Minimally a four-tuple for each assembly job:

  1. Input data size
  2. Assembler/recipe
  3. Peak memory usage
  4. Execution time

This data will be used to prepare for regular worker nodes devoted to small jobs.

@sebhtml
Copy link
Contributor

sebhtml commented Feb 5, 2015

@levinas For the input data size, is this measured in bytes or in number of reads ?

For the time, I suppose this can be captured by the Python code, using the elapsed time between the
moment that the job starts and the moment that the job ends.

For assembly/recipe, this is obviously already available in the Python code. Is this a string ?

For memory usage: GNU time and tstime can report peak memory usage and other related metrics, but I don't know if they capture the information concerning the children of the main process.

@levinas
Copy link
Contributor Author

levinas commented Feb 5, 2015

For the input data size, is this measured in bytes or in number of reads ?

Ideally, this should be measured in the number of bases. We have talked about running FastQC on all assembly input; maybe we should just extract this number from there. Otherwise, just the filetype and the raw file size could be a good proxy (e.g. (fasta, 1G) or (fastq.bz2, 300M)).
For the time, I suppose this can be captured by the Python code, using the elapsed time between the
moment that the job starts and the moment that the job ends.

For assembly/recipe, this is obviously already available in the Python code. Is this a string ?

Yes. We could probably just capture the method string including the “assembler/recipe/pipeline/wasp” prefix. So something like “-a velvet” or “-r smart”. We could postprocess/cluster these strings later.
For memory usage: GNU time and tstime can report peak memory usage and other related metrics, but I don't know if they capture the information concerning the children of the main process.

I don’t know how to do that either.

@cbun
Copy link
Contributor

cbun commented Feb 6, 2015

Can we grab the PID from the subprocesses and poll memory usage? Not sure if this is the best way.

@levinas
Copy link
Contributor Author

levinas commented Feb 12, 2015

Can we implement something like a conditional pull for the compute nodes? If the data set is small, for example, the control node can tag it "small", and it could be consumed by a regular VM with 24GB memory. This is what Chris envisioned in the original architectural diagram.

@cbun
Copy link
Contributor

cbun commented Feb 12, 2015

Yes, I'll have to double check, but the idea is that nodes can subscribe to
multiple queues, and the control server would route to the correct ones.

On Wed Feb 11 2015 at 9:08:51 PM Fangfang Xia notifications@github.com
wrote:

Can we implement something like a conditional pull for the compute nodes?
If the data set is small, for example, the control node can tag it "small",
and it could be consumed by a regular VM with 24GB memory. This is what
Chris envisioned in the original architectural diagram.


Reply to this email directly or view it on GitHub
#283 (comment).

@sebhtml
Copy link
Contributor

sebhtml commented Feb 12, 2015

In the callback method in consume.py, the json payload is received. Does the tag need to be specified in channel.basic_consume ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants