Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New module: Seqera Platform CLI #2151

Merged
merged 33 commits into from Nov 2, 2023
Merged

New module: Seqera Platform CLI #2151

merged 33 commits into from Nov 2, 2023

Conversation

vladsavelyev
Copy link
Member

@vladsavelyev vladsavelyev commented Oct 25, 2023

Parses runs_*.tar.gz dumps containing logs and stats from a Seqera Platform run. To be triggered in the https://github.com/seqeralabs/nf-aggregate workflow, or can parse the output of tw runs dump ... --output=runs_SmUkr43Nul49G.tar.gz directly.

Note that binary files by default are ignored by MultiQC, so you need to set ignore_images: false in the MultiQC config to make it parse the input tar-gz file. This can be done via the CLI:

multiqc -m seqera_cli test_data/data/modules/seqera_cli -f -v --cl-config "ignore_images: false"
  • Think what general stats cols to hide?
  • Figure out how to better present start date/end date
  • Test with many runs
  • Add a bar plot with a category per status of workflows/processs?
Screenshot 2023-10-25 at 22 45 10

@multiqc-bot multiqc-bot changed the title New module: seqera_cli New module: Seqera Platform CLI Oct 25, 2023
@vladsavelyev vladsavelyev marked this pull request as draft October 25, 2023 17:03
@ewels
Copy link
Member

ewels commented Oct 25, 2023

Given that the tar step is from the workflow and not from the Seqera CLI itself, I think we should also cater for uncompressed files. Then can handle the custom stuff in the pipeline but it stays as general use as possible.

@drpatelh
Copy link
Contributor

The Seqera CLI natively dumps tar files. I was going to uncompress them in the same process that dumps the run info but if we don't have to then that would be a bonus. Supporting both would be great. The JSON files all have standard names so will be easy to scrape if we stage each run in its individual directory / archive.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 25, 2023

🚀 Deployed on https://mqc-pr-2151--multiqc.netlify.app

@vladsavelyev
Copy link
Member Author

Added support for both compressed and uncompressed dumps. Also added a barplot and parsing of the platform version.

@ewels
Copy link
Member

ewels commented Oct 27, 2023

Figure out how to better present start date/end date

Maybe show a duration as a column, keep start + end dates but hide by default.

@ewels
Copy link
Member

ewels commented Oct 27, 2023

👉🏻 Should specify the task status order + colours in the bar plot to match Tower:

  • Pending: #8f4199
  • Submitted: #e68642
  • Running: #4256e7
  • Cached: #939598
  • Succeeded: #28ae61
  • Failed: #e7363e

@vladsavelyev
Copy link
Member Author

vladsavelyev commented Oct 30, 2023

Updates:

  • Using a shorter repositroy names to display,
  • Using the Tower color code for process status, as Phil proposed,
  • Hiding start/end and showing duration by default.
  • For time stamps, dates, and gigabytes, we want a human-readable representation, but still sorting abilities. I added support for the format key to be a lambda function, so we can use arbitrary code there instead of a format string. I also modified the plotting code to add a val= attribute into dt table cells containing the original sortable floating value (i.e. for time stamps, that would be the number of sections), and modified the JavaScript table sorting code to take that val into account before using innerText.
Screenshot 2023-11-01 at 16 25 49

@vladsavelyev vladsavelyev marked this pull request as ready for review November 1, 2023 10:02
@vladsavelyev vladsavelyev added the awaits-review Awaiting final review and merge. label Nov 1, 2023
Copy link
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, otherwise LGTM!

docs/core/development/modules.md Outdated Show resolved Hide resolved
docs/core/development/modules.md Outdated Show resolved Hide resolved
docs/core/development/modules.md Outdated Show resolved Hide resolved
@ewels
Copy link
Member

ewels commented Nov 1, 2023

LGTM to me to merge, waiting on review from @drpatelh before hitting the button.

@ewels
Copy link
Member

ewels commented Nov 1, 2023

Suggestion: Move CPU time and Cost columns up, alongside pipeline duration. These 3 columns are likely typically the most interesting.

Copy link
Contributor

@robsyme robsyme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know that cpuEfficiency and memoryEfficiency will always be percentages, so should format as such.

multiqc/modules/seqera_cli/seqera_cli.py Outdated Show resolved Hide resolved
multiqc/modules/seqera_cli/seqera_cli.py Outdated Show resolved Hide resolved
vladsavelyev and others added 2 commits November 1, 2023 13:56
Co-authored-by: Robert Syme <rob.syme@gmail.com>
Co-authored-by: Robert Syme <rob.syme@gmail.com>
@drpatelh
Copy link
Contributor

drpatelh commented Nov 2, 2023

Great! Can we put the Nextflow entry before the Platform one please?

@vladsavelyev
Copy link
Member Author

Done! (updated the comment above)

@drpatelh
Copy link
Contributor

drpatelh commented Nov 2, 2023

Beautiful! Last main thing to add would be the Plots I outlined in #2151 (comment)

Copied again below:

Additional Plots

It would be great to at least have bar plots for the more pertinent metrics so they can be dumped directly into presentations if required rather than having to generate them via another tool:

  • CPU Time
  • Wall Time
  • Estimated cost ($)

I have created issues on the Seqera CLI for things that would be good to add in the future. For now, I can try to find a workaround for using run names in the plots via a mapping file.

@vladsavelyev
Copy link
Member Author

It would be great to at least have bar plots for the more pertinent metrics so they can be dumped directly into presentations if required rather than having to generate them via another tool:

I'm not sure I understand though what would be the categories in such bar plots. In the dump, there is only one value per run, which fits a table column - which translates into a simple bar plot by the means of the background color bar:

Screenshot 2023-11-02 at 15 44 40

But that doesn't fit a bar plot, where you want multiple non-overlapping categories that can sum up.

If we had per-process numbers, that might work - we could use the N heaviest processes as categories, plus "the remaining" as the last tail category.

@drpatelh
Copy link
Contributor

drpatelh commented Nov 2, 2023

We would have a separate plot for each metric which should be relatively simple? Definitely don't want to put those all in the same plot.

@vladsavelyev
Copy link
Member Author

vladsavelyev commented Nov 2, 2023

Such a plot for a single-value metric would just duplicate the corresponding table column.

Is this something that you have in mind?

Screenshot 2023-11-02 at 15 55 37

We do have this plot in the table already:

Screenshot 2023-11-02 at 15 57 20

@drpatelh
Copy link
Contributor

drpatelh commented Nov 2, 2023

Yes, exactly! One for each of the metrics I listed. The problem with having that in the table is you can't export it as a plot. Also, it will be nice to bring a larger focus on the core metrics if we have individual plots.

@vladsavelyev
Copy link
Member Author

Alright, let's add those. It would be nice to add per-process data into the run dump though, to make the bars more informative.

multiqc_report.html.zip

Screenshot 2023-11-02 at 18 26 24

Copy link
Contributor

@drpatelh drpatelh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome @vladsavelyev ! 🤩 Thank you! Some minor comments for consistency but otherwise LGTM! Be great to add in some other things in the future but this is definitely a good start.

multiqc/modules/seqera_cli/seqera_cli.py Outdated Show resolved Hide resolved
multiqc/modules/seqera_cli/seqera_cli.py Outdated Show resolved Hide resolved
multiqc/modules/seqera_cli/seqera_cli.py Outdated Show resolved Hide resolved
multiqc/modules/seqera_cli/seqera_cli.py Outdated Show resolved Hide resolved
multiqc/modules/seqera_cli/seqera_cli.py Outdated Show resolved Hide resolved
multiqc/modules/seqera_cli/seqera_cli.py Outdated Show resolved Hide resolved
multiqc/modules/seqera_cli/seqera_cli.py Outdated Show resolved Hide resolved
multiqc/modules/seqera_cli/seqera_cli.py Outdated Show resolved Hide resolved
multiqc/modules/seqera_cli/seqera_cli.py Outdated Show resolved Hide resolved
multiqc/modules/seqera_cli/seqera_cli.py Outdated Show resolved Hide resolved
vladsavelyev and others added 10 commits November 2, 2023 19:07
Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
@vladsavelyev vladsavelyev merged commit 17bc033 into master Nov 2, 2023
4 checks passed
@ewels ewels deleted the seqera-cli branch November 3, 2023 13:55
@ewels
Copy link
Member

ewels commented Nov 3, 2023

Looks great! 👏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaits-review Awaiting final review and merge. module: new
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants