Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A first look into some of the automated Biocontainers recipes #86

Closed
marcodelapierre opened this issue Feb 6, 2023 · 8 comments · Fixed by #87
Closed

A first look into some of the automated Biocontainers recipes #86

marcodelapierre opened this issue Feb 6, 2023 · 8 comments · Fixed by #87

Comments

@marcodelapierre
Copy link
Contributor

marcodelapierre commented Feb 6, 2023

Hi @vsoch @audreystott,

so today I have checked the list of aliases for the 31x Biocontainers recipes we are currently supported on Setonix, our new supercomputer.
The goal is to check for aliases, with dual purpose:

  • Pawsey: to make sure we're providing the needed aliases to researchers
  • SHPC hub: to provide feedback on the automated alias generation process

So, out of 31 recipes, only 13 had missing aliases, 10 missing them completely, 3 missing only a few of them.
Notably, out of the other recipes, several identified more useful aliases than I had manually, which is great! :-)

[EDIT] The reference, baseline 31 packages are from SHPC release 0.0.57, under registry/quay.io/biocontainers/: https://github.com/singularityhub/singularity-hpc/tree/0.0.57/registry/quay.io/biocontainers

[EDIT] I have carried out the check on the latest release of the registry, 2023-02: https://github.com/singularityhub/shpc-registry/tree/2023-02

Here are the recipes with missing packages: bbmap, bcftools, blast, bwa, fastp, gatk, multiqc, mummer, samtools (only some), spades (only some), star, trinity (only one), velvet

Funny fact: cutadapt and spades have python aliases.. I thought those would have been filtered out.

Now, how do we tackle these in general? I am not sure how to improve the algorithm right now, but .. @vsoch can we provide a CLI to add/edits aliases, and make it so that these are retained, both for final use and eventually also for the algorithm to act better?

Thoughts?

@vsoch
Copy link
Member

vsoch commented Feb 6, 2023

hey @marcodelapierre !

The 31 packages are from SHPC release 0.0.57, under registry/quay.io/biocontainers

Do you know why you are referencing a version of shpc for the packages before we removed the registry from shpc and added to this repo here? That is a first thing to check - the newer releases of shpc should not have a registry directory, but instead use the remote registry (the repository issue board we are chatting on right now!)

For a spot check, here is the current samtools recipe e0ed2ea. That particular PR Matthieu added back previous tags that were missed by the new parser (since they were older).

Funny fact: cutadapt and spades have python aliases.. I thought those would have been filtered out.

The algorithm selects for the aliases under a threshold, and then the next N up to 10. So if python pops up, it just means the container didn't have other unique aliases. I actually think this is ok - someone might want to interact directly with python here. We could explicitly filter out python executables, but this seems like it's more an opinion than a reflection of what a user wants. As soon as we filter out, someone might want them back.

Now, how do we tackle these in general? I am not sure how to improve the algorithm right now, but .. @vsoch can we provide a CLI to add/edits aliases, and make it so that these are retained, both for final use and eventually also for the algorithm to act better?

I don't think the algorithm can ever be perfect for everyone's needs - it's right now optimized to be more forgiving and err on the side of providing too many. If we are missing aliases (the list you gave) that usually is because we couldn't extract the container guts, and indeed a command line client to add them would be a great idea. How should that look? Would the user need to clone shpc-registry and then maybe run a script that is locally there to add an alias? E.g.,

$ git clone https://github.com/singularityhub/shpc-registry
$ cd shpc-registry
$ python add_alias quay.io/biocontainers/samtools samtools /opt/bin/samtools

Would that be simple enough?

@marcodelapierre
Copy link
Contributor Author

Hi Vanessa, I did not explain myself clearly.

So version 0.0.57 is the reference "blessed" one,
I have carried out the check on the latest release of the registry, 2023-02: https://github.com/singularityhub/shpc-registry/tree/2023-02

@marcodelapierre
Copy link
Contributor Author

I love the CLI snippet!!

@marcodelapierre
Copy link
Contributor Author

And I agree about your thoughts re: missing/extra aliases and the algorithm.
I think we might briefly mention the new CLI to add missing aliases in the paper, and that would be enough.

@vsoch
Copy link
Member

vsoch commented Feb 6, 2023

I think we might briefly mention the new CLI to add missing aliases in the paper, and that would be enough.

It won't officially be a part of shpc, but just the repo here, since the user would need to clone it as a base requirement. Can you give me a few one off examples for aliases / containers to add and remove? I'll write it right now and PR!

@marcodelapierre
Copy link
Contributor Author

quay.io/biocontainers/spades has both extras and missings when comparing shpc 0.0.57 and shpc-registry 2023-02.
although for testing I reckon any would do.

when it's back into the main, I can run it through those packages myself!

@marcodelapierre
Copy link
Contributor Author

marcodelapierre commented Feb 6, 2023

so you would add add_alias and remove_alias (or maybe rm_alias) interfaces?

@vsoch
Copy link
Member

vsoch commented Feb 6, 2023

I made a small helper script in this repository here, see #87.

Instructions for usage in the updated README of that PR: https://github.com/singularityhub/shpc-registry/pull/87/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R26-R51

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants