-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A first look into some of the automated Biocontainers recipes #86
Comments
hey @marcodelapierre !
Do you know why you are referencing a version of shpc for the packages before we removed the registry from shpc and added to this repo here? That is a first thing to check - the newer releases of shpc should not have a registry directory, but instead use the remote registry (the repository issue board we are chatting on right now!) For a spot check, here is the current samtools recipe e0ed2ea. That particular PR Matthieu added back previous tags that were missed by the new parser (since they were older).
The algorithm selects for the aliases under a threshold, and then the next N up to 10. So if python pops up, it just means the container didn't have other unique aliases. I actually think this is ok - someone might want to interact directly with python here. We could explicitly filter out python executables, but this seems like it's more an opinion than a reflection of what a user wants. As soon as we filter out, someone might want them back.
I don't think the algorithm can ever be perfect for everyone's needs - it's right now optimized to be more forgiving and err on the side of providing too many. If we are missing aliases (the list you gave) that usually is because we couldn't extract the container guts, and indeed a command line client to add them would be a great idea. How should that look? Would the user need to clone shpc-registry and then maybe run a script that is locally there to add an alias? E.g., $ git clone https://github.com/singularityhub/shpc-registry
$ cd shpc-registry
$ python add_alias quay.io/biocontainers/samtools samtools /opt/bin/samtools Would that be simple enough? |
Hi Vanessa, I did not explain myself clearly. So version |
I love the CLI snippet!! |
And I agree about your thoughts re: missing/extra aliases and the algorithm. |
It won't officially be a part of shpc, but just the repo here, since the user would need to clone it as a base requirement. Can you give me a few one off examples for aliases / containers to add and remove? I'll write it right now and PR! |
when it's back into the |
so you would add |
I made a small helper script in this repository here, see #87. Instructions for usage in the updated README of that PR: https://github.com/singularityhub/shpc-registry/pull/87/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R26-R51 |
Hi @vsoch @audreystott,
so today I have checked the list of aliases for the 31x Biocontainers recipes we are currently supported on Setonix, our new supercomputer.
The goal is to check for aliases, with dual purpose:
So, out of 31 recipes, only 13 had missing aliases, 10 missing them completely, 3 missing only a few of them.
Notably, out of the other recipes, several identified more useful aliases than I had manually, which is great! :-)
[EDIT] The reference, baseline 31 packages are from SHPC release
0.0.57
, underregistry/quay.io/biocontainers/
: https://github.com/singularityhub/singularity-hpc/tree/0.0.57/registry/quay.io/biocontainers[EDIT] I have carried out the check on the latest release of the registry,
2023-02
: https://github.com/singularityhub/shpc-registry/tree/2023-02Here are the recipes with missing packages: bbmap, bcftools, blast, bwa, fastp, gatk, multiqc, mummer, samtools (only some), spades (only some), star, trinity (only one), velvet
Funny fact: cutadapt and spades have python aliases.. I thought those would have been filtered out.
Now, how do we tackle these in general? I am not sure how to improve the algorithm right now, but .. @vsoch can we provide a CLI to add/edits aliases, and make it so that these are retained, both for final use and eventually also for the algorithm to act better?
Thoughts?
The text was updated successfully, but these errors were encountered: