Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to configure structure/hierarchy of modules base directory #456

Closed
jacquikeane opened this issue Nov 18, 2021 · 26 comments

Comments

@jacquikeane
Copy link

I am managing the software on a HPC for a large research group and intend to do this via modules and singularity so the shpc application looks ideal for me (thanks!). However, I would like to have my modules organised as groupname/software/version regardless of where I take the container from (I intend to take from quay.io/biocontainers where possible and not build and maintain my own registry/containers). Having a consistent (and shorter) naming of the modules when typing module load will make it easier for our users.

As far as I understand the application creates a directory structure under the modules base directory according to registry/software/version which is taken from the container.yaml file so I cannot organise via groupname/software/version? It would be really useful if this data hierarchy was configurable.

@vsoch
Copy link
Member

vsoch commented Nov 18, 2021

@jacquikeane one thing to try is that this morning I just merged adding a .version file - which would set the level of the organization to use versions (and not go into the level of the module file). I don't remember if this is for lua or tcl but it's worth a pull and try to see if that simple tweak resolves your issue!

@jacquikeane
Copy link
Author

jacquikeane commented Nov 18, 2021

Mmm, I think I may not be explaining my requirement very well! This is a separate request to the version issue. Perhaps a screenshot will help. When I use shpc at the moment to install software, running module avail gives me:

privatemodules1

Ideally I want to configure the directory hierarchy so when I do the install and run module avail I get a list like this:

privatemodules2

Basically I only want the name of the software without biocontainers in the name. But perhaps this is not a fix for shpc but with how I am using modules?

@vsoch
Copy link
Member

vsoch commented Nov 18, 2021

That’s what I am talking about too! When we added the .version file it tells the module software that the level shown should be at the version directory. But I forget if that is for lmod or tcl. Anyway let’s get feedback from others about this - I agree we need a better solution.

@jacquikeane
Copy link
Author

Ok, I will checkout the latest version and give it a try. Thanks!

@vsoch
Copy link
Member

vsoch commented Nov 18, 2021

Oh just a quick note (sorry didn't mention this before!) the current version doesn't have a release yet, so it's just the main branch here.

@jacquikeane
Copy link
Author

I am using tcl so I don't think this works for me.

@marcodelapierre
Copy link
Contributor

marcodelapierre commented Nov 25, 2021

I get Jacqui point on conventions, let me start from a couple of aspects:

  1. ability to request a default version (ideally, this would be a switch that can be turned on and off)
  2. readability of the module list by users (the extra "module" bit reduces readability -- however note this seems the only way to disable default versions)

On the 1st one above, the recent change in #451 (no SHPC release yet) tackles it, through the setting "default_version" true/false, by adding that .version file as described in the PR (or related issue).

However, I get point 2. may be of interest, too.
One way to tackle both at once would be to maintain that new setting "default_version" above, but change the implementation.
In particular:

  • case "false": the original shpc module tree: /......./<tool>/<version>/module.{tcl,lua} ; I hope this option remains in SHPC, it's an interesting super-power in the hands of HPC support staff :-)
  • case "true": the module tree proposed by Jacqui: /......./<tool>/<version>.{tcl,lua} ; It is to be checked whether the .version file would still be needed or not.

Hope this can help :)
Happy to chat further, these days I am just flattened by work...!!

I have another reflection to share , in a following comment below

@marcodelapierre
Copy link
Contributor

On shortening the module structure, eg
from quay.io/biocontainers/samtools/<version>
to samtools/<version>

I have 2 ideas

  1. the lazy one: this only works if all of your containers come from the same sub-repo (eg quay.io/biocontainers), you could just use module use so as to include those paths, for instance: module use <path-to-shpc>/modules/quay.io/biocontainers would enable having short <tool>/<version> listings in the module user interface
    @jacquikeane do you think this would do it for you?
    (I am also going to use shpc for biocontainers deployments on our supercomputing centre here in Perth)

  2. otherwise, one might try to make the module tree configurable, with one caveat I can see: in this way there's the potential for conflicting modulefiles, for instance two or more samtools/<version> from different providers.
    How could SHPC implementation handle this? - warning to users?
    eg SHPC could retain the full tree for containers, and the shortened one for modules only, and provide warnings when conflicting installation attempts are detected? (what happens at uninstall time?)
    This might be not too bad?

These comments on the tree depth could eventually merge up with the one above on modulefile naming, to become something like the highly configurable module naming scheme in Spack - see "Projections" here: https://spack.readthedocs.io/en/latest/module_file_support.html#customize-the-naming-of-modules .
However, this flexibility also increases the barrier to its use (SHPC is a simpler tool compared to Spack), so I am wondering whether this would be an overkill for SHPC.

@marcodelapierre
Copy link
Contributor

I am digging in the functionalities of EnvModules and Lmod, I think I will be able to share some suggested nice implementations soon, around default versions and module file conventions.
On leave for a few days, will aim to comment further on this next week.

@vsoch
Copy link
Member

vsoch commented Nov 25, 2021

Thanks @marcodelapierre ! With respect to making the paths shorter, we do have the namespace command, but that's just for management of the modules (they install to the same long paths). The issue with modifying the install namespace is what you stated - if there are two containers with the same name under different top level folders, there is a clash. I thought about allowing the user to specify a custom name, but then it would be much harder to programatically discover those that are installed. E.g., let's say you install ghcr.io/autamus/clingo to just be clingo, and let's say there are a few different clingos. How would you reliably link back?

I think what I can do now is prepare a PR to test that makes the changes to the structure to support the version named files, and then we can discuss further! And I'm going to merge and release #455 because it does fix several real issues. I'll ping here when I have a second PR.

@vsoch
Copy link
Member

vsoch commented Nov 25, 2021

okay, so I gave that a shot #458 and it didn't work out great - a lot of detail work and debugging that I don't think I have the attention for today (kind of want to work on fun things!) but I did some tweaks to fix a bug I found with adding a singularity container, and also the CircleCI tests, so we are better of than where we started! #459. If the version file doesn't fix this issue, we at least have a good start and I'll have the bandwidth to return after the holiday at some point. Happy 🦃 day!

@marcodelapierre
Copy link
Contributor

marcodelapierre commented Nov 26, 2021

thanks for your thoughts Vanessa!
update on my last message yesterday - disregard my eureka, feats do not work as I expected.

I agree it is good to start with tackling the default version thing, by trying the modulefile as a version approach.

As we're considering user interface changes in this space anyway, a couple of suggestions to make the modulefile more concise:

  • tcl modules do not need the .tcl extension, you can get rid of it
  • to reduce the character length for the original naming, could the module filename module be shortened, e.g. to just m or mod? this would improve readability of the module avail output for the case default_version=false ; or would it be confusing for people to see that mysterious m or mod?

@vsoch
Copy link
Member

vsoch commented Nov 26, 2021

tcl modules do not need the .tcl extension, you can get rid of it

we can definitely do that, but what extension should they have?

to reduce the character length for the original naming, could the module filename module be shortened, e.g. to just m or mod? this would improve readability of the module avail output for the case default_version=false ; or would it be confusing for people to see that mysterious m or mod?

Hmm, so this is just my opinion, but my preference is to name things clearly and understandably, so if mod has multiple meanings (and m isn't clear) I wouldn't go in that direction. module.(lua|tcl) is longer but it's clear what it is!

@marcodelapierre
Copy link
Contributor

marcodelapierre commented Nov 26, 2021

on the latter point, thanks for discussing this, I agree with your point on clarity!

on the 1st one, tcl modules do not need any extension at all.

We have both tcl and lmod in various clusters here at the moment, let me show you 2 examples.

Tcl:

md@magnus-2:modulefiles$ pwd
/group/pawsey0001/mdelapierre/software/cle60up05/modulefiles
md@magnus-2:modulefiles$ ls meep/
1.11.0
md@magnus-2:modulefiles$ module av meep

---------------------------------------------------------- /group/pawsey0001/mdelapierre/software/cle60up05/modulefiles ----------------------------------------------------------
meep/1.11.0

Lmod:

md@zeus-1:modulefiles$ pwd
/group/pawsey0001/mdelapierre/software/sles12sp3/modulefiles
md@zeus-1:modulefiles$ ls laplace_2d/
1.0.1.lua
md@zeus-1:modulefiles$ module av laplace_2d

---------------------------------------------------------- /group/pawsey0001/mdelapierre/software/sles12sp3/modulefiles ----------------------------------------------------------
   laplace_2d/1.0.1

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

See how lmod takes out the .lua extension when looking for modules; that one is needed.
On the other hand, in tcl example the modulefiles don't need an extension to be picked up by the module system; if you add .tcl, it still works, but all you get in addition is a longer output from module commands:

md@magnus-2:modulefiles$ mv meep/1.11.0 meep/1.11.0.tcl
md@magnus-2:modulefiles$ module av meep

---------------------------------------------------------- /group/pawsey0001/mdelapierre/software/cle60up05/modulefiles ----------------------------------------------------------
meep/1.11.0.tcl

I think the EnvModules docs never state it explicitly, but in the examples modulefiles typically don't have any extension (https://modules.readthedocs.io/)

@vsoch
Copy link
Member

vsoch commented Nov 26, 2021

Ah, newbie mistake on my part! I’ll prepare a PR to remove the extension.

@vsoch
Copy link
Member

vsoch commented Nov 26, 2021

okay here is a test! #460 Before I was using the extension to distinguish the types, and this will still work as long as we don't add another module system without an extension.

@marcodelapierre
Copy link
Contributor

marcodelapierre commented Nov 26, 2021

Cool! I am off without computer for a long weekend, will try and give it a go next week.

One thing that comes to mind around the proposed change to default_version=true implementation (i.e. version as module filename): as in this case there may be multiple version files in the same directory, it would probably be good if the script 99-shpc.sh is moved in the container directory.
Oh! and another one: I realise this proposed implementation would not work if modules and containers are in the same directory for the same reason - this is not common practice in typical software+module setups, but at the moment is the default in SHPC I think.

@vsoch
Copy link
Member

vsoch commented Mar 3, 2022

@marcodelapierre @jacquikeane I think we investigated removing the extension and it didn't work - are there other ideas to brainstorm or other ways we can help?

@muffato
Copy link
Contributor

muffato commented Mar 3, 2022

Hi. What about using symlinks to maintain an alternate, simplified, module tree ? Essentially, there would still be a module directory with the full namespaces and no risk of version clashes. In parallel there would be another directory tree structured as <software-name>/<version> where <version> is a symlink to <full-module-dir>/docker/registry/org/<software-name>/<version>/module.tcl ? And if there are some version conflicts between two Docker registries, shpc would arbitrary keep only one through the symlink, and warn the user at installation time.

I've given it a try, and it works :) !

$ tree                                                                                                                                    
.                                                                                                                                                                
├── bedtools2                                                                                                                                                    
│   └── 2.30.0 -> /software/treeoflife/shpc/modules/ghcr.io/autamus/bedtools2/2.30.0/module.tcl                                                                  
├── bwa                                                                                                                                                          
│   ├── 0.7.17 -> /software/treeoflife/shpc/modules/ghcr.io/autamus/bwa/0.7.17/module.tcl                                                                        
│   └── 0.7.17--h84994c4_4 -> /software/treeoflife/shpc/modules/quay.io/biocontainers/bwa/0.7.17--h84994c4_4/module.tcl                                          
├── bwa-mem2                                                                                                                                                     
│   └── 2.2.1--h9a82719_1 -> /software/treeoflife/shpc/modules/quay.io/biocontainers/bwa-mem2/2.2.1--h9a82719_1/module.tcl                                       
├── cooler                                                                                                                                                       
│   ├── 0.8.11--pyh3252c3a_0 -> /software/treeoflife/shpc/modules/quay.io/biocontainers/cooler/0.8.11--pyh3252c3a_0/module.tcl                                   
│   └── 0.8.6--py_0 -> /software/treeoflife/shpc/modules/quay.io/biocontainers/cooler/0.8.6--py_0/module.tcl                                                     
├── fasta                                                                                                                                                        
│   └── 36.3.8h -> /software/treeoflife/shpc/modules/gitlab-registry.internal.sanger.ac.uk/sanger-pathogens/docker-images/fasta/36.3.8h/module.tcl               
├── picard                                                                                                                                                       
│   └── 2.26.5 -> /software/treeoflife/shpc/modules/ghcr.io/autamus/picard/2.26.5/module.tcl                                                                     
├── rust                                                                                                                                                         
│   └── 1.54.0 -> /software/treeoflife/shpc/modules/ghcr.io/autamus/rust/1.54.0/module.tcl                                                                       
└── samtools                                                                                                                                                     
    └── 1.14--hb421002_0 -> /software/treeoflife/shpc/modules/quay.io/biocontainers/samtools/1.14--hb421002_0/module.tcl

$ export MODULEPATH=$PWD
$ module avail
------------------------------------------------------------------ /nfs/users/nfs_m/mm49/mymod ------------------------------------------------------------------
bedtools2/2.30.0            bwa/0.7.17              cooler/0.8.6--py_0           fasta/36.3.8h  rust/1.54.0                
bwa-mem2/2.2.1--h9a82719_1  bwa/0.7.17--h84994c4_4  cooler/0.8.11--pyh3252c3a_0  picard/2.26.5  samtools/1.14--hb421002_0  
$ module load samtools/1.14--hb421002_0
$ samtools version
samtools 1.14
Using htslib 1.14
Copyright (C) 2021 Genome Research Ltd.
(...)

@vsoch
Copy link
Member

vsoch commented Mar 3, 2022

oh that's a neat idea - so essentially someone might be able to do:

shpc insall ghcr.io/singularityhub/github-ci --symlink

and given some symlink_home designated in their settings.yml (default is null, not set) that would create

github-ci/
   latest/
       module.tcl

And then the user would just load github-ci, but instead of:

module load ghcr.io/singularityhub/github-ci

it would be

module load github-ci

and we'd only run into trouble given an equivalent container name, e.g.,:

biotools/samtools
ghcr.io/autamus/samtools

In which case if the user had the first installed (and symlinked) samtools it would say:

shpc insall ghcr.io/autamus/samtools --symlink

You've already installed the "samtools" namespace. Would you like to:
1. Stop and abort.
2. Remove previous install and replace
3. Install at a different namespace (will be prompted to enter after number):
Enter your choice: 

@marcodelapierre would this work with our newly refactored module_dir - or did you already test using that @muffato ?

@muffato
Copy link
Contributor

muffato commented Mar 3, 2022

🤔 Not sure I understand. What are the latest/ and module/ directories ?

github-ci/
   latest/
       module/

I had in mind that there could still be multiple versions of the same software in different symlinks, cf cooler in my example. A default version can be indicated with a <software-name>/.version file.

@vsoch
Copy link
Member

vsoch commented Mar 3, 2022

Sorry bad example and a typo! module/ should have been the module file, and latest is akin to a version tag.

And we do have the .version tag, although it's just created by default to specify the level of the directory to define versions (right now it's an empty file) but that could be customized further. I haven't used it beyond that use case so let me know your thoughts for how we could change it / make it better.

And I was thinking we could have a fourth open "install to same namespace" (with a "proceed at your own risk" ) sort of deal - if the user is installing a particular tag they could largely avoid the conflicts. Anyway I'm throwing together an implementation for us to test, back in a bit!

@vsoch
Copy link
Member

vsoch commented Mar 3, 2022

Okay testing PR is in! #502 I'm about to head off for the evening (almost 11pm here) but I'm excited to pick up on this!

@marcodelapierre
Copy link
Contributor

This sounds like a great idea indeed!
Certainly handy if SHPC can handle the creation of a more compact moduletree ... adding more thoughts in #502

@muffato
Copy link
Contributor

muffato commented Jun 22, 2022

I think this issue has been addressed now that views are in

@vsoch
Copy link
Member

vsoch commented Jun 22, 2022

Agree! @georgiastuart please try this out, and it’s the most basic version so open a new issue to discuss additional features you’d like!

@vsoch vsoch closed this as completed Jun 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants