Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor config parameters into UEP opt and read tools #128 #130

Merged
merged 11 commits into from
Jul 18, 2023

Conversation

patrick-austin
Copy link
Contributor

NB: This change is larger in scope than #126, but builds on the decision there to add the clustering_save parameters to the read tool rather than the config tool. If this is merged, that PR can be closed without merging. If this PR is rejected/changed significantly, #126 should still be considered as an active PR.

Closes #99, closes #128

Context

Motivation was to make the stopping site calculations using UEP (the most likely calculator for users to want to use) more streamlined, by removing options which are not relevant to UEP calculations from the UI.

Our tools wrappers for the stopping site workflow were split into 4 stages/tools:

  1. pm_yaml_config
    • Required options relating to all three calculators, as was defining config required in the write and opt steps
    • Some options accounted for things which do not make sense in a Galaxy context (e.g. relative paths to files which would only be needed if downloading the output of stage 2 and running stage 3 elsewhere)
    • Ordering and purpose of the parameters was unclear - some are relevant for all subsequent stages, some only for individual stages
  2. pm_muairrs_write
    • Requires knowledge of the calculator (which comes from the yaml generated in stage 1) as the file format of the generated structure files is dependent on the calculator
  3. pm_uep_opt/pm_dftb_opt
    • CASTEP calculator did not have a galaxy wrapper, so to run CASTEP one would need to do this outside of Galaxy
    • Optimsers have different requirements (i.e. DFTB+ needs to be installed) and parameters (additional files only needed for UEP
  4. pm_muairss_read
    • At this point the process is calculator independent (as we can simply read the files provided and rely on the yaml file present in the zipped output from stage 3

Impact of change

As of this change, the core UEP muAIRSS workflow only requires:

  1. pm_uep_opt
    • This now does the work of the first three stages
    • Parameters in the UI are split into "Generation" of the muonated structures and "Optimisation" of the muon positions
    • All options for choosing the calculator, or not relevant to the UEP method, or not relevant for running in Galaxy have been removed
  2. pm_muairss_read
    • This remains calculator agnostic
    • Parameters relating to clustering have been moved here from the previous yaml tool

Having the parameters only feature at the point where they are relevant (rather than in a dedicated "config" tool) is more similar to the other PyMuonSuite tools (pm_nq, pm_asephonons), and has other benefits (and drawbacks) as described in #99, #126. Hopefully this change also makes the UI more intuitive for a user, however this is speculation at this point.

It's also worth noting that these (or other significant changes to the stopping site tools) will mean changes are required to the WIP training materials.

Alternative approaches

Other possible approaches/changes I considered that might be preferable instead of / in addition to this change:

  • Make the yaml config an output dataset of the tools and/or make it possible to provide an existing yaml parameter input instead of filling out the form.
    • Might make re-running a tool, or running outside of Galaxy easier
    • Re-running can be achieved with the "run tool again" option in your history, which uses the same config options as before
    • Would either have to perform validation on the provided file, or be prepared to add default arguments/overwrite if options are missing/incompatible (which is additional logic to write and potentially unclear to the user)
  • Keep the write and opt tools separate
    • Would be more similar to the operation on the command line, where the write and opt operations are run as two commands
    • Would make it possible to run a different optimisation on the same generated files (but generating files should be quick compared to the optimisation, so could just re-run both?)
    • Due to the fact that write needs to know the right file format for the calculator, you would either need to re-introduce a selector for the calculator (the main motivation behind this change) or have the write tool implicitly be tied to only producing a UEP compatible output. If we wanted to maintain/add support for DFTB+/CASTEP, we would then need 3 separate write tools and 3 separate opt tools, as opposed to just one write/opt tool for each calculator.
  • Changes to existing dftb_opt/yaml_config tools:
    • Did not alter these as part of this change, with intent being to avoid allocate development effort to them and focus on UEP going forward.
    • In principle could apply the same changes to the dftb_opt tool
    • Intend to preserve these in the repo as legacy/deprecated, but with the option to develop them in the future if priorities change
    • Within MuonGalaxy, we can move them to a section in the tool panel to reflect this, while maintaining them for compatability with old, existing workflows etc.

Comments and opinions on this change are welcome, as it was not fully discussed in the project meeting.

Have deployed these updated versions under the Tools Under Development section on the dev instance.

Copy link
Contributor

@leandro-liborio leandro-liborio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there,

Following our meeting of 14/03/2023, here is a summary of my comments.

  • I think we can remove the keyword "supercell" from the generation parameters
  • We need to fix the den_fmt data type, as Galaxy does not recognise this data type. The problem seems to be that, when we developed the data type, we used a non-polarised calculation and the header of the file: 'END header: data is " charge spin" in units of electrons/grid_point * number of grid_points' lacks the word 'spin' in it.
  • We should be able to have H:mu as the default symbol for the muon
  • in the PyMuonSuite AIRSS Cluster tool we need to modify the options for the 'Clustering Save Type' parameter. The option 'input' is unnecessary.
  • In the same PyMuonSuite AIRSS Cluster tool we need to add an extra option. If we choose the option 'Clustering Save Type' =structure, it would be useful to provide the capacity to create a .cell supercell file.

@patrick-austin
Copy link
Contributor Author

We need to fix the den_fmt data type, as Galaxy does not recognise this data type. The problem seems to be that, when we developed the data type, we used a non-polarised calculation and the header of the file: 'END header: data is " charge spin" in units of electrons/grid_point * number of grid_points' lacks the word 'spin' in it.

Have created galaxyproject/galaxy#15796 and galaxyproject/galaxy#15797 to address this.

@leandro-liborio
Copy link
Contributor

leandro-liborio commented Mar 15, 2023

In my previous comment about the PyMuonSuite AIRSS Cluster tool, I mentioned that we need to modify the options for the 'Clustering Save Type' parameter, so that the option 'input' is removed.

I also mentioned that, in the same PyMuonSuite AIRSS Cluster tool we need to add an extra option. If we choose the option 'Clustering Save Type' =structure, it would be useful to provide the capacity to create a .cell supercell file. Furthermore, that supercell file should look like this:

#######################################################
#CASTEP cell file: struct_clusters/uep/struct_min_cluster_1/struct_min_cluster_1.cell
#Created using the Atomic Simulation Environment (ASE)#
#######################################################

%BLOCK LATTICE_CART
3.842000 0.000000 0.000000
0.000000 3.842000 0.000000
0.000000 0.000000 9.743000
%ENDBLOCK LATTICE_CART

%BLOCK POSITIONS_ABS
Cu 2.881500 0.960500 4.871500
Cu 0.960500 2.881500 4.871500
As 2.881500 0.960500 0.000000
As 0.960500 2.881500 0.000000
As 2.881500 2.881500 3.329622
As 0.960500 0.960500 6.413378
Yb 0.960500 0.960500 2.303070 SPIN=0.01
Yb 2.881500 2.881500 7.439930 SPIN=-0.01
H:mu 0.960536 0.960446 8.422762
%ENDBLOCK POSITIONS_ABS

%BLOCK SPECIES_GAMMA
radsectesla
H:mu 851615456.5978916
%ENDBLOCK SPECIES_GAMMA

%BLOCK SPECIES_MASS
AMU
H:mu 0.1134289259
%ENDBLOCK SPECIES_MASS

FIX_ALL_CELL: True

i.e.: it must have the %BLOCK SPECIES_GAMMA, %BLOCK SPECIES_MASS and FIX_ALL_CELL options included in it.

Finally: the spin in this lines is wrong. It should be 1 and -1
Yb 0.960500 0.960500 2.303070 SPIN=0.01
Yb 2.881500 2.881500 7.439930 SPIN=-0.01

@patrick-austin
Copy link
Contributor Author

We should be able to have H:mu as the default symbol for the muon

Furthermore, that supercell file should look like this:

This is already mentioned in passing on muon-spectroscopy-computational-project/pymuon-suite#61:

When clustering_save_type and clustering_save_format are set, pm-muairss read does not write the correct mu_symbol to the cell file, or include the SPECIES_MASS block with the custom particle_mass.

It's an upstream issue, PyMuonSuite is using H:mu as the symbol for the muon it's just only using it:

  • When writing the allpos file
  • When writing cell files for optimisation with CASTEP

And it is NOT using it:

  • When writing xyz/gen files for UEP/DFTB optimisation (H is hardcoded)
  • When writing the clustered structures (I suspect it just inherits the H from the files written above - so it might show as H:mu if you clustered a CASTEP calculation but that won't generally be the case)

Any changes will need to happen in PyMuonSuite (I'd recommend any further discussion on the existing issue linked above).

@patrick-austin
Copy link
Contributor Author

For supercell, this is silently unused by the UEP method (it would normally be used when generating the structures but no structure files are created and instead just the yaml needed for the UEP optimise step).

The other methods will then be clustering supercells anyway. To output supercells for UEP we would need to extend PyMuonSuite to use the supercell parameter at this stage of the process, iff the calculator is UEP (otherwise we would expand the cell twice). Once again this is an upstream issue, tracked by muon-spectroscopy-computational-project/pymuon-suite#78

@patrick-austin
Copy link
Contributor Author

With changes to PyMuonSuite (muon-spectroscopy-computational-project/pymuon-suite/pull/79 muon-spectroscopy-computational-project/pymuon-suite/pull/81), and some minor changes in the xml to account for them, it's possible to generate the following for the As4Cu2Yb2 example:

#######################################################
#CASTEP cell file: struct_clusters/uep/struct_uep_min_cluster_1.cell
#Created using the Atomic Simulation Environment (ASE)#
#######################################################

%BLOCK LATTICE_CART
 7.684000  0.000000  0.000000
 0.000000  7.684000  0.000000
 0.000000  0.000000 19.486000
%ENDBLOCK LATTICE_CART

%BLOCK POSITIONS_ABS
Cu  2.881500  0.960500  4.871500
Cu  0.960500  2.881500  4.871500
As  2.881500  0.960500  0.000000
As  0.960500  2.881500  0.000000
As  2.881500  2.881500  3.329622
As  0.960500  0.960500  6.413378
Yb  0.960500  0.960500  2.303070 SPIN=1.0 
Yb  2.881500  2.881500  7.439930 SPIN=-1.0 
Cu  2.881500  0.960500 14.614500
Cu  0.960500  2.881500 14.614500
As  2.881500  0.960500  9.743000
As  0.960500  2.881500  9.743000
As  2.881500  2.881500 13.072622
As  0.960500  0.960500 16.156378
Yb  0.960500  0.960500 12.046070 SPIN=1.0 
Yb  2.881500  2.881500 17.182930 SPIN=-1.0 
Cu  2.881500  4.802500  4.871500
Cu  0.960500  6.723500  4.871500
As  2.881500  4.802500  0.000000
As  0.960500  6.723500  0.000000
As  2.881500  6.723500  3.329622
As  0.960500  4.802500  6.413378
Yb  0.960500  4.802500  2.303070 SPIN=1.0 
Yb  2.881500  6.723500  7.439930 SPIN=-1.0 
Cu  2.881500  4.802500 14.614500
Cu  0.960500  6.723500 14.614500
As  2.881500  4.802500  9.743000
As  0.960500  6.723500  9.743000
As  2.881500  6.723500 13.072622
As  0.960500  4.802500 16.156378
Yb  0.960500  4.802500 12.046070 SPIN=1.0 
Yb  2.881500  6.723500 17.182930 SPIN=-1.0 
Cu  6.723500  0.960500  4.871500
Cu  4.802500  2.881500  4.871500
As  6.723500  0.960500  0.000000
As  4.802500  2.881500  0.000000
As  6.723500  2.881500  3.329622
As  4.802500  0.960500  6.413378
Yb  4.802500  0.960500  2.303070 SPIN=1.0 
Yb  6.723500  2.881500  7.439930 SPIN=-1.0 
Cu  6.723500  0.960500 14.614500
Cu  4.802500  2.881500 14.614500
As  6.723500  0.960500  9.743000
As  4.802500  2.881500  9.743000
As  6.723500  2.881500 13.072622
As  4.802500  0.960500 16.156378
Yb  4.802500  0.960500 12.046070 SPIN=1.0 
Yb  6.723500  2.881500 17.182930 SPIN=-1.0 
Cu  6.723500  4.802500  4.871500
Cu  4.802500  6.723500  4.871500
As  6.723500  4.802500  0.000000
As  4.802500  6.723500  0.000000
As  6.723500  6.723500  3.329622
As  4.802500  4.802500  6.413378
Yb  4.802500  4.802500  2.303070 SPIN=1.0 
Yb  6.723500  6.723500  7.439930 SPIN=-1.0 
Cu  6.723500  4.802500 14.614500
Cu  4.802500  6.723500 14.614500
As  6.723500  4.802500  9.743000
As  4.802500  6.723500  9.743000
As  6.723500  6.723500 13.072622
As  4.802500  4.802500 16.156378
Yb  4.802500  4.802500 12.046070 SPIN=1.0 
Yb  6.723500  6.723500 17.182930 SPIN=-1.0 
H:mu  0.960289  0.959916  8.423218
%ENDBLOCK POSITIONS_ABS

%BLOCK SPECIES_GAMMA
radsectesla
H:mu 851615456.5978916
%ENDBLOCK SPECIES_GAMMA

%BLOCK SPECIES_MASS
AMU
H:mu 0.1134289259
%ENDBLOCK SPECIES_MASS

KPOINT_MP_GRID: 1 1 1
FIX_ALL_CELL: True

I believe this addresses all the above concerns, but if not it's likely any further changes will need to be in PyMuonSuite rather than here. It's also worth saying that until we release the changes to PyMuonSuite in a new version this PR will be blocked.

@patrick-austin
Copy link
Contributor Author

Have updated the tools to use the new 0.3.0 version of PyMuonSuite, which should enable the tools to produce clustered cell files with supercells and supplementary muon information, e.g. https://github.com/muon-spectroscopy-computational-project/muon-galaxy-tools/blob/patrick/128_stopping_site_changes/pm_muairss_read/test-data/Si_uep_min_cluster_1.cell

Following discussion with @leandro-liborio, have also altered how we run the UEP optimisation so that all sites are calculated in parallel rather than sequentially. On my development instance this resulted in a speedup from around 3 seconds per site to 1 second per site for a simple Si example. The exact impact will depend on how we configure our job submission in production (i.e. how many cores we allocate to UEP jobs).

Attempted to bump the MSS version as well, but ran into muon-spectroscopy-computational-project/muspinsim/issues/86 so left it at 2.2.1 until that can be fixed.

@patrick-austin patrick-austin merged commit 66e3c8e into dev Jul 18, 2023
18 checks passed
@patrick-austin patrick-austin deleted the patrick/128_stopping_site_changes branch July 18, 2023 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove support for non-UEP muAIRSS methods Consider combining pm_yaml_config and pm_muairss_write
3 participants