feat: parallelize viscosity calculation by ltalirz · Pull Request #174 · glasagent/amorphouspy

ltalirz · 2026-03-31T10:47:11Z

No description provided.

github-actions · 2026-03-31T10:48:29Z

Coverage Report

File	Stmts	Miss	Cover	Missing
__init__.py	5	3	40%	5–7
app.py	42	7	83%	36–40, 80–81
config.py	18	4	77%	17–19, 37
database.py	107	8	92%	73, 117, 128–129, 210, 212, 214, 234
executor.py	43	28	34%	48, 50, 57–59, 61, 66, 75–87, 101–104, 113, 126, 128, 134
models.py	191	17	91%	67, 80–82, 88, 90, 94–104
routers
__init__.py	3	0	100%
glasses.py	38	0	100%
jobs.py	148	38	74%	105–107, 192–193, 213, 245, 267, 289, 295, 299, 320, 322, 324–327, 330–332, 334–335, 340, 342–343, 345–346, 351–352, 354–356, 358, 360–364
jobs_helpers.py	272	153	43%	76, 134–135, 190, 192, 196, 202, 239, 242–243, 289–291, 298–300, 308–316, 320–322, 326–335, 344–345, 348–349, 353–360, 362–366, 369–372, 374, 376–381, 383–384, 389, 391–393, 395–397, 399–404, 406–411, 428–432, 434–444, 449–453, 455–456, 458–461, 463–464, 474, 478, 480, 482, 484, 494–499, 501, 504–506, 509–511, 513, 516–522, 525, 527, 538–540, 543–546, 549, 551
workflows
__init__.py	55	40	27%	41–42, 52–53, 58–61, 72–77, 94, 96–98, 101, 103–105, 108–110, 120, 123–127, 129, 139–140, 151–155, 157
meltquench.py	24	17	29%	39–41, 43–45, 47, 63, 65–66, 68–69, 71–73, 75, 88
workflows/analyses
__init__.py	5	0	100%
cte.py	100	88	12%	22–23, 25–27, 29–30, 44, 71, 73, 80, 107, 109, 111, 132, 134–144, 146–148, 157–159, 161–162, 166–172, 174, 184, 197–198, 200, 216–218, 220–221, 223, 225, 253–254, 256, 258–262, 264–266, 268–273, 275–276, 278, 304, 306, 309–310, 312–318, 321–323, 325
elastic.py	35	27	22%	22, 24, 26, 28, 36, 50–52, 54, 67–72, 74, 94–95, 124, 126, 128–130, 132–134, 136
meltquench_viz.py	90	90	0%	3, 5–8, 10, 13, 15–20, 23, 32, 34, 36–38, 40–55, 57, 60, 67–69, 71, 80–82, 89, 92–94, 101, 107–111, 113–114, 116–118, 122–123, 125, 127, 129, 131, 134–137, 139, 142–149, 152, 154, 156, 158, 160, 163–164, 167–168, 170, 172, 214
structure.py	43	35	18%	19, 21–22, 32–35, 37, 39, 41–47, 57, 62–63, 66–69, 72–76, 79–81, 83–85, 87
viscosity.py	125	108	13%	30, 32, 34, 81–82, 85–86, 88–93, 95, 97–98, 101–102, 114–115, 118–119, 132–133, 135–140, 142, 164, 166–168, 170–172, 174–175, 177–180, 192–194, 196, 201, 203–205, 207–208, 220–221, 223, 236–237, 271, 274–278, 280, 289–295, 297, 306–310, 312, 322–324, 348–350, 378, 389–391, 411, 422–424, 444, 446–450, 452–459
TOTAL	1344	663	50%

ltalirz · 2026-03-31T10:50:31Z

@jan-janssen So far I just parallelized at the analysis level, but since the viscosity analysis is too slow, we also need to parallelize inside.

In this PR I special-case the viscosity so I can still submit slurm jobs for its internal steps, but it's not very elegant.

What is the suggested approach with executorlib for cases like these?

jan-janssen · 2026-04-13T07:50:25Z

@ltalirz There are two constraints from my side:

One is the SLURM job manager - Is there any limit how small the jobs should be? In terms of compute hours / minutes and number of cores? In principle, executorlib can have jobs in the order of 1 minute of compute time submitted, but this would result in a large number of small jobs. Instead we commonly used nested executors: https://executorlib.readthedocs.io/en/latest/3-hpc-job.html#slurm-with-flux
Two is the caching - For every job submitted to SLURM executorlib creates one HDF5 file, while for nested executors we typically use socket based communication, in this case the creation of an HDF5 file is optional. For choosing the right caching level we have to identify which kind of results might be useful for a request from a different user later on.

Unless there is a strong preference for very small jobs from the job manager perspective or a drastic change in resource requirements, I would recommend to package one user request in one SLURM job and then within this SLRUM job use a nested executor to efficiently use the available resources.

ltalirz · 2026-04-13T21:08:17Z

Thanks for the detailed feedback, Jan!
Let me add a bit more details to my question and then address your points as well.

My use case here is: At the top level, it makes sense to launch one workflow per analysis, such as CTE, viscosity, elastic constants. However, within each workflow it can be necessary to parallelize again. This is the case for the viscosity workflow, we are talking about many minutes to hours of runtime for each parallel job and, importantly, the parallel jobs may have significantly different runtimes.

Now, I could submit one "viscosity" slurm job and then parallelize inside that job, but that would (a) require me to already know how many parallel jobs the viscosity workflow will want to spawn so I know how much computational resources to allocate (doable, but not elegant) and (b) it will be inefficient because the entire reserved resources will be blocked until the slowest of the parallel jobs has completed.

Alternatively, I could submit one slurm job per parallel job (which is what I do in this PR), but then I lose the "workflow wrapper" around the viscosity (no caching + I have to use a different codepath than non-parallel workflows that are directly run inside executors, see the code in this PR).

Ideally, I would like to do something like: submit a 1-core job for the outer viscosity workflow that essentially just does the steering and data merging (in AiiDA, this comes for free, it is the AiiDA daemon process), which then launches dynamically as many parallel computation jobs as needed.
Is that possible?

Re 1.: We do currently have some jobs that run just for a few seconds, where we could probably gain a bit by not running them through slurm, but they are few and the few seconds we lose here are not relevant in the big picture (on the upside we get the caching), see below

Re 2.: In this particular case, a cache of the intermediate jobs would be nice to have, but not mandatory.

jan-janssen · 2026-04-14T10:08:45Z

I am a bit confused. Based on the user input, do you know how many tasks you are going to create in the workflow? Or is the number of tasks only determined at run time? If the number of tasks is only determined at run time, then this requires hierarchical scheduling, because job schedulers like SLURM do not support this.

In terms of the comparison to AiiDA, you can create a local flux server and use it for short running tasks. In this case you would submit the whole workflow to this flux scheduler and inside this flux job submit the individual tasks to SLURM. The question for me is do you want to keep these flux jobs running while the SLURM jobs are waiting or is the task stopped after the individual steps are submitted to SLURM.

In general to me this comes down to the difference of caching and data management. Executorlib does caching, so when you submit the same task we can reload the result, still we do not claim to provide a research data management solution. So I would split the workflow into individual steps but having a hierarchy of caching is not something we provide at the moment.

feat: parallelize viscosity calculation

4528aba

ltalirz mentioned this pull request Apr 1, 2026

stress-test of workflows and API - the 1000 glass challenge #178

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: parallelize viscosity calculation#174

feat: parallelize viscosity calculation#174
ltalirz wants to merge 1 commit intomainfrom
feat/parallelize-visc

ltalirz commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

ltalirz commented Mar 31, 2026

Uh oh!

jan-janssen commented Apr 13, 2026

Uh oh!

ltalirz commented Apr 13, 2026 •

edited

Loading

Uh oh!

jan-janssen commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ltalirz commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ltalirz commented Mar 31, 2026

Uh oh!

jan-janssen commented Apr 13, 2026

Uh oh!

ltalirz commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jan-janssen commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 31, 2026 •

edited

Loading

ltalirz commented Apr 13, 2026 •

edited

Loading