Worker Job Class #497

jan-janssen · 2021-11-05T18:16:47Z

Follow up to #474

Implementing the workers as separate job class using multiprocessing

jan-janssen · 2021-11-05T18:18:40Z

Example:

from pyiron import Project
pr_worker = Project("worker")
pr_calc = Project("calc")

# Setup Worker
job_worker = pr_worker.create.job.WorkerJob("runner")
job_worker.server.run_mode.non_modal = True
job_worker.server.cores = 4
job_worker.run()

# Submit Calculation
structure = pr_calc.create.structure.ase.bulk("Al", cubic=True)
structure.set_repeat([10, 10, 10])
for i in range(10):
    job = pr_calc.create.job.Lammps("lmp_" + str(i))
    job.structure = structure
    job.server.run_mode.worker = True
    job.master_id = job_worker.job_id
    job.run()

# Monitoring 
pr_calc.job_table()

liamhuber · 2021-11-05T20:28:46Z

pyiron_base/job/worker.py

+class WorkerJob(PythonTemplateJob):
+    def __init__(self, project, job_name):
+        super(WorkerJob, self).__init__(project, job_name)
+        self.input['project'] = None


PythonTemplateJob facilitates dot notation for input now

jan-janssen · 2021-11-06T00:40:05Z

An mybinder example for this class is available https://github.com/jan-janssen/pyiron-worker

https://github.com/pyiron/aproc

jan-janssen · 2021-11-06T20:27:07Z

Waiting for conda-forge/staged-recipes#16822

pmrv · 2021-11-07T16:52:55Z

pyiron_base/job/worker.py

+                        for pp, p, job_id in path_lst
+                    ]
+                    active_job_ids += [j[2] for j in job_lst]
+                    _ = [pool.put(j) for j in job_lst]


What's the reason not to use a plain multiprocessing.Pool & map_async here? I'm a bit wary because I see that aproc uses inspect.getsource to transfer the function to the worker instead of pickle or dill.

The list is growing while the worker is already executing it, so a simple map does not work. The current combination in the background uses a Multiprocessing pool plus a queue. In addition I also tried dill and cloudpickle but both failed, this might be related to the use of decorators.

I don't understand why the growing list is a problem.

Inside the while loop the worker checks the job table for submitted jobs are assigned to it via the master_id and puts them in a list. It could them map_async the list and mark the job ids as 'running' or 'queued' or whatever. Then in the next iteration of the while loop there may be new submitted jobs which can be handled the same way or there may be submitted jobs that were already found in the last iteration, but since we marked them in the last iteration we can exclude them from the map in this iteration.

The limitation of multiple map_async calls is that they can not be executed concurrently. So the second call to map_async is only executed once the first one is finished. The async only means you get control of the main process, it does not mean that the workers are asynchronous. That is why I created the aproc interface.

From my testing this is not the case. I can call map_async multiple times just fine and as long as there are available workers in the pool. Once the pool is completely used you have to wait naturally, but I assume aproc is the same in this regard.

Here's my test script

import multiprocessing as mp def pow2(x): import time print(x) time.sleep(10) return x**2 with mp.Pool(4) as p: r1 = p.map_async(pow2, range(2)) r2 = p.map_async(pow2, range(4, 8)) r1.wait() print(r1.get()) r2.wait() print(r2.get())

with output

0 1 4 5 <nothing happens here for 10s because we're blocked on r1.wait()> 6 7 [0, 1] [16, 25, 36, 49]

That shows that the two first jobs from the second map call are run straight away and then the last two once the jobs from the first map are finished. This sounds like exactly what we need to me.

I thought I tried that before but it is working now, so I reverted to map_async - thanks again for pointing me in this direction.

jan-janssen · 2021-11-09T14:03:48Z

@pmrv and @liamhuber Any other feedback? Otherwise I would like to merge this soon to be included in the next pyiron_base release.

pmrv · 2021-11-09T14:37:21Z

I would like to have a small example in the class or module docstring on how to use the worker. The snippet above would do it already. Otherwise it looks good to me.

liamhuber · 2021-11-09T15:14:35Z

@pmrv and @liamhuber Any other feedback? Otherwise I would like to merge this soon to be included in the next pyiron_base release.

I would like to see the class appear in the unit tests. If I understand correctly, as long as the test inherits from pyiron_base._tests.PyironTestCase and the docstring_module attribute is set then example in the docstring will get run -- this is probably already sufficient. @pmrv let me know if I misspoke or if you think any additional test is necessary.

jan-janssen · 2021-11-09T15:18:49Z

I would like to see the class appear in the unit tests. If I understand correctly, as long as the test inherits from pyiron_base._tests.PyironTestCase and the docstring_module attribute is set then example in the docstring will get run -- this is probably already sufficient. @pmrv let me know if I misspoke or if you think any additional test is necessary.

As we no longer have the example job I am not exactly sure what job to test it with. The ScriptJob which is used for many other tests is not really sufficient.

liamhuber · 2021-11-09T16:02:14Z

Why would script job be insufficient?

niklassiemer · 2021-11-09T20:02:44Z

As we no longer have the example job I am not exactly sure what job to test it with. The ScriptJob which is used for many other tests is not really sufficient.

However, we have

pyiron_base/pyiron_base/_tests.py

Lines 92 to 103 in 1846711

    
           class ToyJob(PythonTemplateJob): 
        
               def __init__(self, project, job_name): 
        
                   """A toyjob to test export/import functionalities.""" 
        
                   super(ToyJob, self).__init__(project, job_name) 
        
                   self.input.data_in = 100 
        
               # This function is executed 
        
               def run_static(self): 
        
                   self.status.running = True 
        
                   self.output.data_out = self.input.data_in + 1 
        
                   self.status.finished = True 
        
                   self.to_hdf()

Maybe this is sufficient?

jan-janssen · 2021-11-09T20:16:40Z

Maybe this is sufficient?

No because the class defined in the test can not be reloaded in the subprocess on the worker. But I got it working with the script job instead.

jan-janssen and others added 3 commits November 5, 2021 08:03

Implementation of the Worker Class

1538e30

Use queue for communication

7f24fe9

Call Jobwrapper directly

8ea0acd

liamhuber reviewed Nov 5, 2021

View reviewed changes

jan-janssen and others added 3 commits November 5, 2021 16:13

fixes

b7dd27c

Close the pool

93aa3b4

Add some comments

614edc0

Use the aproc package for asynchronous multiprocessing

29b0d99

https://github.com/pyiron/aproc

jan-janssen marked this pull request as draft November 6, 2021 20:27

pmrv reviewed Nov 7, 2021

View reviewed changes

Revert to multiprocessing map_async

7537e7d

jan-janssen marked this pull request as ready for review November 9, 2021 05:12

Add DocString

508d79c

jan-janssen and others added 3 commits November 9, 2021 09:24

Add worker test

0bafa56

Test with scriptjob

f5398b6

Update test_worker.py

7f90547

Update test_worker.py

cfa455f

jan-janssen merged commit 2e193e2 into master Nov 10, 2021

delete-merged-branch bot deleted the worker branch November 10, 2021 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker Job Class #497

Worker Job Class #497

jan-janssen commented Nov 5, 2021

jan-janssen commented Nov 5, 2021 •

edited

liamhuber Nov 5, 2021

jan-janssen commented Nov 6, 2021

jan-janssen commented Nov 6, 2021

pmrv Nov 7, 2021

jan-janssen Nov 7, 2021

pmrv Nov 7, 2021

jan-janssen Nov 7, 2021

pmrv Nov 8, 2021 •

edited

jan-janssen Nov 9, 2021

jan-janssen commented Nov 9, 2021

pmrv commented Nov 9, 2021

liamhuber commented Nov 9, 2021

jan-janssen commented Nov 9, 2021

liamhuber commented Nov 9, 2021

niklassiemer commented Nov 9, 2021

jan-janssen commented Nov 9, 2021

Worker Job Class #497

Worker Job Class #497

Conversation

jan-janssen commented Nov 5, 2021

jan-janssen commented Nov 5, 2021 • edited

liamhuber Nov 5, 2021

Choose a reason for hiding this comment

jan-janssen commented Nov 6, 2021

jan-janssen commented Nov 6, 2021

pmrv Nov 7, 2021

Choose a reason for hiding this comment

jan-janssen Nov 7, 2021

Choose a reason for hiding this comment

pmrv Nov 7, 2021

Choose a reason for hiding this comment

jan-janssen Nov 7, 2021

Choose a reason for hiding this comment

pmrv Nov 8, 2021 • edited

Choose a reason for hiding this comment

jan-janssen Nov 9, 2021

Choose a reason for hiding this comment

jan-janssen commented Nov 9, 2021

pmrv commented Nov 9, 2021

liamhuber commented Nov 9, 2021

jan-janssen commented Nov 9, 2021

liamhuber commented Nov 9, 2021

niklassiemer commented Nov 9, 2021

jan-janssen commented Nov 9, 2021

jan-janssen commented Nov 5, 2021 •

edited

pmrv Nov 8, 2021 •

edited