Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processor multiprocessing #233

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mjtadema
Copy link
Contributor

@mjtadema mjtadema commented Mar 4, 2020

This is what i came up with for multiprocessing in the base run_system method.
It will always fall back to the old method if nproc is not set.
So far it seems to work fine in combination with my changed make_bonds function in #234 .

@mjtadema mjtadema mentioned this pull request Mar 4, 2020
@mjtadema
Copy link
Contributor Author

mjtadema commented Mar 4, 2020

It seems we need to add a test for run_system with multiprocessing, but I am not entirely sure where to put it..

Copy link
Member

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments. The downside of this approach is that every Processor being run creates it's own Pool of processes, which might create a lot of overhead. I'm not sure how bad it is though.
Possible alternative is to have the entry point (/bin/martinize2, the CLI) create a Pool that's given to every Processor (None default).
Still missing in this PR is the CLI flag.

@jbarnoud I need a second opinion on this. What's the best way of passing the multiprocessing option/choices to all processors? My current idea (that I conveyed to @mjtadema) is to simply set a vermouth.processors.processor.Processor class attribute, rather than muck around with __init__ and instance attributes.


class Processor:
"""
An abstract base class for processors. Subclasses must implement a
`run_molecule` method.
Has nproc attribute that is changed by a CLI flag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attributes
------------
nproc: int
    Describes how many CPUs this processor can use.

No need to mention the CLI here. It could also come from different code, a web interface, whatever.

for molecule in system.molecules:
mols.append(self.run_molecule(molecule))
system.molecules = mols
if hasattr(self, 'nproc') and self.nproc > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasattr check is not needed, since every object is guaranteed to have one.
Also, add an extra check around here that if nproc==-1 (or 0, or None, or whatever), None will be passed to Pool, making it use all CPUs.

@mjtadema
Copy link
Contributor Author

mjtadema commented Mar 4, 2020

Some small comments. The downside of this approach is that every Processor being run creates it's own Pool of processes, which might create a lot of overhead. I'm not sure how bad it is though.

Instead of setting the class attribute nproc and creating the pool in run_system, maybe we can set the attribute to be the actual pool itself. That way the same pool should be shared among all processors, no?

@pckroon
Copy link
Member

pckroon commented Mar 4, 2020

Yes, that would be an option.

@fgrunewald
Copy link
Member

stale PR; also it's so far behind it will probably be easier to make a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants