New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add in place and multithread support for more of the MolStandardize code #6970
Add in place and multithread support for more of the MolStandardize code #6970
Conversation
multithreaded inplace support still needs more looking at pre-PR
should have noticed this a while ago
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left in a few comments.
double weight = Descriptors::calcExactMW(*frag); | ||
if (l.Fragment != nullptr && | ||
(!this->useAtomCount || numatoms == l.NumAtoms) && | ||
double weight = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is not required, but I think it looks tidier:
double weight = 0; | |
double weight = 0.0; |
const auto atom = mol.getAtomWithIdx(idx); | ||
// it's not important to be perfect here | ||
weight += 100 * atom->getAtomicNum() + atom->getIsotope() - | ||
atom->getFormalCharge() * .1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making weight depend on formal charge looks a bit of a hack, and could lead to unintentional ties (10 formal charges count like one neutron). Wouldn't it be better to leave formal charge out of the weight computation and rather add it as an additional compare criterion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is clearly a heuristic (sounds nicer than "hack") and it doesn't have to be exactly correct, just reproducible.
Plus, how often are you going to have a molecule with two fragments which have the same number of atoms and the same atoms but which differ by one neutron and a net formal charge difference of 10?
I think it's a theoretical risk. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a hack :-) but I am fine with it.
void tautomerParentInPlace(std::vector<RWMol *> &mols, int numThreads, | ||
const CleanupParameters ¶ms, | ||
bool skip_standardize) { | ||
auto sfunc = [&](RWMol &m, const CleanupParameters &ps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You actually only need to capture skip_standardize
.
void fragmentParentInPlace(std::vector<RWMol *> &mols, int numThreads, | ||
const CleanupParameters ¶ms, | ||
bool skip_standardize) { | ||
auto sfunc = [&](RWMol &m, const CleanupParameters &ps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You actually only need to capture skip_standardize
.
void stereoParentInPlace(std::vector<RWMol *> &mols, int numThreads, | ||
const CleanupParameters ¶ms, | ||
bool skip_standardize) { | ||
auto sfunc = [&](RWMol &m, const CleanupParameters &ps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You actually only need to capture skip_standardize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your keyboard got stuck! ;-)
@@ -291,7 +366,7 @@ void reionizeInPlace(RWMol &mol, const CleanupParameters ¶ms) { | |||
std::unique_ptr<Reionizer> reionizer{reionizerFromParams(params)}; | |||
reionizer->reionizeInPlace(mol); | |||
} | |||
void reionizeInPlace(std::vector<RWMol *> &mols,int numThreads, | |||
void reionizeInPlace(std::vector<RWMol *> &mols, int numThreads, | |||
const CleanupParameters ¶ms) { | |||
std::unique_ptr<Reionizer> reionizer{reionizerFromParams(params)}; | |||
auto sfunc = [&](RWMol &m, const CleanupParameters &) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's there is no need to capture anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, it needs to capture reionizer
, but only that
@@ -311,7 +386,7 @@ void removeFragmentsInPlace(RWMol &mol, const CleanupParameters ¶ms) { | |||
remover->removeInPlace(mol); | |||
} | |||
|
|||
void removeFragmentsInPlace(std::vector<RWMol *> &mols,int numThreads, | |||
void removeFragmentsInPlace(std::vector<RWMol *> &mols, int numThreads, | |||
const CleanupParameters ¶ms) { | |||
std::unique_ptr<FragmentRemover> remover{fragmentRemoverFromParams(params)}; | |||
auto sfunc = [&](RWMol &m, const CleanupParameters &) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's there is no need to capture anything.
if (params) { | ||
ps = python::extract<RDKit::MolStandardize::CleanupParameters *>(params); | ||
} | ||
func(*static_cast<RDKit::RWMol *>(mol), *ps, skip_standardize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So one can cast a ROMol*
to RWMol*
without Bad Consequences? I'll keep that in mind if I do not need a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it is officially UB. But in the RDKit the only difference between RWMol and ROMol is that an RWMol has a few more methods. There are no additional data members, so I think that doing that cast is as safe as UB ever can be.
void mtinPlaceHelper2(python::object pymols, int numThreads, | ||
python::object params, bool skip_standardize, | ||
FUNCTYPE func) { | ||
const RDKit::MolStandardize::CleanupParameters *ps = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const RDKit::MolStandardize::CleanupParameters *ps = | |
const auto ps = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one actually isn't so easy since we plan to reassign ps
later. It needs to be const auto *ps
unsigned int nmols = python::extract<unsigned int>(pymols.attr("__len__")()); | ||
std::vector<RDKit::RWMol *> mols(nmols); | ||
for (auto i = 0u; i < nmols; ++i) { | ||
RDKit::RWMol *mol = static_cast<RDKit::RWMol *>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RDKit::RWMol *mol = static_cast<RDKit::RWMol *>( | |
auto mol = static_cast<RDKit::RWMol *>( |
This PR is primarily aimed at getting the various parent operations working in place and to provide multi-threaded versions of them.