Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in place and multithread support for more of the MolStandardize code #6970

Merged
merged 8 commits into from Dec 12, 2023

Conversation

greglandrum
Copy link
Member

This PR is primarily aimed at getting the various parent operations working in place and to provide multi-threaded versions of them.

@greglandrum greglandrum added this to the 2023_09_4 milestone Dec 8, 2023
Copy link
Contributor

@ptosco ptosco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left in a few comments.

double weight = Descriptors::calcExactMW(*frag);
if (l.Fragment != nullptr &&
(!this->useAtomCount || numatoms == l.NumAtoms) &&
double weight = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not required, but I think it looks tidier:

Suggested change
double weight = 0;
double weight = 0.0;

const auto atom = mol.getAtomWithIdx(idx);
// it's not important to be perfect here
weight += 100 * atom->getAtomicNum() + atom->getIsotope() -
atom->getFormalCharge() * .1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making weight depend on formal charge looks a bit of a hack, and could lead to unintentional ties (10 formal charges count like one neutron). Wouldn't it be better to leave formal charge out of the weight computation and rather add it as an additional compare criterion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is clearly a heuristic (sounds nicer than "hack") and it doesn't have to be exactly correct, just reproducible.

Plus, how often are you going to have a molecule with two fragments which have the same number of atoms and the same atoms but which differ by one neutron and a net formal charge difference of 10?

I think it's a theoretical risk. :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a hack :-) but I am fine with it.

void tautomerParentInPlace(std::vector<RWMol *> &mols, int numThreads,
const CleanupParameters &params,
bool skip_standardize) {
auto sfunc = [&](RWMol &m, const CleanupParameters &ps) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You actually only need to capture skip_standardize.

void fragmentParentInPlace(std::vector<RWMol *> &mols, int numThreads,
const CleanupParameters &params,
bool skip_standardize) {
auto sfunc = [&](RWMol &m, const CleanupParameters &ps) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You actually only need to capture skip_standardize.

void stereoParentInPlace(std::vector<RWMol *> &mols, int numThreads,
const CleanupParameters &params,
bool skip_standardize) {
auto sfunc = [&](RWMol &m, const CleanupParameters &ps) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You actually only need to capture skip_standardize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your keyboard got stuck! ;-)

@@ -291,7 +366,7 @@ void reionizeInPlace(RWMol &mol, const CleanupParameters &params) {
std::unique_ptr<Reionizer> reionizer{reionizerFromParams(params)};
reionizer->reionizeInPlace(mol);
}
void reionizeInPlace(std::vector<RWMol *> &mols,int numThreads,
void reionizeInPlace(std::vector<RWMol *> &mols, int numThreads,
const CleanupParameters &params) {
std::unique_ptr<Reionizer> reionizer{reionizerFromParams(params)};
auto sfunc = [&](RWMol &m, const CleanupParameters &) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's there is no need to capture anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, it needs to capture reionizer, but only that

@@ -311,7 +386,7 @@ void removeFragmentsInPlace(RWMol &mol, const CleanupParameters &params) {
remover->removeInPlace(mol);
}

void removeFragmentsInPlace(std::vector<RWMol *> &mols,int numThreads,
void removeFragmentsInPlace(std::vector<RWMol *> &mols, int numThreads,
const CleanupParameters &params) {
std::unique_ptr<FragmentRemover> remover{fragmentRemoverFromParams(params)};
auto sfunc = [&](RWMol &m, const CleanupParameters &) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's there is no need to capture anything.

if (params) {
ps = python::extract<RDKit::MolStandardize::CleanupParameters *>(params);
}
func(*static_cast<RDKit::RWMol *>(mol), *ps, skip_standardize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one can cast a ROMol* to RWMol* without Bad Consequences? I'll keep that in mind if I do not need a copy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it is officially UB. But in the RDKit the only difference between RWMol and ROMol is that an RWMol has a few more methods. There are no additional data members, so I think that doing that cast is as safe as UB ever can be.

void mtinPlaceHelper2(python::object pymols, int numThreads,
python::object params, bool skip_standardize,
FUNCTYPE func) {
const RDKit::MolStandardize::CleanupParameters *ps =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const RDKit::MolStandardize::CleanupParameters *ps =
const auto ps =

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one actually isn't so easy since we plan to reassign ps later. It needs to be const auto *ps

unsigned int nmols = python::extract<unsigned int>(pymols.attr("__len__")());
std::vector<RDKit::RWMol *> mols(nmols);
for (auto i = 0u; i < nmols; ++i) {
RDKit::RWMol *mol = static_cast<RDKit::RWMol *>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RDKit::RWMol *mol = static_cast<RDKit::RWMol *>(
auto mol = static_cast<RDKit::RWMol *>(

@greglandrum greglandrum merged commit c7c9ad3 into rdkit:master Dec 12, 2023
10 checks passed
@greglandrum greglandrum deleted the feat/mt_fragmentchooser branch December 12, 2023 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants