-
Notifications
You must be signed in to change notification settings - Fork 845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cleanup of the SMILES/SMARTS parsing and writing code #2912
cleanup of the SMILES/SMARTS parsing and writing code #2912
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tidying.
} | ||
|
||
std::string res; | ||
unsigned int nAtoms = tmol->getNumAtoms(); | ||
UINT_VECT ranks(nAtoms); | ||
std::vector<unsigned int> ranks(nAtoms); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice. I personally find this much more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but back in the day when we were chiseling the code into stone tablets it was important to save those characters!
;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dan asked me to have a look at this, too. Looks good; I just added a couple suggestions :)
There's still one thing that I don't really like, which is the alternation of unsigned int
and int
for atom indexes, but that's not going to go away in one commit. Still, it would be good to decide which should be used, and start updating it.
@@ -241,14 +243,15 @@ void AUTOCORR3D(const ROMol& mol, std::vector<double>& res, int confId, | |||
double* dist3D = | |||
MolOps::get3DDistanceMat(mol, confId, false, true); // 3D distance matrix | |||
if (customAtomPropName != "") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe if (!customAtomPropName.empty()) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. I did a few of those, but no doubt missed a few.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd make sense to get these in one big sweep through the code in a different PR though
std::vector<unsigned int> atomOrdering; | ||
|
||
if (canonical) { | ||
if (tmol->hasProp("_canonicalRankingNumbers")) { | ||
for (unsigned int i = 0; i < tmol->getNumAtoms(); ++i) { | ||
for (const auto atom : tmol->atoms()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth moving to detail::common_properties
in types.h ?
Also, I can't find where this property is set. Is it some legacy property from older code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's an undocumented way to allow you to provide the atom ranking to be used in canonicalization
bool *numSwapsChiralAtoms = (bool *)malloc(nAtoms * sizeof(bool)); | ||
CHECK_INVARIANT(numSwapsChiralAtoms, "failed to allocate memory"); | ||
memset(numSwapsChiralAtoms, 0, nAtoms * sizeof(bool)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one had been making me uncomfortable for some time...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I wasn't sad to remove that
Agreed. That would be a nice one to cleanup/rationalize. It's a non-zero amount of effort but might be a good one for the next time I have a chunk of time to devote to cleaning up code. |
Code/GraphMol/Canon.cpp
Outdated
boost::make_iterator_range(mol.getAtomBonds(atom))) { | ||
// can't just check for single bonds, because dative bonds also have an | ||
// order of 1 | ||
if (mol[bndItr]->getBondTypeAsDouble() > 1.001) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.0 is perfectly representable in floating point. I think you can just do getBondTypeAsDouble() > 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
Code/GraphMol/Canon.cpp
Outdated
ROMol &mol = dblBond->getOwningMol(); | ||
int firstVisitOrder = mol.getNumBonds() + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be unsigned int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing this, along with a number of other signed/unsigned things
double dtmp; | ||
|
||
for (int i = 0; i < 10; i++) { | ||
double* Bimat = GetGeodesicMatrix(topologicaldistance, i + 1, numAtoms); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unique_ptr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing a general refactoring pass through the 3D descriptors at some point would be a really good idea, but I'd rather not get too deeply into that in this PR. I just did mechanical things (the replacement of boost::math functionality) this time through
|
||
std::vector<double> customAtomarray(numAtoms, 0.0); | ||
for (int i = 0; i < numAtoms; ++i) { | ||
if (mol.getAtomWithIdx(i)->hasProp(customAtomPropName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially could use
std::vector<double> customAtomarray(numAtoms, 1.0);
for (auto &atom : mol.atoms()) {
atom.GetPropIfPresent(customAtomPropName, customAtomArray[atom.getIdx()]);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
Code/GraphMol/Descriptors/WHIM.cpp
Outdated
@@ -172,7 +172,8 @@ std::vector<double> getWhimD(std::vector<double> weightvector, | |||
} | |||
if (std::fabs(Scores(j, i) + Scores(k, i)) <= th) { | |||
// those that are close opposite & not close to the axis! | |||
ns += 1; // check only once the symmetric none null we need to add +2! | |||
ns += |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
odd formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that was clang-format. I just moved the comments around and that cleared up
auto start_tok = static_cast<int>(START_BOND); | ||
std::vector<RWMol *> molVect; | ||
Atom *atom = nullptr; | ||
auto res = smarts_parse_helper(inp, molVect, atom, bond, start_tok); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can simplify to
return smarts_parse_helper(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I often leave the local variable there because it makes debugging easier. That's not particularly relevant here
mol->setBondBookmark(b, atIt->first); | ||
frag->clearBondBookmark(atIt->first, b); | ||
mol->setBondBookmark(b, atIt.first); | ||
frag->clearBondBookmark(atIt.first, b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does/Can this invalidate the iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. The only range/iterator loops here are lines 177 and 181 and those both are over atom bookmarks. Here we're removing bond bookmarks.
@@ -447,8 +432,8 @@ void CloseMolRings(RWMol *mol, bool toleratePartials) { | |||
Bond *bond2 = *bondIt; | |||
|
|||
// remove those bonds from the bookmarks: | |||
mol->clearBondBookmark(bookmarkIt->first, bond1); | |||
mol->clearBondBookmark(bookmarkIt->first, bond2); | |||
mol->clearBondBookmark(bookmark.first, bond1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this invalidate the iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same story here, the loop is over atom bookmarks
switch (mSE.type) { | ||
case Canon::MOL_STACK_ATOM: | ||
if (!ringClosuresToErase.empty()) { | ||
BOOST_FOREACH (unsigned int rclosure, ringClosuresToErase) { | ||
for (auto rclosure : ringClosuresToErase) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ringClosuresToErase.empty() call is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct
@@ -137,9 +137,8 @@ class RDKIT_RDGENERAL_EXPORT Dict { | |||
*/ | |||
STR_VECT keys() const { | |||
STR_VECT res; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small optimization maybe
res.reserve(_data.size());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
Code/RDGeneral/Dict.h
Outdated
@@ -170,10 +170,18 @@ class RDKIT_RDGENERAL_EXPORT Dict { | |||
} | |||
} | |||
throw KeyErrorException(what); | |||
} | |||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semi colon isn't necessary. Mildly surprised this isn't an error.
Also in the function above here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's legal syntax (and is used pretty regularly in the RDKit), but it's not needed.
https://stackoverflow.com/questions/9997895/semicolon-after-function
I'll remove it here and will try to remember to remove those trailing semicolons in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A semicolon after a function (non class) only became legal in C++11 I think. It doesn't really matter except for consistency and as long as we don't start seeing:
void foo() {
};;;;;;;;; // Hi Brian!
@greglandrum just a few minor comments, otherwise looks good. There is one suspicious maybe iterator invalidation question. |
…landrum/rdkit into dev/refactor_smiles_writing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to make the 1 -> 1.0 change go for it. Otherwise I approve.
@@ -23,7 +23,7 @@ bool isUnsaturated(const Atom *atom, const ROMol &mol) { | |||
boost::make_iterator_range(mol.getAtomBonds(atom))) { | |||
// can't just check for single bonds, because dative bonds also have an | |||
// order of 1 | |||
if (mol[bndItr]->getBondTypeAsDouble() > 1.001) { | |||
if (mol[bndItr]->getBondTypeAsDouble() > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still should be a double value if only to silence warning :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That actually doesn't produce a warning on any the platforms where we build.
There's really no reason to warn on double > int
or double > long int
Not a major refactoring, but a bunch of cleanup work.
Along the way some bugs/new ideas came up and I dealt with those here too:
RDKit::round
in favor ofstd::round