New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev/reaction enumeration #1111
Dev/reaction enumeration #1111
Conversation
Let me know if I should just squash all this stuff btw and resubmit. |
I have github setup to squash things on merge, so no worries about that. I just re-triggered the appveyor build to see if that was a random failure (like, I suspect, the python3.4 failure was). I'm traveling and don't have my windows machine with me, so me looking at that would have to wait for the weekend. |
There is an infinite iterator bug in python3 anyway, that will take me a while to sort out. |
@greglandrum - I'm going to add a proper options class to the enumerator and a constructor that takes a serialization string, so expect at least one more update. I'm also going to allow setting a new enumeration strategy after the fact, I've already been annoyed that it doesn't exist ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not done yet, but I wanted to get something submitted in the near term while I finish reading through
@@ -1,21 +1,42 @@ | |||
find_package(Boost 1.39.0 COMPONENTS serialization REQUIRED) | |||
message("==> ${Boost_SERIALIZATION_LIBRARY}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably don't need this message
|
||
rdkit_test(testReaction testReaction.cpp LINK_LIBRARIES | ||
ChemReactions FilterCatalog ChemTransforms Descriptors Fingerprints Subgraphs DataStructs Depictor FileParsers SmilesParse SubstructMatch | ||
${Boost_SERIALIZATION_LIBRARY} ChemReactions ChemTransforms Descriptors Fingerprints Subgraphs DataStructs Depictor FileParsers SmilesParse SubstructMatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The boost serialization library should probably be at the end of the list of libraries. Otherwise I think we'll get undefined symbols if statically linking on (at least) windows.
namespace RDKit { | ||
//! BBS - Helper typedef for holding buliding blocks for reactions | ||
//! holds vectors of reagents for each reactant in a Reaction | ||
typedef std::vector<MOL_SPTR_VECT> BBS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how generic the names BBS and RGROUPS are, I think they should be in their own namespace. How about RDKit::EnumerationTypes::BBS and RDKit::EnumerationTypes::RGROUPS ?
|
||
operator bool() const { return hasNext(); } | ||
|
||
EnumerationStrategyBase *Clone() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An aside for us to try to remember later when we do API cleanups:
This method is called copy()
on atoms and bonds.
You've used Clone()
here and in the FilterCatalog.
We should probably make that consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll deprecate clone in FilterCatalog and switch over to copy.
return m_permutation; | ||
} | ||
|
||
size_t getPermutationIdx() const { return m_numPermutationsProcessed; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another consistency thing to think about (though I won't insist pre-merge, I am pretty certaion to change it post-merge): size_t isn't used anywhere else in the RDKit API (at least it should be). We tend to use unsigned int or one of the boost::uint
typedefs when the size is important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint64_t is more appropriate here since the number of permutations can be really large here. I'll make the changes.
} | ||
} | ||
|
||
void next(size_t rowToIncrement) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method ought to have a PRECONDITION
ensuring that rowToIncrement
is valid
setState(m_initialState); | ||
} | ||
|
||
std::vector<std::vector<std::string> > EnumerateLibraryBase::nextSmiles() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the rest of the code would take the vector to be returned as a reference parameter. As long as we aren't using C++11, doesn't returning a vector of vectors of strings do rather a lot of copying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: I made it this far in the first part of the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is return value optimized optimized by copy elision.
http://en.cppreference.com/w/cpp/language/copy_elision
It can also be a move constructor in c++11. In any-case if there is a copy, it is really low cost for standard returns and will not be a copy in c++11 and 14.
//! building_blocks.push_back( BBS[0][groups[0] ); | ||
//! building_blocks.push_back( BBS[1][groups[1] ); | ||
//! rxn.runReactants( building_blocks ); | ||
typedef std::vector<size_t> RGROUPS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that these are going to be serialized at some point, right?
If you serialize a word-size dependent value like size_t on a 64bit system and then deserialize on a 32bit system, is boost::serialize smart enough to deal with that?
(This is part of my general unhappiness with size_t being used in the public API).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RGRoups aren't serialized, they are just the index into the bb list so they need to be of vector::size_t type or larger. I agree completely that size_t should only be used for vector types and not part of the api. I'll make these uint64_t.
As an aside, I'm using a text archive which is the most portable (if slower) version. There is a portable binary archive, but since most of the data is in the molecule pickle, I don't think it is necessary.
@bp-kelley : looks like that broke the travis build |
get the tests working on windows
That did it. I think this is ready to merge (and then to tag as "beta" for this release). I'll do that as soon as the travis build pass. |
I'm going to make sure the tutorial still works, that will take about 10-20 minutes, but we are ready to go. Where should I put Glare by the way? Contrib? |
yeah, I think we should start with GLARE in contrib. We can move it to somewhere in rdkit.Chem once we have a bit more experience with it? |
Ok, let me know when to merge. I should be around for about another hour |
@greglandrum - Ready to go! Hold on, moving Glare... |
Ok! Thanks for spending the time reviewing, it was quite useful and necessary :) |
Thanks for the code... it's cool stuff. And I think we'll probably do a bit more tweaking still. :-) |
@greglandrum I think I already have done some tweaking, I had to stop and pull the trigger at some point :) |
@bp-kelley yeah, but I want to play too! ;-) |
Finally in a mergable state, an enumeration toolkit! See the ipython notebook in the ChemReactions/tutorial directory for more details.
Hopefully it builds on windows :)