Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev/reaction enumeration #1111

Merged
merged 57 commits into from Nov 5, 2016
Merged

Conversation

bp-kelley
Copy link
Contributor

Finally in a mergable state, an enumeration toolkit! See the ipython notebook in the ChemReactions/tutorial directory for more details.

Hopefully it builds on windows :)

@bp-kelley
Copy link
Contributor Author

Let me know if I should just squash all this stuff btw and resubmit.

@greglandrum
Copy link
Member

I have github setup to squash things on merge, so no worries about that.

I just re-triggered the appveyor build to see if that was a random failure (like, I suspect, the python3.4 failure was). I'm traveling and don't have my windows machine with me, so me looking at that would have to wait for the weekend.

@bp-kelley
Copy link
Contributor Author

There is an infinite iterator bug in python3 anyway, that will take me a while to sort out.

@bp-kelley
Copy link
Contributor Author

@greglandrum - I'm going to add a proper options class to the enumerator and a constructor that takes a serialization string, so expect at least one more update.

I'm also going to allow setting a new enumeration strategy after the fact, I've already been annoyed that it doesn't exist ;)

Copy link
Member

@greglandrum greglandrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not done yet, but I wanted to get something submitted in the near term while I finish reading through

@@ -1,21 +1,42 @@
find_package(Boost 1.39.0 COMPONENTS serialization REQUIRED)
message("==> ${Boost_SERIALIZATION_LIBRARY}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need this message


rdkit_test(testReaction testReaction.cpp LINK_LIBRARIES
ChemReactions FilterCatalog ChemTransforms Descriptors Fingerprints Subgraphs DataStructs Depictor FileParsers SmilesParse SubstructMatch
${Boost_SERIALIZATION_LIBRARY} ChemReactions ChemTransforms Descriptors Fingerprints Subgraphs DataStructs Depictor FileParsers SmilesParse SubstructMatch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The boost serialization library should probably be at the end of the list of libraries. Otherwise I think we'll get undefined symbols if statically linking on (at least) windows.

namespace RDKit {
//! BBS - Helper typedef for holding buliding blocks for reactions
//! holds vectors of reagents for each reactant in a Reaction
typedef std::vector<MOL_SPTR_VECT> BBS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how generic the names BBS and RGROUPS are, I think they should be in their own namespace. How about RDKit::EnumerationTypes::BBS and RDKit::EnumerationTypes::RGROUPS ?


operator bool() const { return hasNext(); }

EnumerationStrategyBase *Clone() const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An aside for us to try to remember later when we do API cleanups:
This method is called copy() on atoms and bonds.
You've used Clone() here and in the FilterCatalog.
We should probably make that consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll deprecate clone in FilterCatalog and switch over to copy.

return m_permutation;
}

size_t getPermutationIdx() const { return m_numPermutationsProcessed; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another consistency thing to think about (though I won't insist pre-merge, I am pretty certaion to change it post-merge): size_t isn't used anywhere else in the RDKit API (at least it should be). We tend to use unsigned int or one of the boost::uint typedefs when the size is important

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint64_t is more appropriate here since the number of permutations can be really large here. I'll make the changes.

}
}

void next(size_t rowToIncrement) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method ought to have a PRECONDITION ensuring that rowToIncrement is valid

setState(m_initialState);
}

std::vector<std::vector<std::string> > EnumerateLibraryBase::nextSmiles() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the rest of the code would take the vector to be returned as a reference parameter. As long as we aren't using C++11, doesn't returning a vector of vectors of strings do rather a lot of copying?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: I made it this far in the first part of the review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is return value optimized optimized by copy elision.

http://en.cppreference.com/w/cpp/language/copy_elision

It can also be a move constructor in c++11. In any-case if there is a copy, it is really low cost for standard returns and will not be a copy in c++11 and 14.

//! building_blocks.push_back( BBS[0][groups[0] );
//! building_blocks.push_back( BBS[1][groups[1] );
//! rxn.runReactants( building_blocks );
typedef std::vector<size_t> RGROUPS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that these are going to be serialized at some point, right?
If you serialize a word-size dependent value like size_t on a 64bit system and then deserialize on a 32bit system, is boost::serialize smart enough to deal with that?
(This is part of my general unhappiness with size_t being used in the public API).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RGRoups aren't serialized, they are just the index into the bb list so they need to be of vector::size_t type or larger. I agree completely that size_t should only be used for vector types and not part of the api. I'll make these uint64_t.

As an aside, I'm using a text archive which is the most portable (if slower) version. There is a portable binary archive, but since most of the data is in the molecule pickle, I don't think it is necessary.

@greglandrum
Copy link
Member

@bp-kelley : looks like that broke the travis build

@greglandrum
Copy link
Member

That did it. I think this is ready to merge (and then to tag as "beta" for this release). I'll do that as soon as the travis build pass.

@bp-kelley
Copy link
Contributor Author

I'm going to make sure the tutorial still works, that will take about 10-20 minutes, but we are ready to go.

Where should I put Glare by the way? Contrib?

@greglandrum
Copy link
Member

yeah, I think we should start with GLARE in contrib. We can move it to somewhere in rdkit.Chem once we have a bit more experience with it?

@greglandrum
Copy link
Member

Ok, let me know when to merge. I should be around for about another hour

@bp-kelley
Copy link
Contributor Author

bp-kelley commented Nov 5, 2016

@greglandrum - Ready to go!

Hold on, moving Glare...

@bp-kelley
Copy link
Contributor Author

Ok! Thanks for spending the time reviewing, it was quite useful and necessary :)

@greglandrum
Copy link
Member

Thanks for the code... it's cool stuff.

And I think we'll probably do a bit more tweaking still. :-)

@greglandrum greglandrum added this to the 2016_09_1 milestone Nov 5, 2016
@greglandrum greglandrum merged commit fa89438 into rdkit:master Nov 5, 2016
@bp-kelley
Copy link
Contributor Author

@greglandrum I think I already have done some tweaking, I had to stop and pull the trigger at some point :)

@greglandrum
Copy link
Member

@bp-kelley yeah, but I want to play too! ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants