Dictionary access is saniztized and optimized. #414
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
o rdkit gains a RDKit::common_properties namespace that contains common string value properties
o Dict.h and users (Atom.h) gain getPropIfPresent that attempts to retrieve a property and returns
true/false on success or failure. This is used to optimize access.
o rdkit learns how to pass property keys by reference, not value.
A new namespace has been added to RDKit, common_properties that contains the std::string values for commonly used properties. This helps to avoid typos in string values but also avoids a creation of std::strings from character values. All accessors (has/get/clear and getPropIfPresent) now pass the key by reference to avoid copying.
Additionally, getPropIfPresent removes the double lookup of hasProp/getProp which can be a significant speedup in the smiles and smarts parsers (10-20%)
The original goal was to see if the speed of smiles parsing could be improved. The following table is a result of running SmiToMol on ~50K compounds with and without the dictionary optimizations. For kicks, we also compare usage of JEMALLOC(OSX) and TCMALLOC(Centos5) to see if small object memory allocation could also be improved:
OSX 10.9.5 - 2.3 GHZ core i7
Centos5 - Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz