Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary access is saniztized and optimized. #414

Merged
merged 1 commit into from
Jan 16, 2015

Conversation

bp-kelley
Copy link
Contributor

o rdkit gains a RDKit::common_properties namespace that contains common string value properties

o Dict.h and users (Atom.h) gain getPropIfPresent that attempts to retrieve a property and returns
true/false on success or failure. This is used to optimize access.

o rdkit learns how to pass property keys by reference, not value.

A new namespace has been added to RDKit, common_properties that contains the std::string values for commonly used properties. This helps to avoid typos in string values but also avoids a creation of std::strings from character values. All accessors (has/get/clear and getPropIfPresent) now pass the key by reference to avoid copying.

Additionally, getPropIfPresent removes the double lookup of hasProp/getProp which can be a significant speedup in the smiles and smarts parsers (10-20%)

The original goal was to see if the speed of smiles parsing could be improved. The following table is a result of running SmiToMol on ~50K compounds with and without the dictionary optimizations. For kicks, we also compare usage of JEMALLOC(OSX) and TCMALLOC(Centos5) to see if small object memory allocation could also be improved:

OSX 10.9.5 - 2.3 GHZ core i7
Centos5 - Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz

OSX original optimized original+jemalloc optimized+jemalloc
secs 2.25 2.08 1.5 1.302
cmds/sec 22218 24034 33327 38395
speedup 1 1.08 1.5 1.73
Centos5 original optimized original+tcmalloc optimized+tcmalloc
secs 3.04 2.551 2.22 1.87
cmds/sec 16444 19596 22518 26733
speedup 1 1.19 1.37 1.63

 o rdkit gains a RDKit::common_properties namespace that contains common string value properties

 o Dict.h and below gain getPropIfPresent that attempts to retrieve a property and returns
  true/false on success or failure.  This is used to optimize access.

 o rdkit learns how to pass property keys by reference, not value.

A new namespace has been added to RDKit, common_properties
that contains the std::string values for commonly used
properties.  This helps to avoid typos in string values
but also avoids a creation of std::strings from character
values.  All accessors (has/get/clear and getPropIfPresent) now pass
the key by reference.

Additionally, getPropIfPresent removes the double lookup
of hasProp/getProp which can be a significant speedup
in the smiles and smarts parsers (10-20%)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants