Permalink
Browse files

hash coding API

  • Loading branch information...
1 parent 489aaaf commit 07fb90eb1f73939389f23aff9b20dc44f57a1ea9 @johnmay committed Feb 3, 2013
@@ -0,0 +1,44 @@
+/* Copyright (c) 2013. John May <jwmay@users.sf.net>
@egonw
egonw Mar 24, 2013

Remove the period?

+ *
+ * Contact: cdk-devel@lists.sourceforge.net
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1
+ * of the License, or (at your option) any later version.
+ * All we ask is that proper credit is given for our work, which includes
+ * - but is not limited to - adding the above copyright notice to the beginning
+ * of your source code files, and to any copyright notice that you may distribute
+ * with programs based on this work.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 U
+ */
+package org.openscience.cdk.hash;
+
+import org.openscience.cdk.interfaces.IAtomContainer;
+
+/**
+ * A hash function which generates 64-bit hash codes for the atoms of a
+ * molecule.
+ *
+ * @author John May
+ * @cdk.module interfaces
@egonw
egonw Mar 24, 2013

Add @cdk.githash

+ */
+public interface AtomHashGenerator {
+
+ /**
+ * Generate invariant 64-bit hash codes for the atoms of the molecule.
+ *
+ * @param container a molecule
+ * @return atomic hash codes
+ */
+ public long[] generate(IAtomContainer container);
+
+}
@@ -0,0 +1,47 @@
+/* Copyright (c) 2013. John May <jwmay@users.sf.net>
+ *
+ * Contact: cdk-devel@lists.sourceforge.net
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1
+ * of the License, or (at your option) any later version.
+ * All we ask is that proper credit is given for our work, which includes
+ * - but is not limited to - adding the above copyright notice to the beginning
+ * of your source code files, and to any copyright notice that you may distribute
+ * with programs based on this work.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 U
+ */
+package org.openscience.cdk.hash;
+
+import org.openscience.cdk.interfaces.IAtomContainer;
+
+import java.util.Set;
+
+/**
+ * A hash function which generates a single 64-bit hash code for a set of
+ * molecules (ensemble).
+ *
+ * @author John May
+ * @cdk.module interfaces
+ */
+public interface EnsembleHashGenerator {
+
+ /**
+ * Generate invariant 64-bit hash code for an ensemble of molecules.
+ *
+ * @param ensemble an ensemble molecule
+ * @return hash code for the ensemble
+ */
+ public long generate(Set<IAtomContainer> ensemble);
+
+
+}
@@ -0,0 +1,43 @@
+/* Copyright (c) 2013. John May <jwmay@users.sf.net>
+ *
+ * Contact: cdk-devel@lists.sourceforge.net
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1
+ * of the License, or (at your option) any later version.
+ * All we ask is that proper credit is given for our work, which includes
+ * - but is not limited to - adding the above copyright notice to the beginning
+ * of your source code files, and to any copyright notice that you may distribute
+ * with programs based on this work.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 U
+ */
+package org.openscience.cdk.hash;
+
+import org.openscience.cdk.interfaces.IAtomContainer;
+
+/**
+ * A hash function which generates a single 64-bit hash code for a molecule.
+ *
+ * @author John May
+ * @cdk.module interfaces
@egonw
egonw Mar 24, 2013
  • @cdk.githash.
@egonw
egonw Mar 24, 2013

What is the idea of having it in interfaces? I prefer only to have stuff there that is needed by some/many modules? This interface seems specific to a hash module? Can you say something on how extensive this hash functionality will be used in the CDK?

@johnmay
johnmay Mar 24, 2013 owner

I think these should go in interfaces. I tend to always stick every interface at the bottom of the module hierarchy. This could be used for fingerprints, isomorphism, atom typing.. anything where you need fast look up or caching of invariants.

Another example is probably better to show why. Let's say I have a smiles generator, which needs a canonicalize method.

SmilesGenerator generator = new SmilesGenerator(Canonical canonical);

Now I have module label which has MorgonNumbers() tool. I stick the interface in MorgonNumbers as it's where it's needed.

SmilesGenerator generator = new SmilesGenerator(new MorgonNumbers());

then I want one for InChINumbers... I now need to make InChI depend on my MorgonNumbers module or move the interface.

SmilesGenerator generator = new SmilesGenerator(new InChINumbering());

With the interface here I can write a simple wrapping class in the InChIModule which also provides a hash using the InChIKey and not have to depend on this hashcode module.

@egonw
egonw via email Mar 24, 2013
+ */
+public interface MoleculeHashGenerator {
+
+ /**
+ * Generate invariant 64-bit hash code for a molecule.
+ *
+ * @param container a molecule
+ * @return hash code for the molecule
+ */
+ public long generate(IAtomContainer container);
+
+}

0 comments on commit 07fb90e

Please sign in to comment.