Skip to content

Add MolFromInchiAndAuxInfo to restore original atom order from AuxInfo#9158

Merged
greglandrum merged 2 commits intordkit:masterfrom
rodyarantes:add-mol-from-inchi-and-auxinfo
Mar 13, 2026
Merged

Add MolFromInchiAndAuxInfo to restore original atom order from AuxInfo#9158
greglandrum merged 2 commits intordkit:masterfrom
rodyarantes:add-mol-from-inchi-and-auxinfo

Conversation

@rodyarantes
Copy link
Copy Markdown
Contributor

@rodyarantes rodyarantes commented Mar 6, 2026

Summary

Adds MolFromInchiAndAuxInfo(inchi, auxinfo, ...) to inchi.py — the inverse of the existing MolToInchiAndAuxInfo. It reconstructs a molecule from an InChI string and its AuxInfo, restoring:

  • Original atom ordering via the /N: (atom numbering) layer
  • 2D/3D coordinates via the /rC: (coordinate) layer

This enables lossless round-tripping of molecules through InChI when the AuxInfo is preserved, which is useful for workflows where atom identity and spatial layout must survive serialization.

New public API

  • MolFromInchiAndAuxInfo(inchi, auxinfo, sanitize=True, removeHs=True, logLevel=None, treatWarningAsError=False)Chem.rdchem.Mol

Internal helpers added

  • _parse_auxinfo_coordinates(auxinfo) — parses /rC: layer into [(x,y,z), ...]
  • _parse_auxinfo_atom_order(auxinfo) — parses /N: layer into 0-based index list
  • _attach_conformer(mol, coords, is_3d) — adds a conformer from parsed coordinates
  • _build_inverse_permutation(atom_order, size) — builds the permutation array for RenumberAtoms

Edge cases handled

  • None or empty AuxInfo → falls back to MolFromInchi behavior
  • Invalid InChI → returns None
  • Multi-fragment molecules (semicolons in /N: layer)
  • Trailing semicolons in /rC: (standard InChI output format)
  • All-zero coordinates → no conformer attached
  • removeHs=True vs False atom count mismatches

Usage examples

Basic round-trip preserving atom order

from rdkit import Chem
from rdkit.Chem import MolFromSmiles, MolToSmiles
from rdkit.Chem import MolToInchiAndAuxInfo, MolFromInchiAndAuxInfo

mol = MolFromSmiles('c1cc(O)ccc1N')
inchi, aux = MolToInchiAndAuxInfo(mol)

# Reconstruct with original atom ordering restored
mol2 = MolFromInchiAndAuxInfo(inchi, aux)

# Atom order is preserved
orig_atoms = [a.GetSymbol() for a in mol.GetAtoms()]
new_atoms = [a.GetSymbol() for a in mol2.GetAtoms()]
assert orig_atoms == new_atoms

Round-trip with coordinate preservation

from rdkit.Chem import MolFromMolBlock, MolToInchiAndAuxInfo, MolFromInchiAndAuxInfo

mol = MolFromMolBlock(mol_block)  # molecule with 2D/3D coords
inchi, aux = MolToInchiAndAuxInfo(mol)

mol2 = MolFromInchiAndAuxInfo(inchi, aux)
assert mol2.GetNumConformers() == 1  # coordinates restored

# Coordinates match the original
for i in range(mol.GetNumAtoms()):
    orig = mol.GetConformer().GetAtomPosition(i)
    restored = mol2.GetConformer().GetAtomPosition(i)
    assert abs(orig.x - restored.x) < 1e-4
    assert abs(orig.y - restored.y) < 1e-4

Graceful degradation

# Works fine with no AuxInfo — just returns MolFromInchi result
mol = MolFromInchiAndAuxInfo('InChI=1S/CH4/h1H4', None)
assert mol is not None

# Returns None for invalid InChI
mol = MolFromInchiAndAuxInfo('not_an_inchi', '')
assert mol is None

Test plan

Tests are in rdkit/Chem/UnitTestInchi.py under TestMolFromInchiAndAuxInfo:

  • test0RoundTripAtomOrder — atom ordering preserved for multiple SMILES
  • test1StereoPreservation — stereochemistry preserved through round-trip
  • test2NoneAuxInfo / test3EmptyAuxInfo — graceful fallback with missing AuxInfo
  • test4InvalidInchi — returns None for invalid InChI
  • test5MultiFragmentAuxInfo — handles semicolon-separated fragments
  • test6CoordinateRestoration — 2D conformer restored, trailing semicolons handled
  • test7EmptyCoordinates / test7bAllZeroCoordinates — no conformer for empty/zero coords
  • test8CoordinateAtomOrderMatch — per-atom symbol + coordinate verification

To run:

python -m pytest rdkit/Chem/UnitTestInchi.py::TestMolFromInchiAndAuxInfo -v

Add a new function that reconstructs molecules from InChI + AuxInfo strings,
restoring the original atom ordering and 2D/3D coordinates from the /N: and
/rC: AuxInfo layers. Includes comprehensive tests for round-tripping, stereo
preservation, coordinate restoration, edge cases, and multi-fragment molecules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@greglandrum greglandrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @rodyarantes!

I don't really know the auxinfo format, and I didn't dig into the regexes (:shudder:) but it looks like your tests are covering things nicely.

One small suggestion to simplify a test, but otherwise this looks good!

Comment thread rdkit/Chem/UnitTestInchi.py Outdated
@greglandrum greglandrum added this to the 2026_03_1 milestone Mar 12, 2026
Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
@rodyarantes rodyarantes requested a review from greglandrum March 12, 2026 17:11
Copy link
Copy Markdown
Member

@greglandrum greglandrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@greglandrum greglandrum merged commit 6f58d21 into rdkit:master Mar 13, 2026
12 checks passed
@greglandrum
Copy link
Copy Markdown
Member

Thanks @rodyarantes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants