A fast TypeScript / JavaScript chemistry toolkit for working with molecular structures: parsing & generation (SMILES, MOL, SDF), canonicalization, pattern matching (SMARTS), 2D rendering, molecular descriptors, and structural analysis.
Production-ready, TypeScript-first library for cheminformatics — works in both browser and Node.js. openchem keeps a small runtime footprint.
- SMILES — Parse and generate canonical SMILES with full stereochemistry
- MOL files — V2000/V3000 format support with 2D coordinate generation
- SDF files — Multi-molecule files with property data
- InChI — Generate InChI and InChIKey identifiers
- IUPAC names — Bidirectional IUPAC ↔ SMILES conversion
- Pattern matching — SMARTS substructure search
- Fingerprints — Morgan (ECFP) fingerprints with Tanimoto similarity
- Murcko scaffolds — Extract core scaffolds, generic frameworks, scaffold trees
- Tautomers — Complete enumeration (25 rules, 100% RDKit coverage) with RDKit-compatible scoring
- Ring systems — SSSR detection, fused/spiro/bridged classification
- Aromaticity — Hückel rule perception and kekulization
- Symmetry — Canonical ordering via modified Morgan algorithm
- Stereochemistry — Full support for tetrahedral centers, E/Z bonds, extended chirality
- Basic — Formula, mass, atom/bond counts
- Structural — Valence electrons, amide bonds, spiro/bridgehead atoms, ring classifications
- Stereochemistry — Specified and unspecified stereocenter counting
- Drug-likeness — Lipinski's Rule of Five, Veber rules, BBB penetration
- Descriptors — TPSA, LogP, rotatable bonds, H-bond donors/acceptors
- Ring analysis — Saturated/aliphatic/heterocyclic ring counts
- 2D rendering — Publication-quality SVG with automatic layout
- Smart positioning — Overlap-aware fused ring placement
- Stereochemistry display — Wedge/hash bonds for chirality
- Customizable — Element colors, bond styles, canvas size
- ⚡ Fast — Optimized coordinate generation, CSR graph for O(1) lookups
- 🔬 Accurate — 100% RDKit agreement on canonical SMILES (325/325 molecules)
- ✅ Well-tested — 2,093 passing tests including bulk RDKit comparisons
- 🎯 Production-ready — Used with real drugs, natural products, edge cases
- 📦 Lightweight — Minimal dependencies, works in browser and Node.js
- 🔒 TypeScript-first — Full type safety with excellent IDE support
npm install openchem
# or: bun add openchemimport { parseSMILES, renderSVG, Descriptors } from 'openchem';
// Parse a molecule
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];
// Render as SVG
const svg = renderSVG(aspirin);
console.log(svg.svg); // SVG markup ready for display
// Get all molecular properties at once
const props = Descriptors.all(aspirin);
console.log(props.formula); // "C9H8O4"
console.log(props.mass); // 180.16
console.log(props.logP); // 1.19
console.log(props.lipinskiPass); // true - aspirin is drug-like!
// Or get specific categories
const drugLike = Descriptors.drugLikeness(aspirin);
console.log(drugLike.lipinski.passes); // true
console.log(drugLike.lipinski.violations); // []openchem includes an interactive HTML playground for testing SMILES parsing, molecular visualization, and descriptor calculation:
# Build the browser bundle and start a local server
bun run serve
# Then open http://localhost:3000/smiles-playground.html in your browserThe playground provides:
- 2D Structure Visualization — Clean SVG rendering of molecular structures
- Molecular Descriptors — Formula, mass, TPSA, rotatable bonds, etc.
- Drug-Likeness Checks — Lipinski's Rule of Five, Veber rules, BBB penetration
- Interactive Examples — Pre-loaded molecules like aspirin, caffeine, ibuprofen
The playground automatically detects if the full openchem library is available and falls back to approximate calculations if needed.
Note: The HTML playground requires a web server to load the openchem library due to ES module security restrictions. Use bun run serve to start a local server, then open http://localhost:3000/smiles-playground.html in your browser.
The MCP server for AI assistant integration is now available as a separate package: @openchem/mcp
# Install MCP server
npm install -g @openchem/mcp
# Start server
openchem-mcp
# Server runs on http://localhost:3000Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"openchem": {
"url": "http://localhost:3000/mcp"
}
}
}Restart Claude Desktop and try: "Analyze aspirin using SMILES CC(=O)Oc1ccccc1C(=O)O"
- analyze — Complete molecular analysis (40+ descriptors, drug-likeness, IUPAC name, optional rendering)
- compare — Molecular similarity (Morgan fingerprints, Tanimoto similarity, property comparison)
- search — Substructure matching (SMARTS patterns with match counts and indices)
- render — 2D structure visualization (publication-quality SVG)
- convert — Format conversion (canonical SMILES, IUPAC names, Murcko scaffolds)
- @openchem/mcp Package — Full MCP server documentation
- MCP Integration Guide — Complete integration guide (Claude Desktop, custom clients, deployment)
- MCP Server Reference — API documentation, tool schemas, examples
import { parseSMILES, generateSMILES, parseMolfile, generateMolfile, parseSDF, writeSDF } from 'openchem';
// Parse SMILES into molecule structure
const result = parseSMILES('CC(=O)O'); // acetic acid
console.log(result.molecules[0].atoms.length); // 4 atoms
console.log(result.molecules[0].bonds.length); // 3 bonds
// Generate canonical SMILES
const canonical = generateSMILES(result.molecules[0]);
console.log(canonical); // "CC(=O)O"
// Parse MOL file
const molContent = `
acetic acid
openchem
4 3 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.2500 1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2.2500 -1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 2 0 0 0 0
2 4 1 0 0 0 0
M END
`;
const molResult = parseMolfile(molContent);
console.log(generateSMILES(molResult.molecule!)); // "CC(=O)O"
// Generate MOL file from SMILES
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const molfile = generateMolfile(aspirin.molecules[0], { title: 'aspirin' });
console.log(molfile); // Full MOL file with coordinates
// Parse SDF file
const sdfContent = `
Mrv2311 02102409422D
3 2 0 0 0 0 999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.2500 1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
M END
> <ID>
MOL001
> <NAME>
Ethanol
$$$$
`;
const sdfResult = parseSDF(sdfContent);
console.log(sdfResult.records[0].molecule?.atoms.length); // 3
console.log(sdfResult.records[0].properties.NAME); // "Ethanol"// Generate InChI from molecule const inchi = await generateInChI(aspirin.molecules[0]); console.log(inchi); // "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"
// Generate InChIKey const inchikey = await generateInChIKey(inchi); console.log(inchikey); // "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"
### Morgan Fingerprints and Similarity
```typescript
import { parseSMILES, computeMorganFingerprint, tanimotoSimilarity } from 'openchem';
// Generate fingerprints for similarity comparison
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O');
const fp1 = computeMorganFingerprint(aspirin.molecules[0], 2, 512);
const fp2 = computeMorganFingerprint(ibuprofen.molecules[0], 2, 512);
// Calculate structural similarity
const similarity = tanimotoSimilarity(fp1, fp2);
console.log(`Similarity: ${(similarity * 100).toFixed(1)}%`); // ~45.2%
Extract core molecular scaffolds for drug discovery and compound classification:
import { parseSMILES, getMurckoScaffold, getBemisMurckoFramework, generateSMILES } from 'openchem';
// Extract scaffold (rings + linkers, remove side chains)
const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O').molecules[0];
const scaffold = getMurckoScaffold(ibuprofen);
console.log(generateSMILES(scaffold)); // "c1ccc(cc1)" - benzene core
// Get generic framework (all atoms → carbon, all bonds → single)
const framework = getBemisMurckoFramework(ibuprofen);
console.log(generateSMILES(framework)); // "C1CCCCC1" - cyclohexane
// Compare scaffolds of similar drugs
import { haveSameScaffold } from 'openchem';
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];
console.log(haveSameScaffold(ibuprofen, aspirin)); // true - both have benzene scaffoldApplications:
- Compound library classification
- Lead series identification
- Scaffold hopping strategies
- Fragment-based drug design
Enumerate and score tautomers (keto-enol, imine-enamine, amide-imidol, etc.) with RDKit-compatible scoring:
import { parseSMILES, enumerateTautomers, generateSMILES } from 'openchem';
// Enumerate tautomers for acetylacetone (pentane-2,4-dione)
const mol = parseSMILES('CC(=O)CC(=O)C').molecules[0];
const tautomers = enumerateTautomers(mol, { maxTautomers: 16 });
console.log(`Found ${tautomers.length} tautomers:`);
tautomers.forEach((t, i) => {
console.log(`${i + 1}. ${t.smiles} (score: ${t.score})`);
});
// Get canonical tautomer (highest scoring)
import { canonicalTautomer } from 'openchem';
const canonical = canonicalTautomer(mol);
console.log(`Canonical: ${generateSMILES(canonical)}`);Supported tautomer types (26 rules, 100% RDKit coverage):
- 1,3 and 1,5 keto-enol (carbonyl ↔ enol, conjugated systems)
- Imine-enamine (C=N ↔ C-NH, including aromatic special cases)
- 1,5/1,7/1,9/1,11 aromatic heteroatom H shift (pyrrole, indole, large heterocycles)
- Furanone (lactone tautomerism in 5-membered rings)
- Amide-imidol (N-C=O ↔ N=C-OH)
- Lactam-lactim (cyclic amide ↔ cyclic imidate)
- Nitro-aci-nitro, nitroso-oxime, oxim/nitroso via phenol
- Thione-thiol (C=S ↔ C-SH)
- Guanidine, tetrazole, imidazole (heterocycle tautomerism)
- Phosphonic acid, sulfoxide (P/S heteroatom shifts)
- Edge cases: keten/ynol, cyano/isocyanic acid, formamidinesulfinic acid, isocyanide
Scoring system (RDKit-compatible):
- +250 per all-carbon aromatic ring (benzene)
- +100 per heteroaromatic ring (pyridine)
- +25 for benzoquinone patterns
- +4 for oximes (C=N-OH)
- +2 for carbonyls (C=O, N=O, P=O)
- -10 per formal charge
- -4 for aci-nitro forms
- -1 per hydrogen on P, S, Se, Te
Applications:
- Compound standardization for databases
- Virtual screening preparation
- pKa prediction support
- Tautomer-aware structure searching
import { parseSMILES, renderSVG } from 'openchem';
// Render molecule as SVG
const caffeine = parseSMILES('CN1C=NC2=C1C(=O)N(C(=O)N2C)C');
const svgResult = renderSVG(caffeine.molecules[0], {
width: 300,
height: 200,
showCarbonLabels: false,
bondLength: 30,
});
console.log(svgResult.svg); // Complete SVG markup
console.log(`Canvas: ${svgResult.width}x${svgResult.height}`); // "300x200"openchem has an extensive test suite (unit, integration, and RDKit comparison tests) that exercises parsing, generation, file round-trips, stereochemistry, aromatic perception, and molecular properties. Rather than rely on fragile hard-coded counts in the README, the project keeps comprehensive automated tests in the test/ folder and runs RDKit parity checks as part of the comparison test suite when RDKit is available.
Highlights:
- Broad unit and integration coverage across parsers, generators, utils, and validators
- RDKit comparison tests for canonical SMILES and round-trip fidelity (these run when RDKit is available in the test environment)
- Tests are designed to be self-contained and to skip RDKit-specific checks when RDKit isn't present in the environment
For maintainers: update and run the test suite with bun test. Use RUN_RDKIT_BULK=1 to enable the heavier RDKit bulk comparisons when you have RDKit available.
openchem maintains broad automated test coverage across unit, integration, and RDKit comparison tests. The test/ directory contains the authoritative suite; maintainers can run bun test locally and enable the heavier RDKit comparison runs with RUN_RDKIT_BULK=1 when RDKit is available. Tests are designed to validate parsing, generation, round-tripping, stereochemistry, aromatic perception, and molecular properties without requiring hard-coded counts in the README.
npm install openchem
bun add openchem
pnpm add openchemFor comprehensive working examples, see:
docs/examples/comprehensive-example.ts— All major features (SMILES, properties, IUPAC, InChI, SVG, SMARTS, fingerprints)docs/examples/example-iupac.ts— IUPAC name generation and parsing (both directions)docs/examples/example-aromaticity.ts— Aromaticity perception using Hückel's ruledocs/examples/example-drug-likeness.ts— Drug-likeness assessment (Lipinski, Veber, BBB)docs/examples/example-murcko-scaffolds.ts— Murcko scaffold extraction and analysisdocs/examples/example-tautomers.ts— Tautomer enumeration and canonical selectiondocs/examples/example-sdf-export.ts— SDF file generation
Run any example:
bun run docs/examples/comprehensive-example.tsThe repository contains two long-running RDKit comparison tests (the 10k SMILES suite and the bulk 300-SMILES suite). These tests are skipped by default to keep regular test runs fast.
To run them set the RUN_RDKIT_BULK environment variable:
# Run heavy RDKit comparisons (rdkit-10k and rdkit-bulk)
RUN_RDKIT_BULK=1 bun testAdd RUN_VERBOSE=1 for more detailed RDKit reporting during the run.
import { parseSMILES } from 'openchem';
// Simple molecule
const ethanol = parseSMILES('CCO');
console.log(ethanol.molecules[0].atoms.length); // 3
// Check for errors
const result = parseSMILES('invalid');
if (result.errors.length > 0) {
console.error('Parse errors:', result.errors);
}
// Complex molecule with stereochemistry
const lAlanine = parseSMILES('C[C@H](N)C(=O)O');
const chiralCenter = lAlanine.molecules[0].atoms.find(a => a.chiral);
console.log(chiralCenter?.chiral); // '@'openchem provides comprehensive molecular property calculations for drug discovery and cheminformatics applications.
import {
parseSMILES,
getMolecularFormula,
getMolecularMass,
getExactMass
} from 'openchem';
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const mol = aspirin.molecules[0];
// Get molecular formula (Hill notation)
const formula = getMolecularFormula(mol);
console.log(formula); // "C9H8O4"
// Get molecular mass (average atomic masses)
const mass = getMolecularMass(mol);
console.log(mass); // 180.042
// Get exact mass (most abundant isotope)
const exactMass = getExactMass(mol);
console.log(exactMass); // 180.042import {
parseSMILES,
getHeavyAtomCount,
getHeteroAtomCount,
getRingCount,
getAromaticRingCount,
getRingInfo
} from 'openchem';
const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O');
const mol = ibuprofen.molecules[0];
// Count heavy atoms (non-hydrogen)
console.log(getHeavyAtomCount(mol)); // 13
// Count heteroatoms (N, O, S, P, halogens, etc.)
console.log(getHeteroAtomCount(mol)); // 2
// Count total rings
console.log(getRingCount(mol)); // 1
// Count aromatic rings
console.log(getAromaticRingCount(mol)); // 1
// Get comprehensive ring information
const ringInfo = getRingInfo(mol);
console.log(ringInfo.numRings()); // 1
console.log(ringInfo.rings()); // [[6,7,8,9,10,11]] - atom IDs in the ringimport {
parseSMILES,
getFractionCSP3,
getHBondDonorCount,
getHBondAcceptorCount,
getTPSA
} from 'openchem';
const caffeine = parseSMILES('CN1C=NC2=C1C(=O)N(C(=O)N2C)C');
const mol = caffeine.molecules[0];
// Fraction of sp3 carbons (structural complexity)
console.log(getFractionCSP3(mol)); // 0.25
// H-bond donors (N-H, O-H)
console.log(getHBondDonorCount(mol)); // 0
// H-bond acceptors (N, O atoms)
console.log(getHBondAcceptorCount(mol)); // 6
// Topological polar surface area (Ų)
// Critical for predicting oral bioavailability and BBB penetration
console.log(getTPSA(mol)); // 61.82TPSA (Topological Polar Surface Area) is essential for predicting drug properties:
import { parseSMILES, getTPSA } from 'openchem';
// Oral bioavailability: TPSA < 140 Ų
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
console.log(getTPSA(aspirin.molecules[0])); // 63.60 ✓ Good oral availability
// Blood-brain barrier penetration: TPSA < 90 Ų
const morphine = parseSMILES('CN1CC[C@]23[C@@H]4[C@H]1CC5=C2C(=C(C=C5)O)O[C@H]3[C@H](C=C4)O');
console.log(getTPSA(morphine.molecules[0])); // 52.93 ✓ CNS-activeimport {
parseSMILES,
checkLipinskiRuleOfFive,
checkVeberRules,
checkBBBPenetration
} from 'openchem';
// Lipinski's Rule of Five (oral drug-likeness)
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const lipinski = checkLipinskiRuleOfFive(aspirin.molecules[0]);
console.log(lipinski.passes); // true
console.log(lipinski.properties);
// { molecularWeight: 180.04, hbondDonors: 1, hbondAcceptors: 4, logP: 1.31 }
// Veber Rules (oral bioavailability)
const veber = checkVeberRules(aspirin.molecules[0]);
console.log(veber.passes); // true
console.log(veber.properties);
// { rotatableBonds: 3, tpsa: 63.60 }
// Blood-brain barrier penetration prediction
const caffeine = parseSMILES('CN1C=NC2=C1C(=O)N(C(=O)N2C)C');
const bbb = checkBBBPenetration(caffeine.molecules[0]);
console.log(bbb.likelyPenetration); // true (TPSA: 61.82 < 90)import { parseSMILES, generateSMILES } from 'openchem';
// Generate canonical SMILES (default)
const input = 'CC(C)CC';
const parsed = parseSMILES(input);
const canonical = generateSMILES(parsed.molecules[0]);
console.log(canonical); // "CCC(C)C" - canonicalized
// Stereo normalization matches RDKit
const trans1 = parseSMILES('C\\C=C\\C'); // trans (down markers)
console.log(generateSMILES(trans1.molecules[0])); // "C/C=C/C" - normalized to up markers
const trans2 = parseSMILES('C/C=C/C'); // trans (up markers)
console.log(generateSMILES(trans2.molecules[0])); // "C/C=C/C" - already normalized
// Generate simple (non-canonical) SMILES
const simple = generateSMILES(parsed.molecules[0], false);
console.log(simple); // "CC(C)CC" - preserves input order
// Explicit canonical generation
const explicitCanonical = generateSMILES(parsed.molecules[0], true);
console.log(explicitCanonical); // "CCC(C)C"
// Handle multiple disconnected molecules
const mixture = parseSMILES('CCO.O'); // ethanol + water
const output = generateSMILES(mixture.molecules);
console.log(output); // "CCO.O"Render molecules as 2D SVG structures with automatic coordinate generation. openchem provides deterministic layouts, fast performance, and excellent handling of rings, branches, and terminal atoms.
import { parseSMILES, renderSVG } from 'openchem';
// Render from parsed molecule
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const result = renderSVG(aspirin.molecules[0]);
console.log(result.svg); // SVG string ready for display
console.log(result.width); // Canvas width
console.log(result.height); // Canvas height
// Or render directly from SMILES (if parsing is included)
const renderResult = renderSVG('CCO');
if (renderResult.errors.length === 0) {
console.log(renderResult.svg);
}
// Render multiple molecules in a grid
const molecules = [
parseSMILES('CC(=O)O').molecules[0],
parseSMILES('CCO').molecules[0],
parseSMILES('CC(C)C').molecules[0]
];
const gridResult = renderSVG(molecules);
console.log(gridResult.svg); // Multi-molecule gridimport { parseSMILES, renderSVG } from 'openchem';
import type { SVGRendererOptions } from 'openchem';
const benzene = parseSMILES('c1ccccc1');
const mol = benzene.molecules[0];
const options: SVGRendererOptions = {
// Canvas sizing
width: 400,
height: 400,
padding: 20,
// Bond styling
bondLineWidth: 2,
bondLength: 40,
bondColor: '#000000',
// Atom & text styling
fontSize: 14,
fontFamily: 'Arial, sans-serif',
showCarbonLabels: false, // Hide C labels for cleaner appearance
showImplicitHydrogens: false, // Hide implicit hydrogens
// Color mapping by element
atomColors: {
C: '#222222',
N: '#3050F8',
O: '#FF0D0D',
S: '#E6C200',
F: '#50FF50',
Cl: '#1FF01F',
Br: '#A62929',
I: '#940094'
},
// Background
backgroundColor: '#FFFFFF',
// Stereochemistry display
showStereoBonds: true,
// Layout & coordinate generation
kekulize: true, // Convert aromatic to alternating single/double bonds (default: true)
moleculeSpacing: 60 // Spacing between molecules in grid layouts
};
const result = renderSVG(mol, options);
console.log(result.svg); // Custom-styled SVGimport { parseSMILES, renderSVG } from 'openchem';
const ethanol = parseSMILES('CCO');
const mol = ethanol.molecules[0];
// Provide your own atom coordinates (useful for custom layouts)
const customCoords = [
{ x: 0, y: 0 }, // C
{ x: 40, y: 0 }, // C
{ x: 80, y: 0 } // O
];
const result = renderSVG(mol, {
atomCoordinates: customCoords,
width: 200,
height: 100
});
console.log(result.svg);openchem's coordinate generator provides:
- Deterministic layouts — Same molecule always produces same coordinates
- Fast performance — Optimized for speed and quality
- Perfect terminal atom placement — OH, NH₂, and other terminal groups extend radially
- Ring system detection — Automatically detects and regularizes 5/6-membered rings, fused rings, spiro, and bridged systems
- Zero atom overlaps — Intelligent substituent placement prevents collisions
- Publication-quality output — Clean, chemically accurate 2D structures
import { parseSMILES, renderSVG } from 'openchem';
// Complex fused ring system
const naphthalene = parseSMILES('c1ccc2ccccc2c1');
const result = renderSVG(naphthalene.molecules[0], {
width: 300,
height: 300,
bondLength: 35
});
console.log(result.svg);import { renderSVG } from 'openchem';
const result = renderSVG('C');
if (result.errors.length > 0) {
console.error('SVG rendering errors:', result.errors);
} else {
console.log(result.svg);
}Match molecular patterns using SMARTS (SMILES Arbitrary Target Specification) notation.
import { parseSMILES, parseSMARTS, matchSMARTS } from 'openchem';
// Parse molecule and SMARTS pattern
const molecule = parseSMILES('CC(=O)Oc1ccccc1C(=O)O'); // aspirin
const pattern = parseSMARTS('[O;D1]'); // Single-bonded oxygen (carbonyl)
// Find matching atoms
const matches = matchSMARTS(molecule.molecules[0], pattern);
console.log(matches.length); // 2 (two carbonyl oxygens)
console.log(matches); // [[2], [7]] (atom indices)
// Example: Find aromatic rings
const aromaticRing = parseSMARTS('c1ccccc1'); // benzene pattern
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const ringMatches = matchSMARTS(aspirin.molecules[0], aromaticRing);
console.log(ringMatches.length); // 1 (one benzene ring)
// Example: Find carboxylic acid groups
const carboxylPattern = parseSMARTS('[C](=O)[O;H1]'); // COOH
const matches2 = matchSMARTS(aspirin.molecules[0], carboxylPattern);
console.log(matches2.length); // 1 (one carboxylic acid)
// Example: Find all heteroatoms
const heteroPattern = parseSMARTS('[!C;!H]'); // Any non-carbon, non-hydrogen
const heteroMatches = matchSMARTS(aspirin.molecules[0], heteroPattern);
console.log(heteroMatches.length); // Number of heteroatomsConvert aromatic molecules to alternating single/double bond representations (Kekulé structures).
import { parseSMILES, kekulize, generateSMILES } from 'openchem';
// Parse aromatic molecule
const benzene = parseSMILES('c1ccccc1');
const mol = benzene.molecules[0];
// Convert to Kekulé structure
const kekuleMol = kekulize(mol);
// Generate SMILES from Kekulé form
const kekuleSMILES = generateSMILES(kekuleMol);
console.log(kekuleSMILES); // "C1=CC=CC=C1" or similar alternating structure
// SVG rendering automatically kekulizes (unless disabled)
import { renderSVG } from 'openchem';
const result = renderSVG(mol, {
kekulize: true // default: true
});
// Rendered SVG shows alternating single/double bondsCalculate LogP (partition coefficient) for predicting lipophilicity and membrane permeability.
import { parseSMILES, computeLogP, crippenLogP } from 'openchem';
const molecules = [
'CC(=O)Oc1ccccc1C(=O)O', // aspirin
'CC(C)Cc1ccc(cc1)C(C)C(=O)O', // ibuprofen
'CC(=O)Nc1ccc(O)cc1' // acetaminophen
];
molecules.forEach(smiles => {
const mol = parseSMILES(smiles).molecules[0];
// Wildman-Crippen method (more accurate)
const logP = computeLogP(mol);
console.log(`${smiles.substring(0, 10)}... LogP: ${logP.toFixed(2)}`);
// Alternative: crippenLogP (alias)
const logP2 = crippenLogP(mol);
console.log(` Crippen LogP: ${logP2.toFixed(2)}`);
});
// LogP guidelines for drug design
const caffeine = parseSMILES('CN1C=NC2=C1C(=O)N(C(=O)N2C)C');
const caffeineMol = caffeine.molecules[0];
const logpValue = computeLogP(caffeineMol);
console.log(`Caffeine LogP: ${logpValue.toFixed(2)}`);
if (logpValue > 5) {
console.log('⚠️ High LogP - may have poor water solubility');
} else if (logpValue < 0) {
console.log('✓ Good LogP - hydrophilic, good bioavailability');
} else {
console.log('✓ Optimal LogP - good balance of lipophilicity and hydrophilicity');
}import { parseSMILES } from 'openchem';
import { BondType } from 'openchem';
const result = parseSMILES('C=C');
const mol = result.molecules[0];
// Access atoms
mol.atoms.forEach(atom => {
console.log(`${atom.symbol} (id: ${atom.id})`);
console.log(` Aromatic: ${atom.aromatic}`);
console.log(` Charge: ${atom.charge}`);
console.log(` Hydrogens: ${atom.hydrogens}`);
});
// Access bonds
mol.bonds.forEach(bond => {
console.log(`Bond ${bond.atom1}-${bond.atom2}`);
console.log(` Type: ${bond.type === BondType.DOUBLE ? 'DOUBLE' : 'SINGLE'}`);
});# Run all tests (includes RDKit comparisons)
bun test
# Run with Node.js
npm test
# Run specific test file
bun test test/parser.test.tsNote: RDKit comparison tests require @rdkit/rdkit package. Tests will automatically skip RDKit validations if the package is unavailable. For full validation, ensure you're running tests with Node.js (RDKit's WebAssembly may not work in all Bun versions).
openchem provides 38 functions organized into 8 categories:
Parsing & Generation (8)
parseSMILES- Parse SMILES stringsgenerateSMILES- Generate canonical/non-canonical SMILESparseMolfile- Parse MOL files (V2000/V3000)generateMolfile- Generate MOL files (V2000)parseSDF- Parse SDF files with propertieswriteSDF- Write SDF files with propertiesgenerateInChI- Generate InChI strings from moleculesgenerateInChIKey- Generate InChIKey strings from molecules
Pattern Matching & Rendering (6)
renderSVG- Render molecules as 2D SVG structuresparseSMARTS- Parse SMARTS pattern stringsmatchSMARTS- Find SMARTS pattern matches in moleculeskekulize- Convert aromatic to Kekulé structurescomputeMorganFingerprint- Generate Morgan fingerprints from moleculestanimotoSimilarity- Calculate Tanimoto similarity between fingerprints
Scaffold Analysis (5)
getMurckoScaffold- Extract Murcko scaffold (rings + linkers)getBemisMurckoFramework- Generic scaffold (all C, single bonds)getScaffoldTree- Hierarchical scaffold decompositiongetGraphFramework- Pure topology (all atoms → wildcard)haveSameScaffold- Compare two molecules' scaffolds
Tautomer Analysis (2)
enumerateTautomers- Generate all tautomers with RDKit scoringcanonicalTautomer- Select highest-scoring canonical tautomer
Basic Properties (3)
getMolecularFormula- Hill notation formulagetMolecularMass- Average molecular massgetExactMass- Exact mass (monoisotopic)
Lipophilicity (3)
computeLogP- Wildman-Crippen partition coefficientcrippenLogP- Alias for computeLogPlogP- Alternative LogP calculation
Structural Properties (8)
getHeavyAtomCount- Non-hydrogen atom countgetHeteroAtomCount- Heteroatom count (N, O, S, etc.)getRingCount- Total ring countgetAromaticRingCount- Aromatic ring countgetRingInfo- Comprehensive ring information objectgetFractionCSP3- sp³ carbon fractiongetHBondDonorCount- H-bond donor countgetHBondAcceptorCount- H-bond acceptor count
Drug-Likeness (5)
getTPSA- Topological polar surface areagetRotatableBondCount- Rotatable bond countcheckLipinskiRuleOfFive- Lipinski's Rule of FivecheckVeberRules- Veber rules for bioavailabilitycheckBBBPenetration- Blood-brain barrier prediction
Parses a SMILES string into molecule structures.
Returns: ParseResult containing:
molecules: Molecule[]— Array of parsed moleculeserrors: string[]— Parse/validation errors (empty if successful)
Generates SMILES from molecule structure(s).
Parameters:
input— Single molecule or array of moleculescanonical— Generate canonical SMILES (default:true)
Returns: SMILES string (uses . to separate disconnected molecules)
Canonical SMILES features:
- RDKit-compatible atom ordering using modified Morgan algorithm
- Automatic E/Z double bond stereo normalization
- Deterministic output for identical molecules
- Preserves tetrahedral and double bond stereochemistry
Generates a MOL file (V2000 format) from a molecule structure. Matches RDKit's output structure for compatibility with cheminformatics tools.
Parameters:
molecule— Molecule structure to convertoptions— Optional configuration:title?: string— Molecule title (default: empty)programName?: string— Program name in header (default: "openchem")dimensionality?: '2D' | '3D'— Coordinate system (default: "2D")comment?: string— Comment line (default: empty)
Returns: MOL file content as string with V2000 format
Features:
- V2000 MOL format compatible with RDKit and other tools
- 2D coordinate generation using circular layout
- Proper atom/bond type mapping (aromatic, charged, isotopic)
- Stereochemistry support (chiral centers, E/Z double bonds)
- Fixed-width formatting matching RDKit output
Example:
import { parseSMILES, generateMolfile } from 'openchem';
const result = parseSMILES('CCO');
const molfile = generateMolfile(result.molecules[0]);
console.log(molfile);
// Output: MOL file with header, atom coordinates, bond connectivity, etc.Parses a MOL file (MDL Molfile format) into a molecule structure. Supports both V2000 and V3000 formats with comprehensive validation.
Parameters:
input— MOL file content as a string
Returns: MolfileParseResult containing:
molfile: MolfileData | null— Raw MOL file data structure (or null on critical errors)molecule: Molecule | null— Parsed molecule with enriched properties (or null on errors)errors: ParseError[]— Array of parse/validation errors (empty if successful)
Supported formats:
- V2000: Classic fixed-width format (most common)
- V3000: Extended format with additional features
Validation features:
- Validates atom/bond counts match declared values
- Checks bond references point to valid atoms
- Validates numeric fields (coordinates, counts, bond types)
- Detects malformed data (NaN, negative counts, invalid types)
- Returns errors without throwing exceptions
Parsed features:
- Atom coordinates (2D/3D)
- Element symbols (organic and periodic table)
- Charges (both atom block and M CHG property)
- Isotopes (both mass diff and M ISO property)
- Bond types (single, double, triple, aromatic)
- Stereochemistry (bond wedges, chiral centers)
- Atom mapping (reaction mapping)
Limitations:
- SGroups are parsed but not converted to molecule structure
- Query atoms/bonds not supported
Example:
import { parseMolfile, generateSMILES } from 'openchem';
const molContent = `
ethanol
openchem
3 2 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.2500 1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
M END
`;
const result = parseMolfile(molContent);
if (result.errors.length === 0) {
console.log(result.molecule?.atoms.length); // 3
console.log(result.molecule?.bonds.length); // 2
// Convert to SMILES
const smiles = generateSMILES(result.molecule!);
console.log(smiles); // "CCO"
}
// Error handling
const invalid = parseMolfile('invalid content');
if (invalid.errors.length > 0) {
console.error('Parse errors:', invalid.errors);
}Round-trip workflow:
import { parseSMILES, generateMolfile, parseMolfile, generateSMILES } from 'openchem';
// SMILES → MOL → SMILES round-trip
const original = 'CC(=O)O'; // acetic acid
const mol = parseSMILES(original).molecules[0];
const molfile = generateMolfile(mol);
const parsed = parseMolfile(molfile);
const roundtrip = generateSMILES(parsed.molecule!);
console.log(roundtrip); // "CC(=O)O"Parses an SDF (Structure-Data File) into molecule structures with associated properties. SDF files can contain multiple molecules, each with a MOL block and optional property fields.
Parameters:
input— SDF file content as a string
Returns: SDFParseResult containing:
records: SDFRecord[]— Array of parsed recordserrors: ParseError[]— Global parse errors (empty if successful)
Record structure (SDFRecord):
molecule: Molecule | null— Parsed molecule (null on parse errors)molfile: MolfileData | null— Raw MOL file data (null on parse errors)properties: Record<string, string>— Property name-value pairserrors: ParseError[]— Record-specific errors (empty if successful)
Features:
- Multi-record parsing (splits on
$$$$delimiter) - Property block parsing (
> <NAME>format) - Multi-line property values with blank line handling
- Empty property names and values
- Windows (CRLF) and Unix (LF) line endings
- Tolerant parsing: continues after invalid records
Example (single record):
import { parseSDF, generateSMILES } from 'openchem';
const sdfContent = `
Mrv2311 02102409422D
3 2 0 0 0 0 999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.2500 1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
M END
> <ID>
MOL001
> <NAME>
Ethanol
> <FORMULA>
C2H6O
$$$$
`;
const result = parseSDF(sdfContent);
if (result.errors.length === 0) {
const record = result.records[0];
console.log(record.molecule?.atoms.length); // 3
console.log(record.properties.ID); // "MOL001"
console.log(record.properties.NAME); // "Ethanol"
console.log(record.properties.FORMULA); // "C2H6O"
// Convert to SMILES
const smiles = generateSMILES(record.molecule!);
console.log(smiles); // "CCO"
}
// Error handling
if (result.records[0].errors.length > 0) {
console.error('Record errors:', result.records[0].errors);
}Example (multiple records):
import { parseSDF } from 'openchem';
const multiRecordSDF = `
Mrv2311 02102409422D
1 0 0 0 0 0 999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
M END
> <ID>
1
> <NAME>
Methane
$$$$
Mrv2311 02102409422D
2 1 0 0 0 0 999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
M END
> <ID>
2
> <NAME>
Ethane
$$$$
`;
const result = parseSDF(multiRecordSDF);
console.log(result.records.length); // 2
console.log(result.records[0].properties.NAME); // "Methane"
console.log(result.records[1].properties.NAME); // "Ethane"Round-trip workflow:
import { parseSMILES, writeSDF, parseSDF, generateSMILES } from 'openchem';
// SMILES → SDF → SMILES round-trip
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];
const sdfResult = writeSDF({
molecule: aspirin,
properties: { NAME: 'aspirin', FORMULA: 'C9H8O4' }
});
const parsed = parseSDF(sdfResult.sdf);
const roundtrip = generateSMILES(parsed.records[0].molecule!);
console.log(roundtrip); // "CC(=O)Oc1ccccc1C(=O)O"
console.log(parsed.records[0].properties.NAME); // "aspirin"Generates an InChI (International Chemical Identifier) string from a molecule structure. InChI provides a unique, canonical representation of chemical structures that can be used for database lookups and structure comparison.
Returns: Promise resolving to InChI string
Example:
import { parseSMILES, generateInChI } from 'openchem';
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const inchi = await generateInChI(aspirin.molecules[0]);
console.log(inchi); // "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"Generates an InChIKey (a hashed, fixed-length version of InChI) from an InChI string. InChIKeys are commonly used for database indexing and fast lookups.
Parameters:
inchi— InChI string to convert
Returns: Promise resolving to InChIKey string (27 characters)
Example:
const inchikey = await generateInChIKey(inchi);
console.log(inchikey); // "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"Writes molecules to SDF (Structure-Data File) format. Supports single or multiple records with optional property data. SDF files are commonly used for storing chemical databases and transferring molecular data between cheminformatics tools.
Parameters:
records— Single record or array of records to writeoptions— Optional configuration (same asMolGeneratorOptions):title?: string— Default title for records (default: empty)programName?: string— Program name in headers (default: "openchem")dimensionality?: '2D' | '3D'— Coordinate system (default: "2D")comment?: string— Default comment (default: empty)
Returns: SDFWriterResult containing:
sdf: string— Complete SDF file contenterrors: string[]— Any errors encountered (empty if successful)
Record format:
interface SDFRecord {
molecule: Molecule;
properties?: Record<string, string | number | boolean>;
}SDF structure:
- MOL block (V2000 format) for each molecule
- Property fields (
> <NAME>, value, blank line) - Record separator (
$$$$)
Example (single molecule):
import { parseSMILES, writeSDF } from 'openchem';
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const result = writeSDF({
molecule: aspirin.molecules[0],
properties: {
NAME: 'aspirin',
MOLECULAR_FORMULA: 'C9H8O4',
MOLECULAR_WEIGHT: 180.042
}
});
console.log(result.sdf);
// Output: SDF file with MOL block + properties + $$$$Example (multiple molecules):
import { parseSMILES, writeSDF } from 'openchem';
const drugs = [
{ smiles: 'CC(=O)Oc1ccccc1C(=O)O', name: 'aspirin' },
{ smiles: 'CC(C)Cc1ccc(cc1)C(C)C(=O)O', name: 'ibuprofen' },
{ smiles: 'CC(=O)Nc1ccc(O)cc1', name: 'acetaminophen' }
];
const records = drugs.map(drug => {
const mol = parseSMILES(drug.smiles).molecules[0];
return {
molecule: mol,
properties: {
NAME: drug.name,
SMILES: drug.smiles
}
};
});
const result = writeSDF(records, { programName: 'my-drug-tool' });
console.log(result.sdf);
// Output: Multi-record SDF with all 3 moleculesProperty formatting:
- Strings: Written as-is
- Numbers: Converted to strings
- Booleans:
"true"or"false" - Property names are case-sensitive
Compatibility:
- Output compatible with RDKit, OpenBabel, ChemDraw, and other tools
- Standard SDF format (V2000 MOL blocks)
- Properties follow MDL SDF specification
renderSVG(input: string | Molecule | Molecule[] | ParseResult, options?: SVGRendererOptions): SVGRenderResult
Renders molecules as 2D SVG structures with automatic coordinate generation using webcola collision prevention.
Parameters:
input— SMILES string, single molecule, array of molecules, or ParseResultoptions— Optional rendering configuration (see SVGRendererOptions below)
Returns: SVGRenderResult containing:
svg: string— SVG markup ready for displaywidth: number— Canvas width in pixelsheight: number— Canvas height in pixelserrors: string[]— Any rendering errors (empty if successful)
SVGRendererOptions:
width?: number— Canvas width (default: 300)height?: number— Canvas height (default: 300)bondLineWidth?: number— Bond line thickness (default: 2)bondLength?: number— Target bond length in pixels (default: 40)fontSize?: number— Atom label font size (default: 12)fontFamily?: string— Font family (default: "Arial, sans-serif")padding?: number— Canvas padding (default: 20)showCarbonLabels?: boolean— Show C atom labels (default: false)showImplicitHydrogens?: boolean— Show implicit hydrogens (default: false)kekulize?: boolean— Convert aromatic to Kekulé (default: true)atomColors?: Record<string, string>— Element-specific colorsbackgroundColor?: string— Background color (default: "#FFFFFF")bondColor?: string— Bond color (default: "#000000")showStereoBonds?: boolean— Show wedge/hash bonds (default: true)atomCoordinates?: AtomCoordinates[]— Pre-computed coordinateswebcolaIterations?: number— Collision prevention iterations (default: 100)deterministicChainPlacement?: boolean— Deterministic layouts (default: false)moleculeSpacing?: number— Space between molecules in grid (default: 60)
Features:
- Automatic 2D coordinate generation with collision prevention
- Ring regularization for 5 and 6-membered rings
- Fused ring system handling
- Stereochemistry display (wedge/hash bonds)
- Element-specific atom coloring
- Publication-quality output
Parses a SMARTS pattern string into a pattern molecule structure.
Returns: ParseResult containing:
molecules: Molecule[]— Array with pattern moleculeerrors: string[]— Parse errors (empty if successful)
SMARTS support:
- Logical operators:
!(not),&(and),,(or) - Atom properties:
[D1](degree),[H1](explicit H),[v3](valence) - Connectivity:
[#6X4](carbon with degree 4) - Aromatic matching:
[c]or[a](aromatic carbon)
Finds all matches of a SMARTS pattern in a molecule.
Parameters:
molecule— Target molecule to searchpattern— SMARTS pattern (fromparseSMARTS())
Returns: Array of matches, where each match is an array of atom indices
Example:
import { parseSMILES, parseSMARTS, matchSMARTS } from 'openchem';
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];
const carbonyl = parseSMARTS('[C](=O)').molecules[0];
const matches = matchSMARTS(aspirin, carbonyl);
// matches: [[1, 2], [7, 8]] (two carbonyl groups)Converts aromatic molecules to alternating single/double bond (Kekulé) representation.
Returns: New molecule with aromatic bonds replaced by alternating single/double bonds
Example:
import { parseSMILES, kekulize, generateSMILES } from 'openchem';
const benzene = parseSMILES('c1ccccc1');
const kek = kekulize(benzene.molecules[0]);
console.log(generateSMILES(kek)); // "C1=CC=CC=C1"Generates a Morgan fingerprint (ECFP-like) for molecular similarity searching and compound classification. Uses a modified Morgan algorithm with atom typing and circular neighborhoods.
Parameters:
molecule— Molecule to fingerprintradius— Fingerprint radius (default: 2, equivalent to ECFP4)fpSize— Fingerprint size in bits (default: 2048, RDKit standard)
Returns: Uint8Array containing the fingerprint bits
Example:
import { parseSMILES, computeMorganFingerprint } from 'openchem';
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const fingerprint = computeMorganFingerprint(aspirin.molecules[0], 2, 512);
console.log(fingerprint.length); // 64 (512 bits / 8 bytes)Calculates the Tanimoto similarity coefficient between two Morgan fingerprints. Measures structural similarity on a scale from 0 (no similarity) to 1 (identical).
Parameters:
fp1— First fingerprintfp2— Second fingerprint
Returns: Similarity score between 0 and 1
Example:
const similarity = tanimotoSimilarity(fingerprint1, fingerprint2);
console.log(`Similarity: ${(similarity * 100).toFixed(1)}%`);Extracts the Murcko scaffold from a molecule — the core ring systems and linkers connecting them, with all terminal side chains removed. This is the standard scaffold used in medicinal chemistry for compound classification.
Parameters:
molecule— Molecule to analyzeoptions.includeLinkers— Include linker atoms between rings (default:true)
Returns: New Molecule containing only the scaffold
Example:
import { parseSMILES, getMurckoScaffold, generateSMILES } from 'openchem';
const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O').molecules[0];
const scaffold = getMurckoScaffold(ibuprofen);
console.log(generateSMILES(scaffold)); // "c1ccccc1" - benzene coreGenerates a generic Bemis-Murcko framework — the scaffold with all atoms converted to carbon and all bonds converted to single bonds. Useful for identifying compounds with similar topology but different heteroatom patterns.
Returns: New Molecule with generic framework
Example:
import { parseSMILES, getBemisMurckoFramework, generateSMILES } from 'openchem';
const pyridine = parseSMILES('c1ccncc1').molecules[0];
const framework = getBemisMurckoFramework(pyridine);
console.log(generateSMILES(framework)); // "C1CCCCC1" - cyclohexaneGenerates a hierarchical scaffold tree by iteratively removing rings from the Murcko scaffold. Returns scaffolds ordered from most specific (full scaffold) to least specific (single ring).
Returns: Array of Molecule objects representing scaffolds at different levels
Example:
import { parseSMILES, getScaffoldTree, generateSMILES } from 'openchem';
const mol = parseSMILES('c1ccc2ccccc2c1').molecules[0]; // Naphthalene
const tree = getScaffoldTree(mol);
console.log(tree.length); // 2 levels: full naphthalene, then single benzene
tree.forEach((scaffold, idx) => {
console.log(`Level ${idx}: ${generateSMILES(scaffold)}`);
});Generates a pure topological framework with all atoms converted to wildcard atoms (*). This represents the molecular graph structure without any atom type information.
Returns: New Molecule with graph framework
Example:
import { parseSMILES, getGraphFramework, generateSMILES } from 'openchem';
const caffeine = parseSMILES('CN1C=NC2=C1C(=O)N(C(=O)N2C)C').molecules[0];
const graph = getGraphFramework(caffeine);
console.log(generateSMILES(graph)); // "*1*=**2=*1*(*)*(*)*2*" - pure topologyCompares two molecules to determine if they share the same Murcko scaffold. Useful for compound series analysis and lead identification.
Returns: true if scaffolds match, false otherwise
Example:
import { parseSMILES, haveSameScaffold } from 'openchem';
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];
const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O').molecules[0];
console.log(haveSameScaffold(aspirin, ibuprofen)); // true - both benzene scaffoldEnumerates all tautomers for a molecule using transform-based enumeration with RDKit-compatible scoring.
Options:
maxTautomers?: number— Maximum tautomers to generate (default: 256)maxTransforms?: number— Maximum transform operations (default: 1024)phases?: number[]— Rule phases to apply (default: [1, 2, 3])useFingerprintDedup?: boolean— Use fingerprint deduplication (default: true)
Returns: Array of TautomerResult objects with:
smiles: string— SMILES representationmolecule: Molecule— Molecule objectscore: number— Stability score (higher = more stable)ruleIds: string[]— Applied transformation rules
Scoring system (RDKit-inspired):
- +250 per all-carbon aromatic ring
- +100 per heteroaromatic ring
- +25 for benzoquinone
- +4 for oximes (C=N-OH)
- +2 for carbonyls (C=O, N=O, P=O)
- -10 per formal charge
- -4 for aci-nitro
- -1 per H on P, S, Se, Te
Example:
import { parseSMILES, enumerateTautomers } from 'openchem';
const mol = parseSMILES('CC(=O)CC(=O)C').molecules[0]; // acetylacetone
const tautomers = enumerateTautomers(mol, { maxTautomers: 16 });
console.log(`Found ${tautomers.length} tautomers:`);
tautomers.forEach((t, i) => {
console.log(`${i + 1}. ${t.smiles} (score: ${t.score})`);
});
// 1. CC(=O)CC(=O)C (score: 4) - diketo form
// 2. CC(=O)C=C(C)O (score: 2) - monoenol form
// 3. CC(O)=CC(=O)C (score: 2) - monoenol formSelects the canonical (most stable) tautomer based on scoring.
Returns: The highest-scoring tautomer as a Molecule
Example:
import { parseSMILES, canonicalTautomer, generateSMILES } from 'openchem';
const mol = parseSMILES('CC(=O)CC(=O)C').molecules[0];
const canonical = canonicalTautomer(mol);
console.log(generateSMILES(canonical)); // "CC(=O)CC(=O)C" - diketo form preferredCalculates the LogP (partition coefficient) using the Wildman-Crippen method. LogP predicts lipophilicity and membrane permeability.
Returns: LogP value as a number
Interpretation:
- LogP < 0: Hydrophilic (water-loving)
- 0 ≤ LogP ≤ 5: Optimal range for most drugs
- LogP > 5: Lipophilic (fat-loving), may have poor water solubility
Example:
import { parseSMILES, computeLogP } from 'openchem';
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];
console.log(computeLogP(aspirin)); // 1.31 (good bioavailability)Alias for computeLogP(). Alternative name for the Wildman-Crippen LogP calculation.
Alternative LogP calculation method. May use different fragment contributions than Crippen.
Returns the molecular formula in Hill notation (C first, then H, then alphabetical).
Example: C9H8O4 for aspirin
Returns the molecular mass using average atomic masses from the periodic table.
Example: 180.042 for aspirin
Returns the exact mass using the most abundant isotope for each element.
Example: 180.042 for aspirin
Returns the count of non-hydrogen atoms.
Example: 13 for ibuprofen
Returns the count of heteroatoms (any atom except C and H). Includes N, O, S, P, halogens, etc.
Example: 2 for aspirin (2 oxygen atoms in COOH group)
Returns the total number of rings in the molecule using cycle detection.
Example: 2 for naphthalene (2 fused rings)
Returns the number of aromatic rings.
Example: 1 for benzene, 2 for naphthalene
Returns a comprehensive ring information object providing access to SSSR (Smallest Set of Smallest Rings) and ring membership queries. Similar to RDKit's GetRingInfo() functionality.
Methods:
numRings()- Number of rings in SSSRrings()- Array of rings (each ring is atom ID array)isAtomInRing(atomIdx)- Check if atom is in any ringisBondInRing(atom1, atom2)- Check if bond is in any ringatomRingMembership(atomIdx)- Ring membership count for atom ([Rn] in SMARTS)atomRings(atomIdx)- All rings containing specific atomringAtoms(ringIdx)- Atoms in specific ringringBonds(ringIdx)- Bonds in specific ring
Example:
const ringInfo = getRingInfo(mol);
console.log(ringInfo.numRings()); // 2
console.log(ringInfo.isAtomInRing(5)); // true
console.log(ringInfo.atomRingMembership(3)); // 2 (bridgehead atom)Returns the fraction of sp³-hybridized carbons (saturated carbons) relative to total carbons. Higher values indicate greater structural complexity and 3D character. Range: 0.0 to 1.0.
Example: 0.25 for caffeine, 0.67 for ibuprofen
Returns the count of hydrogen bond donors (N-H and O-H groups).
Example: 1 for aspirin (carboxylic acid O-H), 0 for caffeine
Returns the count of hydrogen bond acceptors (N and O atoms).
Example: 4 for aspirin, 6 for caffeine
Returns the Topological Polar Surface Area in Ų (square Ångströms) using the Ertl et al. fragment-based algorithm. TPSA is a key descriptor for predicting drug absorption and bioavailability.
Guidelines:
- TPSA < 140 Ų: Good oral bioavailability
- TPSA < 90 Ų: Likely blood-brain barrier penetration
- TPSA > 140 Ų: Poor membrane permeability
Example: 63.60 for aspirin (good oral availability), 52.93 for morphine (CNS-active)
Returns the count of rotatable bonds (single non-ring bonds between non-terminal heavy atoms). Used in Veber rules for predicting oral bioavailability.
Example: 3 for aspirin, 4 for ibuprofen
Evaluates Lipinski's Rule of Five for oral drug-likeness. Returns result object with:
passes: boolean indicating if all rules passviolations: array of violation messagesproperties: { molecularWeight, hbondDonors, hbondAcceptors, logP }
Rules:
- Molecular weight ≤ 500 Da
- H-bond donors ≤ 5
- H-bond acceptors ≤ 10
- LogP ≤ 5
Evaluates Veber rules for oral bioavailability. Returns result object with:
passes: boolean indicating if all rules passviolations: array of violation messagesproperties: { rotatableBonds, tpsa }
Rules:
- Rotatable bonds ≤ 10
- TPSA ≤ 140 Ų
Predicts blood-brain barrier penetration. Returns result object with:
likelyPenetration: boolean (true if TPSA < 90 Ų)tpsa: TPSA value
interface Molecule {
atoms: Atom[];
bonds: Bond[];
}
interface Atom {
id: number;
symbol: string;
aromatic: boolean;
hydrogens: number;
charge: number;
isotope: number | null;
chiral: string | null;
atomClass: number | null;
isBracket: boolean;
atomicNumber: number;
}
interface Bond {
atom1: number;
atom2: number;
type: BondType;
stereo: StereoType;
}
enum BondType {
SINGLE = 1,
DOUBLE = 2,
TRIPLE = 3,
QUADRUPLE = 4,
AROMATIC = 5
}openchem is designed for production use with real-world performance:
- Parsing: ~1-10ms per molecule (depending on complexity)
- Generation: ~1-5ms per molecule
- Memory: Minimal overhead, compact AST representation
- Zero dependencies: No external runtime dependencies
Benchmark with 325 diverse molecules including commercial drugs: Average parse + generate round-trip < 5ms
openchem uses a post-processing enrichment system that pre-computes expensive molecular properties during parsing. This design significantly improves performance for downstream property queries while maintaining code simplicity.
Molecular property calculations like ring finding, hybridization determination, and rotatable bond classification are computationally expensive (O(n²) complexity). Without pre-computation:
- Redundant calculations: Ring finding would run every time you query ring count, aromatic rings, or check if atoms/bonds are in rings
- Performance penalty: Property queries would dominate runtime, especially for drug-likeness checks that need multiple properties
- Code complexity: Every property function would need to duplicate expensive logic
The Solution: Compute once during parsing, cache results, use everywhere.
types.ts— Extended with optional cached properties onAtom,Bond, andMoleculeinterfacessrc/utils/molecule-enrichment.ts— Post-processing module that enriches molecules after parsingsrc/parser.ts— CallsenrichMolecule()after validation phase at line 451src/utils/molecular-properties.ts— Uses cached properties when available, falls back to computation
- Atom:
degree(neighbor count),isInRing,ringIds[],hybridization(sp/sp²/sp³) - Bond:
isInRing,ringIds[],isRotatable - Molecule:
rings[][](all rings as atom IDs),ringInfo(lookup maps)
Benchmark Results (10,000 molecules, 7 properties each):
- Parse time: 1.22 ms/molecule (includes enrichment)
- Property query time: 0.006 ms/molecule (0.5% of parse time)
- Rotatable bond queries: ~3.1 million ops/second (simple array filter vs 47-line calculation)
Complexity Improvements:
- Ring finding: Once per molecule (O(n²)) → subsequent queries O(1)
- Rotatable bonds: O(n×m) nested loops → O(n) array filter
- Property queries: 200× faster on average
Important: Molecules are immutable after parsing. All enriched properties remain valid for the lifetime of the molecule object. This design:
- Prevents stale cached properties (no mutation = no invalidation needed)
- Enables safe sharing across threads/workers
- Simplifies reasoning about molecule state
If you need to modify a molecule, create a new one by parsing updated SMILES.
- Ring analysis (
analyzeRings()) runs only during enrichment - Downstream property functions check cached values first, fall back to computation if missing
- Backward compatible: cached properties are optional (
?:) with defensive fallbacks - New code should always use cached properties when available
openchem handles 100% of tested SMILES correctly (325/325 in bulk validation).
Key implementation details:
-
Stereo normalization: Trans alkenes are automatically normalized to use
/(up) markers on both ends to match RDKit's canonical form. For example,C\C=C\CandC/C=C/Cboth represent trans configuration and canonicalize toC/C=C/C. -
Canonical ordering: Atoms are ordered using a modified Morgan algorithm matching RDKit's approach, with tie-breaking by atomic number, degree, and other properties.
-
Aromatic validation: Aromatic notation (lowercase letters) is accepted as specified in SMILES. The parser validates that aromatic atoms are in rings but accepts aromatic notation without strict Hückel rule enforcement, matching RDKit's behavior for broader compatibility.
This implementation has been validated against RDKit's canonical SMILES output for diverse molecule sets including stereocenters, complex rings, heteroatoms, and 25 commercial pharmaceutical drugs.
openchem implements the OpenSMILES specification with high fidelity while prioritizing RDKit compatibility for real-world interoperability. In specific areas where the OpenSMILES specification provides recommendations rather than strict requirements, openchem follows RDKit's implementation choices to ensure 100% parity with the industry-standard cheminformatics toolkit.
OpenSMILES Recommendation: Start traversal on heteroatoms first, then terminals.
- Example preference:
OCCCoverCCCOfor propanol - Rationale: Heteroatoms are "more interesting" chemically
openchem Implementation: Canonical labels first, heteroatoms as tie-breaker.
- Example: Both
OCCCandCCCOcanonicalize toCCCO - Rationale: Ensures 100% deterministic output for identical molecules
Why RDKit's Approach:
- Determinism: Canonical labels guarantee the same molecule always produces identical output, regardless of input order
- Interoperability: 100% agreement with RDKit enables seamless integration with existing cheminformatics pipelines and databases
- Real-world usage: Major chemical databases (PubChem, ChEMBL) prioritize canonical determinism over heteroatom preference
- Chemical equivalence: Both
OCCCandCCCOrepresent the same molecule; the output difference is purely cosmetic
Impact: Minimal — affects only the order atoms appear in canonical output, not chemical meaning or validity. All SMILES remain valid OpenSMILES syntax.
OpenSMILES Specification: Recommends strict Hückel rule enforcement (4n+2 π-electrons).
openchem Implementation: Accepts aromatic notation as specified in input; validates aromatic atoms are in rings but does not enforce strict Hückel rules during parsing.
Why RDKit's Approach: Broader compatibility with real-world chemical data where aromaticity may be empirically determined or context-dependent rather than purely theoretical.
| Feature | OpenSMILES Spec | openchem Implementation | Rationale |
|---|---|---|---|
| Starting atom | Heteroatom preference | Canonical labels first | Deterministic output, RDKit parity |
| Aromatic validation | Strict Hückel (4n+2) | Permissive ring validation | Real-world compatibility |
| Stereo normalization | Not specified | Canonical E/Z form | Deterministic stereo representation |
| Canonical ordering | Modified Morgan recommended | Modified Morgan (RDKit-compatible) | 100% RDKit agreement |
All deviations are deliberate choices to maximize real-world interoperability while maintaining full compliance with OpenSMILES syntax and semantics. openchem produces valid OpenSMILES that can be read by any compliant parser.
openchem/
├── src/
│ ├── generators/
│ │ ├── mol-generator.ts # MOL file generation
│ │ ├── sdf-writer.ts # SDF file writing
│ │ └── smiles-generator.ts # Canonical SMILES generation
│ ├── parsers/
│ │ ├── bracket-parser.ts # Bracket notation parser
│ │ ├── molfile-parser.ts # MOL file parser
│ │ ├── sdf-parser.ts # SDF file parser
│ │ └── smiles-parser.ts # SMILES parser
│ ├── utils/
│ │ ├── aromaticity-perceiver.ts # Aromaticity detection
│ │ ├── atom-utils.ts # Atom helper functions
│ │ ├── bond-utils.ts # Bond helper functions
│ │ ├── molecular-properties.ts # Property calculations
│ │ ├── molecule-enrichment.ts # Post-processing enrichment
│ │ ├── ring-finder.ts # Ring detection algorithm
│ │ ├── ring-utils.ts # Ring utilities
│ │ ├── symmetry-detector.ts # Symmetry analysis
│ │ └── valence-calculator.ts # Valence validation
│ ├── validators/
│ │ ├── aromaticity-validator.ts # Aromaticity validation
│ │ ├── stereo-validator.ts # Stereochemistry validation
│ │ └── valence-validator.ts # Valence checking
│ └── constants.ts # Element data and constants
├── test/
│ ├── smiles/ # SMILES tests (213 tests)
│ │ ├── stereo/ # Stereo tests (59 tests)
│ │ │ ├── stereo-advanced.test.ts
│ │ │ ├── stereo-extra.test.ts
│ │ │ └── stereo-rings.test.ts
│ │ ├── rdkit-comparison/ # RDKit validation (229 tests)
│ │ │ ├── bond-mismatch-debug.test.ts
│ │ │ ├── failing-cases.test.ts
│ │ │ ├── rdkit-10k.test.ts
│ │ │ ├── rdkit-bulk.test.ts
│ │ │ ├── rdkit-canonical.test.ts
│ │ │ ├── rdkit-comparison.test.ts
│ │ │ ├── rdkit-stereo.test.ts
│ │ │ ├── rdkit-symmetry.test.ts
│ │ │ └── smiles-10k.txt
│ │ ├── smiles-bracket-parser.test.ts
│ │ ├── smiles-extended-stereo.test.ts
│ │ ├── smiles-isotope.test.ts
│ │ ├── smiles-parser-advanced.test.ts
│ │ ├── smiles-parser-basic.test.ts
│ │ ├── smiles-parser-edge-cases.test.ts
│ │ ├── smiles-round-trip.test.ts
│ │ └── smiles-standard-form.test.ts
│ ├── molfile/ # MOL file tests (57 tests)
│ │ ├── mol-generator.test.ts
│ │ ├── molfile-parser.test.ts
│ │ ├── molfile-roundtrip.test.ts
│ │ ├── rdkit-mol-comparison.test.ts
│ │ └── rdkit-molfile.test.ts
│ ├── sdf/ # SDF tests (62 tests)
│ │ ├── sdf-parser-integration.test.ts
│ │ ├── sdf-parser-unit.test.ts
│ │ ├── sdf-writer-integration.test.ts
│ │ └── sdf-writer-unit.test.ts
│ ├── unit/
│ │ ├── utils/ # Utility tests (101 tests)
│ │ │ ├── aromaticity-perceiver.test.ts
│ │ │ ├── atom-utils.test.ts
│ │ │ ├── molecular-properties.test.ts
│ │ │ ├── ring-finder.test.ts
│ │ │ ├── symmetry-detector.test.ts
│ │ │ └── valence-calculator.test.ts
│ │ └── validators/ # Validator tests (not yet created)
│ └── rdkit-comparison/
│ └── rdkit-api-inspect.test.ts # RDKit API inspection (1 test)
├── types.ts # TypeScript type definitions
├── index.ts # Public API exports
├── package.json
├── tsconfig.json
├── AGENTS.md # Agent guidelines
└── README.md
openchem implements RDKit-compatible canonical SMILES generation:
-
Modified Morgan Algorithm: Atoms are canonically ordered using iterative refinement based on:
- Canonical rank (connectivity signature)
- Atomic number (tie-breaker)
- Degree, isotope, charge
- Neighbor properties
-
Starting Atom Selection (RDKit-compatible):
- Primary criterion: Canonical label (lowest rank wins)
- Tie-breakers (in order): Heteroatom preference → Terminal atom → Lower degree → Lower charge
- Design choice: Prioritizes canonical labels over heteroatom preference for deterministic output
- Note: The OpenSMILES specification (Section 4.3.4) recommends starting on heteroatoms first (e.g.,
OCCCoverCCCO), but RDKit prioritizes canonical ordering for deterministic behavior - Result: Both approaches are chemically equivalent; openchem follows RDKit for maximum interoperability
-
Stereo Normalization: E/Z double bond stereochemistry is normalized to a canonical form:
- Trans (E) alkenes: Both markers pointing up (
/) - e.g.,C/C=C/C - Cis (Z) alkenes: Opposing markers (
/and\) - e.g.,C/C=C\C - Ensures equivalent stereo representations canonicalize identically
- Trans (E) alkenes: Both markers pointing up (
-
Deterministic Output: Same molecule always produces the same canonical SMILES, enabling reliable structure comparison and database storage.
Example of RDKit-compatible behavior:
// Both inputs represent the same molecule (hydrogen cyanide)
parseSMILES('C#N'); // → canonical: "C#N" (carbon first)
parseSMILES('N#C'); // → canonical: "C#N" (canonical labels prioritized)
// Both inputs represent the same molecule (propanol)
parseSMILES('OCCC'); // → canonical: "CCCO" (canonical labels prioritized)
parseSMILES('CCCO'); // → canonical: "CCCO"This implementation achieves 100% agreement with RDKit's canonical output across 325 diverse test molecules including 25 commercial pharmaceutical drugs.
All debug logging (e.g., console.log, console.warn, etc.) must be gated behind the VERBOSE environment variable. This ensures that test and production output remains clean unless explicitly requested. Use:
if (process.env.VERBOSE) {
console.log('Debug info...');
}This applies to all source and test files. Never leave direct logging statements that print during normal runs.
We welcome contributions! openchem maintains strict quality standards:
- All tests must pass — 610/610 required
- RDKit parity required — Canonical SMILES must match RDKit output exactly
- Add tests for new features — Test coverage is mandatory
- Follow TypeScript conventions — See
AGENTS.mdfor guidelines
To contribute:
# Clone and install
git clone https://github.com/rajeshg/openchem.git
cd openchem
bun install
# Make changes and test
bun test
# Type check
bun run tsc
# Submit PR with testsopenchem is perfect for:
- Cheminformatics web applications — Client-side molecule parsing and visualization
- Chemical databases — Canonical SMILES, InChI, and fingerprint-based storage and comparison
- Molecule editors — Import/export SMILES, MOL, SDF with 2D rendering
- Drug discovery tools — Structure representation, property calculation, and similarity searching
- Educational software — Teaching chemical notation with interactive 2D visualization
- API services — Fast molecule processing, fingerprinting, and property calculation in Node.js
MIT
openchem builds on the work of several excellent open-source projects:
-
RDKit — Validated against RDKit for accuracy. Several molecular descriptor algorithms (LabuteASA, LogP, Morgan fingerprints, etc.) are derived from RDKit's C++ implementation. See THIRD-PARTY-LICENSES.md for details.
-
OPSIN by Daniel Lowe — IUPAC nomenclature data (XML lexicon files in
opsin-iupac-data/and compiled rules inopsin-rules.json) used for parsing and generating IUPAC chemical names. Licensed under MIT. -
InChI Trust — InChI identifier generation via WebAssembly build of the official InChI library.
-
Daylight SMILES & SMARTS — Specification for molecular line notation and pattern matching.
For complete license information and attribution requirements, see THIRD-PARTY-LICENSES.md.