openchem

A fast TypeScript / JavaScript chemistry toolkit for working with molecular structures: parsing & generation (SMILES, MOL, SDF), canonicalization, pattern matching (SMARTS), 2D rendering, molecular descriptors, and structural analysis.

Production-ready, TypeScript-first library for cheminformatics — works in both browser and Node.js. openchem keeps a small runtime footprint.

Features

File Format Support

SMILES — Parse and generate canonical SMILES with full stereochemistry
MOL files — V2000/V3000 format support with 2D coordinate generation
SDF files — Multi-molecule files with property data
InChI — Generate InChI and InChIKey identifiers
IUPAC names — Bidirectional IUPAC ↔ SMILES conversion

Structure Analysis

Pattern matching — SMARTS substructure search
Fingerprints — Morgan (ECFP) fingerprints with Tanimoto similarity
Murcko scaffolds — Extract core scaffolds, generic frameworks, scaffold trees
Tautomers — Complete enumeration (25 rules, 100% RDKit coverage) with RDKit-compatible scoring
Ring systems — SSSR detection, fused/spiro/bridged classification
Aromaticity — Hückel rule perception and kekulization
Symmetry — Canonical ordering via modified Morgan algorithm
Stereochemistry — Full support for tetrahedral centers, E/Z bonds, extended chirality

Molecular Properties

Basic — Formula, mass, atom/bond counts
Structural — Valence electrons, amide bonds, spiro/bridgehead atoms, ring classifications
Stereochemistry — Specified and unspecified stereocenter counting
Drug-likeness — Lipinski's Rule of Five, Veber rules, BBB penetration
Descriptors — TPSA, LogP, rotatable bonds, H-bond donors/acceptors
Ring analysis — Saturated/aliphatic/heterocyclic ring counts

Visualization

2D rendering — Publication-quality SVG with automatic layout
Smart positioning — Overlap-aware fused ring placement
Stereochemistry display — Wedge/hash bonds for chirality
Customizable — Element colors, bond styles, canvas size

Performance & Quality

⚡ Fast — Optimized coordinate generation, CSR graph for O(1) lookups
🔬 Accurate — 100% RDKit agreement on canonical SMILES (325/325 molecules)
✅ Well-tested — 2,093 passing tests including bulk RDKit comparisons
🎯 Production-ready — Used with real drugs, natural products, edge cases
📦 Lightweight — Minimal dependencies, works in browser and Node.js
🔒 TypeScript-first — Full type safety with excellent IDE support

Quick Start

npm install openchem
# or: bun add openchem

import { parseSMILES, renderSVG, Descriptors } from 'openchem';

// Parse a molecule
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];

// Render as SVG
const svg = renderSVG(aspirin);
console.log(svg.svg); // SVG markup ready for display

// Get all molecular properties at once
const props = Descriptors.all(aspirin);
console.log(props.formula);        // "C9H8O4"
console.log(props.mass);           // 180.16
console.log(props.logP);           // 1.19
console.log(props.lipinskiPass);   // true - aspirin is drug-like!

// Or get specific categories
const drugLike = Descriptors.drugLikeness(aspirin);
console.log(drugLike.lipinski.passes);     // true
console.log(drugLike.lipinski.violations); // []

HTML Playground

openchem includes an interactive HTML playground for testing SMILES parsing, molecular visualization, and descriptor calculation:

# Build the browser bundle and start a local server
bun run serve

# Then open http://localhost:3000/smiles-playground.html in your browser

The playground provides:

2D Structure Visualization — Clean SVG rendering of molecular structures
Molecular Descriptors — Formula, mass, TPSA, rotatable bonds, etc.
Drug-Likeness Checks — Lipinski's Rule of Five, Veber rules, BBB penetration
Interactive Examples — Pre-loaded molecules like aspirin, caffeine, ibuprofen

The playground automatically detects if the full openchem library is available and falls back to approximate calculations if needed.

Note: The HTML playground requires a web server to load the openchem library due to ES module security restrictions. Use bun run serve to start a local server, then open http://localhost:3000/smiles-playground.html in your browser.

Model Context Protocol (MCP) Server

The MCP server for AI assistant integration is now available as a separate package: @openchem/mcp

Quick Start

# Install MCP server
npm install -g @openchem/mcp

# Start server
openchem-mcp

# Server runs on http://localhost:3000

Connect to Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "openchem": {
      "url": "http://localhost:3000/mcp"
    }
  }
}

Restart Claude Desktop and try: "Analyze aspirin using SMILES CC(=O)Oc1ccccc1C(=O)O"

Available MCP Tools

analyze — Complete molecular analysis (40+ descriptors, drug-likeness, IUPAC name, optional rendering)
compare — Molecular similarity (Morgan fingerprints, Tanimoto similarity, property comparison)
search — Substructure matching (SMARTS patterns with match counts and indices)
render — 2D structure visualization (publication-quality SVG)
convert — Format conversion (canonical SMILES, IUPAC names, Murcko scaffolds)

Documentation

@openchem/mcp Package — Full MCP server documentation
MCP Integration Guide — Complete integration guide (Claude Desktop, custom clients, deployment)
MCP Server Reference — API documentation, tool schemas, examples

Code Examples

import { parseSMILES, generateSMILES, parseMolfile, generateMolfile, parseSDF, writeSDF } from 'openchem';

// Parse SMILES into molecule structure
const result = parseSMILES('CC(=O)O'); // acetic acid
console.log(result.molecules[0].atoms.length); // 4 atoms
console.log(result.molecules[0].bonds.length); // 3 bonds

// Generate canonical SMILES
const canonical = generateSMILES(result.molecules[0]);
console.log(canonical); // "CC(=O)O"

// Parse MOL file
const molContent = `
acetic acid
  openchem

  4  3  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.2500    1.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.2500   -1.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  2  0  0  0  0
  2  4  1  0  0  0  0
M  END
`;
const molResult = parseMolfile(molContent);
console.log(generateSMILES(molResult.molecule!)); // "CC(=O)O"

// Generate MOL file from SMILES
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const molfile = generateMolfile(aspirin.molecules[0], { title: 'aspirin' });
console.log(molfile); // Full MOL file with coordinates

// Parse SDF file
const sdfContent = `
  Mrv2311 02102409422D          


  3  2  0  0  0  0            999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.2500    1.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
M  END
>  <ID>
MOL001

>  <NAME>
Ethanol

$$$$
`;
const sdfResult = parseSDF(sdfContent);
console.log(sdfResult.records[0].molecule?.atoms.length); // 3
console.log(sdfResult.records[0].properties.NAME); // "Ethanol"

// Generate InChI from molecule const inchi = await generateInChI(aspirin.molecules[0]); console.log(inchi); // "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"

// Generate InChIKey const inchikey = await generateInChIKey(inchi); console.log(inchikey); // "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"


### Morgan Fingerprints and Similarity

```typescript
import { parseSMILES, computeMorganFingerprint, tanimotoSimilarity } from 'openchem';

// Generate fingerprints for similarity comparison
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O');

const fp1 = computeMorganFingerprint(aspirin.molecules[0], 2, 512);
const fp2 = computeMorganFingerprint(ibuprofen.molecules[0], 2, 512);

// Calculate structural similarity
const similarity = tanimotoSimilarity(fp1, fp2);
console.log(`Similarity: ${(similarity * 100).toFixed(1)}%`); // ~45.2%

Murcko Scaffolds

Extract core molecular scaffolds for drug discovery and compound classification:

import { parseSMILES, getMurckoScaffold, getBemisMurckoFramework, generateSMILES } from 'openchem';

// Extract scaffold (rings + linkers, remove side chains)
const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O').molecules[0];
const scaffold = getMurckoScaffold(ibuprofen);
console.log(generateSMILES(scaffold)); // "c1ccc(cc1)" - benzene core

// Get generic framework (all atoms → carbon, all bonds → single)
const framework = getBemisMurckoFramework(ibuprofen);
console.log(generateSMILES(framework)); // "C1CCCCC1" - cyclohexane

// Compare scaffolds of similar drugs
import { haveSameScaffold } from 'openchem';
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];
console.log(haveSameScaffold(ibuprofen, aspirin)); // true - both have benzene scaffold

Applications:

Compound library classification
Lead series identification
Scaffold hopping strategies
Fragment-based drug design

Tautomer Enumeration

Enumerate and score tautomers (keto-enol, imine-enamine, amide-imidol, etc.) with RDKit-compatible scoring:

import { parseSMILES, enumerateTautomers, generateSMILES } from 'openchem';

// Enumerate tautomers for acetylacetone (pentane-2,4-dione)
const mol = parseSMILES('CC(=O)CC(=O)C').molecules[0];
const tautomers = enumerateTautomers(mol, { maxTautomers: 16 });

console.log(`Found ${tautomers.length} tautomers:`);
tautomers.forEach((t, i) => {
  console.log(`${i + 1}. ${t.smiles} (score: ${t.score})`);
});

// Get canonical tautomer (highest scoring)
import { canonicalTautomer } from 'openchem';
const canonical = canonicalTautomer(mol);
console.log(`Canonical: ${generateSMILES(canonical)}`);

Supported tautomer types (26 rules, 100% RDKit coverage):

1,3 and 1,5 keto-enol (carbonyl ↔ enol, conjugated systems)
Imine-enamine (C=N ↔ C-NH, including aromatic special cases)
1,5/1,7/1,9/1,11 aromatic heteroatom H shift (pyrrole, indole, large heterocycles)
Furanone (lactone tautomerism in 5-membered rings)
Amide-imidol (N-C=O ↔ N=C-OH)
Lactam-lactim (cyclic amide ↔ cyclic imidate)
Nitro-aci-nitro, nitroso-oxime, oxim/nitroso via phenol
Thione-thiol (C=S ↔ C-SH)
Guanidine, tetrazole, imidazole (heterocycle tautomerism)
Phosphonic acid, sulfoxide (P/S heteroatom shifts)
Edge cases: keten/ynol, cyano/isocyanic acid, formamidinesulfinic acid, isocyanide

Scoring system (RDKit-compatible):

+250 per all-carbon aromatic ring (benzene)
+100 per heteroaromatic ring (pyridine)
+25 for benzoquinone patterns
+4 for oximes (C=N-OH)
+2 for carbonyls (C=O, N=O, P=O)
-10 per formal charge
-4 for aci-nitro forms
-1 per hydrogen on P, S, Se, Te

Applications:

Compound standardization for databases
Virtual screening preparation
pKa prediction support
Tautomer-aware structure searching

SVG Rendering

import { parseSMILES, renderSVG } from 'openchem';

// Render molecule as SVG
const caffeine = parseSMILES('CN1C=NC2=C1C(=O)N(C(=O)N2C)C');
const svgResult = renderSVG(caffeine.molecules[0], {
  width: 300,
  height: 200,
  showCarbonLabels: false,
  bondLength: 30,
});

console.log(svgResult.svg); // Complete SVG markup
console.log(`Canvas: ${svgResult.width}x${svgResult.height}`); // "300x200"

Testing & RDKit comparison

openchem has an extensive test suite (unit, integration, and RDKit comparison tests) that exercises parsing, generation, file round-trips, stereochemistry, aromatic perception, and molecular properties. Rather than rely on fragile hard-coded counts in the README, the project keeps comprehensive automated tests in the test/ folder and runs RDKit parity checks as part of the comparison test suite when RDKit is available.

Highlights:

Broad unit and integration coverage across parsers, generators, utils, and validators
RDKit comparison tests for canonical SMILES and round-trip fidelity (these run when RDKit is available in the test environment)
Tests are designed to be self-contained and to skip RDKit-specific checks when RDKit isn't present in the environment

For maintainers: update and run the test suite with bun test. Use RUN_RDKIT_BULK=1 to enable the heavier RDKit bulk comparisons when you have RDKit available.

Validation

openchem maintains broad automated test coverage across unit, integration, and RDKit comparison tests. The test/ directory contains the authoritative suite; maintainers can run bun test locally and enable the heavier RDKit comparison runs with RUN_RDKIT_BULK=1 when RDKit is available. Tests are designed to validate parsing, generation, round-tripping, stereochemistry, aromatic perception, and molecular properties without requiring hard-coded counts in the README.

Installation

npm install openchem

bun add openchem

pnpm add openchem

Usage

Example Files

For comprehensive working examples, see:

docs/examples/comprehensive-example.ts — All major features (SMILES, properties, IUPAC, InChI, SVG, SMARTS, fingerprints)
docs/examples/example-iupac.ts — IUPAC name generation and parsing (both directions)
docs/examples/example-aromaticity.ts — Aromaticity perception using Hückel's rule
docs/examples/example-drug-likeness.ts — Drug-likeness assessment (Lipinski, Veber, BBB)
docs/examples/example-murcko-scaffolds.ts — Murcko scaffold extraction and analysis
docs/examples/example-tautomers.ts — Tautomer enumeration and canonical selection
docs/examples/example-sdf-export.ts — SDF file generation

Run any example:

bun run docs/examples/comprehensive-example.ts

Basic Parsing

Running heavy RDKit comparisons

The repository contains two long-running RDKit comparison tests (the 10k SMILES suite and the bulk 300-SMILES suite). These tests are skipped by default to keep regular test runs fast.

To run them set the RUN_RDKIT_BULK environment variable:

# Run heavy RDKit comparisons (rdkit-10k and rdkit-bulk)
RUN_RDKIT_BULK=1 bun test

Add RUN_VERBOSE=1 for more detailed RDKit reporting during the run.

import { parseSMILES } from 'openchem';

// Simple molecule
const ethanol = parseSMILES('CCO');
console.log(ethanol.molecules[0].atoms.length); // 3

// Check for errors
const result = parseSMILES('invalid');
if (result.errors.length > 0) {
  console.error('Parse errors:', result.errors);
}

// Complex molecule with stereochemistry
const lAlanine = parseSMILES('C[C@H](N)C(=O)O');
const chiralCenter = lAlanine.molecules[0].atoms.find(a => a.chiral);
console.log(chiralCenter?.chiral); // '@'

Molecular Properties

openchem provides comprehensive molecular property calculations for drug discovery and cheminformatics applications.

Basic Properties

import { 
  parseSMILES, 
  getMolecularFormula, 
  getMolecularMass, 
  getExactMass 
} from 'openchem';

const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const mol = aspirin.molecules[0];

// Get molecular formula (Hill notation)
const formula = getMolecularFormula(mol);
console.log(formula); // "C9H8O4"

// Get molecular mass (average atomic masses)
const mass = getMolecularMass(mol);
console.log(mass); // 180.042

// Get exact mass (most abundant isotope)
const exactMass = getExactMass(mol);
console.log(exactMass); // 180.042

Atom Counts and Structure

import {
  parseSMILES,
  getHeavyAtomCount,
  getHeteroAtomCount,
  getRingCount,
  getAromaticRingCount,
  getRingInfo
} from 'openchem';

const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O');
const mol = ibuprofen.molecules[0];

// Count heavy atoms (non-hydrogen)
console.log(getHeavyAtomCount(mol)); // 13

// Count heteroatoms (N, O, S, P, halogens, etc.)
console.log(getHeteroAtomCount(mol)); // 2

// Count total rings
console.log(getRingCount(mol)); // 1

// Count aromatic rings
console.log(getAromaticRingCount(mol)); // 1

// Get comprehensive ring information
const ringInfo = getRingInfo(mol);
console.log(ringInfo.numRings()); // 1
console.log(ringInfo.rings()); // [[6,7,8,9,10,11]] - atom IDs in the ring

Drug-Likeness Properties

import { 
  parseSMILES,
  getFractionCSP3,
  getHBondDonorCount,
  getHBondAcceptorCount,
  getTPSA
} from 'openchem';

const caffeine = parseSMILES('CN1C=NC2=C1C(=O)N(C(=O)N2C)C');
const mol = caffeine.molecules[0];

// Fraction of sp3 carbons (structural complexity)
console.log(getFractionCSP3(mol)); // 0.25

// H-bond donors (N-H, O-H)
console.log(getHBondDonorCount(mol)); // 0

// H-bond acceptors (N, O atoms)
console.log(getHBondAcceptorCount(mol)); // 6

// Topological polar surface area (Ų)
// Critical for predicting oral bioavailability and BBB penetration
console.log(getTPSA(mol)); // 61.82

TPSA for Drug Design

TPSA (Topological Polar Surface Area) is essential for predicting drug properties:

import { parseSMILES, getTPSA } from 'openchem';

// Oral bioavailability: TPSA < 140 Ų
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
console.log(getTPSA(aspirin.molecules[0])); // 63.60 ✓ Good oral availability

// Blood-brain barrier penetration: TPSA < 90 Ų
const morphine = parseSMILES('CN1CC[C@]23[C@@H]4[C@H]1CC5=C2C(=C(C=C5)O)O[C@H]3[C@H](C=C4)O');
console.log(getTPSA(morphine.molecules[0])); // 52.93 ✓ CNS-active

Drug-Likeness Rule Checkers

import { 
  parseSMILES, 
  checkLipinskiRuleOfFive, 
  checkVeberRules, 
  checkBBBPenetration 
} from 'openchem';

// Lipinski's Rule of Five (oral drug-likeness)
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const lipinski = checkLipinskiRuleOfFive(aspirin.molecules[0]);
console.log(lipinski.passes); // true
console.log(lipinski.properties);
// { molecularWeight: 180.04, hbondDonors: 1, hbondAcceptors: 4, logP: 1.31 }

// Veber Rules (oral bioavailability)
const veber = checkVeberRules(aspirin.molecules[0]);
console.log(veber.passes); // true
console.log(veber.properties);
// { rotatableBonds: 3, tpsa: 63.60 }

// Blood-brain barrier penetration prediction
const caffeine = parseSMILES('CN1C=NC2=C1C(=O)N(C(=O)N2C)C');
const bbb = checkBBBPenetration(caffeine.molecules[0]);
console.log(bbb.likelyPenetration); // true (TPSA: 61.82 < 90)

Generating SMILES

import { parseSMILES, generateSMILES } from 'openchem';

// Generate canonical SMILES (default)
const input = 'CC(C)CC';
const parsed = parseSMILES(input);
const canonical = generateSMILES(parsed.molecules[0]);
console.log(canonical); // "CCC(C)C" - canonicalized

// Stereo normalization matches RDKit
const trans1 = parseSMILES('C\\C=C\\C'); // trans (down markers)
console.log(generateSMILES(trans1.molecules[0])); // "C/C=C/C" - normalized to up markers

const trans2 = parseSMILES('C/C=C/C'); // trans (up markers)
console.log(generateSMILES(trans2.molecules[0])); // "C/C=C/C" - already normalized

// Generate simple (non-canonical) SMILES
const simple = generateSMILES(parsed.molecules[0], false);
console.log(simple); // "CC(C)CC" - preserves input order

// Explicit canonical generation
const explicitCanonical = generateSMILES(parsed.molecules[0], true);
console.log(explicitCanonical); // "CCC(C)C"

// Handle multiple disconnected molecules
const mixture = parseSMILES('CCO.O'); // ethanol + water
const output = generateSMILES(mixture.molecules);
console.log(output); // "CCO.O"

SVG Rendering

Render molecules as 2D SVG structures with automatic coordinate generation. openchem provides deterministic layouts, fast performance, and excellent handling of rings, branches, and terminal atoms.

Basic SVG Rendering

import { parseSMILES, renderSVG } from 'openchem';

// Render from parsed molecule
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const result = renderSVG(aspirin.molecules[0]);
console.log(result.svg); // SVG string ready for display
console.log(result.width); // Canvas width
console.log(result.height); // Canvas height

// Or render directly from SMILES (if parsing is included)
const renderResult = renderSVG('CCO');
if (renderResult.errors.length === 0) {
  console.log(renderResult.svg);
}

// Render multiple molecules in a grid
const molecules = [
  parseSMILES('CC(=O)O').molecules[0],
  parseSMILES('CCO').molecules[0],
  parseSMILES('CC(C)C').molecules[0]
];
const gridResult = renderSVG(molecules);
console.log(gridResult.svg); // Multi-molecule grid

SVG Rendering Options

import { parseSMILES, renderSVG } from 'openchem';
import type { SVGRendererOptions } from 'openchem';

const benzene = parseSMILES('c1ccccc1');
const mol = benzene.molecules[0];

const options: SVGRendererOptions = {
  // Canvas sizing
  width: 400,
  height: 400,
  padding: 20,
  
  // Bond styling
  bondLineWidth: 2,
  bondLength: 40,
  bondColor: '#000000',
  
  // Atom & text styling
  fontSize: 14,
  fontFamily: 'Arial, sans-serif',
  showCarbonLabels: false, // Hide C labels for cleaner appearance
  showImplicitHydrogens: false, // Hide implicit hydrogens
  
  // Color mapping by element
  atomColors: {
    C: '#222222',
    N: '#3050F8',
    O: '#FF0D0D',
    S: '#E6C200',
    F: '#50FF50',
    Cl: '#1FF01F',
    Br: '#A62929',
    I: '#940094'
  },
  
  // Background
  backgroundColor: '#FFFFFF',
  
  // Stereochemistry display
  showStereoBonds: true,
  
  // Layout & coordinate generation
  kekulize: true, // Convert aromatic to alternating single/double bonds (default: true)
  moleculeSpacing: 60 // Spacing between molecules in grid layouts
};

const result = renderSVG(mol, options);
console.log(result.svg); // Custom-styled SVG

Using Pre-computed Coordinates

import { parseSMILES, renderSVG } from 'openchem';

const ethanol = parseSMILES('CCO');
const mol = ethanol.molecules[0];

// Provide your own atom coordinates (useful for custom layouts)
const customCoords = [
  { x: 0, y: 0 },    // C
  { x: 40, y: 0 },   // C
  { x: 80, y: 0 }    // O
];

const result = renderSVG(mol, {
  atomCoordinates: customCoords,
  width: 200,
  height: 100
});

console.log(result.svg);

Coordinate Generation Features

openchem's coordinate generator provides:

Deterministic layouts — Same molecule always produces same coordinates
Fast performance — Optimized for speed and quality
Perfect terminal atom placement — OH, NH₂, and other terminal groups extend radially
Ring system detection — Automatically detects and regularizes 5/6-membered rings, fused rings, spiro, and bridged systems
Zero atom overlaps — Intelligent substituent placement prevents collisions
Publication-quality output — Clean, chemically accurate 2D structures

import { parseSMILES, renderSVG } from 'openchem';

// Complex fused ring system
const naphthalene = parseSMILES('c1ccc2ccccc2c1');
const result = renderSVG(naphthalene.molecules[0], {
  width: 300,
  height: 300,
  bondLength: 35
});

console.log(result.svg);

Error Handling

import { renderSVG } from 'openchem';

const result = renderSVG('C');
if (result.errors.length > 0) {
  console.error('SVG rendering errors:', result.errors);
} else {
  console.log(result.svg);
}

SMARTS Matching

Match molecular patterns using SMARTS (SMILES Arbitrary Target Specification) notation.

import { parseSMILES, parseSMARTS, matchSMARTS } from 'openchem';

// Parse molecule and SMARTS pattern
const molecule = parseSMILES('CC(=O)Oc1ccccc1C(=O)O'); // aspirin
const pattern = parseSMARTS('[O;D1]'); // Single-bonded oxygen (carbonyl)

// Find matching atoms
const matches = matchSMARTS(molecule.molecules[0], pattern);
console.log(matches.length); // 2 (two carbonyl oxygens)
console.log(matches); // [[2], [7]] (atom indices)

// Example: Find aromatic rings
const aromaticRing = parseSMARTS('c1ccccc1'); // benzene pattern
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const ringMatches = matchSMARTS(aspirin.molecules[0], aromaticRing);
console.log(ringMatches.length); // 1 (one benzene ring)

// Example: Find carboxylic acid groups
const carboxylPattern = parseSMARTS('[C](=O)[O;H1]'); // COOH
const matches2 = matchSMARTS(aspirin.molecules[0], carboxylPattern);
console.log(matches2.length); // 1 (one carboxylic acid)

// Example: Find all heteroatoms
const heteroPattern = parseSMARTS('[!C;!H]'); // Any non-carbon, non-hydrogen
const heteroMatches = matchSMARTS(aspirin.molecules[0], heteroPattern);
console.log(heteroMatches.length); // Number of heteroatoms

Kekulization

Convert aromatic molecules to alternating single/double bond representations (Kekulé structures).

import { parseSMILES, kekulize, generateSMILES } from 'openchem';

// Parse aromatic molecule
const benzene = parseSMILES('c1ccccc1');
const mol = benzene.molecules[0];

// Convert to Kekulé structure
const kekuleMol = kekulize(mol);

// Generate SMILES from Kekulé form
const kekuleSMILES = generateSMILES(kekuleMol);
console.log(kekuleSMILES); // "C1=CC=CC=C1" or similar alternating structure

// SVG rendering automatically kekulizes (unless disabled)
import { renderSVG } from 'openchem';

const result = renderSVG(mol, {
  kekulize: true  // default: true
});
// Rendered SVG shows alternating single/double bonds

LogP Calculation

Calculate LogP (partition coefficient) for predicting lipophilicity and membrane permeability.

import { parseSMILES, computeLogP, crippenLogP } from 'openchem';

const molecules = [
  'CC(=O)Oc1ccccc1C(=O)O',  // aspirin
  'CC(C)Cc1ccc(cc1)C(C)C(=O)O', // ibuprofen
  'CC(=O)Nc1ccc(O)cc1'    // acetaminophen
];

molecules.forEach(smiles => {
  const mol = parseSMILES(smiles).molecules[0];
  
  // Wildman-Crippen method (more accurate)
  const logP = computeLogP(mol);
  console.log(`${smiles.substring(0, 10)}... LogP: ${logP.toFixed(2)}`);
  
  // Alternative: crippenLogP (alias)
  const logP2 = crippenLogP(mol);
  console.log(`  Crippen LogP: ${logP2.toFixed(2)}`);
});

// LogP guidelines for drug design
const caffeine = parseSMILES('CN1C=NC2=C1C(=O)N(C(=O)N2C)C');
const caffeineMol = caffeine.molecules[0];
const logpValue = computeLogP(caffeineMol);

console.log(`Caffeine LogP: ${logpValue.toFixed(2)}`);
if (logpValue > 5) {
  console.log('⚠️ High LogP - may have poor water solubility');
} else if (logpValue < 0) {
  console.log('✓ Good LogP - hydrophilic, good bioavailability');
} else {
  console.log('✓ Optimal LogP - good balance of lipophilicity and hydrophilicity');
}

Molecule Structure

import { parseSMILES } from 'openchem';
import { BondType } from 'openchem';

const result = parseSMILES('C=C');
const mol = result.molecules[0];

// Access atoms
mol.atoms.forEach(atom => {
  console.log(`${atom.symbol} (id: ${atom.id})`);
  console.log(`  Aromatic: ${atom.aromatic}`);
  console.log(`  Charge: ${atom.charge}`);
  console.log(`  Hydrogens: ${atom.hydrogens}`);
});

// Access bonds
mol.bonds.forEach(bond => {
  console.log(`Bond ${bond.atom1}-${bond.atom2}`);
  console.log(`  Type: ${bond.type === BondType.DOUBLE ? 'DOUBLE' : 'SINGLE'}`);
});

Running Tests

# Run all tests (includes RDKit comparisons)
bun test

# Run with Node.js
npm test

# Run specific test file
bun test test/parser.test.ts

Note: RDKit comparison tests require @rdkit/rdkit package. Tests will automatically skip RDKit validations if the package is unavailable. For full validation, ensure you're running tests with Node.js (RDKit's WebAssembly may not work in all Bun versions).

API Reference

Quick Reference

openchem provides 38 functions organized into 8 categories:

Parsing & Generation (8)

parseSMILES - Parse SMILES strings
generateSMILES - Generate canonical/non-canonical SMILES
parseMolfile - Parse MOL files (V2000/V3000)
generateMolfile - Generate MOL files (V2000)
parseSDF - Parse SDF files with properties
writeSDF - Write SDF files with properties
generateInChI - Generate InChI strings from molecules
generateInChIKey - Generate InChIKey strings from molecules

Pattern Matching & Rendering (6)

renderSVG - Render molecules as 2D SVG structures
parseSMARTS - Parse SMARTS pattern strings
matchSMARTS - Find SMARTS pattern matches in molecules
kekulize - Convert aromatic to Kekulé structures
computeMorganFingerprint - Generate Morgan fingerprints from molecules
tanimotoSimilarity - Calculate Tanimoto similarity between fingerprints

Scaffold Analysis (5)

getMurckoScaffold - Extract Murcko scaffold (rings + linkers)
getBemisMurckoFramework - Generic scaffold (all C, single bonds)
getScaffoldTree - Hierarchical scaffold decomposition
getGraphFramework - Pure topology (all atoms → wildcard)
haveSameScaffold - Compare two molecules' scaffolds

Tautomer Analysis (2)

enumerateTautomers - Generate all tautomers with RDKit scoring
canonicalTautomer - Select highest-scoring canonical tautomer

Basic Properties (3)

getMolecularFormula - Hill notation formula
getMolecularMass - Average molecular mass
getExactMass - Exact mass (monoisotopic)

Lipophilicity (3)

computeLogP - Wildman-Crippen partition coefficient
crippenLogP - Alias for computeLogP
logP - Alternative LogP calculation

Structural Properties (8)

getHeavyAtomCount - Non-hydrogen atom count
getHeteroAtomCount - Heteroatom count (N, O, S, etc.)
getRingCount - Total ring count
getAromaticRingCount - Aromatic ring count
getRingInfo - Comprehensive ring information object
getFractionCSP3 - sp³ carbon fraction
getHBondDonorCount - H-bond donor count
getHBondAcceptorCount - H-bond acceptor count

Drug-Likeness (5)

getTPSA - Topological polar surface area
getRotatableBondCount - Rotatable bond count
checkLipinskiRuleOfFive - Lipinski's Rule of Five
checkVeberRules - Veber rules for bioavailability
checkBBBPenetration - Blood-brain barrier prediction

Detailed API Documentation

Parsing & Generation (6 functions)

`parseSMILES(smiles: string): ParseResult`

Parses a SMILES string into molecule structures.

Returns: ParseResult containing:

molecules: Molecule[] — Array of parsed molecules
errors: string[] — Parse/validation errors (empty if successful)

`generateSMILES(input: Molecule | Molecule[], canonical?: boolean): string`

Generates SMILES from molecule structure(s).

Parameters:

input — Single molecule or array of molecules
canonical — Generate canonical SMILES (default: true)

Returns: SMILES string (uses . to separate disconnected molecules)

Canonical SMILES features:

RDKit-compatible atom ordering using modified Morgan algorithm
Automatic E/Z double bond stereo normalization
Deterministic output for identical molecules
Preserves tetrahedral and double bond stereochemistry

`generateMolfile(molecule: Molecule, options?: MolGeneratorOptions): string`

Generates a MOL file (V2000 format) from a molecule structure. Matches RDKit's output structure for compatibility with cheminformatics tools.

Parameters:

molecule — Molecule structure to convert
options — Optional configuration:
- title?: string — Molecule title (default: empty)
- programName?: string — Program name in header (default: "openchem")
- dimensionality?: '2D' | '3D' — Coordinate system (default: "2D")
- comment?: string — Comment line (default: empty)

Returns: MOL file content as string with V2000 format

Features:

V2000 MOL format compatible with RDKit and other tools
2D coordinate generation using circular layout
Proper atom/bond type mapping (aromatic, charged, isotopic)
Stereochemistry support (chiral centers, E/Z double bonds)
Fixed-width formatting matching RDKit output

Example:

import { parseSMILES, generateMolfile } from 'openchem';

const result = parseSMILES('CCO');
const molfile = generateMolfile(result.molecules[0]);
console.log(molfile);
// Output: MOL file with header, atom coordinates, bond connectivity, etc.

`parseMolfile(input: string): MolfileParseResult`

Parses a MOL file (MDL Molfile format) into a molecule structure. Supports both V2000 and V3000 formats with comprehensive validation.

Parameters:

input — MOL file content as a string

Returns: MolfileParseResult containing:

molfile: MolfileData | null — Raw MOL file data structure (or null on critical errors)
molecule: Molecule | null — Parsed molecule with enriched properties (or null on errors)
errors: ParseError[] — Array of parse/validation errors (empty if successful)

Supported formats:

V2000: Classic fixed-width format (most common)
V3000: Extended format with additional features

Validation features:

Validates atom/bond counts match declared values
Checks bond references point to valid atoms
Validates numeric fields (coordinates, counts, bond types)
Detects malformed data (NaN, negative counts, invalid types)
Returns errors without throwing exceptions

Parsed features:

Atom coordinates (2D/3D)
Element symbols (organic and periodic table)
Charges (both atom block and M CHG property)
Isotopes (both mass diff and M ISO property)
Bond types (single, double, triple, aromatic)
Stereochemistry (bond wedges, chiral centers)
Atom mapping (reaction mapping)

Limitations:

SGroups are parsed but not converted to molecule structure
Query atoms/bonds not supported

Example:

import { parseMolfile, generateSMILES } from 'openchem';

const molContent = `
ethanol
  openchem

  3  2  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.2500    1.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
M  END
`;

const result = parseMolfile(molContent);
if (result.errors.length === 0) {
  console.log(result.molecule?.atoms.length); // 3
  console.log(result.molecule?.bonds.length); // 2
  
  // Convert to SMILES
  const smiles = generateSMILES(result.molecule!);
  console.log(smiles); // "CCO"
}

// Error handling
const invalid = parseMolfile('invalid content');
if (invalid.errors.length > 0) {
  console.error('Parse errors:', invalid.errors);
}

Round-trip workflow:

import { parseSMILES, generateMolfile, parseMolfile, generateSMILES } from 'openchem';

// SMILES → MOL → SMILES round-trip
const original = 'CC(=O)O'; // acetic acid
const mol = parseSMILES(original).molecules[0];
const molfile = generateMolfile(mol);
const parsed = parseMolfile(molfile);
const roundtrip = generateSMILES(parsed.molecule!);
console.log(roundtrip); // "CC(=O)O"

`parseSDF(input: string): SDFParseResult`

Parses an SDF (Structure-Data File) into molecule structures with associated properties. SDF files can contain multiple molecules, each with a MOL block and optional property fields.

Parameters:

input — SDF file content as a string

Returns: SDFParseResult containing:

records: SDFRecord[] — Array of parsed records
errors: ParseError[] — Global parse errors (empty if successful)

Record structure (SDFRecord):

molecule: Molecule | null — Parsed molecule (null on parse errors)
molfile: MolfileData | null — Raw MOL file data (null on parse errors)
properties: Record<string, string> — Property name-value pairs
errors: ParseError[] — Record-specific errors (empty if successful)

Features:

Multi-record parsing (splits on $$$$ delimiter)
Property block parsing (> <NAME> format)
Multi-line property values with blank line handling
Empty property names and values
Windows (CRLF) and Unix (LF) line endings
Tolerant parsing: continues after invalid records

Example (single record):

import { parseSDF, generateSMILES } from 'openchem';

const sdfContent = `
  Mrv2311 02102409422D          


  3  2  0  0  0  0            999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.2500    1.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
M  END
>  <ID>
MOL001

>  <NAME>
Ethanol

>  <FORMULA>
C2H6O

$$$$
`;

const result = parseSDF(sdfContent);
if (result.errors.length === 0) {
  const record = result.records[0];
  console.log(record.molecule?.atoms.length); // 3
  console.log(record.properties.ID); // "MOL001"
  console.log(record.properties.NAME); // "Ethanol"
  console.log(record.properties.FORMULA); // "C2H6O"
  
  // Convert to SMILES
  const smiles = generateSMILES(record.molecule!);
  console.log(smiles); // "CCO"
}

// Error handling
if (result.records[0].errors.length > 0) {
  console.error('Record errors:', result.records[0].errors);
}

Example (multiple records):

import { parseSDF } from 'openchem';

const multiRecordSDF = `
  Mrv2311 02102409422D          


  1  0  0  0  0  0            999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
M  END
>  <ID>
1

>  <NAME>
Methane

$$$$

  Mrv2311 02102409422D          


  2  1  0  0  0  0            999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
M  END
>  <ID>
2

>  <NAME>
Ethane

$$$$
`;

const result = parseSDF(multiRecordSDF);
console.log(result.records.length); // 2
console.log(result.records[0].properties.NAME); // "Methane"
console.log(result.records[1].properties.NAME); // "Ethane"

Round-trip workflow:

import { parseSMILES, writeSDF, parseSDF, generateSMILES } from 'openchem';

// SMILES → SDF → SMILES round-trip
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];
const sdfResult = writeSDF({
  molecule: aspirin,
  properties: { NAME: 'aspirin', FORMULA: 'C9H8O4' }
});

const parsed = parseSDF(sdfResult.sdf);
const roundtrip = generateSMILES(parsed.records[0].molecule!);
console.log(roundtrip); // "CC(=O)Oc1ccccc1C(=O)O"
console.log(parsed.records[0].properties.NAME); // "aspirin"

`generateInChI(molecule: Molecule): Promise<string>`

Generates an InChI (International Chemical Identifier) string from a molecule structure. InChI provides a unique, canonical representation of chemical structures that can be used for database lookups and structure comparison.

Returns: Promise resolving to InChI string

Example:

import { parseSMILES, generateInChI } from 'openchem';

const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const inchi = await generateInChI(aspirin.molecules[0]);
console.log(inchi); // "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"

`generateInChIKey(inchi: string): Promise<string>`

Generates an InChIKey (a hashed, fixed-length version of InChI) from an InChI string. InChIKeys are commonly used for database indexing and fast lookups.

Parameters:

inchi — InChI string to convert

Returns: Promise resolving to InChIKey string (27 characters)

Example:

const inchikey = await generateInChIKey(inchi);
console.log(inchikey); // "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"

`writeSDF(records: SDFRecord | SDFRecord[], options?: SDFWriterOptions): SDFWriterResult`

Writes molecules to SDF (Structure-Data File) format. Supports single or multiple records with optional property data. SDF files are commonly used for storing chemical databases and transferring molecular data between cheminformatics tools.

Parameters:

records — Single record or array of records to write
options — Optional configuration (same as MolGeneratorOptions):
- title?: string — Default title for records (default: empty)
- programName?: string — Program name in headers (default: "openchem")
- dimensionality?: '2D' | '3D' — Coordinate system (default: "2D")
- comment?: string — Default comment (default: empty)

Returns: SDFWriterResult containing:

sdf: string — Complete SDF file content
errors: string[] — Any errors encountered (empty if successful)

Record format:

interface SDFRecord {
  molecule: Molecule;
  properties?: Record<string, string | number | boolean>;
}

SDF structure:

MOL block (V2000 format) for each molecule
Property fields (> <NAME>, value, blank line)
Record separator ($$$$)

Example (single molecule):

import { parseSMILES, writeSDF } from 'openchem';

const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const result = writeSDF({
  molecule: aspirin.molecules[0],
  properties: {
    NAME: 'aspirin',
    MOLECULAR_FORMULA: 'C9H8O4',
    MOLECULAR_WEIGHT: 180.042
  }
});

console.log(result.sdf);
// Output: SDF file with MOL block + properties + $$$$

Example (multiple molecules):

import { parseSMILES, writeSDF } from 'openchem';

const drugs = [
  { smiles: 'CC(=O)Oc1ccccc1C(=O)O', name: 'aspirin' },
  { smiles: 'CC(C)Cc1ccc(cc1)C(C)C(=O)O', name: 'ibuprofen' },
  { smiles: 'CC(=O)Nc1ccc(O)cc1', name: 'acetaminophen' }
];

const records = drugs.map(drug => {
  const mol = parseSMILES(drug.smiles).molecules[0];
  return {
    molecule: mol,
    properties: {
      NAME: drug.name,
      SMILES: drug.smiles
    }
  };
});

const result = writeSDF(records, { programName: 'my-drug-tool' });
console.log(result.sdf);
// Output: Multi-record SDF with all 3 molecules

Property formatting:

Strings: Written as-is
Numbers: Converted to strings
Booleans: "true" or "false"
Property names are case-sensitive

Compatibility:

Output compatible with RDKit, OpenBabel, ChemDraw, and other tools
Standard SDF format (V2000 MOL blocks)
Properties follow MDL SDF specification

Pattern Matching & Rendering (4 functions)

`renderSVG(input: string | Molecule | Molecule[] | ParseResult, options?: SVGRendererOptions): SVGRenderResult`

Renders molecules as 2D SVG structures with automatic coordinate generation using webcola collision prevention.

Parameters:

input — SMILES string, single molecule, array of molecules, or ParseResult
options — Optional rendering configuration (see SVGRendererOptions below)

Returns: SVGRenderResult containing:

svg: string — SVG markup ready for display
width: number — Canvas width in pixels
height: number — Canvas height in pixels
errors: string[] — Any rendering errors (empty if successful)

SVGRendererOptions:

width?: number — Canvas width (default: 300)
height?: number — Canvas height (default: 300)
bondLineWidth?: number — Bond line thickness (default: 2)
bondLength?: number — Target bond length in pixels (default: 40)
fontSize?: number — Atom label font size (default: 12)
fontFamily?: string — Font family (default: "Arial, sans-serif")
padding?: number — Canvas padding (default: 20)
showCarbonLabels?: boolean — Show C atom labels (default: false)
showImplicitHydrogens?: boolean — Show implicit hydrogens (default: false)
kekulize?: boolean — Convert aromatic to Kekulé (default: true)
atomColors?: Record<string, string> — Element-specific colors
backgroundColor?: string — Background color (default: "#FFFFFF")
bondColor?: string — Bond color (default: "#000000")
showStereoBonds?: boolean — Show wedge/hash bonds (default: true)
atomCoordinates?: AtomCoordinates[] — Pre-computed coordinates
webcolaIterations?: number — Collision prevention iterations (default: 100)
deterministicChainPlacement?: boolean — Deterministic layouts (default: false)
moleculeSpacing?: number — Space between molecules in grid (default: 60)

Features:

Automatic 2D coordinate generation with collision prevention
Ring regularization for 5 and 6-membered rings
Fused ring system handling
Stereochemistry display (wedge/hash bonds)
Element-specific atom coloring
Publication-quality output

`parseSMARTS(smarts: string): ParseResult`

Parses a SMARTS pattern string into a pattern molecule structure.

Returns: ParseResult containing:

molecules: Molecule[] — Array with pattern molecule
errors: string[] — Parse errors (empty if successful)

SMARTS support:

Logical operators: ! (not), & (and), , (or)
Atom properties: [D1] (degree), [H1] (explicit H), [v3] (valence)
Connectivity: [#6X4] (carbon with degree 4)
Aromatic matching: [c] or [a] (aromatic carbon)

`matchSMARTS(molecule: Molecule, pattern: ParseResult): number[][]`

Finds all matches of a SMARTS pattern in a molecule.

Parameters:

molecule — Target molecule to search
pattern — SMARTS pattern (from parseSMARTS())

Returns: Array of matches, where each match is an array of atom indices

Example:

import { parseSMILES, parseSMARTS, matchSMARTS } from 'openchem';

const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];
const carbonyl = parseSMARTS('[C](=O)').molecules[0];

const matches = matchSMARTS(aspirin, carbonyl);
// matches: [[1, 2], [7, 8]] (two carbonyl groups)

`kekulize(molecule: Molecule): Molecule`

Converts aromatic molecules to alternating single/double bond (Kekulé) representation.

Returns: New molecule with aromatic bonds replaced by alternating single/double bonds

Example:

import { parseSMILES, kekulize, generateSMILES } from 'openchem';

const benzene = parseSMILES('c1ccccc1');
const kek = kekulize(benzene.molecules[0]);
console.log(generateSMILES(kek)); // "C1=CC=CC=C1"

`computeMorganFingerprint(molecule: Molecule, radius?: number, fpSize?: number): Uint8Array`

Generates a Morgan fingerprint (ECFP-like) for molecular similarity searching and compound classification. Uses a modified Morgan algorithm with atom typing and circular neighborhoods.

Parameters:

molecule — Molecule to fingerprint
radius — Fingerprint radius (default: 2, equivalent to ECFP4)
fpSize — Fingerprint size in bits (default: 2048, RDKit standard)

Returns: Uint8Array containing the fingerprint bits

Example:

import { parseSMILES, computeMorganFingerprint } from 'openchem';

const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const fingerprint = computeMorganFingerprint(aspirin.molecules[0], 2, 512);
console.log(fingerprint.length); // 64 (512 bits / 8 bytes)

`tanimotoSimilarity(fp1: Uint8Array, fp2: Uint8Array): number`

Calculates the Tanimoto similarity coefficient between two Morgan fingerprints. Measures structural similarity on a scale from 0 (no similarity) to 1 (identical).

Parameters:

fp1 — First fingerprint
fp2 — Second fingerprint

Returns: Similarity score between 0 and 1

Example:

const similarity = tanimotoSimilarity(fingerprint1, fingerprint2);
console.log(`Similarity: ${(similarity * 100).toFixed(1)}%`);

Scaffold Analysis (5 functions)

`getMurckoScaffold(molecule: Molecule, options?: MurckoOptions): Molecule`

Extracts the Murcko scaffold from a molecule — the core ring systems and linkers connecting them, with all terminal side chains removed. This is the standard scaffold used in medicinal chemistry for compound classification.

Parameters:

molecule — Molecule to analyze
options.includeLinkers — Include linker atoms between rings (default: true)

Returns: New Molecule containing only the scaffold

Example:

import { parseSMILES, getMurckoScaffold, generateSMILES } from 'openchem';

const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O').molecules[0];
const scaffold = getMurckoScaffold(ibuprofen);
console.log(generateSMILES(scaffold)); // "c1ccccc1" - benzene core

`getBemisMurckoFramework(molecule: Molecule): Molecule`

Generates a generic Bemis-Murcko framework — the scaffold with all atoms converted to carbon and all bonds converted to single bonds. Useful for identifying compounds with similar topology but different heteroatom patterns.

Returns: New Molecule with generic framework

Example:

import { parseSMILES, getBemisMurckoFramework, generateSMILES } from 'openchem';

const pyridine = parseSMILES('c1ccncc1').molecules[0];
const framework = getBemisMurckoFramework(pyridine);
console.log(generateSMILES(framework)); // "C1CCCCC1" - cyclohexane

`getScaffoldTree(molecule: Molecule): Molecule[]`

Generates a hierarchical scaffold tree by iteratively removing rings from the Murcko scaffold. Returns scaffolds ordered from most specific (full scaffold) to least specific (single ring).

Returns: Array of Molecule objects representing scaffolds at different levels

Example:

import { parseSMILES, getScaffoldTree, generateSMILES } from 'openchem';

const mol = parseSMILES('c1ccc2ccccc2c1').molecules[0]; // Naphthalene
const tree = getScaffoldTree(mol);
console.log(tree.length); // 2 levels: full naphthalene, then single benzene
tree.forEach((scaffold, idx) => {
  console.log(`Level ${idx}: ${generateSMILES(scaffold)}`);
});

`getGraphFramework(molecule: Molecule): Molecule`

Generates a pure topological framework with all atoms converted to wildcard atoms (*). This represents the molecular graph structure without any atom type information.

Returns: New Molecule with graph framework

Example:

import { parseSMILES, getGraphFramework, generateSMILES } from 'openchem';

const caffeine = parseSMILES('CN1C=NC2=C1C(=O)N(C(=O)N2C)C').molecules[0];
const graph = getGraphFramework(caffeine);
console.log(generateSMILES(graph)); // "*1*=**2=*1*(*)*(*)*2*" - pure topology

`haveSameScaffold(mol1: Molecule, mol2: Molecule): boolean`

Compares two molecules to determine if they share the same Murcko scaffold. Useful for compound series analysis and lead identification.

Returns: true if scaffolds match, false otherwise

Example:

import { parseSMILES, haveSameScaffold } from 'openchem';

const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];
const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O').molecules[0];
console.log(haveSameScaffold(aspirin, ibuprofen)); // true - both benzene scaffold

Tautomer Analysis (2 functions)

`enumerateTautomers(molecule: Molecule, options?: TautomerOptions): TautomerResult[]`

Enumerates all tautomers for a molecule using transform-based enumeration with RDKit-compatible scoring.

Options:

maxTautomers?: number — Maximum tautomers to generate (default: 256)
maxTransforms?: number — Maximum transform operations (default: 1024)
phases?: number[] — Rule phases to apply (default: [1, 2, 3])
useFingerprintDedup?: boolean — Use fingerprint deduplication (default: true)

Returns: Array of TautomerResult objects with:

smiles: string — SMILES representation
molecule: Molecule — Molecule object
score: number — Stability score (higher = more stable)
ruleIds: string[] — Applied transformation rules

Scoring system (RDKit-inspired):

+250 per all-carbon aromatic ring
+100 per heteroaromatic ring
+25 for benzoquinone
+4 for oximes (C=N-OH)
+2 for carbonyls (C=O, N=O, P=O)
-10 per formal charge
-4 for aci-nitro
-1 per H on P, S, Se, Te

Example:

import { parseSMILES, enumerateTautomers } from 'openchem';

const mol = parseSMILES('CC(=O)CC(=O)C').molecules[0]; // acetylacetone
const tautomers = enumerateTautomers(mol, { maxTautomers: 16 });

console.log(`Found ${tautomers.length} tautomers:`);
tautomers.forEach((t, i) => {
  console.log(`${i + 1}. ${t.smiles} (score: ${t.score})`);
});
// 1. CC(=O)CC(=O)C (score: 4) - diketo form
// 2. CC(=O)C=C(C)O (score: 2) - monoenol form
// 3. CC(O)=CC(=O)C (score: 2) - monoenol form

`canonicalTautomer(molecule: Molecule): Molecule`

Selects the canonical (most stable) tautomer based on scoring.

Returns: The highest-scoring tautomer as a Molecule

Example:

import { parseSMILES, canonicalTautomer, generateSMILES } from 'openchem';

const mol = parseSMILES('CC(=O)CC(=O)C').molecules[0];
const canonical = canonicalTautomer(mol);
console.log(generateSMILES(canonical)); // "CC(=O)CC(=O)C" - diketo form preferred

Lipophilicity (3 functions)

`computeLogP(molecule: Molecule): number`

Calculates the LogP (partition coefficient) using the Wildman-Crippen method. LogP predicts lipophilicity and membrane permeability.

Returns: LogP value as a number

Interpretation:

LogP < 0: Hydrophilic (water-loving)
0 ≤ LogP ≤ 5: Optimal range for most drugs
LogP > 5: Lipophilic (fat-loving), may have poor water solubility

Example:

import { parseSMILES, computeLogP } from 'openchem';

const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];
console.log(computeLogP(aspirin)); // 1.31 (good bioavailability)

`crippenLogP(molecule: Molecule): number`

Alias for computeLogP(). Alternative name for the Wildman-Crippen LogP calculation.

`logP(molecule: Molecule): number`

Alternative LogP calculation method. May use different fragment contributions than Crippen.

Basic Properties (3 functions)

`getMolecularFormula(molecule: Molecule): string`

Returns the molecular formula in Hill notation (C first, then H, then alphabetical).

Example: C9H8O4 for aspirin

`getMolecularMass(molecule: Molecule): number`

Returns the molecular mass using average atomic masses from the periodic table.

Example: 180.042 for aspirin

`getExactMass(molecule: Molecule): number`

Returns the exact mass using the most abundant isotope for each element.

Example: 180.042 for aspirin

Structural Properties (7 functions)

`getHeavyAtomCount(molecule: Molecule): number`

Returns the count of non-hydrogen atoms.

Example: 13 for ibuprofen

`getHeteroAtomCount(molecule: Molecule): number`

Returns the count of heteroatoms (any atom except C and H). Includes N, O, S, P, halogens, etc.

Example: 2 for aspirin (2 oxygen atoms in COOH group)

`getRingCount(molecule: Molecule): number`

Returns the total number of rings in the molecule using cycle detection.

Example: 2 for naphthalene (2 fused rings)

`getAromaticRingCount(molecule: Molecule): number`

Returns the number of aromatic rings.

Example: 1 for benzene, 2 for naphthalene

`getRingInfo(molecule: Molecule): RingInformation`

Returns a comprehensive ring information object providing access to SSSR (Smallest Set of Smallest Rings) and ring membership queries. Similar to RDKit's GetRingInfo() functionality.

Methods:

numRings() - Number of rings in SSSR
rings() - Array of rings (each ring is atom ID array)
isAtomInRing(atomIdx) - Check if atom is in any ring
isBondInRing(atom1, atom2) - Check if bond is in any ring
atomRingMembership(atomIdx) - Ring membership count for atom ([Rn] in SMARTS)
atomRings(atomIdx) - All rings containing specific atom
ringAtoms(ringIdx) - Atoms in specific ring
ringBonds(ringIdx) - Bonds in specific ring

Example:

const ringInfo = getRingInfo(mol);
console.log(ringInfo.numRings()); // 2
console.log(ringInfo.isAtomInRing(5)); // true
console.log(ringInfo.atomRingMembership(3)); // 2 (bridgehead atom)

`getFractionCSP3(molecule: Molecule): number`

Returns the fraction of sp³-hybridized carbons (saturated carbons) relative to total carbons. Higher values indicate greater structural complexity and 3D character. Range: 0.0 to 1.0.

Example: 0.25 for caffeine, 0.67 for ibuprofen

`getHBondDonorCount(molecule: Molecule): number`

Returns the count of hydrogen bond donors (N-H and O-H groups).

Example: 1 for aspirin (carboxylic acid O-H), 0 for caffeine

`getHBondAcceptorCount(molecule: Molecule): number`

Returns the count of hydrogen bond acceptors (N and O atoms).

Example: 4 for aspirin, 6 for caffeine

Drug-Likeness Properties (5 functions)

`getTPSA(molecule: Molecule): number`

Returns the Topological Polar Surface Area in Ų (square Ångströms) using the Ertl et al. fragment-based algorithm. TPSA is a key descriptor for predicting drug absorption and bioavailability.

Guidelines:

TPSA < 140 Ų: Good oral bioavailability
TPSA < 90 Ų: Likely blood-brain barrier penetration
TPSA > 140 Ų: Poor membrane permeability

Example: 63.60 for aspirin (good oral availability), 52.93 for morphine (CNS-active)

`getRotatableBondCount(molecule: Molecule): number`

Returns the count of rotatable bonds (single non-ring bonds between non-terminal heavy atoms). Used in Veber rules for predicting oral bioavailability.

Example: 3 for aspirin, 4 for ibuprofen

`checkLipinskiRuleOfFive(molecule: Molecule): LipinskiResult`

Evaluates Lipinski's Rule of Five for oral drug-likeness. Returns result object with:

passes: boolean indicating if all rules pass
violations: array of violation messages
properties: { molecularWeight, hbondDonors, hbondAcceptors, logP }

Rules:

Molecular weight ≤ 500 Da
H-bond donors ≤ 5
H-bond acceptors ≤ 10
LogP ≤ 5

`checkVeberRules(molecule: Molecule): VeberResult`

Evaluates Veber rules for oral bioavailability. Returns result object with:

passes: boolean indicating if all rules pass
violations: array of violation messages
properties: { rotatableBonds, tpsa }

Rules:

Rotatable bonds ≤ 10
TPSA ≤ 140 Ų

`checkBBBPenetration(molecule: Molecule): BBBResult`

Predicts blood-brain barrier penetration. Returns result object with:

likelyPenetration: boolean (true if TPSA < 90 Ų)
tpsa: TPSA value

TypeScript Types

interface Molecule {
  atoms: Atom[];
  bonds: Bond[];
}

interface Atom {
  id: number;
  symbol: string;
  aromatic: boolean;
  hydrogens: number;
  charge: number;
  isotope: number | null;
  chiral: string | null;
  atomClass: number | null;
  isBracket: boolean;
  atomicNumber: number;
}

interface Bond {
  atom1: number;
  atom2: number;
  type: BondType;
  stereo: StereoType;
}

enum BondType {
  SINGLE = 1,
  DOUBLE = 2,
  TRIPLE = 3,
  QUADRUPLE = 4,
  AROMATIC = 5
}

Performance

openchem is designed for production use with real-world performance:

Parsing: ~1-10ms per molecule (depending on complexity)
Generation: ~1-5ms per molecule
Memory: Minimal overhead, compact AST representation
Zero dependencies: No external runtime dependencies

Benchmark with 325 diverse molecules including commercial drugs: Average parse + generate round-trip < 5ms

Architecture

Molecule Enrichment System

openchem uses a post-processing enrichment system that pre-computes expensive molecular properties during parsing. This design significantly improves performance for downstream property queries while maintaining code simplicity.

Why Pre-compute Properties?

Molecular property calculations like ring finding, hybridization determination, and rotatable bond classification are computationally expensive (O(n²) complexity). Without pre-computation:

Redundant calculations: Ring finding would run every time you query ring count, aromatic rings, or check if atoms/bonds are in rings
Performance penalty: Property queries would dominate runtime, especially for drug-likeness checks that need multiple properties
Code complexity: Every property function would need to duplicate expensive logic

The Solution: Compute once during parsing, cache results, use everywhere.

Key Components

types.ts — Extended with optional cached properties on Atom, Bond, and Molecule interfaces
src/utils/molecule-enrichment.ts — Post-processing module that enriches molecules after parsing
src/parser.ts — Calls enrichMolecule() after validation phase at line 451
src/utils/molecular-properties.ts — Uses cached properties when available, falls back to computation

Cached Properties

Atom: degree (neighbor count), isInRing, ringIds[], hybridization (sp/sp²/sp³)
Bond: isInRing, ringIds[], isRotatable
Molecule: rings[][] (all rings as atom IDs), ringInfo (lookup maps)

Performance Impact

Benchmark Results (10,000 molecules, 7 properties each):

Parse time: 1.22 ms/molecule (includes enrichment)
Property query time: 0.006 ms/molecule (0.5% of parse time)
Rotatable bond queries: ~3.1 million ops/second (simple array filter vs 47-line calculation)

Complexity Improvements:

Ring finding: Once per molecule (O(n²)) → subsequent queries O(1)
Rotatable bonds: O(n×m) nested loops → O(n) array filter
Property queries: 200× faster on average

Immutability Contract

Important: Molecules are immutable after parsing. All enriched properties remain valid for the lifetime of the molecule object. This design:

Prevents stale cached properties (no mutation = no invalidation needed)
Enables safe sharing across threads/workers
Simplifies reasoning about molecule state

If you need to modify a molecule, create a new one by parsing updated SMILES.

Design Notes

Ring analysis (analyzeRings()) runs only during enrichment
Downstream property functions check cached values first, fall back to computation if missing
Backward compatible: cached properties are optional (?:) with defensive fallbacks
New code should always use cached properties when available

Edge Cases & Limitations

openchem handles 100% of tested SMILES correctly (325/325 in bulk validation).

Key implementation details:

Stereo normalization: Trans alkenes are automatically normalized to use / (up) markers on both ends to match RDKit's canonical form. For example, C\C=C\C and C/C=C/C both represent trans configuration and canonicalize to C/C=C/C.
Canonical ordering: Atoms are ordered using a modified Morgan algorithm matching RDKit's approach, with tie-breaking by atomic number, degree, and other properties.
Aromatic validation: Aromatic notation (lowercase letters) is accepted as specified in SMILES. The parser validates that aromatic atoms are in rings but accepts aromatic notation without strict Hückel rule enforcement, matching RDKit's behavior for broader compatibility.

This implementation has been validated against RDKit's canonical SMILES output for diverse molecule sets including stereocenters, complex rings, heteroatoms, and 25 commercial pharmaceutical drugs.

OpenSMILES Specification Compliance

openchem implements the OpenSMILES specification with high fidelity while prioritizing RDKit compatibility for real-world interoperability. In specific areas where the OpenSMILES specification provides recommendations rather than strict requirements, openchem follows RDKit's implementation choices to ensure 100% parity with the industry-standard cheminformatics toolkit.

Starting Atom Selection (OpenSMILES Section 4.3.4)

OpenSMILES Recommendation: Start traversal on heteroatoms first, then terminals.

Example preference: OCCC over CCCO for propanol
Rationale: Heteroatoms are "more interesting" chemically

openchem Implementation: Canonical labels first, heteroatoms as tie-breaker.

Example: Both OCCC and CCCO canonicalize to CCCO
Rationale: Ensures 100% deterministic output for identical molecules

Why RDKit's Approach:

Determinism: Canonical labels guarantee the same molecule always produces identical output, regardless of input order
Interoperability: 100% agreement with RDKit enables seamless integration with existing cheminformatics pipelines and databases
Real-world usage: Major chemical databases (PubChem, ChEMBL) prioritize canonical determinism over heteroatom preference
Chemical equivalence: Both OCCC and CCCO represent the same molecule; the output difference is purely cosmetic

Impact: Minimal — affects only the order atoms appear in canonical output, not chemical meaning or validity. All SMILES remain valid OpenSMILES syntax.

Aromatic Perception

OpenSMILES Specification: Recommends strict Hückel rule enforcement (4n+2 π-electrons).

openchem Implementation: Accepts aromatic notation as specified in input; validates aromatic atoms are in rings but does not enforce strict Hückel rules during parsing.

Why RDKit's Approach: Broader compatibility with real-world chemical data where aromaticity may be empirically determined or context-dependent rather than purely theoretical.

Standards Compliance Summary

Feature	OpenSMILES Spec	openchem Implementation	Rationale
Starting atom	Heteroatom preference	Canonical labels first	Deterministic output, RDKit parity
Aromatic validation	Strict Hückel (4n+2)	Permissive ring validation	Real-world compatibility
Stereo normalization	Not specified	Canonical E/Z form	Deterministic stereo representation
Canonical ordering	Modified Morgan recommended	Modified Morgan (RDKit-compatible)	100% RDKit agreement

All deviations are deliberate choices to maximize real-world interoperability while maintaining full compliance with OpenSMILES syntax and semantics. openchem produces valid OpenSMILES that can be read by any compliant parser.

Project Structure

openchem/
├── src/
│   ├── generators/
│   │   ├── mol-generator.ts         # MOL file generation
│   │   ├── sdf-writer.ts            # SDF file writing
│   │   └── smiles-generator.ts      # Canonical SMILES generation
│   ├── parsers/
│   │   ├── bracket-parser.ts        # Bracket notation parser
│   │   ├── molfile-parser.ts        # MOL file parser
│   │   ├── sdf-parser.ts            # SDF file parser
│   │   └── smiles-parser.ts         # SMILES parser
│   ├── utils/
│   │   ├── aromaticity-perceiver.ts # Aromaticity detection
│   │   ├── atom-utils.ts            # Atom helper functions
│   │   ├── bond-utils.ts            # Bond helper functions
│   │   ├── molecular-properties.ts  # Property calculations
│   │   ├── molecule-enrichment.ts   # Post-processing enrichment
│   │   ├── ring-finder.ts           # Ring detection algorithm
│   │   ├── ring-utils.ts            # Ring utilities
│   │   ├── symmetry-detector.ts     # Symmetry analysis
│   │   └── valence-calculator.ts    # Valence validation
│   ├── validators/
│   │   ├── aromaticity-validator.ts # Aromaticity validation
│   │   ├── stereo-validator.ts      # Stereochemistry validation
│   │   └── valence-validator.ts     # Valence checking
│   └── constants.ts                 # Element data and constants
├── test/
│   ├── smiles/                      # SMILES tests (213 tests)
│   │   ├── stereo/                  # Stereo tests (59 tests)
│   │   │   ├── stereo-advanced.test.ts
│   │   │   ├── stereo-extra.test.ts
│   │   │   └── stereo-rings.test.ts
│   │   ├── rdkit-comparison/        # RDKit validation (229 tests)
│   │   │   ├── bond-mismatch-debug.test.ts
│   │   │   ├── failing-cases.test.ts
│   │   │   ├── rdkit-10k.test.ts
│   │   │   ├── rdkit-bulk.test.ts
│   │   │   ├── rdkit-canonical.test.ts
│   │   │   ├── rdkit-comparison.test.ts
│   │   │   ├── rdkit-stereo.test.ts
│   │   │   ├── rdkit-symmetry.test.ts
│   │   │   └── smiles-10k.txt
│   │   ├── smiles-bracket-parser.test.ts
│   │   ├── smiles-extended-stereo.test.ts
│   │   ├── smiles-isotope.test.ts
│   │   ├── smiles-parser-advanced.test.ts
│   │   ├── smiles-parser-basic.test.ts
│   │   ├── smiles-parser-edge-cases.test.ts
│   │   ├── smiles-round-trip.test.ts
│   │   └── smiles-standard-form.test.ts
│   ├── molfile/                     # MOL file tests (57 tests)
│   │   ├── mol-generator.test.ts
│   │   ├── molfile-parser.test.ts
│   │   ├── molfile-roundtrip.test.ts
│   │   ├── rdkit-mol-comparison.test.ts
│   │   └── rdkit-molfile.test.ts
│   ├── sdf/                         # SDF tests (62 tests)
│   │   ├── sdf-parser-integration.test.ts
│   │   ├── sdf-parser-unit.test.ts
│   │   ├── sdf-writer-integration.test.ts
│   │   └── sdf-writer-unit.test.ts
│   ├── unit/
│   │   ├── utils/                   # Utility tests (101 tests)
│   │   │   ├── aromaticity-perceiver.test.ts
│   │   │   ├── atom-utils.test.ts
│   │   │   ├── molecular-properties.test.ts
│   │   │   ├── ring-finder.test.ts
│   │   │   ├── symmetry-detector.test.ts
│   │   │   └── valence-calculator.test.ts
│   │   └── validators/              # Validator tests (not yet created)
│   └── rdkit-comparison/
│       └── rdkit-api-inspect.test.ts # RDKit API inspection (1 test)
├── types.ts                         # TypeScript type definitions
├── index.ts                         # Public API exports
├── package.json
├── tsconfig.json
├── AGENTS.md                        # Agent guidelines
└── README.md

Key Implementation Features

Canonical SMILES Generation

openchem implements RDKit-compatible canonical SMILES generation:

Modified Morgan Algorithm: Atoms are canonically ordered using iterative refinement based on:
- Canonical rank (connectivity signature)
- Atomic number (tie-breaker)
- Degree, isotope, charge
- Neighbor properties
Starting Atom Selection (RDKit-compatible):
- Primary criterion: Canonical label (lowest rank wins)
- Tie-breakers (in order): Heteroatom preference → Terminal atom → Lower degree → Lower charge
- Design choice: Prioritizes canonical labels over heteroatom preference for deterministic output
- Note: The OpenSMILES specification (Section 4.3.4) recommends starting on heteroatoms first (e.g., OCCC over CCCO), but RDKit prioritizes canonical ordering for deterministic behavior
- Result: Both approaches are chemically equivalent; openchem follows RDKit for maximum interoperability
Stereo Normalization: E/Z double bond stereochemistry is normalized to a canonical form:
- Trans (E) alkenes: Both markers pointing up (/) - e.g., C/C=C/C
- Cis (Z) alkenes: Opposing markers (/ and \) - e.g., C/C=C\C
- Ensures equivalent stereo representations canonicalize identically
Deterministic Output: Same molecule always produces the same canonical SMILES, enabling reliable structure comparison and database storage.

Example of RDKit-compatible behavior:

// Both inputs represent the same molecule (hydrogen cyanide)
parseSMILES('C#N');  // → canonical: "C#N" (carbon first)
parseSMILES('N#C');  // → canonical: "C#N" (canonical labels prioritized)

// Both inputs represent the same molecule (propanol)
parseSMILES('OCCC'); // → canonical: "CCCO" (canonical labels prioritized)
parseSMILES('CCCO'); // → canonical: "CCCO"

This implementation achieves 100% agreement with RDKit's canonical output across 325 diverse test molecules including 25 commercial pharmaceutical drugs.

Contributing

Debug Logging Convention

All debug logging (e.g., console.log, console.warn, etc.) must be gated behind the VERBOSE environment variable. This ensures that test and production output remains clean unless explicitly requested. Use:

if (process.env.VERBOSE) {
  console.log('Debug info...');
}

This applies to all source and test files. Never leave direct logging statements that print during normal runs.

We welcome contributions! openchem maintains strict quality standards:

All tests must pass — 610/610 required
RDKit parity required — Canonical SMILES must match RDKit output exactly
Add tests for new features — Test coverage is mandatory
Follow TypeScript conventions — See AGENTS.md for guidelines

To contribute:

# Clone and install
git clone https://github.com/rajeshg/openchem.git
cd openchem
bun install

# Make changes and test
bun test

# Type check
bun run tsc

# Submit PR with tests

Use Cases

openchem is perfect for:

Cheminformatics web applications — Client-side molecule parsing and visualization
Chemical databases — Canonical SMILES, InChI, and fingerprint-based storage and comparison
Molecule editors — Import/export SMILES, MOL, SDF with 2D rendering
Drug discovery tools — Structure representation, property calculation, and similarity searching
Educational software — Teaching chemical notation with interactive 2D visualization
API services — Fast molecule processing, fingerprinting, and property calculation in Node.js

License

MIT

Acknowledgments

openchem builds on the work of several excellent open-source projects:

RDKit — Validated against RDKit for accuracy. Several molecular descriptor algorithms (LabuteASA, LogP, Morgan fingerprints, etc.) are derived from RDKit's C++ implementation. See THIRD-PARTY-LICENSES.md for details.
OPSIN by Daniel Lowe — IUPAC nomenclature data (XML lexicon files in opsin-iupac-data/ and compiled rules in opsin-rules.json) used for parsing and generating IUPAC chemical names. Licensed under MIT.
InChI Trust — InChI identifier generation via WebAssembly build of the official InChI library.
Daylight SMILES & SMARTS — Specification for molecular line notation and pattern matching.

For complete license information and attribution requirements, see THIRD-PARTY-LICENSES.md.

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
.github		.github
.vscode		.vscode
docs		docs
opsin-iupac-data		opsin-iupac-data
packages/mcp		packages/mcp
rules		rules
scripts		scripts
src		src
test		test
.gitignore		.gitignore
.npmignore		.npmignore
.oxlintrc.json		.oxlintrc.json
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
THIRD-PARTY-LICENSES.md		THIRD-PARTY-LICENSES.md
bun.lock		bun.lock
index.ts		index.ts
opencode.json		opencode.json
opsin-rules.json		opsin-rules.json
package.json		package.json
reset.d.ts		reset.d.ts
serve-playground.js		serve-playground.js
smiles-playground.html		smiles-playground.html
tsconfig.build.json		tsconfig.build.json
tsconfig.decl.json		tsconfig.decl.json
tsconfig.json		tsconfig.json
types.ts		types.ts

License

rajeshg/openchem

Folders and files

Latest commit

History

Repository files navigation

openchem

Features

File Format Support

Structure Analysis

Molecular Properties

Visualization

Performance & Quality

Quick Start

HTML Playground

Model Context Protocol (MCP) Server

Quick Start

Connect to Claude Desktop

Available MCP Tools

Documentation

Code Examples

Murcko Scaffolds

Tautomer Enumeration

SVG Rendering

Testing & RDKit comparison

Validation

Installation

Usage

Example Files

Basic Parsing

Running heavy RDKit comparisons

Molecular Properties

Basic Properties

Atom Counts and Structure

Drug-Likeness Properties

TPSA for Drug Design

Drug-Likeness Rule Checkers

Generating SMILES

SVG Rendering

Basic SVG Rendering

SVG Rendering Options

Using Pre-computed Coordinates

Coordinate Generation Features

Error Handling

SMARTS Matching

Kekulization

LogP Calculation

Molecule Structure

Running Tests

API Reference

Quick Reference

Detailed API Documentation

Parsing & Generation (6 functions)

parseSMILES(smiles: string): ParseResult

generateSMILES(input: Molecule | Molecule[], canonical?: boolean): string

generateMolfile(molecule: Molecule, options?: MolGeneratorOptions): string

parseMolfile(input: string): MolfileParseResult

parseSDF(input: string): SDFParseResult

generateInChI(molecule: Molecule): Promise<string>

generateInChIKey(inchi: string): Promise<string>

writeSDF(records: SDFRecord | SDFRecord[], options?: SDFWriterOptions): SDFWriterResult

Pattern Matching & Rendering (4 functions)

renderSVG(input: string | Molecule | Molecule[] | ParseResult, options?: SVGRendererOptions): SVGRenderResult

parseSMARTS(smarts: string): ParseResult

matchSMARTS(molecule: Molecule, pattern: ParseResult): number[][]

kekulize(molecule: Molecule): Molecule

computeMorganFingerprint(molecule: Molecule, radius?: number, fpSize?: number): Uint8Array

tanimotoSimilarity(fp1: Uint8Array, fp2: Uint8Array): number

Scaffold Analysis (5 functions)

getMurckoScaffold(molecule: Molecule, options?: MurckoOptions): Molecule

getBemisMurckoFramework(molecule: Molecule): Molecule

getScaffoldTree(molecule: Molecule): Molecule[]

getGraphFramework(molecule: Molecule): Molecule

haveSameScaffold(mol1: Molecule, mol2: Molecule): boolean

Tautomer Analysis (2 functions)

enumerateTautomers(molecule: Molecule, options?: TautomerOptions): TautomerResult[]

canonicalTautomer(molecule: Molecule): Molecule

Lipophilicity (3 functions)

computeLogP(molecule: Molecule): number

crippenLogP(molecule: Molecule): number

`parseSMILES(smiles: string): ParseResult`

`generateSMILES(input: Molecule | Molecule[], canonical?: boolean): string`

`generateMolfile(molecule: Molecule, options?: MolGeneratorOptions): string`

`parseMolfile(input: string): MolfileParseResult`

`parseSDF(input: string): SDFParseResult`

`generateInChI(molecule: Molecule): Promise<string>`

`generateInChIKey(inchi: string): Promise<string>`

`writeSDF(records: SDFRecord | SDFRecord[], options?: SDFWriterOptions): SDFWriterResult`

`renderSVG(input: string | Molecule | Molecule[] | ParseResult, options?: SVGRendererOptions): SVGRenderResult`

`parseSMARTS(smarts: string): ParseResult`

`matchSMARTS(molecule: Molecule, pattern: ParseResult): number[][]`

`kekulize(molecule: Molecule): Molecule`

`computeMorganFingerprint(molecule: Molecule, radius?: number, fpSize?: number): Uint8Array`

`tanimotoSimilarity(fp1: Uint8Array, fp2: Uint8Array): number`

`getMurckoScaffold(molecule: Molecule, options?: MurckoOptions): Molecule`

`getBemisMurckoFramework(molecule: Molecule): Molecule`

`getScaffoldTree(molecule: Molecule): Molecule[]`

`getGraphFramework(molecule: Molecule): Molecule`

`haveSameScaffold(mol1: Molecule, mol2: Molecule): boolean`

`enumerateTautomers(molecule: Molecule, options?: TautomerOptions): TautomerResult[]`

`canonicalTautomer(molecule: Molecule): Molecule`

`computeLogP(molecule: Molecule): number`

`crippenLogP(molecule: Molecule): number`

`logP(molecule: Molecule): number`

`getMolecularFormula(molecule: Molecule): string`

`getMolecularMass(molecule: Molecule): number`

`getExactMass(molecule: Molecule): number`

`getHeavyAtomCount(molecule: Molecule): number`

`getHeteroAtomCount(molecule: Molecule): number`

`getRingCount(molecule: Molecule): number`

`getAromaticRingCount(molecule: Molecule): number`

`getRingInfo(molecule: Molecule): RingInformation`

`getFractionCSP3(molecule: Molecule): number`

`getHBondDonorCount(molecule: Molecule): number`

`getHBondAcceptorCount(molecule: Molecule): number`

`getTPSA(molecule: Molecule): number`

`getRotatableBondCount(molecule: Molecule): number`

`checkLipinskiRuleOfFive(molecule: Molecule): LipinskiResult`

`checkVeberRules(molecule: Molecule): VeberResult`

`checkBBBPenetration(molecule: Molecule): BBBResult`

Packages