Skip to content

rajeshg/openchem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

openchem

A fast TypeScript / JavaScript chemistry toolkit for working with molecular structures: parsing & generation (SMILES, MOL, SDF), canonicalization, pattern matching (SMARTS), 2D rendering, molecular descriptors, and structural analysis.

Production-ready, TypeScript-first library for cheminformatics — works in both browser and Node.js. openchem keeps a small runtime footprint.

Features

File Format Support

  • SMILES — Parse and generate canonical SMILES with full stereochemistry
  • MOL files — V2000/V3000 format support with 2D coordinate generation
  • SDF files — Multi-molecule files with property data
  • InChI — Generate InChI and InChIKey identifiers
  • IUPAC names — Bidirectional IUPAC ↔ SMILES conversion

Structure Analysis

  • Pattern matching — SMARTS substructure search
  • Fingerprints — Morgan (ECFP) fingerprints with Tanimoto similarity
  • Murcko scaffolds — Extract core scaffolds, generic frameworks, scaffold trees
  • Tautomers — Complete enumeration (25 rules, 100% RDKit coverage) with RDKit-compatible scoring
  • Ring systems — SSSR detection, fused/spiro/bridged classification
  • Aromaticity — Hückel rule perception and kekulization
  • Symmetry — Canonical ordering via modified Morgan algorithm
  • Stereochemistry — Full support for tetrahedral centers, E/Z bonds, extended chirality

Molecular Properties

  • Basic — Formula, mass, atom/bond counts
  • Structural — Valence electrons, amide bonds, spiro/bridgehead atoms, ring classifications
  • Stereochemistry — Specified and unspecified stereocenter counting
  • Drug-likeness — Lipinski's Rule of Five, Veber rules, BBB penetration
  • Descriptors — TPSA, LogP, rotatable bonds, H-bond donors/acceptors
  • Ring analysis — Saturated/aliphatic/heterocyclic ring counts

Visualization

  • 2D rendering — Publication-quality SVG with automatic layout
  • Smart positioning — Overlap-aware fused ring placement
  • Stereochemistry display — Wedge/hash bonds for chirality
  • Customizable — Element colors, bond styles, canvas size

Performance & Quality

  • Fast — Optimized coordinate generation, CSR graph for O(1) lookups
  • 🔬 Accurate — 100% RDKit agreement on canonical SMILES (325/325 molecules)
  • Well-tested — 2,093 passing tests including bulk RDKit comparisons
  • 🎯 Production-ready — Used with real drugs, natural products, edge cases
  • 📦 Lightweight — Minimal dependencies, works in browser and Node.js
  • 🔒 TypeScript-first — Full type safety with excellent IDE support

Quick Start

npm install openchem
# or: bun add openchem
import { parseSMILES, renderSVG, Descriptors } from 'openchem';

// Parse a molecule
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];

// Render as SVG
const svg = renderSVG(aspirin);
console.log(svg.svg); // SVG markup ready for display

// Get all molecular properties at once
const props = Descriptors.all(aspirin);
console.log(props.formula);        // "C9H8O4"
console.log(props.mass);           // 180.16
console.log(props.logP);           // 1.19
console.log(props.lipinskiPass);   // true - aspirin is drug-like!

// Or get specific categories
const drugLike = Descriptors.drugLikeness(aspirin);
console.log(drugLike.lipinski.passes);     // true
console.log(drugLike.lipinski.violations); // []

HTML Playground

openchem includes an interactive HTML playground for testing SMILES parsing, molecular visualization, and descriptor calculation:

# Build the browser bundle and start a local server
bun run serve

# Then open http://localhost:3000/smiles-playground.html in your browser

The playground provides:

  • 2D Structure Visualization — Clean SVG rendering of molecular structures
  • Molecular Descriptors — Formula, mass, TPSA, rotatable bonds, etc.
  • Drug-Likeness Checks — Lipinski's Rule of Five, Veber rules, BBB penetration
  • Interactive Examples — Pre-loaded molecules like aspirin, caffeine, ibuprofen

The playground automatically detects if the full openchem library is available and falls back to approximate calculations if needed.

Note: The HTML playground requires a web server to load the openchem library due to ES module security restrictions. Use bun run serve to start a local server, then open http://localhost:3000/smiles-playground.html in your browser.

Model Context Protocol (MCP) Server

The MCP server for AI assistant integration is now available as a separate package: @openchem/mcp

Quick Start

# Install MCP server
npm install -g @openchem/mcp

# Start server
openchem-mcp

# Server runs on http://localhost:3000

Connect to Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "openchem": {
      "url": "http://localhost:3000/mcp"
    }
  }
}

Restart Claude Desktop and try: "Analyze aspirin using SMILES CC(=O)Oc1ccccc1C(=O)O"

Available MCP Tools

  • analyze — Complete molecular analysis (40+ descriptors, drug-likeness, IUPAC name, optional rendering)
  • compare — Molecular similarity (Morgan fingerprints, Tanimoto similarity, property comparison)
  • search — Substructure matching (SMARTS patterns with match counts and indices)
  • render — 2D structure visualization (publication-quality SVG)
  • convert — Format conversion (canonical SMILES, IUPAC names, Murcko scaffolds)

Documentation

Code Examples

import { parseSMILES, generateSMILES, parseMolfile, generateMolfile, parseSDF, writeSDF } from 'openchem';

// Parse SMILES into molecule structure
const result = parseSMILES('CC(=O)O'); // acetic acid
console.log(result.molecules[0].atoms.length); // 4 atoms
console.log(result.molecules[0].bonds.length); // 3 bonds

// Generate canonical SMILES
const canonical = generateSMILES(result.molecules[0]);
console.log(canonical); // "CC(=O)O"

// Parse MOL file
const molContent = `
acetic acid
  openchem

  4  3  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.2500    1.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.2500   -1.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  2  0  0  0  0
  2  4  1  0  0  0  0
M  END
`;
const molResult = parseMolfile(molContent);
console.log(generateSMILES(molResult.molecule!)); // "CC(=O)O"

// Generate MOL file from SMILES
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const molfile = generateMolfile(aspirin.molecules[0], { title: 'aspirin' });
console.log(molfile); // Full MOL file with coordinates

// Parse SDF file
const sdfContent = `
  Mrv2311 02102409422D          


  3  2  0  0  0  0            999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.2500    1.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
M  END
>  <ID>
MOL001

>  <NAME>
Ethanol

$$$$
`;
const sdfResult = parseSDF(sdfContent);
console.log(sdfResult.records[0].molecule?.atoms.length); // 3
console.log(sdfResult.records[0].properties.NAME); // "Ethanol"

// Generate InChI from molecule const inchi = await generateInChI(aspirin.molecules[0]); console.log(inchi); // "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"

// Generate InChIKey const inchikey = await generateInChIKey(inchi); console.log(inchikey); // "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"


### Morgan Fingerprints and Similarity

```typescript
import { parseSMILES, computeMorganFingerprint, tanimotoSimilarity } from 'openchem';

// Generate fingerprints for similarity comparison
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O');

const fp1 = computeMorganFingerprint(aspirin.molecules[0], 2, 512);
const fp2 = computeMorganFingerprint(ibuprofen.molecules[0], 2, 512);

// Calculate structural similarity
const similarity = tanimotoSimilarity(fp1, fp2);
console.log(`Similarity: ${(similarity * 100).toFixed(1)}%`); // ~45.2%

Murcko Scaffolds

Extract core molecular scaffolds for drug discovery and compound classification:

import { parseSMILES, getMurckoScaffold, getBemisMurckoFramework, generateSMILES } from 'openchem';

// Extract scaffold (rings + linkers, remove side chains)
const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O').molecules[0];
const scaffold = getMurckoScaffold(ibuprofen);
console.log(generateSMILES(scaffold)); // "c1ccc(cc1)" - benzene core

// Get generic framework (all atoms → carbon, all bonds → single)
const framework = getBemisMurckoFramework(ibuprofen);
console.log(generateSMILES(framework)); // "C1CCCCC1" - cyclohexane

// Compare scaffolds of similar drugs
import { haveSameScaffold } from 'openchem';
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];
console.log(haveSameScaffold(ibuprofen, aspirin)); // true - both have benzene scaffold

Applications:

  • Compound library classification
  • Lead series identification
  • Scaffold hopping strategies
  • Fragment-based drug design

Tautomer Enumeration

Enumerate and score tautomers (keto-enol, imine-enamine, amide-imidol, etc.) with RDKit-compatible scoring:

import { parseSMILES, enumerateTautomers, generateSMILES } from 'openchem';

// Enumerate tautomers for acetylacetone (pentane-2,4-dione)
const mol = parseSMILES('CC(=O)CC(=O)C').molecules[0];
const tautomers = enumerateTautomers(mol, { maxTautomers: 16 });

console.log(`Found ${tautomers.length} tautomers:`);
tautomers.forEach((t, i) => {
  console.log(`${i + 1}. ${t.smiles} (score: ${t.score})`);
});

// Get canonical tautomer (highest scoring)
import { canonicalTautomer } from 'openchem';
const canonical = canonicalTautomer(mol);
console.log(`Canonical: ${generateSMILES(canonical)}`);

Supported tautomer types (26 rules, 100% RDKit coverage):

  • 1,3 and 1,5 keto-enol (carbonyl ↔ enol, conjugated systems)
  • Imine-enamine (C=N ↔ C-NH, including aromatic special cases)
  • 1,5/1,7/1,9/1,11 aromatic heteroatom H shift (pyrrole, indole, large heterocycles)
  • Furanone (lactone tautomerism in 5-membered rings)
  • Amide-imidol (N-C=O ↔ N=C-OH)
  • Lactam-lactim (cyclic amide ↔ cyclic imidate)
  • Nitro-aci-nitro, nitroso-oxime, oxim/nitroso via phenol
  • Thione-thiol (C=S ↔ C-SH)
  • Guanidine, tetrazole, imidazole (heterocycle tautomerism)
  • Phosphonic acid, sulfoxide (P/S heteroatom shifts)
  • Edge cases: keten/ynol, cyano/isocyanic acid, formamidinesulfinic acid, isocyanide

Scoring system (RDKit-compatible):

  • +250 per all-carbon aromatic ring (benzene)
  • +100 per heteroaromatic ring (pyridine)
  • +25 for benzoquinone patterns
  • +4 for oximes (C=N-OH)
  • +2 for carbonyls (C=O, N=O, P=O)
  • -10 per formal charge
  • -4 for aci-nitro forms
  • -1 per hydrogen on P, S, Se, Te

Applications:

  • Compound standardization for databases
  • Virtual screening preparation
  • pKa prediction support
  • Tautomer-aware structure searching

SVG Rendering

import { parseSMILES, renderSVG } from 'openchem';

// Render molecule as SVG
const caffeine = parseSMILES('CN1C=NC2=C1C(=O)N(C(=O)N2C)C');
const svgResult = renderSVG(caffeine.molecules[0], {
  width: 300,
  height: 200,
  showCarbonLabels: false,
  bondLength: 30,
});

console.log(svgResult.svg); // Complete SVG markup
console.log(`Canvas: ${svgResult.width}x${svgResult.height}`); // "300x200"

Testing & RDKit comparison

openchem has an extensive test suite (unit, integration, and RDKit comparison tests) that exercises parsing, generation, file round-trips, stereochemistry, aromatic perception, and molecular properties. Rather than rely on fragile hard-coded counts in the README, the project keeps comprehensive automated tests in the test/ folder and runs RDKit parity checks as part of the comparison test suite when RDKit is available.

Highlights:

  • Broad unit and integration coverage across parsers, generators, utils, and validators
  • RDKit comparison tests for canonical SMILES and round-trip fidelity (these run when RDKit is available in the test environment)
  • Tests are designed to be self-contained and to skip RDKit-specific checks when RDKit isn't present in the environment

For maintainers: update and run the test suite with bun test. Use RUN_RDKIT_BULK=1 to enable the heavier RDKit bulk comparisons when you have RDKit available.

Validation

openchem maintains broad automated test coverage across unit, integration, and RDKit comparison tests. The test/ directory contains the authoritative suite; maintainers can run bun test locally and enable the heavier RDKit comparison runs with RUN_RDKIT_BULK=1 when RDKit is available. Tests are designed to validate parsing, generation, round-tripping, stereochemistry, aromatic perception, and molecular properties without requiring hard-coded counts in the README.

Installation

npm install openchem

bun add openchem

pnpm add openchem

Usage

Example Files

For comprehensive working examples, see:

Run any example:

bun run docs/examples/comprehensive-example.ts

Basic Parsing

Running heavy RDKit comparisons

The repository contains two long-running RDKit comparison tests (the 10k SMILES suite and the bulk 300-SMILES suite). These tests are skipped by default to keep regular test runs fast.

To run them set the RUN_RDKIT_BULK environment variable:

# Run heavy RDKit comparisons (rdkit-10k and rdkit-bulk)
RUN_RDKIT_BULK=1 bun test

Add RUN_VERBOSE=1 for more detailed RDKit reporting during the run.

import { parseSMILES } from 'openchem';

// Simple molecule
const ethanol = parseSMILES('CCO');
console.log(ethanol.molecules[0].atoms.length); // 3

// Check for errors
const result = parseSMILES('invalid');
if (result.errors.length > 0) {
  console.error('Parse errors:', result.errors);
}

// Complex molecule with stereochemistry
const lAlanine = parseSMILES('C[C@H](N)C(=O)O');
const chiralCenter = lAlanine.molecules[0].atoms.find(a => a.chiral);
console.log(chiralCenter?.chiral); // '@'

Molecular Properties

openchem provides comprehensive molecular property calculations for drug discovery and cheminformatics applications.

Basic Properties

import { 
  parseSMILES, 
  getMolecularFormula, 
  getMolecularMass, 
  getExactMass 
} from 'openchem';

const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const mol = aspirin.molecules[0];

// Get molecular formula (Hill notation)
const formula = getMolecularFormula(mol);
console.log(formula); // "C9H8O4"

// Get molecular mass (average atomic masses)
const mass = getMolecularMass(mol);
console.log(mass); // 180.042

// Get exact mass (most abundant isotope)
const exactMass = getExactMass(mol);
console.log(exactMass); // 180.042

Atom Counts and Structure

import {
  parseSMILES,
  getHeavyAtomCount,
  getHeteroAtomCount,
  getRingCount,
  getAromaticRingCount,
  getRingInfo
} from 'openchem';

const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O');
const mol = ibuprofen.molecules[0];

// Count heavy atoms (non-hydrogen)
console.log(getHeavyAtomCount(mol)); // 13

// Count heteroatoms (N, O, S, P, halogens, etc.)
console.log(getHeteroAtomCount(mol)); // 2

// Count total rings
console.log(getRingCount(mol)); // 1

// Count aromatic rings
console.log(getAromaticRingCount(mol)); // 1

// Get comprehensive ring information
const ringInfo = getRingInfo(mol);
console.log(ringInfo.numRings()); // 1
console.log(ringInfo.rings()); // [[6,7,8,9,10,11]] - atom IDs in the ring

Drug-Likeness Properties

import { 
  parseSMILES,
  getFractionCSP3,
  getHBondDonorCount,
  getHBondAcceptorCount,
  getTPSA
} from 'openchem';

const caffeine = parseSMILES('CN1C=NC2=C1C(=O)N(C(=O)N2C)C');
const mol = caffeine.molecules[0];

// Fraction of sp3 carbons (structural complexity)
console.log(getFractionCSP3(mol)); // 0.25

// H-bond donors (N-H, O-H)
console.log(getHBondDonorCount(mol)); // 0

// H-bond acceptors (N, O atoms)
console.log(getHBondAcceptorCount(mol)); // 6

// Topological polar surface area (Ų)
// Critical for predicting oral bioavailability and BBB penetration
console.log(getTPSA(mol)); // 61.82

TPSA for Drug Design

TPSA (Topological Polar Surface Area) is essential for predicting drug properties:

import { parseSMILES, getTPSA } from 'openchem';

// Oral bioavailability: TPSA < 140 Ų
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
console.log(getTPSA(aspirin.molecules[0])); // 63.60 ✓ Good oral availability

// Blood-brain barrier penetration: TPSA < 90 Ų
const morphine = parseSMILES('CN1CC[C@]23[C@@H]4[C@H]1CC5=C2C(=C(C=C5)O)O[C@H]3[C@H](C=C4)O');
console.log(getTPSA(morphine.molecules[0])); // 52.93 ✓ CNS-active

Drug-Likeness Rule Checkers

import { 
  parseSMILES, 
  checkLipinskiRuleOfFive, 
  checkVeberRules, 
  checkBBBPenetration 
} from 'openchem';

// Lipinski's Rule of Five (oral drug-likeness)
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const lipinski = checkLipinskiRuleOfFive(aspirin.molecules[0]);
console.log(lipinski.passes); // true
console.log(lipinski.properties);
// { molecularWeight: 180.04, hbondDonors: 1, hbondAcceptors: 4, logP: 1.31 }

// Veber Rules (oral bioavailability)
const veber = checkVeberRules(aspirin.molecules[0]);
console.log(veber.passes); // true
console.log(veber.properties);
// { rotatableBonds: 3, tpsa: 63.60 }

// Blood-brain barrier penetration prediction
const caffeine = parseSMILES('CN1C=NC2=C1C(=O)N(C(=O)N2C)C');
const bbb = checkBBBPenetration(caffeine.molecules[0]);
console.log(bbb.likelyPenetration); // true (TPSA: 61.82 < 90)

Generating SMILES

import { parseSMILES, generateSMILES } from 'openchem';

// Generate canonical SMILES (default)
const input = 'CC(C)CC';
const parsed = parseSMILES(input);
const canonical = generateSMILES(parsed.molecules[0]);
console.log(canonical); // "CCC(C)C" - canonicalized

// Stereo normalization matches RDKit
const trans1 = parseSMILES('C\\C=C\\C'); // trans (down markers)
console.log(generateSMILES(trans1.molecules[0])); // "C/C=C/C" - normalized to up markers

const trans2 = parseSMILES('C/C=C/C'); // trans (up markers)
console.log(generateSMILES(trans2.molecules[0])); // "C/C=C/C" - already normalized

// Generate simple (non-canonical) SMILES
const simple = generateSMILES(parsed.molecules[0], false);
console.log(simple); // "CC(C)CC" - preserves input order

// Explicit canonical generation
const explicitCanonical = generateSMILES(parsed.molecules[0], true);
console.log(explicitCanonical); // "CCC(C)C"

// Handle multiple disconnected molecules
const mixture = parseSMILES('CCO.O'); // ethanol + water
const output = generateSMILES(mixture.molecules);
console.log(output); // "CCO.O"

SVG Rendering

Render molecules as 2D SVG structures with automatic coordinate generation. openchem provides deterministic layouts, fast performance, and excellent handling of rings, branches, and terminal atoms.

Basic SVG Rendering

import { parseSMILES, renderSVG } from 'openchem';

// Render from parsed molecule
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const result = renderSVG(aspirin.molecules[0]);
console.log(result.svg); // SVG string ready for display
console.log(result.width); // Canvas width
console.log(result.height); // Canvas height

// Or render directly from SMILES (if parsing is included)
const renderResult = renderSVG('CCO');
if (renderResult.errors.length === 0) {
  console.log(renderResult.svg);
}

// Render multiple molecules in a grid
const molecules = [
  parseSMILES('CC(=O)O').molecules[0],
  parseSMILES('CCO').molecules[0],
  parseSMILES('CC(C)C').molecules[0]
];
const gridResult = renderSVG(molecules);
console.log(gridResult.svg); // Multi-molecule grid

SVG Rendering Options

import { parseSMILES, renderSVG } from 'openchem';
import type { SVGRendererOptions } from 'openchem';

const benzene = parseSMILES('c1ccccc1');
const mol = benzene.molecules[0];

const options: SVGRendererOptions = {
  // Canvas sizing
  width: 400,
  height: 400,
  padding: 20,
  
  // Bond styling
  bondLineWidth: 2,
  bondLength: 40,
  bondColor: '#000000',
  
  // Atom & text styling
  fontSize: 14,
  fontFamily: 'Arial, sans-serif',
  showCarbonLabels: false, // Hide C labels for cleaner appearance
  showImplicitHydrogens: false, // Hide implicit hydrogens
  
  // Color mapping by element
  atomColors: {
    C: '#222222',
    N: '#3050F8',
    O: '#FF0D0D',
    S: '#E6C200',
    F: '#50FF50',
    Cl: '#1FF01F',
    Br: '#A62929',
    I: '#940094'
  },
  
  // Background
  backgroundColor: '#FFFFFF',
  
  // Stereochemistry display
  showStereoBonds: true,
  
  // Layout & coordinate generation
  kekulize: true, // Convert aromatic to alternating single/double bonds (default: true)
  moleculeSpacing: 60 // Spacing between molecules in grid layouts
};

const result = renderSVG(mol, options);
console.log(result.svg); // Custom-styled SVG

Using Pre-computed Coordinates

import { parseSMILES, renderSVG } from 'openchem';

const ethanol = parseSMILES('CCO');
const mol = ethanol.molecules[0];

// Provide your own atom coordinates (useful for custom layouts)
const customCoords = [
  { x: 0, y: 0 },    // C
  { x: 40, y: 0 },   // C
  { x: 80, y: 0 }    // O
];

const result = renderSVG(mol, {
  atomCoordinates: customCoords,
  width: 200,
  height: 100
});

console.log(result.svg);

Coordinate Generation Features

openchem's coordinate generator provides:

  • Deterministic layouts — Same molecule always produces same coordinates
  • Fast performance — Optimized for speed and quality
  • Perfect terminal atom placement — OH, NH₂, and other terminal groups extend radially
  • Ring system detection — Automatically detects and regularizes 5/6-membered rings, fused rings, spiro, and bridged systems
  • Zero atom overlaps — Intelligent substituent placement prevents collisions
  • Publication-quality output — Clean, chemically accurate 2D structures
import { parseSMILES, renderSVG } from 'openchem';

// Complex fused ring system
const naphthalene = parseSMILES('c1ccc2ccccc2c1');
const result = renderSVG(naphthalene.molecules[0], {
  width: 300,
  height: 300,
  bondLength: 35
});

console.log(result.svg);

Error Handling

import { renderSVG } from 'openchem';

const result = renderSVG('C');
if (result.errors.length > 0) {
  console.error('SVG rendering errors:', result.errors);
} else {
  console.log(result.svg);
}

SMARTS Matching

Match molecular patterns using SMARTS (SMILES Arbitrary Target Specification) notation.

import { parseSMILES, parseSMARTS, matchSMARTS } from 'openchem';

// Parse molecule and SMARTS pattern
const molecule = parseSMILES('CC(=O)Oc1ccccc1C(=O)O'); // aspirin
const pattern = parseSMARTS('[O;D1]'); // Single-bonded oxygen (carbonyl)

// Find matching atoms
const matches = matchSMARTS(molecule.molecules[0], pattern);
console.log(matches.length); // 2 (two carbonyl oxygens)
console.log(matches); // [[2], [7]] (atom indices)

// Example: Find aromatic rings
const aromaticRing = parseSMARTS('c1ccccc1'); // benzene pattern
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const ringMatches = matchSMARTS(aspirin.molecules[0], aromaticRing);
console.log(ringMatches.length); // 1 (one benzene ring)

// Example: Find carboxylic acid groups
const carboxylPattern = parseSMARTS('[C](=O)[O;H1]'); // COOH
const matches2 = matchSMARTS(aspirin.molecules[0], carboxylPattern);
console.log(matches2.length); // 1 (one carboxylic acid)

// Example: Find all heteroatoms
const heteroPattern = parseSMARTS('[!C;!H]'); // Any non-carbon, non-hydrogen
const heteroMatches = matchSMARTS(aspirin.molecules[0], heteroPattern);
console.log(heteroMatches.length); // Number of heteroatoms

Kekulization

Convert aromatic molecules to alternating single/double bond representations (Kekulé structures).

import { parseSMILES, kekulize, generateSMILES } from 'openchem';

// Parse aromatic molecule
const benzene = parseSMILES('c1ccccc1');
const mol = benzene.molecules[0];

// Convert to Kekulé structure
const kekuleMol = kekulize(mol);

// Generate SMILES from Kekulé form
const kekuleSMILES = generateSMILES(kekuleMol);
console.log(kekuleSMILES); // "C1=CC=CC=C1" or similar alternating structure

// SVG rendering automatically kekulizes (unless disabled)
import { renderSVG } from 'openchem';

const result = renderSVG(mol, {
  kekulize: true  // default: true
});
// Rendered SVG shows alternating single/double bonds

LogP Calculation

Calculate LogP (partition coefficient) for predicting lipophilicity and membrane permeability.

import { parseSMILES, computeLogP, crippenLogP } from 'openchem';

const molecules = [
  'CC(=O)Oc1ccccc1C(=O)O',  // aspirin
  'CC(C)Cc1ccc(cc1)C(C)C(=O)O', // ibuprofen
  'CC(=O)Nc1ccc(O)cc1'    // acetaminophen
];

molecules.forEach(smiles => {
  const mol = parseSMILES(smiles).molecules[0];
  
  // Wildman-Crippen method (more accurate)
  const logP = computeLogP(mol);
  console.log(`${smiles.substring(0, 10)}... LogP: ${logP.toFixed(2)}`);
  
  // Alternative: crippenLogP (alias)
  const logP2 = crippenLogP(mol);
  console.log(`  Crippen LogP: ${logP2.toFixed(2)}`);
});

// LogP guidelines for drug design
const caffeine = parseSMILES('CN1C=NC2=C1C(=O)N(C(=O)N2C)C');
const caffeineMol = caffeine.molecules[0];
const logpValue = computeLogP(caffeineMol);

console.log(`Caffeine LogP: ${logpValue.toFixed(2)}`);
if (logpValue > 5) {
  console.log('⚠️ High LogP - may have poor water solubility');
} else if (logpValue < 0) {
  console.log('✓ Good LogP - hydrophilic, good bioavailability');
} else {
  console.log('✓ Optimal LogP - good balance of lipophilicity and hydrophilicity');
}

Molecule Structure

import { parseSMILES } from 'openchem';
import { BondType } from 'openchem';

const result = parseSMILES('C=C');
const mol = result.molecules[0];

// Access atoms
mol.atoms.forEach(atom => {
  console.log(`${atom.symbol} (id: ${atom.id})`);
  console.log(`  Aromatic: ${atom.aromatic}`);
  console.log(`  Charge: ${atom.charge}`);
  console.log(`  Hydrogens: ${atom.hydrogens}`);
});

// Access bonds
mol.bonds.forEach(bond => {
  console.log(`Bond ${bond.atom1}-${bond.atom2}`);
  console.log(`  Type: ${bond.type === BondType.DOUBLE ? 'DOUBLE' : 'SINGLE'}`);
});

Running Tests

# Run all tests (includes RDKit comparisons)
bun test

# Run with Node.js
npm test

# Run specific test file
bun test test/parser.test.ts

Note: RDKit comparison tests require @rdkit/rdkit package. Tests will automatically skip RDKit validations if the package is unavailable. For full validation, ensure you're running tests with Node.js (RDKit's WebAssembly may not work in all Bun versions).

API Reference

Quick Reference

openchem provides 38 functions organized into 8 categories:

Parsing & Generation (8)

  • parseSMILES - Parse SMILES strings
  • generateSMILES - Generate canonical/non-canonical SMILES
  • parseMolfile - Parse MOL files (V2000/V3000)
  • generateMolfile - Generate MOL files (V2000)
  • parseSDF - Parse SDF files with properties
  • writeSDF - Write SDF files with properties
  • generateInChI - Generate InChI strings from molecules
  • generateInChIKey - Generate InChIKey strings from molecules

Pattern Matching & Rendering (6)

  • renderSVG - Render molecules as 2D SVG structures
  • parseSMARTS - Parse SMARTS pattern strings
  • matchSMARTS - Find SMARTS pattern matches in molecules
  • kekulize - Convert aromatic to Kekulé structures
  • computeMorganFingerprint - Generate Morgan fingerprints from molecules
  • tanimotoSimilarity - Calculate Tanimoto similarity between fingerprints

Scaffold Analysis (5)

  • getMurckoScaffold - Extract Murcko scaffold (rings + linkers)
  • getBemisMurckoFramework - Generic scaffold (all C, single bonds)
  • getScaffoldTree - Hierarchical scaffold decomposition
  • getGraphFramework - Pure topology (all atoms → wildcard)
  • haveSameScaffold - Compare two molecules' scaffolds

Tautomer Analysis (2)

  • enumerateTautomers - Generate all tautomers with RDKit scoring
  • canonicalTautomer - Select highest-scoring canonical tautomer

Basic Properties (3)

  • getMolecularFormula - Hill notation formula
  • getMolecularMass - Average molecular mass
  • getExactMass - Exact mass (monoisotopic)

Lipophilicity (3)

  • computeLogP - Wildman-Crippen partition coefficient
  • crippenLogP - Alias for computeLogP
  • logP - Alternative LogP calculation

Structural Properties (8)

  • getHeavyAtomCount - Non-hydrogen atom count
  • getHeteroAtomCount - Heteroatom count (N, O, S, etc.)
  • getRingCount - Total ring count
  • getAromaticRingCount - Aromatic ring count
  • getRingInfo - Comprehensive ring information object
  • getFractionCSP3 - sp³ carbon fraction
  • getHBondDonorCount - H-bond donor count
  • getHBondAcceptorCount - H-bond acceptor count

Drug-Likeness (5)

  • getTPSA - Topological polar surface area
  • getRotatableBondCount - Rotatable bond count
  • checkLipinskiRuleOfFive - Lipinski's Rule of Five
  • checkVeberRules - Veber rules for bioavailability
  • checkBBBPenetration - Blood-brain barrier prediction

Detailed API Documentation

Parsing & Generation (6 functions)

parseSMILES(smiles: string): ParseResult

Parses a SMILES string into molecule structures.

Returns: ParseResult containing:

  • molecules: Molecule[] — Array of parsed molecules
  • errors: string[] — Parse/validation errors (empty if successful)
generateSMILES(input: Molecule | Molecule[], canonical?: boolean): string

Generates SMILES from molecule structure(s).

Parameters:

  • input — Single molecule or array of molecules
  • canonical — Generate canonical SMILES (default: true)

Returns: SMILES string (uses . to separate disconnected molecules)

Canonical SMILES features:

  • RDKit-compatible atom ordering using modified Morgan algorithm
  • Automatic E/Z double bond stereo normalization
  • Deterministic output for identical molecules
  • Preserves tetrahedral and double bond stereochemistry
generateMolfile(molecule: Molecule, options?: MolGeneratorOptions): string

Generates a MOL file (V2000 format) from a molecule structure. Matches RDKit's output structure for compatibility with cheminformatics tools.

Parameters:

  • molecule — Molecule structure to convert
  • options — Optional configuration:
    • title?: string — Molecule title (default: empty)
    • programName?: string — Program name in header (default: "openchem")
    • dimensionality?: '2D' | '3D' — Coordinate system (default: "2D")
    • comment?: string — Comment line (default: empty)

Returns: MOL file content as string with V2000 format

Features:

  • V2000 MOL format compatible with RDKit and other tools
  • 2D coordinate generation using circular layout
  • Proper atom/bond type mapping (aromatic, charged, isotopic)
  • Stereochemistry support (chiral centers, E/Z double bonds)
  • Fixed-width formatting matching RDKit output

Example:

import { parseSMILES, generateMolfile } from 'openchem';

const result = parseSMILES('CCO');
const molfile = generateMolfile(result.molecules[0]);
console.log(molfile);
// Output: MOL file with header, atom coordinates, bond connectivity, etc.
parseMolfile(input: string): MolfileParseResult

Parses a MOL file (MDL Molfile format) into a molecule structure. Supports both V2000 and V3000 formats with comprehensive validation.

Parameters:

  • input — MOL file content as a string

Returns: MolfileParseResult containing:

  • molfile: MolfileData | null — Raw MOL file data structure (or null on critical errors)
  • molecule: Molecule | null — Parsed molecule with enriched properties (or null on errors)
  • errors: ParseError[] — Array of parse/validation errors (empty if successful)

Supported formats:

  • V2000: Classic fixed-width format (most common)
  • V3000: Extended format with additional features

Validation features:

  • Validates atom/bond counts match declared values
  • Checks bond references point to valid atoms
  • Validates numeric fields (coordinates, counts, bond types)
  • Detects malformed data (NaN, negative counts, invalid types)
  • Returns errors without throwing exceptions

Parsed features:

  • Atom coordinates (2D/3D)
  • Element symbols (organic and periodic table)
  • Charges (both atom block and M CHG property)
  • Isotopes (both mass diff and M ISO property)
  • Bond types (single, double, triple, aromatic)
  • Stereochemistry (bond wedges, chiral centers)
  • Atom mapping (reaction mapping)

Limitations:

  • SGroups are parsed but not converted to molecule structure
  • Query atoms/bonds not supported

Example:

import { parseMolfile, generateSMILES } from 'openchem';

const molContent = `
ethanol
  openchem

  3  2  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.2500    1.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
M  END
`;

const result = parseMolfile(molContent);
if (result.errors.length === 0) {
  console.log(result.molecule?.atoms.length); // 3
  console.log(result.molecule?.bonds.length); // 2
  
  // Convert to SMILES
  const smiles = generateSMILES(result.molecule!);
  console.log(smiles); // "CCO"
}

// Error handling
const invalid = parseMolfile('invalid content');
if (invalid.errors.length > 0) {
  console.error('Parse errors:', invalid.errors);
}

Round-trip workflow:

import { parseSMILES, generateMolfile, parseMolfile, generateSMILES } from 'openchem';

// SMILES → MOL → SMILES round-trip
const original = 'CC(=O)O'; // acetic acid
const mol = parseSMILES(original).molecules[0];
const molfile = generateMolfile(mol);
const parsed = parseMolfile(molfile);
const roundtrip = generateSMILES(parsed.molecule!);
console.log(roundtrip); // "CC(=O)O"
parseSDF(input: string): SDFParseResult

Parses an SDF (Structure-Data File) into molecule structures with associated properties. SDF files can contain multiple molecules, each with a MOL block and optional property fields.

Parameters:

  • input — SDF file content as a string

Returns: SDFParseResult containing:

  • records: SDFRecord[] — Array of parsed records
  • errors: ParseError[] — Global parse errors (empty if successful)

Record structure (SDFRecord):

  • molecule: Molecule | null — Parsed molecule (null on parse errors)
  • molfile: MolfileData | null — Raw MOL file data (null on parse errors)
  • properties: Record<string, string> — Property name-value pairs
  • errors: ParseError[] — Record-specific errors (empty if successful)

Features:

  • Multi-record parsing (splits on $$$$ delimiter)
  • Property block parsing (> <NAME> format)
  • Multi-line property values with blank line handling
  • Empty property names and values
  • Windows (CRLF) and Unix (LF) line endings
  • Tolerant parsing: continues after invalid records

Example (single record):

import { parseSDF, generateSMILES } from 'openchem';

const sdfContent = `
  Mrv2311 02102409422D          


  3  2  0  0  0  0            999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.2500    1.2990    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
M  END
>  <ID>
MOL001

>  <NAME>
Ethanol

>  <FORMULA>
C2H6O

$$$$
`;

const result = parseSDF(sdfContent);
if (result.errors.length === 0) {
  const record = result.records[0];
  console.log(record.molecule?.atoms.length); // 3
  console.log(record.properties.ID); // "MOL001"
  console.log(record.properties.NAME); // "Ethanol"
  console.log(record.properties.FORMULA); // "C2H6O"
  
  // Convert to SMILES
  const smiles = generateSMILES(record.molecule!);
  console.log(smiles); // "CCO"
}

// Error handling
if (result.records[0].errors.length > 0) {
  console.error('Record errors:', result.records[0].errors);
}

Example (multiple records):

import { parseSDF } from 'openchem';

const multiRecordSDF = `
  Mrv2311 02102409422D          


  1  0  0  0  0  0            999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
M  END
>  <ID>
1

>  <NAME>
Methane

$$$$

  Mrv2311 02102409422D          


  2  1  0  0  0  0            999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
M  END
>  <ID>
2

>  <NAME>
Ethane

$$$$
`;

const result = parseSDF(multiRecordSDF);
console.log(result.records.length); // 2
console.log(result.records[0].properties.NAME); // "Methane"
console.log(result.records[1].properties.NAME); // "Ethane"

Round-trip workflow:

import { parseSMILES, writeSDF, parseSDF, generateSMILES } from 'openchem';

// SMILES → SDF → SMILES round-trip
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];
const sdfResult = writeSDF({
  molecule: aspirin,
  properties: { NAME: 'aspirin', FORMULA: 'C9H8O4' }
});

const parsed = parseSDF(sdfResult.sdf);
const roundtrip = generateSMILES(parsed.records[0].molecule!);
console.log(roundtrip); // "CC(=O)Oc1ccccc1C(=O)O"
console.log(parsed.records[0].properties.NAME); // "aspirin"
generateInChI(molecule: Molecule): Promise<string>

Generates an InChI (International Chemical Identifier) string from a molecule structure. InChI provides a unique, canonical representation of chemical structures that can be used for database lookups and structure comparison.

Returns: Promise resolving to InChI string

Example:

import { parseSMILES, generateInChI } from 'openchem';

const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const inchi = await generateInChI(aspirin.molecules[0]);
console.log(inchi); // "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"
generateInChIKey(inchi: string): Promise<string>

Generates an InChIKey (a hashed, fixed-length version of InChI) from an InChI string. InChIKeys are commonly used for database indexing and fast lookups.

Parameters:

  • inchi — InChI string to convert

Returns: Promise resolving to InChIKey string (27 characters)

Example:

const inchikey = await generateInChIKey(inchi);
console.log(inchikey); // "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"
writeSDF(records: SDFRecord | SDFRecord[], options?: SDFWriterOptions): SDFWriterResult

Writes molecules to SDF (Structure-Data File) format. Supports single or multiple records with optional property data. SDF files are commonly used for storing chemical databases and transferring molecular data between cheminformatics tools.

Parameters:

  • records — Single record or array of records to write
  • options — Optional configuration (same as MolGeneratorOptions):
    • title?: string — Default title for records (default: empty)
    • programName?: string — Program name in headers (default: "openchem")
    • dimensionality?: '2D' | '3D' — Coordinate system (default: "2D")
    • comment?: string — Default comment (default: empty)

Returns: SDFWriterResult containing:

  • sdf: string — Complete SDF file content
  • errors: string[] — Any errors encountered (empty if successful)

Record format:

interface SDFRecord {
  molecule: Molecule;
  properties?: Record<string, string | number | boolean>;
}

SDF structure:

  • MOL block (V2000 format) for each molecule
  • Property fields (> <NAME>, value, blank line)
  • Record separator ($$$$)

Example (single molecule):

import { parseSMILES, writeSDF } from 'openchem';

const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const result = writeSDF({
  molecule: aspirin.molecules[0],
  properties: {
    NAME: 'aspirin',
    MOLECULAR_FORMULA: 'C9H8O4',
    MOLECULAR_WEIGHT: 180.042
  }
});

console.log(result.sdf);
// Output: SDF file with MOL block + properties + $$$$

Example (multiple molecules):

import { parseSMILES, writeSDF } from 'openchem';

const drugs = [
  { smiles: 'CC(=O)Oc1ccccc1C(=O)O', name: 'aspirin' },
  { smiles: 'CC(C)Cc1ccc(cc1)C(C)C(=O)O', name: 'ibuprofen' },
  { smiles: 'CC(=O)Nc1ccc(O)cc1', name: 'acetaminophen' }
];

const records = drugs.map(drug => {
  const mol = parseSMILES(drug.smiles).molecules[0];
  return {
    molecule: mol,
    properties: {
      NAME: drug.name,
      SMILES: drug.smiles
    }
  };
});

const result = writeSDF(records, { programName: 'my-drug-tool' });
console.log(result.sdf);
// Output: Multi-record SDF with all 3 molecules

Property formatting:

  • Strings: Written as-is
  • Numbers: Converted to strings
  • Booleans: "true" or "false"
  • Property names are case-sensitive

Compatibility:

  • Output compatible with RDKit, OpenBabel, ChemDraw, and other tools
  • Standard SDF format (V2000 MOL blocks)
  • Properties follow MDL SDF specification

Pattern Matching & Rendering (4 functions)

renderSVG(input: string | Molecule | Molecule[] | ParseResult, options?: SVGRendererOptions): SVGRenderResult

Renders molecules as 2D SVG structures with automatic coordinate generation using webcola collision prevention.

Parameters:

  • input — SMILES string, single molecule, array of molecules, or ParseResult
  • options — Optional rendering configuration (see SVGRendererOptions below)

Returns: SVGRenderResult containing:

  • svg: string — SVG markup ready for display
  • width: number — Canvas width in pixels
  • height: number — Canvas height in pixels
  • errors: string[] — Any rendering errors (empty if successful)

SVGRendererOptions:

  • width?: number — Canvas width (default: 300)
  • height?: number — Canvas height (default: 300)
  • bondLineWidth?: number — Bond line thickness (default: 2)
  • bondLength?: number — Target bond length in pixels (default: 40)
  • fontSize?: number — Atom label font size (default: 12)
  • fontFamily?: string — Font family (default: "Arial, sans-serif")
  • padding?: number — Canvas padding (default: 20)
  • showCarbonLabels?: boolean — Show C atom labels (default: false)
  • showImplicitHydrogens?: boolean — Show implicit hydrogens (default: false)
  • kekulize?: boolean — Convert aromatic to Kekulé (default: true)
  • atomColors?: Record<string, string> — Element-specific colors
  • backgroundColor?: string — Background color (default: "#FFFFFF")
  • bondColor?: string — Bond color (default: "#000000")
  • showStereoBonds?: boolean — Show wedge/hash bonds (default: true)
  • atomCoordinates?: AtomCoordinates[] — Pre-computed coordinates
  • webcolaIterations?: number — Collision prevention iterations (default: 100)
  • deterministicChainPlacement?: boolean — Deterministic layouts (default: false)
  • moleculeSpacing?: number — Space between molecules in grid (default: 60)

Features:

  • Automatic 2D coordinate generation with collision prevention
  • Ring regularization for 5 and 6-membered rings
  • Fused ring system handling
  • Stereochemistry display (wedge/hash bonds)
  • Element-specific atom coloring
  • Publication-quality output
parseSMARTS(smarts: string): ParseResult

Parses a SMARTS pattern string into a pattern molecule structure.

Returns: ParseResult containing:

  • molecules: Molecule[] — Array with pattern molecule
  • errors: string[] — Parse errors (empty if successful)

SMARTS support:

  • Logical operators: ! (not), & (and), , (or)
  • Atom properties: [D1] (degree), [H1] (explicit H), [v3] (valence)
  • Connectivity: [#6X4] (carbon with degree 4)
  • Aromatic matching: [c] or [a] (aromatic carbon)
matchSMARTS(molecule: Molecule, pattern: ParseResult): number[][]

Finds all matches of a SMARTS pattern in a molecule.

Parameters:

  • molecule — Target molecule to search
  • pattern — SMARTS pattern (from parseSMARTS())

Returns: Array of matches, where each match is an array of atom indices

Example:

import { parseSMILES, parseSMARTS, matchSMARTS } from 'openchem';

const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];
const carbonyl = parseSMARTS('[C](=O)').molecules[0];

const matches = matchSMARTS(aspirin, carbonyl);
// matches: [[1, 2], [7, 8]] (two carbonyl groups)
kekulize(molecule: Molecule): Molecule

Converts aromatic molecules to alternating single/double bond (Kekulé) representation.

Returns: New molecule with aromatic bonds replaced by alternating single/double bonds

Example:

import { parseSMILES, kekulize, generateSMILES } from 'openchem';

const benzene = parseSMILES('c1ccccc1');
const kek = kekulize(benzene.molecules[0]);
console.log(generateSMILES(kek)); // "C1=CC=CC=C1"
computeMorganFingerprint(molecule: Molecule, radius?: number, fpSize?: number): Uint8Array

Generates a Morgan fingerprint (ECFP-like) for molecular similarity searching and compound classification. Uses a modified Morgan algorithm with atom typing and circular neighborhoods.

Parameters:

  • molecule — Molecule to fingerprint
  • radius — Fingerprint radius (default: 2, equivalent to ECFP4)
  • fpSize — Fingerprint size in bits (default: 2048, RDKit standard)

Returns: Uint8Array containing the fingerprint bits

Example:

import { parseSMILES, computeMorganFingerprint } from 'openchem';

const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const fingerprint = computeMorganFingerprint(aspirin.molecules[0], 2, 512);
console.log(fingerprint.length); // 64 (512 bits / 8 bytes)
tanimotoSimilarity(fp1: Uint8Array, fp2: Uint8Array): number

Calculates the Tanimoto similarity coefficient between two Morgan fingerprints. Measures structural similarity on a scale from 0 (no similarity) to 1 (identical).

Parameters:

  • fp1 — First fingerprint
  • fp2 — Second fingerprint

Returns: Similarity score between 0 and 1

Example:

const similarity = tanimotoSimilarity(fingerprint1, fingerprint2);
console.log(`Similarity: ${(similarity * 100).toFixed(1)}%`);

Scaffold Analysis (5 functions)

getMurckoScaffold(molecule: Molecule, options?: MurckoOptions): Molecule

Extracts the Murcko scaffold from a molecule — the core ring systems and linkers connecting them, with all terminal side chains removed. This is the standard scaffold used in medicinal chemistry for compound classification.

Parameters:

  • molecule — Molecule to analyze
  • options.includeLinkers — Include linker atoms between rings (default: true)

Returns: New Molecule containing only the scaffold

Example:

import { parseSMILES, getMurckoScaffold, generateSMILES } from 'openchem';

const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O').molecules[0];
const scaffold = getMurckoScaffold(ibuprofen);
console.log(generateSMILES(scaffold)); // "c1ccccc1" - benzene core
getBemisMurckoFramework(molecule: Molecule): Molecule

Generates a generic Bemis-Murcko framework — the scaffold with all atoms converted to carbon and all bonds converted to single bonds. Useful for identifying compounds with similar topology but different heteroatom patterns.

Returns: New Molecule with generic framework

Example:

import { parseSMILES, getBemisMurckoFramework, generateSMILES } from 'openchem';

const pyridine = parseSMILES('c1ccncc1').molecules[0];
const framework = getBemisMurckoFramework(pyridine);
console.log(generateSMILES(framework)); // "C1CCCCC1" - cyclohexane
getScaffoldTree(molecule: Molecule): Molecule[]

Generates a hierarchical scaffold tree by iteratively removing rings from the Murcko scaffold. Returns scaffolds ordered from most specific (full scaffold) to least specific (single ring).

Returns: Array of Molecule objects representing scaffolds at different levels

Example:

import { parseSMILES, getScaffoldTree, generateSMILES } from 'openchem';

const mol = parseSMILES('c1ccc2ccccc2c1').molecules[0]; // Naphthalene
const tree = getScaffoldTree(mol);
console.log(tree.length); // 2 levels: full naphthalene, then single benzene
tree.forEach((scaffold, idx) => {
  console.log(`Level ${idx}: ${generateSMILES(scaffold)}`);
});
getGraphFramework(molecule: Molecule): Molecule

Generates a pure topological framework with all atoms converted to wildcard atoms (*). This represents the molecular graph structure without any atom type information.

Returns: New Molecule with graph framework

Example:

import { parseSMILES, getGraphFramework, generateSMILES } from 'openchem';

const caffeine = parseSMILES('CN1C=NC2=C1C(=O)N(C(=O)N2C)C').molecules[0];
const graph = getGraphFramework(caffeine);
console.log(generateSMILES(graph)); // "*1*=**2=*1*(*)*(*)*2*" - pure topology
haveSameScaffold(mol1: Molecule, mol2: Molecule): boolean

Compares two molecules to determine if they share the same Murcko scaffold. Useful for compound series analysis and lead identification.

Returns: true if scaffolds match, false otherwise

Example:

import { parseSMILES, haveSameScaffold } from 'openchem';

const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];
const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O').molecules[0];
console.log(haveSameScaffold(aspirin, ibuprofen)); // true - both benzene scaffold

Tautomer Analysis (2 functions)

enumerateTautomers(molecule: Molecule, options?: TautomerOptions): TautomerResult[]

Enumerates all tautomers for a molecule using transform-based enumeration with RDKit-compatible scoring.

Options:

  • maxTautomers?: number — Maximum tautomers to generate (default: 256)
  • maxTransforms?: number — Maximum transform operations (default: 1024)
  • phases?: number[] — Rule phases to apply (default: [1, 2, 3])
  • useFingerprintDedup?: boolean — Use fingerprint deduplication (default: true)

Returns: Array of TautomerResult objects with:

  • smiles: string — SMILES representation
  • molecule: Molecule — Molecule object
  • score: number — Stability score (higher = more stable)
  • ruleIds: string[] — Applied transformation rules

Scoring system (RDKit-inspired):

  • +250 per all-carbon aromatic ring
  • +100 per heteroaromatic ring
  • +25 for benzoquinone
  • +4 for oximes (C=N-OH)
  • +2 for carbonyls (C=O, N=O, P=O)
  • -10 per formal charge
  • -4 for aci-nitro
  • -1 per H on P, S, Se, Te

Example:

import { parseSMILES, enumerateTautomers } from 'openchem';

const mol = parseSMILES('CC(=O)CC(=O)C').molecules[0]; // acetylacetone
const tautomers = enumerateTautomers(mol, { maxTautomers: 16 });

console.log(`Found ${tautomers.length} tautomers:`);
tautomers.forEach((t, i) => {
  console.log(`${i + 1}. ${t.smiles} (score: ${t.score})`);
});
// 1. CC(=O)CC(=O)C (score: 4) - diketo form
// 2. CC(=O)C=C(C)O (score: 2) - monoenol form
// 3. CC(O)=CC(=O)C (score: 2) - monoenol form
canonicalTautomer(molecule: Molecule): Molecule

Selects the canonical (most stable) tautomer based on scoring.

Returns: The highest-scoring tautomer as a Molecule

Example:

import { parseSMILES, canonicalTautomer, generateSMILES } from 'openchem';

const mol = parseSMILES('CC(=O)CC(=O)C').molecules[0];
const canonical = canonicalTautomer(mol);
console.log(generateSMILES(canonical)); // "CC(=O)CC(=O)C" - diketo form preferred

Lipophilicity (3 functions)

computeLogP(molecule: Molecule): number

Calculates the LogP (partition coefficient) using the Wildman-Crippen method. LogP predicts lipophilicity and membrane permeability.

Returns: LogP value as a number

Interpretation:

  • LogP < 0: Hydrophilic (water-loving)
  • 0 ≤ LogP ≤ 5: Optimal range for most drugs
  • LogP > 5: Lipophilic (fat-loving), may have poor water solubility

Example:

import { parseSMILES, computeLogP } from 'openchem';

const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O').molecules[0];
console.log(computeLogP(aspirin)); // 1.31 (good bioavailability)
crippenLogP(molecule: Molecule): number

Alias for computeLogP(). Alternative name for the Wildman-Crippen LogP calculation.

logP(molecule: Molecule): number

Alternative LogP calculation method. May use different fragment contributions than Crippen.


Basic Properties (3 functions)

getMolecularFormula(molecule: Molecule): string

Returns the molecular formula in Hill notation (C first, then H, then alphabetical).

Example: C9H8O4 for aspirin

getMolecularMass(molecule: Molecule): number

Returns the molecular mass using average atomic masses from the periodic table.

Example: 180.042 for aspirin

getExactMass(molecule: Molecule): number

Returns the exact mass using the most abundant isotope for each element.

Example: 180.042 for aspirin


Structural Properties (7 functions)

getHeavyAtomCount(molecule: Molecule): number

Returns the count of non-hydrogen atoms.

Example: 13 for ibuprofen

getHeteroAtomCount(molecule: Molecule): number

Returns the count of heteroatoms (any atom except C and H). Includes N, O, S, P, halogens, etc.

Example: 2 for aspirin (2 oxygen atoms in COOH group)

getRingCount(molecule: Molecule): number

Returns the total number of rings in the molecule using cycle detection.

Example: 2 for naphthalene (2 fused rings)

getAromaticRingCount(molecule: Molecule): number

Returns the number of aromatic rings.

Example: 1 for benzene, 2 for naphthalene

getRingInfo(molecule: Molecule): RingInformation

Returns a comprehensive ring information object providing access to SSSR (Smallest Set of Smallest Rings) and ring membership queries. Similar to RDKit's GetRingInfo() functionality.

Methods:

  • numRings() - Number of rings in SSSR
  • rings() - Array of rings (each ring is atom ID array)
  • isAtomInRing(atomIdx) - Check if atom is in any ring
  • isBondInRing(atom1, atom2) - Check if bond is in any ring
  • atomRingMembership(atomIdx) - Ring membership count for atom ([Rn] in SMARTS)
  • atomRings(atomIdx) - All rings containing specific atom
  • ringAtoms(ringIdx) - Atoms in specific ring
  • ringBonds(ringIdx) - Bonds in specific ring

Example:

const ringInfo = getRingInfo(mol);
console.log(ringInfo.numRings()); // 2
console.log(ringInfo.isAtomInRing(5)); // true
console.log(ringInfo.atomRingMembership(3)); // 2 (bridgehead atom)
getFractionCSP3(molecule: Molecule): number

Returns the fraction of sp³-hybridized carbons (saturated carbons) relative to total carbons. Higher values indicate greater structural complexity and 3D character. Range: 0.0 to 1.0.

Example: 0.25 for caffeine, 0.67 for ibuprofen

getHBondDonorCount(molecule: Molecule): number

Returns the count of hydrogen bond donors (N-H and O-H groups).

Example: 1 for aspirin (carboxylic acid O-H), 0 for caffeine

getHBondAcceptorCount(molecule: Molecule): number

Returns the count of hydrogen bond acceptors (N and O atoms).

Example: 4 for aspirin, 6 for caffeine


Drug-Likeness Properties (5 functions)

getTPSA(molecule: Molecule): number

Returns the Topological Polar Surface Area in Ų (square Ångströms) using the Ertl et al. fragment-based algorithm. TPSA is a key descriptor for predicting drug absorption and bioavailability.

Guidelines:

  • TPSA < 140 Ų: Good oral bioavailability
  • TPSA < 90 Ų: Likely blood-brain barrier penetration
  • TPSA > 140 Ų: Poor membrane permeability

Example: 63.60 for aspirin (good oral availability), 52.93 for morphine (CNS-active)

getRotatableBondCount(molecule: Molecule): number

Returns the count of rotatable bonds (single non-ring bonds between non-terminal heavy atoms). Used in Veber rules for predicting oral bioavailability.

Example: 3 for aspirin, 4 for ibuprofen

checkLipinskiRuleOfFive(molecule: Molecule): LipinskiResult

Evaluates Lipinski's Rule of Five for oral drug-likeness. Returns result object with:

  • passes: boolean indicating if all rules pass
  • violations: array of violation messages
  • properties: { molecularWeight, hbondDonors, hbondAcceptors, logP }

Rules:

  • Molecular weight ≤ 500 Da
  • H-bond donors ≤ 5
  • H-bond acceptors ≤ 10
  • LogP ≤ 5
checkVeberRules(molecule: Molecule): VeberResult

Evaluates Veber rules for oral bioavailability. Returns result object with:

  • passes: boolean indicating if all rules pass
  • violations: array of violation messages
  • properties: { rotatableBonds, tpsa }

Rules:

  • Rotatable bonds ≤ 10
  • TPSA ≤ 140 Ų
checkBBBPenetration(molecule: Molecule): BBBResult

Predicts blood-brain barrier penetration. Returns result object with:

  • likelyPenetration: boolean (true if TPSA < 90 Ų)
  • tpsa: TPSA value

TypeScript Types

interface Molecule {
  atoms: Atom[];
  bonds: Bond[];
}

interface Atom {
  id: number;
  symbol: string;
  aromatic: boolean;
  hydrogens: number;
  charge: number;
  isotope: number | null;
  chiral: string | null;
  atomClass: number | null;
  isBracket: boolean;
  atomicNumber: number;
}

interface Bond {
  atom1: number;
  atom2: number;
  type: BondType;
  stereo: StereoType;
}

enum BondType {
  SINGLE = 1,
  DOUBLE = 2,
  TRIPLE = 3,
  QUADRUPLE = 4,
  AROMATIC = 5
}

Performance

openchem is designed for production use with real-world performance:

  • Parsing: ~1-10ms per molecule (depending on complexity)
  • Generation: ~1-5ms per molecule
  • Memory: Minimal overhead, compact AST representation
  • Zero dependencies: No external runtime dependencies

Benchmark with 325 diverse molecules including commercial drugs: Average parse + generate round-trip < 5ms

Architecture

Molecule Enrichment System

openchem uses a post-processing enrichment system that pre-computes expensive molecular properties during parsing. This design significantly improves performance for downstream property queries while maintaining code simplicity.

Why Pre-compute Properties?

Molecular property calculations like ring finding, hybridization determination, and rotatable bond classification are computationally expensive (O(n²) complexity). Without pre-computation:

  1. Redundant calculations: Ring finding would run every time you query ring count, aromatic rings, or check if atoms/bonds are in rings
  2. Performance penalty: Property queries would dominate runtime, especially for drug-likeness checks that need multiple properties
  3. Code complexity: Every property function would need to duplicate expensive logic

The Solution: Compute once during parsing, cache results, use everywhere.

Key Components

  • types.ts — Extended with optional cached properties on Atom, Bond, and Molecule interfaces
  • src/utils/molecule-enrichment.ts — Post-processing module that enriches molecules after parsing
  • src/parser.ts — Calls enrichMolecule() after validation phase at line 451
  • src/utils/molecular-properties.ts — Uses cached properties when available, falls back to computation

Cached Properties

  • Atom: degree (neighbor count), isInRing, ringIds[], hybridization (sp/sp²/sp³)
  • Bond: isInRing, ringIds[], isRotatable
  • Molecule: rings[][] (all rings as atom IDs), ringInfo (lookup maps)

Performance Impact

Benchmark Results (10,000 molecules, 7 properties each):

  • Parse time: 1.22 ms/molecule (includes enrichment)
  • Property query time: 0.006 ms/molecule (0.5% of parse time)
  • Rotatable bond queries: ~3.1 million ops/second (simple array filter vs 47-line calculation)

Complexity Improvements:

  • Ring finding: Once per molecule (O(n²)) → subsequent queries O(1)
  • Rotatable bonds: O(n×m) nested loops → O(n) array filter
  • Property queries: 200× faster on average

Immutability Contract

Important: Molecules are immutable after parsing. All enriched properties remain valid for the lifetime of the molecule object. This design:

  • Prevents stale cached properties (no mutation = no invalidation needed)
  • Enables safe sharing across threads/workers
  • Simplifies reasoning about molecule state

If you need to modify a molecule, create a new one by parsing updated SMILES.

Design Notes

  • Ring analysis (analyzeRings()) runs only during enrichment
  • Downstream property functions check cached values first, fall back to computation if missing
  • Backward compatible: cached properties are optional (?:) with defensive fallbacks
  • New code should always use cached properties when available

Edge Cases & Limitations

openchem handles 100% of tested SMILES correctly (325/325 in bulk validation).

Key implementation details:

  • Stereo normalization: Trans alkenes are automatically normalized to use / (up) markers on both ends to match RDKit's canonical form. For example, C\C=C\C and C/C=C/C both represent trans configuration and canonicalize to C/C=C/C.

  • Canonical ordering: Atoms are ordered using a modified Morgan algorithm matching RDKit's approach, with tie-breaking by atomic number, degree, and other properties.

  • Aromatic validation: Aromatic notation (lowercase letters) is accepted as specified in SMILES. The parser validates that aromatic atoms are in rings but accepts aromatic notation without strict Hückel rule enforcement, matching RDKit's behavior for broader compatibility.

This implementation has been validated against RDKit's canonical SMILES output for diverse molecule sets including stereocenters, complex rings, heteroatoms, and 25 commercial pharmaceutical drugs.

OpenSMILES Specification Compliance

openchem implements the OpenSMILES specification with high fidelity while prioritizing RDKit compatibility for real-world interoperability. In specific areas where the OpenSMILES specification provides recommendations rather than strict requirements, openchem follows RDKit's implementation choices to ensure 100% parity with the industry-standard cheminformatics toolkit.

Starting Atom Selection (OpenSMILES Section 4.3.4)

OpenSMILES Recommendation: Start traversal on heteroatoms first, then terminals.

  • Example preference: OCCC over CCCO for propanol
  • Rationale: Heteroatoms are "more interesting" chemically

openchem Implementation: Canonical labels first, heteroatoms as tie-breaker.

  • Example: Both OCCC and CCCO canonicalize to CCCO
  • Rationale: Ensures 100% deterministic output for identical molecules

Why RDKit's Approach:

  1. Determinism: Canonical labels guarantee the same molecule always produces identical output, regardless of input order
  2. Interoperability: 100% agreement with RDKit enables seamless integration with existing cheminformatics pipelines and databases
  3. Real-world usage: Major chemical databases (PubChem, ChEMBL) prioritize canonical determinism over heteroatom preference
  4. Chemical equivalence: Both OCCC and CCCO represent the same molecule; the output difference is purely cosmetic

Impact: Minimal — affects only the order atoms appear in canonical output, not chemical meaning or validity. All SMILES remain valid OpenSMILES syntax.

Aromatic Perception

OpenSMILES Specification: Recommends strict Hückel rule enforcement (4n+2 π-electrons).

openchem Implementation: Accepts aromatic notation as specified in input; validates aromatic atoms are in rings but does not enforce strict Hückel rules during parsing.

Why RDKit's Approach: Broader compatibility with real-world chemical data where aromaticity may be empirically determined or context-dependent rather than purely theoretical.

Standards Compliance Summary

Feature OpenSMILES Spec openchem Implementation Rationale
Starting atom Heteroatom preference Canonical labels first Deterministic output, RDKit parity
Aromatic validation Strict Hückel (4n+2) Permissive ring validation Real-world compatibility
Stereo normalization Not specified Canonical E/Z form Deterministic stereo representation
Canonical ordering Modified Morgan recommended Modified Morgan (RDKit-compatible) 100% RDKit agreement

All deviations are deliberate choices to maximize real-world interoperability while maintaining full compliance with OpenSMILES syntax and semantics. openchem produces valid OpenSMILES that can be read by any compliant parser.

Project Structure

openchem/
├── src/
│   ├── generators/
│   │   ├── mol-generator.ts         # MOL file generation
│   │   ├── sdf-writer.ts            # SDF file writing
│   │   └── smiles-generator.ts      # Canonical SMILES generation
│   ├── parsers/
│   │   ├── bracket-parser.ts        # Bracket notation parser
│   │   ├── molfile-parser.ts        # MOL file parser
│   │   ├── sdf-parser.ts            # SDF file parser
│   │   └── smiles-parser.ts         # SMILES parser
│   ├── utils/
│   │   ├── aromaticity-perceiver.ts # Aromaticity detection
│   │   ├── atom-utils.ts            # Atom helper functions
│   │   ├── bond-utils.ts            # Bond helper functions
│   │   ├── molecular-properties.ts  # Property calculations
│   │   ├── molecule-enrichment.ts   # Post-processing enrichment
│   │   ├── ring-finder.ts           # Ring detection algorithm
│   │   ├── ring-utils.ts            # Ring utilities
│   │   ├── symmetry-detector.ts     # Symmetry analysis
│   │   └── valence-calculator.ts    # Valence validation
│   ├── validators/
│   │   ├── aromaticity-validator.ts # Aromaticity validation
│   │   ├── stereo-validator.ts      # Stereochemistry validation
│   │   └── valence-validator.ts     # Valence checking
│   └── constants.ts                 # Element data and constants
├── test/
│   ├── smiles/                      # SMILES tests (213 tests)
│   │   ├── stereo/                  # Stereo tests (59 tests)
│   │   │   ├── stereo-advanced.test.ts
│   │   │   ├── stereo-extra.test.ts
│   │   │   └── stereo-rings.test.ts
│   │   ├── rdkit-comparison/        # RDKit validation (229 tests)
│   │   │   ├── bond-mismatch-debug.test.ts
│   │   │   ├── failing-cases.test.ts
│   │   │   ├── rdkit-10k.test.ts
│   │   │   ├── rdkit-bulk.test.ts
│   │   │   ├── rdkit-canonical.test.ts
│   │   │   ├── rdkit-comparison.test.ts
│   │   │   ├── rdkit-stereo.test.ts
│   │   │   ├── rdkit-symmetry.test.ts
│   │   │   └── smiles-10k.txt
│   │   ├── smiles-bracket-parser.test.ts
│   │   ├── smiles-extended-stereo.test.ts
│   │   ├── smiles-isotope.test.ts
│   │   ├── smiles-parser-advanced.test.ts
│   │   ├── smiles-parser-basic.test.ts
│   │   ├── smiles-parser-edge-cases.test.ts
│   │   ├── smiles-round-trip.test.ts
│   │   └── smiles-standard-form.test.ts
│   ├── molfile/                     # MOL file tests (57 tests)
│   │   ├── mol-generator.test.ts
│   │   ├── molfile-parser.test.ts
│   │   ├── molfile-roundtrip.test.ts
│   │   ├── rdkit-mol-comparison.test.ts
│   │   └── rdkit-molfile.test.ts
│   ├── sdf/                         # SDF tests (62 tests)
│   │   ├── sdf-parser-integration.test.ts
│   │   ├── sdf-parser-unit.test.ts
│   │   ├── sdf-writer-integration.test.ts
│   │   └── sdf-writer-unit.test.ts
│   ├── unit/
│   │   ├── utils/                   # Utility tests (101 tests)
│   │   │   ├── aromaticity-perceiver.test.ts
│   │   │   ├── atom-utils.test.ts
│   │   │   ├── molecular-properties.test.ts
│   │   │   ├── ring-finder.test.ts
│   │   │   ├── symmetry-detector.test.ts
│   │   │   └── valence-calculator.test.ts
│   │   └── validators/              # Validator tests (not yet created)
│   └── rdkit-comparison/
│       └── rdkit-api-inspect.test.ts # RDKit API inspection (1 test)
├── types.ts                         # TypeScript type definitions
├── index.ts                         # Public API exports
├── package.json
├── tsconfig.json
├── AGENTS.md                        # Agent guidelines
└── README.md

Key Implementation Features

Canonical SMILES Generation

openchem implements RDKit-compatible canonical SMILES generation:

  1. Modified Morgan Algorithm: Atoms are canonically ordered using iterative refinement based on:

    • Canonical rank (connectivity signature)
    • Atomic number (tie-breaker)
    • Degree, isotope, charge
    • Neighbor properties
  2. Starting Atom Selection (RDKit-compatible):

    • Primary criterion: Canonical label (lowest rank wins)
    • Tie-breakers (in order): Heteroatom preference → Terminal atom → Lower degree → Lower charge
    • Design choice: Prioritizes canonical labels over heteroatom preference for deterministic output
    • Note: The OpenSMILES specification (Section 4.3.4) recommends starting on heteroatoms first (e.g., OCCC over CCCO), but RDKit prioritizes canonical ordering for deterministic behavior
    • Result: Both approaches are chemically equivalent; openchem follows RDKit for maximum interoperability
  3. Stereo Normalization: E/Z double bond stereochemistry is normalized to a canonical form:

    • Trans (E) alkenes: Both markers pointing up (/) - e.g., C/C=C/C
    • Cis (Z) alkenes: Opposing markers (/ and \) - e.g., C/C=C\C
    • Ensures equivalent stereo representations canonicalize identically
  4. Deterministic Output: Same molecule always produces the same canonical SMILES, enabling reliable structure comparison and database storage.

Example of RDKit-compatible behavior:

// Both inputs represent the same molecule (hydrogen cyanide)
parseSMILES('C#N');  // → canonical: "C#N" (carbon first)
parseSMILES('N#C');  // → canonical: "C#N" (canonical labels prioritized)

// Both inputs represent the same molecule (propanol)
parseSMILES('OCCC'); // → canonical: "CCCO" (canonical labels prioritized)
parseSMILES('CCCO'); // → canonical: "CCCO"

This implementation achieves 100% agreement with RDKit's canonical output across 325 diverse test molecules including 25 commercial pharmaceutical drugs.

Contributing

Debug Logging Convention

All debug logging (e.g., console.log, console.warn, etc.) must be gated behind the VERBOSE environment variable. This ensures that test and production output remains clean unless explicitly requested. Use:

if (process.env.VERBOSE) {
  console.log('Debug info...');
}

This applies to all source and test files. Never leave direct logging statements that print during normal runs.

We welcome contributions! openchem maintains strict quality standards:

  1. All tests must pass — 610/610 required
  2. RDKit parity required — Canonical SMILES must match RDKit output exactly
  3. Add tests for new features — Test coverage is mandatory
  4. Follow TypeScript conventions — See AGENTS.md for guidelines

To contribute:

# Clone and install
git clone https://github.com/rajeshg/openchem.git
cd openchem
bun install

# Make changes and test
bun test

# Type check
bun run tsc

# Submit PR with tests

Use Cases

openchem is perfect for:

  • Cheminformatics web applications — Client-side molecule parsing and visualization
  • Chemical databases — Canonical SMILES, InChI, and fingerprint-based storage and comparison
  • Molecule editors — Import/export SMILES, MOL, SDF with 2D rendering
  • Drug discovery tools — Structure representation, property calculation, and similarity searching
  • Educational software — Teaching chemical notation with interactive 2D visualization
  • API services — Fast molecule processing, fingerprinting, and property calculation in Node.js

License

MIT

Acknowledgments

openchem builds on the work of several excellent open-source projects:

  • RDKit — Validated against RDKit for accuracy. Several molecular descriptor algorithms (LabuteASA, LogP, Morgan fingerprints, etc.) are derived from RDKit's C++ implementation. See THIRD-PARTY-LICENSES.md for details.

  • OPSIN by Daniel Lowe — IUPAC nomenclature data (XML lexicon files in opsin-iupac-data/ and compiled rules in opsin-rules.json) used for parsing and generating IUPAC chemical names. Licensed under MIT.

  • InChI Trust — InChI identifier generation via WebAssembly build of the official InChI library.

  • Daylight SMILES & SMARTS — Specification for molecular line notation and pattern matching.

For complete license information and attribution requirements, see THIRD-PARTY-LICENSES.md.

About

Fast, modern cheminformatics library in TypeScript for web and JS runtimes like node, bun

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages