Open-source biomedical terminology retrieval and semantic search infrastructure.
Live Demo · API Docs · Architecture · Contributing
Biomedical terminology is fragmented across a dozen incompatible standards — ICD-10, ATC, LOINC, CPT, and more. Looking up a code across systems requires expensive licensed tools, proprietary APIs, or brittle scripts.
MedCodeTranslator is a transparent, offline-capable, open-source retrieval engine that works across all major coding schemes without sending data to a third party.
Medical disclaimer: This is an informational reference tool only. Not for diagnosis, treatment decisions, prescribing, or clinical recommendations. Never enter patient-identifiable (PHI) data.
| Problem | This project |
|---|---|
| Terminology lookup requires costly licensed APIs | Bundled offline-capable datasets |
| Black-box matching — no explanation of why a result matched | Transparent scoring + matchMethod label per result |
| Mobile/offline-unfriendly tools | React Native app, SQLite on-device |
| Hard to reproduce or evaluate search quality | Benchmark suite with precision@1, precision@5, MRR |
| No programmatic access | Shared TypeScript + Python packages |
- Layered retrieval — exact → prefix → substring → fuzzy, with per-result score and
matchMethod - 8 medical code schemes — ATC-5, ICD-10, ICD-9-CM, ICD-11, LOINC, CPT, HCPCS, CVX
- Offline-first — SQLite on-device via expo-sqlite; no network calls for search
- Transparent ranking — every result exposes its score (0–1) and how it was matched
- Match highlighting — character-level match spans returned for all result types
- Did you mean — fuzzy fallback suggestions when no exact/substring result found
- Multilingual UI — English, Hebrew (RTL), Spanish, French, Portuguese, Russian, Chinese, German, Arabic
- Cross-platform — iOS, Android, Web (PWA via GitHub Pages)
- Reusable packages —
packages/search/(TypeScript) +packages/python-client/(Python) - Benchmark suite — reproducible evaluation with precision@1, precision@5, MRR per scheme
| Scheme | Authority | Coverage |
|---|---|---|
| ATC-5 | WHO Collaborating Centre (WHOCC) | Drug classification (level 5) |
| ICD-10 | CMS / WHO | Diagnosis codes |
| ICD-9-CM | NBER / CMS (historical) | Legacy diagnosis codes |
| ICD-11 | WHO | Latest international classification |
| LOINC | Regenstrief Institute | Lab & clinical observations |
| CPT | Curated demo subset | Procedure codes |
| HCPCS | CMS | Supplies & non-physician services |
| CVX | CDC | Vaccine codes |
Data files live in data/vocabularies/. See DATA_SOURCES.md and data/vocabularies/source-metadata.json for provenance.
Note: Bundled datasets are curated demo subsets. For production use, replace with full official releases — see
scripts/refresh_medical_db.py.
| Mobile — ICD-10 search | Mobile — Hebrew UI | Web — LOINC search |
|---|---|---|
![]() |
![]() |
![]() |
User query
│
▼
┌─────────────────────────────────────────────────────────┐
│ Layered Retrieval │
│ │
│ 1. Exact match score = 1.000 matchMethod=exact │
│ 2. Prefix match score = 0.900 matchMethod=prefix │
│ 3. Substring match score = 0.700 matchMethod=substr │
│ 4. Fuzzy (Fuse.js) score = 0–0.65 matchMethod=fuzzy │
│ 5. Alias expansion score = parent matchMethod=alias │
│ │
│ → deduplicate by code (keep highest score) │
│ → sort descending by score │
│ → return ScoredEntry[] with highlights[] │
└─────────────────────────────────────────────────────────┘
│
▼
SQLite (expo-sqlite) ←→ Fuse.js in-memory index
All retrieval logic lives in packages/search/src/:
| File | Responsibility |
|---|---|
exact.ts |
Exact, prefix, substring match + highlight spans |
fuzzy.ts |
Fuse.js index management + inverted score mapping |
layered.ts |
Merge + deduplicate across all layers |
import { layeredSearch } from '@medcode/search';
import type { ScoredEntry } from '@medcode/core';
const results = layeredSearch(entries, 'diabetes', 'icd10', { limit: 10 });
results.forEach((r: ScoredEntry) => {
console.log(r.code, r.name_en);
console.log(' score:', r.score, ' via:', r.matchMethod);
console.log(' highlights:', r.highlights); // [[0, 7]] character spans
});from medcodetranslator import MedCodeTranslator
client = MedCodeTranslator()
results = client.search('icd10', 'diabetes', fuzzy=True, limit=5)
for r in results:
print(f"{r.code} {r.name_en:<45} score={r.score:.3f} via={r.match_method}")E11 Type 2 diabetes mellitus score=1.000 via=substring
E10 Type 1 diabetes mellitus score=1.000 via=substring
E13 Other specified diabetes mellitus score=1.000 via=substring
| Method | When it fires | Score range | Explainability |
|---|---|---|---|
| Exact | Query === code or name (case-insensitive) | 1.0 | ✅ |
| Prefix | Query is a leading substring | 0.9 | ✅ |
| Substring | Query appears anywhere in code/name | 0.70–0.75 | ✅ |
| Fuzzy (Fuse.js) | No exact/prefix/substring match | 0–0.65 | ✅ score shown |
| Alias | Query matches synonym/abbreviation table | parent score | ✅ |
Results always include:
score— normalised relevance in[0, 1]matchMethod— which layer produced the resulthighlights—[start, end]character ranges inname_en
| Scheme | Precision@1 | Precision@5 |
|---|---|---|
| ATC-5 | — | — |
| ICD-10 | — | — |
| ICD-11 | — | — |
| LOINC | — | — |
| CVX | — | — |
See data/benchmarks/ for query sets and evaluation methodology.
packages/
core/src/types.ts Shared TypeScript types (SchemeKey, CodeEntry, ScoredEntry)
search/src/ Retrieval engine (exact / fuzzy / layered)
python-client/ Python client (medcodetranslator package)
data/
vocabularies/ JSON vocabulary files (one per scheme)
benchmarks/ Benchmark query sets per scheme
aliases/common.json Abbreviation / brand-name alias table
app/ Expo Router screens + components
db/ expo-sqlite init + query layer
i18n/ i18next locale files (9 languages)
docs/ Architecture, API, and retrieval docs
.github/workflows/ CI + deploy pipelines
See CONTRIBUTING.md. In brief:
- Fork and create a feature branch
- Run
npm testand ensure all 54 tests pass - For data changes, update
data/vocabularies/anddata/vocabularies/source-metadata.json - For search logic changes, add benchmark queries to
data/benchmarks/ - Open a PR against
master
Scope constraint: This project is a retrieval and reference tool. PRs that add diagnosis generation, clinical recommendations, or LLM inference will not be merged. See docs/SAFE_SCOPE.md.
Live PWA: https://nadavweisler.github.io/MedCodeTranslator/ — auto-deployed on every push to master.
MIT — vocabulary data is subject to individual upstream licenses; see DATA_SOURCES.md and DEPENDENCY_LICENSE_AUDIT.md.


