Synonymicon

A multi-source synonym discovery tool with frequency-band filtering. Combines WordNet and fastText to surface candidates across the full Zipf range. Pick a corpus, pick a frequency band, get synonyms.

Live at synonymicon.xyz.

Built with assistance from

Claude Opus 4.5
Claude Opus 4.6
Claude Opus 4.7
Perplexity Computer
Xiaomi MiMo-V2-Pro
Xiaomi MiMo-V2.5-Pro
MiniMax M2.7

Stack

Python 3.12 + Flask (synchronous, single-process, no database)
wordfreq for default frequency
NLTK WordNet for primary synonyms
fastText (fasttext-wiki-news-subwords-300 via gensim) for secondary candidates
Included frequency corpora: wordfreq, SUBTLEX-US, BNC, Google 1-grams, Wikipedia, Kaggle, OpenSubtitles, Project Gutenberg, Leipzig News 2025, Leipzig Web COM 2018, Leipzig Web UK 2018
Definition fallback chain: Wiktionary REST API → Webster's 1913 (local) → WordNet gloss → [undefined]
Vanilla single-page frontend (no build step, no framework)

Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python scripts/setup_nltk.py

The fastText model (~1 GB) downloads on first run via gensim and is cached under ~/gensim-data/.

Run (development)

flask run --no-reload

Use --no-reload because fastText loads at module scope and the reloader would spawn two processes that both load it. Startup takes ~2.5–3 minutes.

Server on localhost:5000.

Run (production)

gunicorn -w 1 -t 120 -b 127.0.0.1:5000 app:app

-w 1 (one worker) is intentional; each worker loads ~1.5 GB of model + corpus data.
-t 120 keeps gunicorn from killing the worker during the long startup.
Run behind a reverse proxy (nginx, Caddy) for TLS.

Memory & startup

Resident memory: ~1.5–2 GB (fastText ~1 GB, corpora ~200 MB, runtime).
Cold start: ~2.5–3 minutes.
Not compatible with serverless or sleep-on-idle hosting.

API

GET /synonyms?word=<x>&tier=<t>&pos=<p>&corpus=<c>

Returns JSON: [{word, zipf, definition, band}, ...].

Param	Values
`word`	required; up to 2 words for phrase queries
`tier`	`all`, `common`, `uncommon`, `rare`, `exotic`, `absurd` (or comma-separated)
`pos`	`all`, `noun`, `verb`, `adj`, `adv` (or comma-separated)
`corpus`	`wordfreq` (default), `subtlex`, `bnc`, `google_1grams`, `wikipedia`, `kaggle`, `opensubtitles`, `gutenberg`, `leipzig_news`, `leipzig_web_com`, `leipzig_web_uk`
`min`, `max`	optional Zipf floats (advanced mode; overrides `tier`)

Layout

app.py                  Flask app (all backend logic)
data/                   Corpus files + Webster's 1913
static/index.html       Single-page frontend (HTML + inline CSS + inline JS)
scripts/setup_nltk.py   One-time NLTK data download
requirements.txt        Pinned dependencies
CLAUDE.md               Architecture and design rationale

License

MIT — see LICENSE.

Credits

Frequency corpora are credited in-app under the "corpora" link in the footer.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
data		data
scripts		scripts
static		static
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
AUDIT_2026-05-26.md		AUDIT_2026-05-26.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
candidates.py		candidates.py
config.py		config.py
corpora.py		corpora.py
definitions.py		definitions.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synonymicon

Built with assistance from

Stack

Setup

Run (development)

Run (production)

Memory & startup

API

Layout

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Synonymicon

Built with assistance from

Stack

Setup

Run (development)

Run (production)

Memory & startup

API

Layout

License

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages