Skip to content

sen-ltd/wordcloud-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wordcloud-api

A tiny FastAPI service that wraps the wordcloud package and exposes it over HTTP. Two endpoints: one that tokenizes plain text for you, one that accepts pre-computed frequencies so you can plug in your own tokenizer for languages wordcloud can't handle natively (Japanese, Chinese, ...).

The argument for factoring this out: wordcloud is a great library but embedding it in every app that needs a word cloud pulls in numpy, matplotlib, and Pillow — about 300 MB of dependencies. Running it as a small service keeps that footprint out of your main containers.

Endpoints

Method Path Description
POST /wordcloud JSON body {text, width, height, ...} → PNG
POST /wordcloud/frequencies JSON body {frequencies: {word: weight}, ...}
GET /health {status, version, default_colormap}
GET / HTML explainer with an in-browser try-it form
GET /docs Swagger UI / OpenAPI

Request fields (shared)

Field Type Default Limits
width int 800 50–2000
height int 400 50–2000
background string "white" any CSS color name
colormap string "viridis" any matplotlib name
max_words int 200 1–500
stopwords string[] (text) null optional

Request body size is capped at 1 MB. text is capped at 50 KB.

Try it locally

pip install -e ".[dev]"
python -m wordcloud_api
curl -X POST http://localhost:8000/wordcloud \
  -H "Content-Type: application/json" \
  -d '{"text": "the quick brown fox jumps over the lazy dog"}' \
  -o wc.png

Frequencies mode (e.g. for Japanese)

wordcloud tokenizes on whitespace, which doesn't work for languages without word boundaries. Tokenize on the client side (pair with furigana-api or slug-jp) and pass counts directly:

curl -X POST http://localhost:8000/wordcloud/frequencies \
  -H "Content-Type: application/json" \
  -d '{"frequencies": {"apple": 10, "banana": 5, "cherry": 3}}' \
  -o wc.png

Note that the default font bundled with wordcloud (DejaVu) does not contain CJK glyphs, so non-Latin scripts will render as tofu. Mount your own TTF if you need that — extending the service for font upload is left as an exercise.

Docker

docker build -t wordcloud-api .
docker run --rm -p 8000:8000 wordcloud-api

The resulting image is around ~300 MB; matplotlib alone is ~150 MB and is a hard transitive dependency of wordcloud (it uses the cm module for colormaps). That's the tradeoff you take for not reinventing the algorithm.

Tests

pytest -q

16 tests covering the pure generator plus the API layer: PNG magic-byte verification, frequencies mode, custom colormap / background, stopwords filtering, 413 / 422 size limits, content-type validation.

License

MIT. See LICENSE.

Links

About

A FastAPI service that renders PNG word clouds from arbitrary text. The design is deliberately thin: a small HTTP wrapper around the excellent `wordcloud` package, a canonical example of the "when the library IS the product" pattern. Two endpoints.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors