Skip to content

frane/udecide

Repository files navigation

udecide

Six small text classifiers and a train() primitive, all of which run in the browser or in Node and return a calibrated probability you can compare and threshold like any other number.

import { spam, intent, grade, train } from 'udecide'

await spam('BUY NOW!!! click here')        // 0.96
await intent('my card was charged twice')  // 'billing'
await grade('positive', 'increase')        // 0.89

Install

npm install udecide

The library runs in Node 22 and higher, in modern browsers, in Bun, in Deno, and inside a Cloudflare Worker, with @huggingface/transformers as the only runtime dependency.

The catalog

Six pre-trained tools, each one a single named import.

tool what it does first call
spam comment-spam probability 23 MB shared
intent route into billing, support, sales, shipping, other 23 MB shared
sentiment positive, neutral, negative 23 MB shared
toxicity abuse probability 23 MB shared
pii personal-information detector 23 MB shared
grade "do these two answers mean the same thing" 65 MB

The first five share one sentence-encoder model that downloads about twenty three megabytes on the first call and stays cached after, while grade loads its own cross-encoder of about sixty five megabytes because direction questions and antonym discrimination need a model that scores pairs jointly rather than computing a cosine between two embeddings. A reader who imports every tool and exercises every one downloads about eighty eight megabytes once and runs locally for the rest of the visit.

Train your own

import { train, load } from 'udecide'

const classify = await train([
  { text: 'great product', label: 'positive' },
  { text: 'broke immediately', label: 'negative' },
  // around thirty more
])

await classify('works as expected')   // 'positive'

const head = classify.export()
const reloaded = await load(head)

train() takes around thirty labeled examples for a binary task or around fifty for a multiclass one, fits a head on top of the sentence encoder using an eighty twenty stratified split for the holdout, calibrates the scores so a 0.7 actually means around seventy percent confidence on the held out test set, and returns a callable closure you can save to disk and reload later. When the classes do not separate, the trainer throws a TrainingError that lists the misclassified examples and the likely causes, which is the only honest thing to do when the underlying signal is not there.

Scores

Every score this library returns is a real probability in [0, 1] rather than a raw model output, which means a 0.7 from any tool can be compared with a 0.7 from any other tool without having to remember which one came out of which sigmoid, and the standard pattern is a single threshold.

const isSpam = (await spam(text)) > 0.7

What it cannot do

The library is the right tool when the rule you are trying to encode is "this looks like that" and the alternative you would otherwise be writing is a regular expression, a switch on keyword presence, or a string equality check that has started lying. It is the wrong tool for problems that require parsing, like deciding whether a string is valid Python, and for problems that require step by step reasoning, like working out whether a proof is correct, because a sentence encoder collapses both of those into a single vector and loses the information that mattered. The default models are also tuned for English, so any application that needs to classify other languages should swap the encoder with setEmbedder to one of the multilingual variants documented under the embedders concept page before expecting any of this to calibrate sensibly on its own corpus.

CLI

udecide train ./examples.jsonl --out ./my-head.json
udecide test  ./my-head.json --input "..." --expected "..."
udecide info  ./my-head.json

More

The full documentation lives under docs/ and the seven runnable examples under examples/ cover the patterns most applications actually need, including the screenshot grader at examples/grader-screenshot/ which demonstrates the fix for the specific bug this library was originally written to address.

License

MIT.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors