Skip to content

mwpcheung/captcha

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

captcha

中文版

A lightweight 4-digit numeric CAPTCHA renderer. ~500 lines, zero native dependencies — no FreeType, Skia, or GDI+. Runs anywhere a language runtime runs.

The motivation is plain: common CAPTCHA-rendering libraries ship close to 10 MB of native binaries and have brittle cross-platform support — different Linux distributions need different system fonts and shared libraries, and slim container images often miss one of them and silently render blanks. So this is a from-scratch lightweight replacement.

The goal is "keep scripts out", not "win an adversarial fight". It's meant to confirm the form was submitted by a human rather than a casually-written bot. It makes no claim of OCR resistance, no claim against a determined attacker, no claim against modern vision models. Real fraud prevention needs sliders, Turnstile, behavioral risk scoring — not this.

Implementation-wise: vector polyline glyphs + signed distance field anti-aliasing + 2× supersampling + a hand-rolled PNG encoder. The C# version is in production. Out of curiosity I later ported the same algorithm to C / Go / Java / Rust and ran a cross-language benchmark — see below.

Visual samples

The same five 4-digit codes rendered by each of the five implementations. Algorithm and parameters are identical; visual differences come only from each language's RNG (which drives background color, jitter, line positions, noise). 80×30 PNG, ~1 KB each.

0123 4567 8901 2468 1357
C#
Rust
C
Java
Go

Each render: light random background → 5 random Bresenham interference lines → 4 digits with random color, ±15° rotation, ±2 px x/y jitter → 50 noise pixels.

Generate any of these yourself:

cd csharp && dotnet run -c Release bench.cs sample 4567 out.png
cd rust   && ./target/release/captcha sample 4567 out.png
cd c      && ./captcha sample 4567 out.png
cd java   && java Captcha sample 4567 out.png
cd go     && go run captcha.go sample 4567 out.png

Benchmark

Apple Silicon arm64, macOS Darwin 25.4, single thread. Each language: 1000-iteration warmup + 20000-iteration measurement. Each iteration renders one complete 80×30 PNG.

# Language Toolchain mean (μs) p50 p95 p99 min req/s
1 C clang -O3, zlib-ng 680.5 682.0 868.0 931.0 325.0 1469
2 Rust 1.91, lto=fat, zlib-ng 722.0 723.1 928.4 995.0 299.6 1385
3 Java OpenJDK 25, FFM + zlib-ng 766.7 770.3 972.2 1037.4 371.8 1304
4 C# .NET 10 Release, zlib-ng 770.8 774.8 999.3 1069.9 328.2 1297
5 Go 1.25, cgo + zlib-ng 832.7 836.6 1059.9 1133.1 382.8 1201

The first run had C# inexplicably in first at 758 μs. Investigation showed .NET 6+ ships zlib-ng as its default deflate while the other four didn't. After wiring everyone up to zlib-ng (C and Rust link directly, Java via FFM in JDK 22+, Go via cgo), the spread collapsed to ~150 μs — native edges out managed runtimes on a tight numeric loop, which is roughly what you'd expect.

This benchmark is a numeric + compression workload. Different workload shapes (IO-bound services, concurrent requests, GC-sensitive paths) would order these languages very differently.

Algorithm

Pipeline

1. Fill 80×30 byte buffer with light random background (RGB 220-255)
2. Draw 5 random Bresenham interference lines (dark, RGB 100-200)
3. For each of 4 digits:
   a. Pick random rotation ±15°, x jitter ±2 px, y jitter ±2 px, dark color
   b. For each pixel inside the digit's bounding box:
      i.   Take 4 sub-pixel samples on a 2×2 grid
      ii.  Reverse-rotate each sub-sample to digit-local coordinates
      iii. Compute distance from sub-sample to nearest stroke segment
      iv.  Convert distance to coverage:
              dist <  STROKE_R          → 1.0 (fully inside stroke)
              STROKE_R..STROKE_R+AA     → linear falloff (anti-alias band)
              dist >= STROKE_R+AA       → 0.0 (outside)
      v.   Average the 4 sub-sample coverages
   c. Alpha-blend pixel with digit color using coverage as alpha
4. Plot 50 random noise pixels
5. Encode PNG: signature + IHDR + IDAT (zlib-compressed scanlines) + IEND

Why a signed distance field

Bitmap fonts produce visible staircase edges when rotated by arbitrary angles. Vector glyph rasterizers (FreeType, Skia, GDI+) handle rotation cleanly but are heavyweight, native-dependent, and a deployment headache — native libs, font files, platform-specific ABIs.

A simpler middle path: define each digit as a polyline, and at each output pixel compute the exact distance to the nearest line segment. Distance below the stroke radius gives full coverage; distance within a narrow band beyond it gives a smooth gradient; further out, zero. With 2×2 supersampling, the result at 80×30 is visually indistinguishable from real vector rasterization — in ~500 lines, with zero dependencies.

Font data format

Each digit is an array of polylines. Each polyline is a flat array of (x, y) doubles in the [0, 1]² normalized digit box. Coordinates are scaled to actual pixel size at render time.

// digit "1": three polylines (hat, body, base)
static const double D1_S0[] = {0.30, 0.20, 0.50, 0.05};   // hat
static const double D1_S1[] = {0.50, 0.05, 0.50, 0.95};   // body
static const double D1_S2[] = {0.25, 0.95, 0.75, 0.95};   // base

// digit "0": single closed oval, 13 vertices
static const double D0_S0[] = {
    0.50, 0.05, 0.78, 0.10, 0.95, 0.28, 1.00, 0.50,
    0.95, 0.72, 0.78, 0.90, 0.50, 0.95, 0.22, 0.90,
    0.05, 0.72, 0.00, 0.50, 0.05, 0.28, 0.22, 0.10,
    0.50, 0.05
};

Adding new characters: append the polyline definition to the FONT table. The renderer is character-agnostic — it rasterizes whatever polyline data you hand it. Letters A–Z would be roughly 26 entries × 10–15 vertices each, ~50 lines of data.

Tunable parameters

Parameter Default Effect
DIGIT_W × DIGIT_H 11×18 px Display size of one digit
STROKE_R 1.0 px Stroke half-width — larger means bolder digits
AA 1.0 px Anti-alias falloff width — larger means softer edges
SS 2 Supersampling factor (2 = 2×2 subsamples per pixel; cost grows quadratically)
X_STEP 17 px Horizontal spacing between digit centers
X_JITTER, Y_JITTER ±2 px Per-digit position randomization

Running the benchmarks

All five at once:

make all

Individual languages:

make csharp    # requires .NET 10
make rust      # requires Rust 1.70+
make c         # requires clang + zlib-ng-compat (brew install zlib-ng-compat)
make java      # requires JDK 22+ + zlib-ng-compat (FFM API binding)
make go        # requires Go 1.22+ + zlib-ng-compat (cgo binding)

Direct invocation:

cd csharp && dotnet run -c Release bench.cs
cd rust   && cargo run --release --quiet
make c                                           # easier than typing the include/lib paths
cd java   && javac Captcha.java && java Captcha
cd go     && go run captcha.go

Each binary prints the same format:

<lang> captcha benchmark
  warmup=1000, runs=20000
  wall  = 17899 ms total (1117 req/s)
  mean  =    894.9 us
  p50   =    893.0 us
  p95   =   1084.0 us
  p99   =   1157.0 us
  min   =    496.0 us
  max   =  12636.0 us

Project layout

captcha/
├── README.md          (English)
├── README.zh-CN.md    (Chinese)
├── LICENSE            (MIT)
├── Makefile
├── samples/           # PNG samples used in the visual table above
├── csharp/
│   └── bench.cs       # .NET 10 single-file program
├── rust/
│   ├── Cargo.toml
│   └── src/main.rs
├── c/
│   └── captcha.c
├── java/
│   └── Captcha.java
└── go/
    ├── go.mod
    └── captcha.go

License

MIT. See LICENSE.

About

captcha generator multi-language

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors