Release v0.4.0 · shaik-abdul-thouhid/ezi-code

[0.4.0] - 2026-06-11

Changed

The CodePoint contract is now documented as such, on the type itself
(encoding.CodePoint) and in the README: values of this type are presumed
valid Unicode scalars; producers uphold the contract, consumers rely on it
and skip decoding/validation (which is what makes the []const CodePoint
API variants the cheap path for already-decoded text). The README quick-look
examples were also brought back in sync with the real signatures
(initUTF8View out-param, view.iter(), nfc over code points).
Breaking: "unchecked" now means one thing everywhere — the caller
guarantees the documented preconditions; violations are asserted /
safety-checked (trap in Debug/ReleaseSafe, undefined in
ReleaseFast/ReleaseSmall); unchecked functions never return errors or
panic. Accordingly:
- encoding.utf8.codePointLenReverseUnchecked returns plain u3
  (was UTF8ValidationError!u3).
- encoding.utf16.utf16SequenceLenReverseUnchecked returns plain u2
  (was UTF16ValidationError!u2) and asserts end_index < buf.len
  instead of returning ZeroLengthUnits.
- encoding.utf8.decodeCodePointReverseUnchecked no longer documents (or
  contains) a panic path; its preconditions are asserted.

Removed

Breaking: the Undefined member is gone from all six UTF-16/UTF-32
error sets (UTF16ValidationError, UTF16ValidationLossyError,
UTF16EncodeError, UTF32ValidationError, UTF32ValidationLossyError,
UTF32EncodeError). It was never returned by any code path; exhaustive
switches over these error sets can drop their dead error.Undefined arm.

Fixed

unicode.segmentation byte-level iterators (grapheme / word / sentence /
line) and step helpers no longer contain catch @panic shims around lossy
decoding. They decode through encoding.utf8.decodeCodePointLossy, so the
"lossy never errors" promise now holds structurally: malformed UTF-8 yields
U+FFFD segments and can never trap. lineStepBytes documents its
byte_pos < bytes.len contract (asserted, safety-checked).

Added

Early-exit collation compare (@stable-since: v0.4.0):
Collator.compareCodePointsIncremental / compareUtf8Incremental generate
collation elements lazily and stop at the first differing weight of the
shallowest differing level — no sort keys are materialized. Identical
results to compareCodePoints / compareUtf8 for every input and option
set (verified across the strength × variable-weighting matrix). Strings that
differ early at the primary level — the common case — pay for only a few
collation elements; the weighting logic is shared with buildKey via an
internal resolveCE so the two paths cannot diverge.
BOM utilities — new encoding.bom module (@stable-since: v0.4.0):
Bom enum (utf8 / utf16_le / utf16_be / utf32_le / utf32_be) with
bytes / len / endian / match, plus detect (longest-match: the
ambiguous FF FE 00 00 reports UTF-32 LE) and zero-copy strip. The
codecs themselves still never consume or produce BOMs; this is the
explicit seam, re-exported at the package root as bom.
unicode.normalization.nfkcCaseFold — sequence-level NFKC_Casefold
(@stable-since: v0.4.0), per the UCD definition
NFKC_CF(X) = NFC(toNFKC_Casefold(X)), built on the already-shipped
per-scalar nfkcCaseFoldMap table. The identifier-caseless form used by
UAX #31 and security profiles: fullwidth compatibility variants fold
(ＡＢＣ → abc), Default_Ignorables map away, and the result is
idempotent (verified by a BMP-wide sweep).
String-level titlecase in unicode.casing
(@stable-since: v0.4.0): titlecaseAlloc ([]const CodePoint) and
titlecaseUtf8Alloc (UTF-8), implementing the Unicode default algorithm
(R3) — UAX #29 word segmentation, full titlecase mapping on each word's
first cased scalar, full lowercase on the rest, with Final_Sigma context
for U+03A3 ("ΜΕΓΑΣ" → "Μεγας"). Default root-locale mappings; no
Turkic/Lithuanian tailoring.
Case-insensitive search in unicode.casing
(@stable-since: v0.4.0): indexOfFold / containsFold over UTF-8 bytes
and indexOfFoldCodePoints / containsFoldCodePoints over scalar slices.
Both sides fold lazily during the scan — no allocation — honoring expanding
folds in .full mode ("STRASSE" matches "Straße"), with a whole-scalar
boundary rule (needle "s" never matches inside "ß"'s expansion). The
CodePoint variants skip decoding/validation entirely per the CodePoint
contract.
SIMD chunked scanners for UTF-16, mirroring the v0.3.0 UTF-8 set
(@stable-since: v0.4.0, portable @Vector compares with scalar tails, no
target intrinsics): utf16.nonSurrogateRunLength (length of the leading
run of standalone scalars — the UTF-16 analogue of asciiRunLength) and
utf16.countScalarsSimd (unchecked scalar count via the high-surrogate
rule). utf16.validate now skips surrogate-free runs in SIMD strides and
falls back to scalar pair checks only at actual surrogates.
Bulk encode-direction APIs taking []const CodePoint
(@stable-since: v0.4.0): encodeCodePoints{Len,Buffer,Alloc} on all three
codecs, the inverse of bytesToUTF8String / bufToUTF16String /
bufToUTF32String. Callers holding already-decoded scalars encode without
any decoding or validation, per the CodePoint contract.
encoding.utf8.StreamingValidator — incremental, resumable validation over
arbitrarily-chunked input (@stable-since: v0.4.0). The Höhrmann DFA state
carries across chunk boundaries (no buffering, no copies); update reports
the absolute offset of the first malformed sequence, finish distinguishes
a truncated trailing scalar from valid end-of-input, and ASCII runs are
skipped in SIMD strides.
Error position reporting (@stable-since: v0.4.0): invalidIndex on
all three codecs returns the unit/byte offset where the first malformed
sequence starts (null when valid), so diagnostics no longer require a
re-scan. Decoding strictly at the reported offset recovers the fine-grained
error. The UTF-8 variant skips ASCII runs in SIMD strides.
Unchecked forward decode entry points, completing the strict / unchecked
/ lossy matrix in the forward direction (@stable-since: v0.4.0): callers
holding already-validated text can decode without re-validating and without
wrapping the input in a View.
- encoding.utf8.decodeCodePointUnchecked
- encoding.utf16.decodeU16CodePointUnchecked
- encoding.utf32.decodeU32CodePointUnchecked
encoding.utf8.decodeCodePointLossy — infallible lossy decode primitive
(@stable-since: v0.4.0). Malformed sequences yield U+FFFD and are never
reported as errors; the only precondition (offset < bytes.len) is asserted
(safety-checked), not error-returned, and the returned len is always >= 1
so forward scans are guaranteed to make progress. UTF8LossyIterator and
UTF8SimdLossyIterator now decode through it, removing their internal
catch unreachable/catch break shims.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

[0.4.0] - 2026-06-11

Changed

Removed

Fixed

Added

Uh oh!