Unicode-aware string utilities for Gleam
Production-ready Gleam library providing Unicode-aware string operations with a focus on grapheme-cluster correctness, pragmatic ASCII transliteration, and URL-friendly slug generation.
| Category | Highlights |
|---|---|
| π― Grapheme-Aware | All operations correctly handle Unicode grapheme clusters (emoji, ZWJ sequences, combining marks) |
| π€ Case Conversions | snake_case, camelCase, kebab-case, PascalCase, Title Case, capitalize |
| π Slug Generation | Configurable slugify with token limits, custom separators, and Unicode preservation |
| π Search & Replace | index_of, last_index_of, replace_first, replace_last, contains_any/all |
| β Validation | is_uppercase, is_lowercase, is_title_case, is_ascii, is_hex, is_numeric, is_alpha |
| π‘οΈ Escaping | escape_html, unescape_html, escape_regex |
| π Similarity | Levenshtein distance, percentage similarity, hamming_distance |
| π§© Splitting | splitn, partition, rpartition, chunk, lines, words |
| π Padding | pad_left, pad_right, center, fill |
| π Zero Dependencies | Pure Gleam implementation with no OTP requirement |
gleam add strimport str/core
import str/extra
pub fn main() {
// π― Grapheme-safe truncation preserves emoji
let text = "Hello π©βπ©βπ§βπ¦ World"
core.truncate(text, 10, "...")
// β "Hello π©βπ©βπ§βπ¦..."
// π ASCII transliteration and slugification
extra.slugify("CrΓ¨me BrΓ»lΓ©e β Recipe 2025!")
// β "creme-brulee-recipe-2025"
// π€ Case conversions
extra.to_camel_case("hello world") // β "helloWorld"
extra.to_snake_case("Hello World") // β "hello_world"
core.capitalize("hELLO wORLD") // β "Hello world"
// π Grapheme-aware search
core.index_of("π¨βπ©βπ§βπ¦ family test", "family")
// β Ok(2) - counts grapheme clusters, not bytes!
// π String similarity
core.similarity("hello", "hallo")
// β 0.8 (80% similar)
// π‘οΈ HTML escaping
core.escape_html("<script>alert('xss')</script>")
// β "<script>alert('xss')</script>"
}| Function | Example | Result |
|---|---|---|
capitalize(text) |
"hELLO wORLD" |
"Hello world" |
swapcase(text) |
"Hello World" |
"hELLO wORLD" |
is_uppercase(text) |
"HELLO123" |
True |
is_lowercase(text) |
"hello_world" |
True |
is_title_case(text) |
"Hello World" |
True |
| Function | Example | Result |
|---|---|---|
take(text, n) |
take("π¨βπ©βπ§βπ¦abc", 2) |
"π¨βπ©βπ§βπ¦a" |
drop(text, n) |
drop("hello", 2) |
"llo" |
take_right(text, n) |
take_right("hello", 3) |
"llo" |
drop_right(text, n) |
drop_right("hello", 2) |
"hel" |
at(text, index) |
at("hello", 1) |
Ok("e") |
chunk(text, size) |
chunk("abcdef", 2) |
["ab", "cd", "ef"] |
| Function | Example | Result |
|---|---|---|
index_of(text, needle) |
"hello world", "world" |
Ok(6) |
last_index_of(text, needle) |
"hello hello", "hello" |
Ok(6) |
contains_any(text, needles) |
"hello", ["x", "e", "z"] |
True |
contains_all(text, needles) |
"hello", ["h", "e"] |
True |
replace_first(text, old, new) |
"aaa", "a", "b" |
"baa" |
replace_last(text, old, new) |
"aaa", "a", "b" |
"aab" |
| Function | Example | Result |
|---|---|---|
partition(text, sep) |
"a-b-c", "-" |
#("a", "-", "b-c") |
rpartition(text, sep) |
"a-b-c", "-" |
#("a-b", "-", "c") |
splitn(text, sep, n) |
"a-b-c-d", "-", 2 |
["a", "b-c-d"] |
words(text) |
"hello world" |
["hello", "world"] |
lines(text) |
"a\nb\nc" |
["a", "b", "c"] |
| Function | Example | Result |
|---|---|---|
pad_left(text, width, pad) |
"42", 5, "0" |
"00042" |
pad_right(text, width, pad) |
"hi", 5, "*" |
"hi***" |
center(text, width, pad) |
"hi", 6, "-" |
"--hi--" |
fill(text, width, pad, pos) |
"x", 5, "-", "both" |
"--x--" |
| Function | Description |
|---|---|
is_numeric(text) |
Digits only (0-9) |
is_alpha(text) |
Letters only (a-z, A-Z) |
is_alphanumeric(text) |
Letters and digits |
is_ascii(text) |
ASCII only (0x00-0x7F) |
is_printable(text) |
Printable ASCII (0x20-0x7E) |
is_hex(text) |
Hexadecimal (0-9, a-f, A-F) |
is_blank(text) |
Whitespace only |
is_title_case(text) |
Title Case format |
| Function | Example | Result |
|---|---|---|
remove_prefix(text, prefix) |
"hello world", "hello " |
"world" |
remove_suffix(text, suffix) |
"file.txt", ".txt" |
"file" |
ensure_prefix(text, prefix) |
"world", "hello " |
"hello world" |
ensure_suffix(text, suffix) |
"file", ".txt" |
"file.txt" |
starts_with_any(text, list) |
"hello", ["hi", "he"] |
True |
ends_with_any(text, list) |
"file.txt", [".txt", ".md"] |
True |
common_prefix(strings) |
["abc", "abd"] |
"ab" |
common_suffix(strings) |
["abc", "xbc"] |
"bc" |
| Function | Example | Result |
|---|---|---|
escape_html(text) |
"<div>" |
"<div>" |
unescape_html(text) |
"<div>" |
"<div>" |
escape_regex(text) |
"a.b*c" |
"a\\.b\\*c" |
| Function | Example | Result |
|---|---|---|
distance(a, b) |
"kitten", "sitting" |
3 |
similarity(a, b) |
"hello", "hallo" |
0.8 |
hamming_distance(a, b) |
"karolin", "kathrin" |
Ok(3) |
| Function | Description |
|---|---|
truncate(text, len, suffix) |
Truncate with emoji preservation |
ellipsis(text, len) |
Truncate with β¦ |
reverse(text) |
Grapheme-aware reversal |
reverse_words(text) |
Reverse word order |
initials(text) |
Extract initials ("John Doe" β "JD") |
normalize_whitespace(text) |
Collapse whitespace |
strip(text, chars) |
Remove chars from ends |
squeeze(text, char) |
Collapse consecutive chars |
chomp(text) |
Remove trailing newline |
| Function | Description |
|---|---|
lines(text) |
Split into lines |
dedent(text) |
Remove common indentation |
indent(text, spaces) |
Add indentation |
wrap_at(text, width) |
Word wrap |
import str/extra
extra.to_snake_case("Hello World") // β "hello_world"
extra.to_camel_case("hello world") // β "helloWorld"
extra.to_pascal_case("hello world") // β "HelloWorld"
extra.to_kebab_case("Hello World") // β "hello-world"
extra.to_title_case("hello world") // β "Hello World"extra.ascii_fold("CrΓ¨me BrΓ»lΓ©e") // β "Creme Brulee"
extra.ascii_fold("straΓe") // β "strasse"
extra.ascii_fold("Γ¦on") // β "aeon"extra.slugify("Hello, World!") // β "hello-world"
extra.slugify_opts("one two three", 2, "-", False) // β "one-two"
extra.slugify_opts("Hello World", 0, "_", False) // β "hello_world"str/
βββ core # Grapheme-aware core utilities
βββ extra # ASCII folding, slugs, case conversions
βββ tokenize # Pure-Gleam tokenizer (reference)
βββ internal_* # Character tables (internal)
| Document | Description |
|---|---|
| Core API | Grapheme-aware string operations |
| Extra API | ASCII folding and slug generation |
| Tokenizer | Pure-Gleam tokenizer reference |
| Examples | Integration examples and OTP patterns |
| Character Tables | Machine-readable transliteration data |
The library core is OTP-free by design. For production Unicode normalization (NFC/NFD):
// In your application code:
pub fn otp_nfd(s: String) -> String {
// Call Erlang's :unicode module
s
}
// Use with str:
extra.ascii_fold_with_normalizer("Crème", otp_nfd)
extra.slugify_with_normalizer("CafΓ©", otp_nfd)# Run the test suite
gleam test
# Regenerate character tables documentation
python3 scripts/generate_character_tables.py- tests covering all public functions
- Unicode edge cases (emoji, ZWJ, combining marks)
- Grapheme cluster boundary handling
- Cross-module integration tests
Contributions welcome! Areas for improvement:
- Expanding character transliteration tables
- Additional test cases for edge cases
- Documentation improvements
- Performance optimizations
gleam test # Ensure tests pass before submitting PRsMIT License β see LICENSE for details.
Made with π for the Gleam community
