Skip to content

johnstonskj/racket-codepoint

Repository files navigation

codepoint

Library for operations on Unicode codepoints, and UCD properties.

GitHub release raco pkg install codepoint Documentation GitHub stars MIT License

This package provides types that describe individual Unicode codepoints, codepoint ranges, and character properties. The following example demonstrates the query of certain character properties that explain the behavior of combining the letter #\a with the character U+0304.

(codepoint? #x0304)
; -> #t
(ucd-name #x0304)
; -> "COMBINING MACRON"
(ucd-general-category #x0304)
; -> 'Mn
(cdr (assoc (ucd-general-category #x0304) *general-categories*))
; -> "Non-spacing mark"
(ucd-canonical-combining-class #x0304)
; -> 230
(cdr (assoc (ucd-canonical-combining-class #x0304) *combining-classes*))
; -> "Distinct marks directly above"
(string #\a (codepoint->char #x0304))
; -> "ā"

The functions in codepoint/properties return the values extracted from the Unicode Character Database, and the only conversion is typically string to number or string to symbol. Descriptions of the values that are returned are gathered in codepoint/enums for display purposes.

Modules

  • codepoint -- functions on the type codepoint?.
  • codepoint/range -- functions on an inclusive range of codepoint? values.
  • codepoint/range-dict -- a dictionary keyed by codepoint-range? values.
  • codepoint/properties -- Unicode Character Database (UCD) properties for codepoint? values.
  • codepoint/enums -- enumeration values found in UCD properties.
  • codepoint/fold -- implementation of case-folding based on UCD properties.

Data Generation

TBD

Version History

Version 0.2

  • Most documentation now complete, renamed scribbling root file from index to codepoint.
  • Removed the shell script for fetching UCD files and rewrote as ucd module.

Version 0.1

  • Initial upload.