[Feature Request] Extend set of Unicode property tests in racket/base #3634

tail-reversion · 2021-01-18T17:35:44Z

Analyzing Unicode properties is critical to text-processing algorithms and Unicode-compatible parsing. Current, Racket supports testing the general category of a character, using char-general-category and related predicates, like char-alphabetic?, &c. However, there are many other Unicode properties (see UAX 44), and there are no procedures in Racket for inspecting them.

Currently, analysis of other properties can be added to programs using the Unicode Chars package, but given that Racket has Unicode chars and strings as primitive data types, such fundamental analysis tools really belong in the core language, not an external dependency. Additionally, relying on an external library for extending the set of text operations risks having a combined set of operations that are not in-sync, with respect to the version of the Unicode Standard to which they adhere, introducing subtle bugs and incompatibilities.

If it would be helpful, I’d be happy to develop a draft of the extended char/string API, for discussion, and help investigate how best to implement the API.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Extend set of Unicode property tests in racket/base #3634

[Feature Request] Extend set of Unicode property tests in racket/base #3634

tail-reversion commented Jan 18, 2021

[Feature Request] Extend set of Unicode property tests in racket/base #3634

[Feature Request] Extend set of Unicode property tests in racket/base #3634

Comments

tail-reversion commented Jan 18, 2021