Skip to content
julp edited this page Dec 12, 2011 · 19 revisions

Features

  • (almost) Unicode aware (thanks ICU)
    • (partial) automatic charset detections and conversions
    • regular expressions (ugrep)
    • graphemes aware and normalization (implies --form=[c|d] option)
  • overlaps handling on multiple matches (eg: echo eleve | grep --color=always -e el -e le)
  • optional bzip2 and/or gzip stream support

Documentation

Manual pages

Tools:

  • ugrep (grep) (spec)
  • ucat (cat) (spec)
  • ucut (cut) (spec) (status: current job, unstable)
  • utr (tr) (spec) (status: first draft)
  • usort (sort) (spec) (status: first implementation, only few "functionalities" are available)

Appendix:

Planned (or not):

  • uwc (wc) (graphemes/code points)

For developers

Branch details:

  • master: current development branch
  • normalized_stream (pushed): dead, kept to remember how to play with NF[C|D]_QC property
  • i18n_test (pushed): study i18n implementation based on ResourceBundles
Clone this wiki locally