Skip to content

Commit

Permalink
Mention the problem with supplementary plains
Browse files Browse the repository at this point in the history
Close #4.
  • Loading branch information
mrkkrp committed Sep 10, 2016
1 parent 2e0fce0 commit 4e28554
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 0 deletions.
14 changes: 14 additions & 0 deletions Data/Text/Metrics.hs
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,20 @@
-- It works with strict 'Text' values and returns either 'Natural' numbers
-- (because the metrics cannot be negative), or @'Ratio' 'Natural'@ values
-- because returned values are rational non-negative numbers by definition.
--
-- The functions provided here are the fastest implementations available for
-- use in Haskell programs. In fact the functions are implemented in C for
-- maximal efficiency, but this leads to a minor flaw. When we work with
-- 'Text' values in C, they are represented as UTF-16 encoded strings of
-- two-byte values. The algorithms treat the strings as if a character
-- corresponds to one element in such strings, which is true for almost all
-- modern text data. However, there are characters that are represented by
-- two adjoined elements in UTF-16: emoji, historic scripts, less used
-- Chinese ideographs, and some more. If input 'Text' of the functions
-- contains such characters, the functions may return slightly incorrect
-- result. Decide for yourself if this is acceptable for your use case, but
-- chances are you will never run into situations when the functions produce
-- incorrect results.

{-# LANGUAGE CPP #-}
{-# LANGUAGE ForeignFunctionInterface #-}
Expand Down
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,19 @@ It works with strict `Text` values and returns either `Natural` numbers
(because the metrics cannot be negative), or `Ratio Natural` values because
returned values are rational non-negative numbers by definition.

The functions provided here are the fastest implementations available for
use in Haskell programs. In fact the functions are implemented in C for
maximal efficiency, but this leads to a minor flaw. When we work with `Text`
values in C, they are represented as UTF-16 encoded strings of two-byte
values. The algorithms treat the strings as if a character corresponds to
one element in such strings, which is true for almost all modern text data.
However, there are characters that are represented by two adjoined elements
in UTF-16: emoji, historic scripts, less used Chinese ideographs, and some
more. If input `Text` of the functions contains such characters, the
functions may return slightly incorrect result. Decide for yourself if this
is acceptable for your use case, but chances are you will never run into
situations when the functions produce incorrect results.

The current version of the package implements:

* [Levenshtein distance](http://en.wikipedia.org/wiki/Levenshtein_distance)
Expand Down

0 comments on commit 4e28554

Please sign in to comment.