Skip to content

Commit

Permalink
Add proposed Ukrainian wordlist (bitcoin/bips#442)
Browse files Browse the repository at this point in the history
Notice:  This is hidden behind the -W flag; see 8aaa6f3.

This is not exactly the wordlist proposed in the pull request.  The file
ukrainian.txt from Bohdat/bips@152fc59 has a bug, in addition to the
usual normalization and sorting concerns:  A trailing space (0x20) and
tab (0x09, '\t') after the word at original index 1393, 1-based line
number 1394, and before the newline '\n'.  The problem was first
identified by failure of easyseed's extensive internal self-tests,
followed by examination of the problem with cmp(1) and hex dumps to
diagnose the difference between the wordlist in my source tree, and the
wordlist printed on stdout by `easyseed -W -P -l uk`.

The following is edited for line length limits in the git log, but it
adequately shows the problem:

$ grep -E '[[:space:]]$' ukrainian.txt | hd
00000000  d0 bf d1 96 d1 81 d0 bd  d1 8f 20 09 0a
$ grep -En '[[:space:]]$' ukrainian.txt
1394:пісня 	<*end of line is here*>

It is fixed with the following command:

$ sed -E -e 's/[[:space:]]+$//' < ukrainian.txt > ukfix1/uk_fixed0.txt

After verification that this command made no other changes, it is
normalized and sorted:

$ ls -l ukrainian.txt ukfix1/uk_fixed0.txt
-rw-r--r-- 1 user user 24550 Jan  7 21:26 ukfix1/uk_fixed0.txt
-rw-r--r-- 1 user user 24552 Jan  7 20:31 ukrainian.txt
$ diff -u3 ukrainian.txt ukfix1/uk_fixed0.txt
[...showing only the desired line changed...]
$ uconv -f utf-8 -t utf-8 -x '::nfkd;' < uk_fixed0.txt | \
	LC_ALL=C LANG=C sort -s > uk_fixed1.txt
$ mv -i uk_fixed1.txt ../../easyseed/wordlist/ukrainian.txt
mv: overwrite '../../easyseed/wordlist/ukrainian.txt'? y

(Note with ref to 234c66c:  When normalizing and sorting the russian.txt
list, I forgot to force the locale for `sort(1)`.  I verified that this
makes no difference, and the 234c66c russian.txt is correct.  It *does*
make a very large difference for the Ukrainian wordlist.)

SHA-256 hash for the resulting ukrainian.txt:
612ee29e1fa13dc38c9e1b31c7ef980db8f3c8dd30f1c9377170d1b10e895dc9
  • Loading branch information
nym-zone committed Jan 7, 2018
1 parent 234c66c commit 08a05b4
Show file tree
Hide file tree
Showing 3 changed files with 2,052 additions and 2 deletions.
3 changes: 2 additions & 1 deletion Makefile.inc
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ WORDLISTS= chinese_simplified \
japanese \
korean \
russian \
spanish
spanish \
ukrainian

VECTORSRC= mkvectors.sh \
vectors.h \
Expand Down
3 changes: 2 additions & 1 deletion easyseed.c
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,8 @@ static const struct wordlist wordlists[] =
LANG(japanese, u8"日本語", "ja", u8"\u3000", 1),
LANG(korean, u8"한국어", "ko", ascii_space, 1),
LANG(russian, u8"Русский", "ru", ascii_space, 0),
LANG(spanish, u8"Español", "es", ascii_space, 1)
LANG(spanish, u8"Español", "es", ascii_space, 1),
LANG(ukrainian, u8"Українська", "uk", ascii_space, 0),
};

#undef LANG
Expand Down
Loading

0 comments on commit 08a05b4

Please sign in to comment.