#   An Optimized Deseret Keyboard

A Deseret alphabet keyboard can be designed to follow principles of efficient typing by examining the frequency of letters and digraphs.

The primary corpus for Deseret is the text of _The Book of Mormon_, _𐐜 𐐒𐐳𐐿 𐐱𐑂 𐐣𐐫𐑉𐑋𐐲𐑌_.  Professor Ryan Shosted of the University of Illinois has manually transcribed the 1869 typesetting by Orson Pratt

There may be small differences in English pronunciation, particularly vowel usage, between the 1869 standard pronunciation and the 2023 “standard” American dialect.  In particular, Pratt hewed to the 1864 Webster's Dictionary for pronunciation.

By taking into account grapheme and digraph frequency, we can intelligently design a keyboard that is easy to use for a “native” Deseret typist.  We follow [the design principles laid down by August Dvorak](https://en.wikipedia.org/wiki/Dvorak_keyboard_layout#Research_on_efficiency), and feel no need to adhere to the `QWERTY` layout since it is already an ill match for the Deseret alphabet.

##  Letter Count

The Deseret alphabet is presented in 38-letter and 40-letter variations.  The latter differs by the addition of two new diphthong letters, `𐐦` Oi /ɔɪ/ and `𐐧` Ew /juː/.  Since the regular keyboard accommodates 26 letters, some pruning is necessary to accommodate Deseret on a Roman keyboard.

With six short vowels and six long vowels, we can reduce the 40-character alphabet by six if we support long-press for the long vowels.  (These are not all of the vowels, but all of the short/long pairs.)

| Short Vowel | Long Vowel |
| ----------- | ---------- |
| `𐐆` Short I /ɪ/ | `𐐀` Long I /iː/ |
| `𐐇` Short E /ɛ/ | `𐐁` Long E /eɪ/ |
| `𐐈` Short A /æ/ | `𐐂` Long A /ɑː/ |
| `𐐉` Short Ah /ɒ/ | `𐐃` Long Ah /ɔː/ |
| `𐐊` Short O /ʌ/ | `𐐄` Long O /oʊ/ |
| `𐐋` Short Oo /ʊ/ | `𐐅` Long Oo /uː/ |

We can also automatically generate certain digraphs; leaning on English-language conventions, we can reduce the 40-character alphabet by an additional eight characters.

| Digraph | Reduction |
| ------- | ------ |
| `𐐤𐐘` NG /ŋ/ | `𐐥` Eng /ŋ/ |
| `𐐝𐐐` SH /ʃ/ | `𐐟` Esh /ʃ/ |
| `𐐗𐐐` CH /tʃ/ | `𐐕` Chee /tʃ/ |
| `𐐓𐐐` TH /θ/ | `𐐛` Eth /θ/ |
| `𐐔𐐐` DH /ð/ | `𐐜` Thee /ð/ |
| `𐐞𐐐` ZH /ʒ/ | `𐐠` Zhee /ʒ/ |
| `𐐉𐐆` OI /ɔɪ/ | `𐐦` Oi /ɔɪ/ |
| `𐐆𐐋` IU /juː/ | `𐐧` Ew /juː/ |
| `𐐏𐐋` YU /juː/ | `𐐧` Ew /juː/ |

(Note that the last two reductions prefer the short vowel but actually correspond to the long vowel in Deseret.)

##  Letter Layout

The naïve approach taken by early fonts was to simply map Deseret letters to their “closest” Roman analogue, such as `𐐃`→`O` or `𐐓`→`T`.  This works reasonably well for a conventional `QWERTY` keyboard, but still requires many exceptions to be memorized.

A more sophisticated approach, and the one which we employ here, is to utilize the known corpus frequency data and some typing design principles to produce a reasonably comfortable typing keyboard, one which a Deseret alphabet native would possibly create and enjoy.

To this end, we first obtain grapheme and digraph frequency from _𐐜 𐐒𐐳𐐿 𐐱𐑂 𐐣𐐫𐑉𐑋𐐲𐑌_.

In [68]:
# Open The Book of Mormon corpus, through p. 153 of the 1869 edition.
with open('DES-BOM-DES.txt', 'r') as workfile:
    data = workfile.readlines()

In [69]:
# Count the graphemes and digraphs in the text.  Omit punctuation, whitespace, and Roman characters.
graph = {}
digraph = {}

for row in data:
    row = row.upper()
    for idx,char1 in enumerate(row):
        if ord(char1) >= ord('𐐀') and ord(char1) <= ord('𐑏'):
            if char1 in graph:
                graph[char1] += 1
            else:
                graph[char1] = 1
            try:
                char2 = row[idx+1]
                if ord(char2) >= ord('𐐀') and ord(char2) <= ord('𐑏'):
                    char12 = char1 + char2
                    if char12 in digraph:
                        digraph[char12] += 1
                    else:
                        digraph[char12] = 1
            except:
                continue

In [70]:
# Sanity check the results.

assert '\n'.join(data).upper().count('𐐠') == graph['𐐠']
assert '\n'.join(data).upper().count('𐐄') == graph['𐐄']
assert '\n'.join(data).upper().count('𐐣') == graph['𐐣']
assert '\n'.join(data).upper().count('𐐓') == graph['𐐓']
assert '\n'.join(data).upper().count('𐐤') == graph['𐐤']

Pratt did not utilize `𐐦` and `𐐧`, but we can infer their frequency from the appropriate digraphs.  (Since we will have the keyboard generate these, they are not strictly necessary for our analysis.)

In [71]:
graph['𐐦'] = digraph['𐐱𐐮'.upper()]
print(graph['𐐦'])

351


In [72]:
graph['𐐧'] = digraph['𐐷𐐭'.upper()] + digraph['𐐮𐐭'.upper()]
print(graph['𐐧'])

1172


In [73]:
# Produce a list by grapheme frequency.
import operator
graphlist = sorted(graph.items(), key=operator.itemgetter(1))

In [74]:
graphlist

[('𐐠', 45),
 ('𐐦', 351),
 ('𐐧', 1172),
 ('𐐖', 1345),
 ('𐐋', 1476),
 ('𐐕', 1898),
 ('𐐍', 1967),
 ('𐐏', 2021),
 ('𐐛', 2362),
 ('𐐘', 2667),
 ('𐐥', 2826),
 ('𐐟', 3075),
 ('𐐂', 3538),
 ('𐐃', 4742),
 ('𐐄', 4894),
 ('𐐑', 5942),
 ('𐐅', 6010),
 ('𐐙', 6025),
 ('𐐒', 6076),
 ('𐐎', 6154),
 ('𐐌', 6559),
 ('𐐗', 6837),
 ('𐐞', 7378),
 ('𐐚', 7995),
 ('𐐐', 8246),
 ('𐐁', 8400),
 ('𐐉', 8729),
 ('𐐀', 8904),
 ('𐐣', 10008),
 ('𐐊', 11388),
 ('𐐝', 12084),
 ('𐐢', 12145),
 ('𐐇', 13224),
 ('𐐜', 15963),
 ('𐐈', 16783),
 ('𐐆', 17016),
 ('𐐓', 18405),
 ('𐐔', 19004),
 ('𐐡', 20424),
 ('𐐤', 23737)]

In [75]:
# Produce a list by digraph frequency.
digraphlist = sorted(digraph.items(), key=operator.itemgetter(1))

In [76]:
digraphlist[-40:]

[('𐐙𐐃', 1161),
 ('𐐐𐐀', 1166),
 ('𐐆𐐢', 1187),
 ('𐐓𐐇', 1208),
 ('𐐃𐐢', 1217),
 ('𐐄𐐡', 1231),
 ('𐐎𐐊', 1235),
 ('𐐣𐐌', 1263),
 ('𐐇𐐣', 1302),
 ('𐐡𐐇', 1321),
 ('𐐟𐐈', 1321),
 ('𐐐𐐆', 1348),
 ('𐐢𐐔', 1404),
 ('𐐐𐐎', 1483),
 ('𐐒𐐀', 1510),
 ('𐐈𐐢', 1567),
 ('𐐗𐐊', 1589),
 ('𐐆𐐞', 1673),
 ('𐐆𐐓', 1733),
 ('𐐜𐐇', 1743),
 ('𐐁𐐡', 1831),
 ('𐐐𐐈', 1920),
 ('𐐡𐐔', 2221),
 ('𐐎𐐆', 2290),
 ('𐐆𐐥', 2314),
 ('𐐝𐐓', 2342),
 ('𐐜𐐈', 2409),
 ('𐐃𐐡', 2481),
 ('𐐜𐐁', 2526),
 ('𐐆𐐤', 2644),
 ('𐐈𐐓', 2660),
 ('𐐇𐐤', 2709),
 ('𐐤𐐓', 2831),
 ('𐐊𐐡', 2902),
 ('𐐇𐐡', 3174),
 ('𐐓𐐅', 3565),
 ('𐐊𐐤', 3817),
 ('𐐉𐐚', 4487),
 ('𐐈𐐤', 7476),
 ('𐐤𐐔', 7563)]

August Dvorak laid down some intelligent principles for a keyboard (regardless of what one thinks of the actual Dvorak layout).

- “Letters should be typed by alternating between hands (which makes typing more rhythmic, increases speed, reduces error, and reduces fatigue). On a Dvorak keyboard, vowels and the most used symbol characters are on the left (with the vowels on the home row), while the most used consonants are on the right.
- “For maximum speed and efficiency, the most common letters and bigrams should be typed on the home row, where the fingers rest, and under the strongest fingers (Thus, about 70% of letter keyboard strokes on Dvorak are done on the home row and only 22% and 8% on the top and bottom rows respectively).
- “The least common letters should be on the bottom row which is the hardest row to reach.
- “The right hand should do more of the typing because most people are right-handed.
- “Digraphs should not be typed with adjacent fingers.
- “Stroking should generally move from the edges of the board to the middle.”

We utilize the letter and digraph frequency data to tune a keyboard layout towards these features.  However, we will not follow Dvorak in moving the punctuation keys, which tends to frustrate typists.

### Alternating Hands

Dvorak preferred to situate vowels at left and consonants at right.  This yields two buckets for us:

Vowels:

```
𐐀𐐁𐐂𐐃𐐄𐐅𐐆𐐇𐐈𐐉𐐊𐐋𐐌𐐍𐐦𐐧
```

Consonants:

```
𐐎𐐏𐐐𐐑𐐒𐐓𐐔𐐕𐐖𐐗𐐘𐐙𐐚𐐛𐐜𐐝𐐞𐐟𐐠𐐡𐐢𐐣𐐤𐐥
```

Applying our reductions from the first 26-letter consideration:

Vowels:

```
𐐆𐐇𐐈𐐉𐐊𐐋𐐌𐐍
```

Consonants:

```
𐐎𐐏𐐐𐐑𐐒𐐓𐐔𐐕𐐖𐐗𐐘𐐙𐐚𐐝𐐞𐐡𐐢𐐤
```

### Digraphs on Home Row

The 100 most common digraphs include:

`𐐤𐐔`, `𐐈𐐤`, `𐐉𐐚`, `𐐊𐐤`, `𐐓𐐅`, `𐐇𐐡`, `𐐊𐐡`, `𐐤𐐓`, `𐐇𐐤`, `𐐈𐐓`, `𐐆𐐤`, `𐐜𐐁`, `𐐃𐐡`, `𐐜𐐈`, `𐐝𐐓`, `𐐆𐐥`, `𐐎𐐆`, `𐐡𐐔`, `𐐐𐐈`, `𐐁𐐡`, `𐐜𐐇`, `𐐆𐐓`, `𐐆𐐞`, `𐐗𐐊`, `𐐈𐐢`, `𐐒𐐀`, `𐐐𐐎`, `𐐢𐐔`, `𐐐𐐆`, `𐐟𐐈`, `𐐡𐐇`, `𐐇𐐣`, `𐐣𐐌`, `𐐎𐐊`, `𐐄𐐡`, `𐐃𐐢`, `𐐓𐐇`, `𐐆𐐢`, `𐐐𐐀`, `𐐙𐐃`, `𐐡𐐆`, `𐐂𐐡`, `𐐇𐐔`, `𐐔𐐆`, `𐐡𐐀`, `𐐊𐐣`, `𐐇𐐝`, `𐐝𐐇`, `𐐣𐐇`, `𐐢𐐆`, `𐐓𐐝`, `𐐡𐐉`, `𐐇𐐢`, `𐐄𐐢`, `𐐣𐐊`, `𐐊𐐝`, `𐐢𐐃`, `𐐑𐐢`, `𐐁𐐣`, `𐐉𐐓`, `𐐈𐐚`, `𐐆𐐕`, `𐐙𐐄`, `𐐓𐐆`, `𐐔𐐇`, `𐐤𐐆`, `𐐋𐐔`, `𐐤𐐉`, `𐐑𐐡`, `𐐤𐐝`, `𐐢𐐁`, `𐐤𐐇`, `𐐀𐐑`, `𐐐𐐄`, `𐐑𐐀`, `𐐆𐐝`, `𐐊𐐑`, `𐐝𐐑`, `𐐂𐐝`, `𐐉𐐔`, `𐐎𐐁`, `𐐚𐐇`, `𐐓𐐡`, `𐐝𐐄`, `𐐏𐐅`, `𐐝𐐊`, `𐐘𐐉`, `𐐢𐐇`, `𐐗𐐃`, `𐐒𐐡`, `𐐆𐐣`, `𐐔𐐞`, `𐐆𐐔`, `𐐟𐐊`, `𐐆𐐜`, `𐐌𐐓`, `𐐀𐐐`, `𐐀𐐗`, `𐐣𐐈`, `𐐝𐐀`

In [88]:
for d,c in digraphlist[::-1][:100]:
    print(f'`{d}`, ', end='')

`𐐤𐐔`, `𐐈𐐤`, `𐐉𐐚`, `𐐊𐐤`, `𐐓𐐅`, `𐐇𐐡`, `𐐊𐐡`, `𐐤𐐓`, `𐐇𐐤`, `𐐈𐐓`, `𐐆𐐤`, `𐐜𐐁`, `𐐃𐐡`, `𐐜𐐈`, `𐐝𐐓`, `𐐆𐐥`, `𐐎𐐆`, `𐐡𐐔`, `𐐐𐐈`, `𐐁𐐡`, `𐐜𐐇`, `𐐆𐐓`, `𐐆𐐞`, `𐐗𐐊`, `𐐈𐐢`, `𐐒𐐀`, `𐐐𐐎`, `𐐢𐐔`, `𐐐𐐆`, `𐐟𐐈`, `𐐡𐐇`, `𐐇𐐣`, `𐐣𐐌`, `𐐎𐐊`, `𐐄𐐡`, `𐐃𐐢`, `𐐓𐐇`, `𐐆𐐢`, `𐐐𐐀`, `𐐙𐐃`, `𐐡𐐆`, `𐐂𐐡`, `𐐇𐐔`, `𐐔𐐆`, `𐐡𐐀`, `𐐊𐐣`, `𐐇𐐝`, `𐐝𐐇`, `𐐣𐐇`, `𐐢𐐆`, `𐐓𐐝`, `𐐡𐐉`, `𐐇𐐢`, `𐐄𐐢`, `𐐣𐐊`, `𐐊𐐝`, `𐐢𐐃`, `𐐑𐐢`, `𐐁𐐣`, `𐐉𐐓`, `𐐈𐐚`, `𐐆𐐕`, `𐐙𐐄`, `𐐓𐐆`, `𐐔𐐇`, `𐐤𐐆`, `𐐋𐐔`, `𐐤𐐉`, `𐐑𐐡`, `𐐤𐐝`, `𐐢𐐁`, `𐐤𐐇`, `𐐀𐐑`, `𐐐𐐄`, `𐐑𐐀`, `𐐆𐐝`, `𐐊𐐑`, `𐐝𐐑`, `𐐂𐐝`, `𐐉𐐔`, `𐐎𐐁`, `𐐚𐐇`, `𐐓𐐡`, `𐐝𐐄`, `𐐏𐐅`, `𐐝𐐊`, `𐐘𐐉`, `𐐢𐐇`, `𐐗𐐃`, `𐐒𐐡`, `𐐆𐐣`, `𐐔𐐞`, `𐐆𐐔`, `𐐟𐐊`, `𐐆𐐜`, `𐐌𐐓`, `𐐀𐐐`, `𐐀𐐗`, `𐐣𐐈`, `𐐝𐐀`, 

### Least Common Letters on Bottom Row

The ten least common letters are:

`𐐠`, `𐐦`, `𐐧`, `𐐖`, `𐐋`, `𐐕`, `𐐍`, `𐐏`, `𐐛`, `𐐘`, `𐐥`, `𐐟`, `𐐂`, `𐐃`, `𐐄`, `𐐑`, `𐐅`, `𐐙`

Of these, `𐐦`, `𐐧`, `𐐕`, `𐐛`, `𐐥`, and `𐐟` are digraphs and can be excluded, as can long vowels `𐐂`, `𐐃`, `𐐄`, and `𐐅`.  This leaves us a seven-character bottom row:

`𐐠`, `𐐖`, `𐐋`, `𐐍`, `𐐏`, `𐐑`, `𐐙`

In [101]:
for d,c in graphlist[:18]:
    print(f'`{d}`, ', end='')

`𐐠`, `𐐦`, `𐐧`, `𐐖`, `𐐋`, `𐐕`, `𐐍`, `𐐏`, `𐐛`, `𐐘`, `𐐥`, `𐐟`, `𐐂`, `𐐃`, `𐐄`, `𐐑`, `𐐅`, `𐐙`, 

### Right Hand Preferred

No action.

### Digraphs Not Adjacent

Since we know the common digraphs, we can separate the letters by applying a weight against the pairs.  While not all digraph/key pairs may be avoidable, it should be possible to avoid all common instances.

### Inboard Stroke Flow

Common key pairs and word structure should move from the edges of the keyboard towards the center, generally speaking.

It seems to be of similar importance that long-stroke vowels not sit on weaker fingers.

### Bottom Row

We can situate the letters on the keyboard following the above principles.

The easiest is the bottom row.  Per the above priorities, we suggest:

`𐐖`, `𐐍`, `𐐋`, `𐐏`, `𐐑`, `𐐙`, `𐐠`

The least common letters are located at the lower left, which is hardest for many typists to reach.  The long-press vowel and semi-vowel are at center.  Consonants are at right.

### Home Row

Home row is favored for common letters and common digraphs.  At this point, our field of possible keys includes:

Vowels:

```
𐐆𐐇𐐈𐐉𐐊𐐌
```

Consonants:

```
𐐎𐐐𐐒𐐓𐐔𐐕𐐗𐐘𐐚𐐝𐐞𐐡𐐢𐐤
```

Common digraphs include:

`𐐤𐐔`, `𐐈𐐤`, `𐐉𐐚`, `𐐊𐐤`, `𐐓𐐅`, `𐐇𐐡`, `𐐊𐐡`, `𐐤𐐓`, `𐐇𐐤`, `𐐈𐐓`, `𐐆𐐤`, `𐐜𐐁`, `𐐃𐐡`, `𐐜𐐈`, `𐐝𐐓`, `𐐆𐐥`, `𐐎𐐆`, `𐐡𐐔`, `𐐐𐐈`, `𐐁𐐡`, `𐐜𐐇`, `𐐆𐐓`, `𐐆𐐞`, `𐐗𐐊`, `𐐈𐐢`, `𐐒𐐀`, `𐐐𐐎`, `𐐢𐐔`, `𐐐𐐆`

If we prefer vowels at left and long-press vowels towards center, and take into account digraph data for consonants, we arrive at the subset:

```
𐐆𐐇𐐈𐐉𐐊𐐔𐐤𐐓𐐡
```

We situate these as:

`𐐓`, `𐐡`, `𐐆`, `𐐇`, `𐐈`, `𐐉`, `𐐊`, `𐐔`, `𐐤`

### Top Row

The remaining keys are:

Vowels:

```
𐐌
```

Consonants:

```
𐐎𐐐𐐒𐐕𐐗𐐘𐐚𐐝𐐞𐐢
```

Mostly consonants, so we take into account digraphs heavily.

`𐐗`, `𐐎`, `𐐌`, `𐐐`, `𐐒`, `𐐕`, `𐐘`, `𐐚`, `𐐝`, `𐐞`, `𐐢`

(We permit ourself a small joke in the arrangement of the first four letters.)

##  Preliminary Keyboard Layout

`𐐗`, `𐐎`, `𐐌`, `𐐐`, `𐐒`, `𐐕`, `𐐘`, `𐐚`, `𐐝`, `𐐞`, `𐐢`

`𐐓`, `𐐡`, `𐐆`, `𐐇`, `𐐈`, `𐐉`, `𐐊`, `𐐔`, `𐐤`

`𐐖`, `𐐍`, `𐐋`, `𐐏`, `𐐑`, `𐐙`, `𐐠`