<a href="https://colab.research.google.com/github/k2-fsa/colab/blob/master/sherpa-onnx/piper/convert_de_DE_thorsten_low.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

This colab notebook shows how to convert

https://huggingface.co/rhasspy/piper-voices/tree/main/de/de_DE/thorsten/low

to [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)

In [4]:
%%shell

wget https://huggingface.co/rhasspy/piper-voices/resolve/main/de/de_DE/thorsten/low/de_DE-thorsten-low.onnx

wget https://huggingface.co/rhasspy/piper-voices/resolve/main/de/de_DE/thorsten/low/de_DE-thorsten-low.onnx.json
ls -lh

--2023-10-27 03:22:48--  https://huggingface.co/rhasspy/piper-voices/resolve/main/de/de_DE/thorsten/low/de_DE-thorsten-low.onnx
Resolving huggingface.co (huggingface.co)... 18.244.202.60, 18.244.202.73, 18.244.202.118, ...
Connecting to huggingface.co (huggingface.co)|18.244.202.60|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/ed/06/ed062eb100d7bd80d78f61252cd190fca48cbda97eb1753fb827ff3339a6b11c/9ac27fad17cec5c1a791161976a64f026f16fc058b400b1fea62565b8b2cf375?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27de_DE-thorsten-low.onnx%3B+filename%3D%22de_DE-thorsten-low.onnx%22%3B&Expires=1698636168&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY5ODYzNjE2OH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9lZC8wNi9lZDA2MmViMTAwZDdiZDgwZDc4ZjYxMjUyY2QxOTBmY2E0OGNiZGE5N2ViMTc1M2ZiODI3ZmYzMzM5YTZiMTFjLzlhYzI3ZmFkMTdjZWM1YzFhNzkxMTYxOTc2YTY0ZjAyNm



In [2]:
%%shell

pip install piper-phonemize onnx onnxruntime==1.16.0

Collecting piper-phonemize
  Downloading piper_phonemize-1.1.0-cp310-cp310-manylinux_2_28_x86_64.whl (25.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m25.0/25.0 MB[0m [31m56.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting onnx
  Downloading onnx-1.15.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.7/15.7 MB[0m [31m76.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting onnxruntime==1.16.0
  Downloading onnxruntime-1.16.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.2/6.2 MB[0m [31m100.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting coloredlogs (from onnxruntime==1.16.0)
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.0/46.0 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
Collecting humanfriendly>=9.1 (fro



In [5]:
from typing import Dict, Any
import onnx
import json

def add_meta_data(filename: str, meta_data: Dict[str, Any]):
    """Add meta data to an ONNX model. It is changed in-place.

    Args:
      filename:
        Filename of the ONNX model to be changed.
      meta_data:
        Key-value pairs.
    """
    model = onnx.load(filename)
    for key, value in meta_data.items():
        meta = model.metadata_props.add()
        meta.key = key
        meta.value = str(value)

    onnx.save(model, filename)

def load_config(model):
    with open(f"{model}.json", "r") as file:
        config = json.load(file)
    return config

def generate_tokens(config):
    id_map = config["phoneme_id_map"]
    with open("tokens.txt", "w", encoding="utf-8") as f:
        for s, i in id_map.items():
            f.write(f"{s} {i[0]}\n")
    print("Generated tokens.txt")

def main():
  config = load_config('de_DE-thorsten-low.onnx')
  generate_tokens(config)
  _punctuation = ';:,.!?¡¿—…"«»“” '
  meta_data = {
        "model_type": "vits",
        "comment": "piper",
        "language": "German",
        "add_blank": 1,
        "n_speakers": config["num_speakers"],
        "sample_rate": config["audio"]["sample_rate"],
        "punctuation": " ".join(list(_punctuation)),
    }
  print(meta_data)
  add_meta_data("de_DE-thorsten-low.onnx", meta_data)

main()

Generated tokens.txt
{'model_type': 'vits', 'comment': 'piper', 'language': 'German', 'add_blank': 1, 'n_speakers': 1, 'sample_rate': 16000, 'punctuation': '; : , . ! ? ¡ ¿ — … " « » “ ”  '}


In [6]:
%%shell

wget -O german.7z "https://downloads.sourceforge.net/project/germandict/german.7z?ts=gAAAAABlOnvekATxh2d2zse53x7JN4MUscbvCW073dv6CQrbQS-ekmrejSGcey1_MeJhNss6IKtI7BgpEH9ao1CIi4v2zMLULg%3D%3D&use_mirror=deac-riga&r=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fgermandict%2Ffiles%2F"


sudo apt-get install -y p7zip
7z x ./german.7z
ls

--2023-10-27 03:23:05--  https://downloads.sourceforge.net/project/germandict/german.7z?ts=gAAAAABlOnvekATxh2d2zse53x7JN4MUscbvCW073dv6CQrbQS-ekmrejSGcey1_MeJhNss6IKtI7BgpEH9ao1CIi4v2zMLULg%3D%3D&use_mirror=deac-riga&r=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fgermandict%2Ffiles%2F
Resolving downloads.sourceforge.net (downloads.sourceforge.net)... 204.68.111.105
Connecting to downloads.sourceforge.net (downloads.sourceforge.net)|204.68.111.105|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://deac-riga.dl.sourceforge.net/project/germandict/german.7z [following]
--2023-10-27 03:23:06--  https://deac-riga.dl.sourceforge.net/project/germandict/german.7z
Resolving deac-riga.dl.sourceforge.net (deac-riga.dl.sourceforge.net)... 89.111.52.100
Connecting to deac-riga.dl.sourceforge.net (deac-riga.dl.sourceforge.net)|89.111.52.100|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6271782 (6.0M) [application/x-7z-compressed]
Saving to



In [7]:
%%shell

head austriazismen.txt LiesMich.txt autocomplete.txt german.dic helvetismen.txt variants.dic
file austriazismen.txt LiesMich.txt autocomplete.txt german.dic helvetismen.txt variants.dic

==> austriazismen.txt <==
abbeutele
abbeuteln
abbeutelnd
abbeutelnde
abbeutelndem
abbeutelnden
abbeutelnder
abbeutelndes
Abbeutelns
abbeutelst

==> LiesMich.txt <==
Servus und danke f�r Ihr Interesse an diesem Projekt.

Falsch geschriebene, seltsame und fehlende W�rter bitte per Mail an Jan Schreiber unter
	jan.schreiber (at) languagetool.org
Vielen Dank im Voraus!

BenutzerInnen aus der Schweiz mit Grundkenntnissen in Python interessieren sich vielleicht auch f�r das Python-Skript
	https://sourceforge.net/p/germandict/code/HEAD/tree/tools/separateDEandCH.py
Es wandelt '�' zu 'ss' um, ersetzt 'M�sli' durch 'M�esli' und erzeugt aus 'Handy...' die Alternativen 'Natel...'.


==> autocomplete.txt <==
Aachen
Aachener
Aachenerin
Aachenerinnen
Aachenern
Aacheners
Aachens
Abbau
abbaue
abbauen

==> german.dic <==
�l
�m
AA
AAA
Aachen
Aachener
Aachenerin
Aachenerinnen
Aachenern
Aacheners

==> helvetismen.txt <==
�lplermagronen
�ufnens
�bermittlungstruppe



In [8]:
from piper_phonemize import phonemize_espeak
import re

def read_lexicon():
    in_files = [
        "./austriazismen.txt",
        "./autocomplete.txt",
        "./german.dic",
        "./helvetismen.txt",
        "./variants.dic"
    ]

    words = set()
    words.add('Liliana')
    pattern = re.compile("^[a-zA-Z'-\.]+$")
    for in_file in in_files:
      print(in_file)
      with open(in_file, encoding='iso-8859-1') as f:
          for line in f:
              try:
                  word = line.strip().lower()
                  if not pattern.match(word):
                      #  print(line, "word is", word)
                      continue
              except:
                  #  print(line)
                  continue

              # assert word not in words, word
              words.add(word)
    return list(words)



def generate_lexicon():
  config = load_config('de_DE-thorsten-low.onnx')
  words = read_lexicon()
  num_words = len(words)
  print(num_words)

  batch = 5000
  i = 0
  word2phones = dict()
  while i < num_words:
    print(f"{i}/{num_words}, {i/num_words*100:.3f}%")
    this_batch = words[i : i + batch]
    i += batch
    for w in this_batch:
      phonemes = phonemize_espeak(w, config["espeak"]["voice"])[0]
      word2phones[w] = ' '.join(phonemes)

  with open("lexicon.txt", "w", encoding="utf-8") as f:
      for w, p in word2phones.items():
          f.write(f"{w} {p}\n")

generate_lexicon()

./austriazismen.txt
./autocomplete.txt
./german.dic
./helvetismen.txt
./variants.dic
1663258
0/1663258, 0.000%
5000/1663258, 0.301%
10000/1663258, 0.601%
15000/1663258, 0.902%
20000/1663258, 1.202%
25000/1663258, 1.503%
30000/1663258, 1.804%
35000/1663258, 2.104%
40000/1663258, 2.405%
45000/1663258, 2.706%
50000/1663258, 3.006%
55000/1663258, 3.307%
60000/1663258, 3.607%
65000/1663258, 3.908%
70000/1663258, 4.209%
75000/1663258, 4.509%
80000/1663258, 4.810%
85000/1663258, 5.110%
90000/1663258, 5.411%
95000/1663258, 5.712%
100000/1663258, 6.012%
105000/1663258, 6.313%
110000/1663258, 6.614%
115000/1663258, 6.914%
120000/1663258, 7.215%
125000/1663258, 7.515%
130000/1663258, 7.816%
135000/1663258, 8.117%
140000/1663258, 8.417%
145000/1663258, 8.718%
150000/1663258, 9.018%
155000/1663258, 9.319%
160000/1663258, 9.620%
165000/1663258, 9.920%
170000/1663258, 10.221%
175000/1663258, 10.522%
180000/1663258, 10.822%
185000/1663258, 11.123%
190000/1663258, 11.423%
195000/1663258, 11.724%
200000

In [9]:
%%shell

head lexicon.txt

sternwind ʃ t ˈ ɛ ɾ n v ɪ n t
rezitatorin r ˌ e ː t s i ː t ˈ ɑ ː t o ː r ˌ ɪ n
erwerbstitels ɛ ɾ v ˈ ɛ ɾ p s t i ː t ə l s
vergleichseinheiten f ɛ ɾ ɡ l ˈ a ɪ c ̧ z a ɪ n h ˌ a ɪ t ə n
reliefblock r ˈ e ː l i ː f b l ˌ ɔ k
massenverhaftungen m ˈ a s ə n f ɜ h ˌ a f t ʊ ŋ ə n
nahrungsmittelkosten n ˈ ɑ ː r ʊ ŋ s m ˌ ɪ t ə l k ˌ ɔ s t ə n
kupferoxyden k ˈ ʊ p f e ː r ˌ ɔ k s y ː d ə n
grundschliff ɡ ɾ ˈ ʊ n t ʃ l ɪ f
bogenarchitektur b ˌ o ː ɡ ə n ˌ a ɾ c ̧ i ː t ɛ k t ˈ u ː ɾ




In [10]:
%%shell

grep -i -w ende ./lexicon.txt

ende ˈ ɛ n d ə




In [11]:
%%shell

file tokens.txt lexicon.txt

tokens.txt:  Unicode text, UTF-8 text
lexicon.txt: Unicode text, UTF-8 text




## Add a new word to lexicon.txt

In [12]:
config = load_config('de_DE-thorsten-low.onnx')
phonemes = phonemize_espeak("für", config["espeak"]["voice"])[0]
print(phonemes)
print('für', ' '.join(phonemes))
id_map = config["phoneme_id_map"]
for p in phonemes:
  print(id_map[p][0])

['f', 'ˈ', 'y', 'ː', 'ɾ']
für f ˈ y ː ɾ
19
120
37
122
92


In [13]:
%%shell

(echo "für f ˈ y ː ɾ"; cat lexicon.txt) > a.txt
mv a.txt lexicon.txt
head lexicon.txt

für f ˈ y ː ɾ
sternwind ʃ t ˈ ɛ ɾ n v ɪ n t
rezitatorin r ˌ e ː t s i ː t ˈ ɑ ː t o ː r ˌ ɪ n
erwerbstitels ɛ ɾ v ˈ ɛ ɾ p s t i ː t ə l s
vergleichseinheiten f ɛ ɾ ɡ l ˈ a ɪ c ̧ z a ɪ n h ˌ a ɪ t ə n
reliefblock r ˈ e ː l i ː f b l ˌ ɔ k
massenverhaftungen m ˈ a s ə n f ɜ h ˌ a f t ʊ ŋ ə n
nahrungsmittelkosten n ˈ ɑ ː r ʊ ŋ s m ˌ ɪ t ə l k ˌ ɔ s t ə n
kupferoxyden k ˈ ʊ p f e ː r ˌ ɔ k s y ː d ə n
grundschliff ɡ ɾ ˈ ʊ n t ʃ l ɪ f




In [14]:
%%shell

wget https://huggingface.co/rhasspy/piper-voices/resolve/main/de/de_DE/thorsten/low/MODEL_CARD
ls -lh

--2023-10-27 03:35:12--  https://huggingface.co/rhasspy/piper-voices/resolve/main/de/de_DE/thorsten/low/MODEL_CARD
Resolving huggingface.co (huggingface.co)... 65.8.178.12, 65.8.178.27, 65.8.178.93, ...
Connecting to huggingface.co (huggingface.co)|65.8.178.12|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 274 [text/plain]
Saving to: ‘MODEL_CARD’


2023-10-27 03:35:12 (87.7 MB/s) - ‘MODEL_CARD’ saved [274/274]

total 190M
-rw-r--r-- 1 root root  73K Dec 27  2017 austriazismen.txt
-rw-r--r-- 1 root root 143K Sep 25  2021 autocomplete.txt
-rw-r--r-- 1 root root  61M Oct 27 03:23 de_DE-thorsten-low.onnx
-rw-r--r-- 1 root root 4.1K Oct 27 03:22 de_DE-thorsten-low.onnx.json
-rw-r--r-- 1 root root 6.0M Oct  1  2021 german.7z
-rw-r--r-- 1 root root  35M Oct  1  2021 german.dic
-rw-r--r-- 1 root root  48K Jul  8  2021 helvetismen.txt
-rw-r--r-- 1 root root  90M Oct 27 03:33 lexicon.txt
-rw-r--r-- 1 root root  827 Oct 27  2016 LiesMich.txt
-rw-r--r-- 1 root root  274 

