# Proof of Concept Testing - Pre-trained Tacotron2

This notebook is intended to show proof of concept for using Coqui-AI to convert ebooks into audiobooks. In order to accomplish this, the notebook will use tools from the epud-parser notebook, as well as the Coqui-AI / TTS engine tools to convert the parsed text into audio files.

**Note:** this notebook is intended for use with the `ttsenv` conda environment.

## Outline of steps

1. Convert .epub file to text string
2. Run text through TTS engine
3. Combine audio files

In [20]:
import ebooklib
import torch
import time
from ebooklib import epub
from bs4 import BeautifulSoup
from IPython.display import Audio
from scipy.io.wavfile import write

In [2]:
# ?torch.hub.load

In [3]:
# device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# map_location=torch.device('cpu')
tacotron2 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp16')
tacotron2 = tacotron2.to('cuda')
tacotron2.eval()

Using cache found in /home/paperspace/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub
  "pytorch_quantization module not found, quantization will not be available"
  "pytorch_quantization module not found, quantization will not be available"


Tacotron2(
  (embedding): Embedding(148, 512)
  (encoder): Encoder(
    (convolutions): ModuleList(
      (0): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (lstm): LSTM(512, 256, batch_first=True, bidirectional=True)
  )
  (decoder): Decoder(
    (prenet): Prenet(
      (layers): ModuleList(
        (0): LinearNorm(
          (lin

## Step 1 - Epub to text

Text here.

In [4]:
book = epub.read_epub('pg2554.epub')
# book = epub.read_epub('epub-test.epub')

Text here.

In [5]:
for item in book.get_items():
   if item.get_type() == ebooklib.ITEM_DOCUMENT:
       print('==================================')
       print('NAME : ', item.get_name())
       print('----------------------------------')
       print(BeautifulSoup(item.get_content(), "html.parser").text)
       print('==================================')

NAME :  5044171427629287690_2554-h-0.htm.html
----------------------------------




The Project Gutenberg eBook of Crime and Punishment, by Fyodor Dostoevsky
This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook.
Title: Crime and Punishment
Author: Fyodor Dostoevsky
Translator: Constance Garnett
Release Date: March, 2001 [eBook #2554]
[Most recently updated: August 6, 2021]
Language: English
Character set encoding: UTF-8
Produced by: John Bickers, Dagny and David Widger
*** START OF THE PROJECT GUTENBERG EBOOK CRIME AND PUNISHMENT ***
CRIME AND PUNISHMENT
By Fyodor Dostoevsky
Translated By Constance Garn






 CHAPTER V
Raskolnikov was already entering the room. He came in looking as though he had the utmost difficulty not to burst out laughing again. Behind him Razumihin strode in gawky and awkward, shamefaced and red as a peony, with an utterly crestfallen and ferocious expression. His face and whole figure really were ridiculous at that moment and amply justified Raskolnikov’s laughter. Raskolnikov, not waiting for an introduction, bowed to Porfiry Petrovitch, who stood in the middle of the room looking inquiringly at them. He held out his hand and shook hands, still apparently making desperate efforts to subdue his mirth and utter a few words to introduce himself. But he had no sooner succeeded in assuming a serious air and muttering something when he suddenly glanced again as though accidentally at Razumihin, and could no longer control himself: his stifled laughter broke out the more irresistibly the more he tried to restrain it. The extraordinary ferocity with which Razumihin re






 CHAPTER I
The morning that followed the fateful interview with Dounia and her mother brought sobering influences to bear on Pyotr Petrovitch. Intensely unpleasant as it was, he was forced little by little to accept as a fact beyond recall what had seemed to him only the day before fantastic and incredible. The black snake of wounded vanity had been gnawing at his heart all night. When he got out of bed, Pyotr Petrovitch immediately looked in the looking-glass. He was afraid that he had jaundice. However his health seemed unimpaired so far, and looking at his noble, clear-skinned countenance which had grown fattish of late, Pyotr Petrovitch for an instant was positively comforted in the conviction that he would find another bride and, perhaps, even a better one. But coming back to the sense of his present position, he turned aside and spat vigorously, which excited a sarcastic smile in Andrey Semyonovitch Lebeziatnikov, the young friend with whom he was staying. That smile Pyotr P

## Step 2 - Tacotron2 from TorchHub
Text here.

In [6]:
waveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', model_math='fp16')
waveglow = waveglow.remove_weightnorm(waveglow)
waveglow = waveglow.to('cuda')
waveglow.eval()

Using cache found in /home/paperspace/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


WaveGlow(
  (upsample): ConvTranspose1d(80, 80, kernel_size=(1024,), stride=(256,))
  (WN): ModuleList(
    (0): WN(
      (in_layers): ModuleList(
        (0): Conv1d(512, 1024, kernel_size=(3,), stride=(1,), padding=(1,))
        (1): Conv1d(512, 1024, kernel_size=(3,), stride=(1,), padding=(2,), dilation=(2,))
        (2): Conv1d(512, 1024, kernel_size=(3,), stride=(1,), padding=(4,), dilation=(4,))
        (3): Conv1d(512, 1024, kernel_size=(3,), stride=(1,), padding=(8,), dilation=(8,))
        (4): Conv1d(512, 1024, kernel_size=(3,), stride=(1,), padding=(16,), dilation=(16,))
        (5): Conv1d(512, 1024, kernel_size=(3,), stride=(1,), padding=(32,), dilation=(32,))
        (6): Conv1d(512, 1024, kernel_size=(3,), stride=(1,), padding=(64,), dilation=(64,))
        (7): Conv1d(512, 1024, kernel_size=(3,), stride=(1,), padding=(128,), dilation=(128,))
      )
      (res_skip_layers): ModuleList(
        (0): Conv1d(512, 1024, kernel_size=(1,), stride=(1,))
        (1): Conv1d(51

In [7]:
text = "Hello world, I missed you so much."

In [8]:
utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils')
sequences, lengths = utils.prepare_input_sequence([text])

Using cache found in /home/paperspace/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


In [9]:
with torch.no_grad():
    mel, _, _ = tacotron2.infer(sequences, lengths)
    audio = waveglow.infer(mel)
audio_numpy = audio[0].data.cpu().numpy()
rate = 22050

In [10]:
write("audio.wav", rate, audio_numpy)

In [11]:
Audio(audio_numpy, rate=rate)

## Step 3 - Text from epub to .wav file

Text here.

In [13]:
for item in book.get_items():
    if item.get_name() == "5044171427629287690_2554-h-1.htm.html":
        # set up text sample and path
        input_text = BeautifulSoup(item.get_content(), "html.parser").text
        input_text = input_text.replace('—', '')
        text_list = input_text.split('\n')
        
# print(input_text)
# print(text_list)
print(text_list[6])

A few words about Dostoevsky himself may help the English reader to understand his work.


In [14]:
utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils')
sequences, lengths = utils.prepare_input_sequence([text_list[6]])

Using cache found in /home/paperspace/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


In [15]:
with torch.no_grad():
    mel, _, _ = tacotron2.infer(sequences, lengths)
    audio = waveglow.infer(mel)
audio_numpy = audio[0].data.cpu().numpy()
rate = 22050

In [16]:
Audio(audio_numpy, rate=rate)

## Step 4 - Convert paragraph

Text here.

In [21]:
start = time.time()

for sample in text_list:
    if sample:      
        sample_index = text_list.index(sample)
        sample_path = "output"+str(sample_index)+".wav"
        
        print("Sample to convert:" + "\n" + sample)
        print("Index of sample:" + str(sample_index))
        print("Output path:" + str(sample_path))
        print(str(len(sample)) + "\n\n")
        
        utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils')
        sequences, lengths = utils.prepare_input_sequence([sample])
        
        with torch.no_grad():
            mel, _, _ = tacotron2.infer(sequences, lengths)
            audio = waveglow.infer(mel)
        audio_numpy = audio[0].data.cpu().numpy()
        rate = 22050
        
        write(sample_path, rate, audio_numpy)
        
end = time.time()
print("The time of execution of above program is :", end-start)

Sample to convert:
 TRANSLATOR’S PREFACE
Index of sample:5
Output path:output5.wav
21




Using cache found in /home/paperspace/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


Sample to convert:
A few words about Dostoevsky himself may help the English reader to understand his work.
Index of sample:6
Output path:output6.wav
88




Using cache found in /home/paperspace/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


Sample to convert:
Dostoevsky was the son of a doctor. His parents were very hard-working and deeply religious people, but so poor that they lived with their five children in only two rooms. The father and mother spent their evenings in reading aloud to their children, generally from books of a serious character.
Index of sample:7
Output path:output7.wav
295




Using cache found in /home/paperspace/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


Sample to convert:
Though always sickly and delicate Dostoevsky came out third in the final examination of the Petersburg school of Engineering. There he had already begun his first work, “Poor Folk.”
Index of sample:8
Output path:output8.wav
181




Using cache found in /home/paperspace/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


Sample to convert:
This story was published by the poet Nekrassov in his review and was received with acclamations. The shy, unknown youth found himself instantly something of a celebrity. A brilliant and successful career seemed to open before him, but those hopes were soon dashed. In 1849 he was arrested.
Index of sample:9
Output path:output9.wav
289




Using cache found in /home/paperspace/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


Sample to convert:
Though neither by temperament nor conviction a revolutionist, Dostoevsky was one of a little group of young men who met together to read Fourier and Proudhon. He was accused of “taking part in conversations against the censorship, of reading a letter from Byelinsky to Gogol, and of knowing of the intention to set up a printing press.” Under Nicholas I. (that “stern and just man,” as Maurice Baring calls him) this was enough, and he was condemned to death. After eight months’ imprisonment he was with twenty-one others taken out to the Semyonovsky Square to be shot. Writing to his brother Mihail, Dostoevsky says: “They snapped words over our heads, and they made us put on the white shirts worn by persons condemned to death. Thereupon we were bound in threes to stakes, to suffer execution. Being the third in the row, I concluded I had only a few minutes of life before me. I thought of you and your dear ones and I contrived to kiss Plestcheiev and Dourov, who were next t

Using cache found in /home/paperspace/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


Sample to convert:
One of the prisoners, Grigoryev, went mad as soon as he was untied, and never regained his sanity.
Index of sample:11
Output path:output11.wav
98




Using cache found in /home/paperspace/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


Sample to convert:
The intense suffering of this experience left a lasting stamp on Dostoevsky’s mind. Though his religious temper led him in the end to accept every suffering with resignation and to regard it as a blessing in his own case, he constantly recurs to the subject in his writings. He describes the awful agony of the condemned man and insists on the cruelty of inflicting such torture. Then followed four years of penal servitude, spent in the company of common criminals in Siberia, where he began the “Dead House,” and some years of service in a disciplinary battalion.
Index of sample:12
Output path:output12.wav
565




Using cache found in /home/paperspace/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


Sample to convert:
He had shown signs of some obscure nervous disease before his arrest and this now developed into violent attacks of epilepsy, from which he suffered for the rest of his life. The fits occurred three or four times a year and were more frequent in periods of great strain. In 1859 he was allowed to return to Russia. He started a journal“Vremya,” which was forbidden by the Censorship through a misunderstanding. In 1864 he lost his first wife and his brother Mihail. He was in terrible poverty, yet he took upon himself the payment of his brother’s debts. He started another journal“The Epoch,” which within a few months was also prohibited. He was weighed down by debt, his brother’s family was dependent on him, he was forced to write at heart-breaking speed, and is said never to have corrected his work. The later years of his life were much softened by the tenderness and devotion of his second wife.
Index of sample:13
Output path:output13.wav
904




Using cache found in /home/paperspace/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


Sample to convert:
In June 1880 he made his famous speech at the unveiling of the monument to Pushkin in Moscow and he was received with extraordinary demonstrations of love and honour.
Index of sample:14
Output path:output14.wav
166




Using cache found in /home/paperspace/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


Sample to convert:
A few months later Dostoevsky died. He was followed to the grave by a vast multitude of mourners, who “gave the hapless man the funeral of a king.” He is still probably the most widely read writer in Russia.
Index of sample:15
Output path:output15.wav
207




Using cache found in /home/paperspace/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


Sample to convert:
In the words of a Russian critic, who seeks to explain the feeling inspired by Dostoevsky: “He was one of ourselves, a man of our blood and our bone, but one who has suffered and has seen so much more deeply than we have his insight impresses us as wisdom... that wisdom of the heart which we seek that we may learn from it how to live. All his other gifts came to him from nature, this he won for himself and through it he became great.”
Index of sample:16
Output path:output16.wav
438




Using cache found in /home/paperspace/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


The time of execution of above program is : 82.51393866539001


In [27]:
Audio("output7.wav", rate=rate)