# Audiobook Generator - Proof of Concept

This notebook is intended to be a proof of concept for the end-to-end work of generating an audiobook file from an ebook. This includes converting the .epub book files into raw python trxt strings, splitting into items and sentences, then tokenizing and batching them to run through the Silero implementation.

In [1]:
import torch
import torchaudio
from omegaconf import OmegaConf
from tqdm.notebook import tqdm

torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml',
                               'latest_silero_models.yml',
                               progress=False)
models = OmegaConf.load('latest_silero_models.yml')

seed = 1337
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)

In [2]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [3]:
# pg2554.epub = Crime and Punishment
# pg174.epub = Portrait of Dorian Gray
# pg1342.epub = Pride And Prejudice
ebook_path = 'pg174.epub'
sample_rate = 24000
max_char_len = 150

In [4]:
language = 'en'
model_id = 'v3_en'
speaker = 'en_0'

model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                     model='silero_tts',
                                     language=language,
                                     speaker=model_id)
model.to(device)  # gpu or cpu

Using cache found in /home/matthew/.cache/torch/hub/snakers4_silero-models_master


In [5]:
def read_ebook(ebook_path):
    
    import ebooklib
    from ebooklib import epub
    from bs4 import BeautifulSoup
    from tqdm.notebook import tqdm
    from nltk import tokenize, download
    from textwrap import TextWrapper
    
    download('punkt')
    wrapper = TextWrapper(max_char_len, fix_sentence_endings=True)
    
    book = epub.read_epub(ebook_path)

    corpus = []
    for item in tqdm(list(book.get_items())):
        if item.get_type() == ebooklib.ITEM_DOCUMENT:
            input_text = BeautifulSoup(item.get_content(), "html.parser").text
            text_list = []
            for paragraph in input_text.split('\n'):
                paragraph = paragraph.replace('—', '-')
                sentences = tokenize.sent_tokenize(paragraph)
                
                # Truncate sentences to maximum character limit
                sentence_list = []
                for sentence in sentences:
                    wrapped_sentences = wrapper.wrap(sentence)
                    sentence_list.append(wrapped_sentences)
                # Flatten list of list of sentences
                trunc_sentences = [phrase for sublist in sentence_list for phrase in sublist]
                
                text_list.append(trunc_sentences)
            text_list = [text for sentences in text_list for text in sentences]
            corpus.append(text_list)

    return corpus

In [6]:
ebook = read_ebook(ebook_path)

[nltk_data] Downloading package punkt to /home/matthew/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


  0%|          | 0/28 [00:00<?, ?it/s]

In [7]:
for chapter in tqdm(ebook):
    chapter_index = f'{ebook.index(chapter):03}'
    audio_list = []
    for sentence in tqdm(chapter):
        audio = model.apply_tts(text=sentence,
                            speaker=speaker,
                            sample_rate=sample_rate)
        if len(audio) > 0 and isinstance(audio, torch.Tensor):
            audio_list.append(audio)
        else:
            print(f'Tensor for sentence is not valid: \n {sentence}')

    sample_path = "outputs/silero/chapter"+str(chapter_index)+".wav"
    
    if len(audio_list) > 0:
        audio_file = torch.cat(audio_list).reshape(1,-1)
        torchaudio.save(sample_path, audio_file, sample_rate)
    else:
        print(f'Chapter {chapter_index} is empty.')

  0%|          | 0/23 [00:00<?, ?it/s]

  0%|          | 0/38 [00:00<?, ?it/s]

  0%|          | 0/36 [00:00<?, ?it/s]

  0%|          | 0/383 [00:00<?, ?it/s]

  0%|          | 0/517 [00:00<?, ?it/s]

  0%|          | 0/385 [00:00<?, ?it/s]

  0%|          | 0/491 [00:00<?, ?it/s]

  0%|          | 0/440 [00:00<?, ?it/s]

  0%|          | 0/254 [00:00<?, ?it/s]

  0%|          | 0/419 [00:00<?, ?it/s]

  0%|          | 0/463 [00:00<?, ?it/s]

  0%|          | 0/361 [00:00<?, ?it/s]

  0%|          | 0/253 [00:00<?, ?it/s]

  0%|          | 0/401 [00:00<?, ?it/s]

  0%|          | 0/256 [00:00<?, ?it/s]

  0%|          | 0/233 [00:00<?, ?it/s]

  0%|          | 0/405 [00:00<?, ?it/s]

  0%|          | 0/279 [00:00<?, ?it/s]

  0%|          | 0/275 [00:00<?, ?it/s]

  0%|          | 0/216 [00:00<?, ?it/s]

  0%|          | 0/323 [00:00<?, ?it/s]

  0%|          | 0/352 [00:00<?, ?it/s]

  0%|          | 0/374 [00:00<?, ?it/s]

0it [00:00, ?it/s]

Chapter 022 is empty.


### Results

##### CPU (i7-4790k)

Running "Pride and Prejudice" through the Silero model took **34m42s** to convert. This book is a good representation of the average book length: the average audiobook length on Audible is between 10 & 12 hours, while Pride and Prejudice is 11h20m.

This is approximately a 20:1 ratio of audio length to processing time.

Pride and Prejudice: **34m42s** - 1h39m33s on i7-4650u

Portrait of Dorian Gray: **18m18s** - 18m50s w/output

Crime and Punishment: **Unknown** - error converting ebook at 7/50, 19/368

##### GPU (P4000)

Running the same book through the Silero model on GPU took **5m39s** to convert.

This is approximately a 122:1 ratio of audio length to processing time.

Pride and Prejudice: **5m39s**

Portrait of Dorian Gray: **4m26s**

Crime and Punishment: **Unknown** - error converting ebook