<a href="https://colab.research.google.com/github/marcosfelt/latex2speech/blob/main/tts_latex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text to speech for Latex

This notebook converts Latex into speech. It's useful for having your papers read back to you during editing/proofreading. 

How to use:


1. Click the play button "Setup" to install all the necessary packages
2. From the Google Colab menu, select "Runtime" -> "Restart Runtime". This is necessary to make sure the correct versions of certain packages are used.
3. Paste your latex code into the text box and click play.
4. You'll get your Latex read out to you!

FAQ:

- **Does this remove citation and reference commands?** Yes, automatically done!
- **How long does it take to generate speech?** The total generation pipeline is ~4x realtime, so 1 minute of speech will take ~15 seconds. Note, that the first run will take longer, since the model needs to be downloaded.
- **Can I change the playback speed?** Click on the three dots in the audio player and select "Playback speed."
- **What model does this use?** It uses the [Tacotron-DDC](https://coqui.ai/blog/tts/solving-attention-problems-of-tts-models-with-double-decoder-consistency) model from [Coqui-AI](https://github.com/coqui-ai/TTS).

In [None]:
#@title Setup - Click the play icon

# Needed for inflect
import locale
locale.getpreferredencoding = lambda: "UTF-8"

# Install packages
!pip install TTS inflect pydub

from TTS.api import TTS
from pydub import AudioSegment
from pydub.effects import speedup
import re
import textwrap
import inflect
import string
import random
from IPython.display import display, clear_output, HTML, Audio
from google.colab import files
from pathlib import Path

# Conversion of numbers
p = inflect.engine()
def convert_numbers(matchobj):
    return p.number_to_words(matchobj.group(0))
clear_output(wait=True)

In [2]:
#@title Generate speech

text = "In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs." #@param {type:"string"}
# Clean up latex
# Strip latex citations and references
text = re.sub(r"\\cite\{[A-za-z\d,\s]+\}", "", text)
text = re.sub(r"\\citep\{[A-za-z\d,\s]+\}", "", text)
text = re.sub(r"\\ref\{[A-za-z\d,\s\-\_:]+\}", "", text)
# Convert numbers to words
text = re.sub(r"\d+(\.\d+)?", convert_numbers, text)
# Remove random latex symbols
for s in ["$", "\\", "{" ,"}"]:
  text = text.replace(s, "")
text = text.replace("_", "-")


wavs = []
model_name = "tts_models/en/ljspeech/tacotron2-DDC"
tts = TTS(model_name, gpu=True, progress_bar=False,)
wav = tts.tts(text)
clear_output(wait=True)
print(" \n".join(textwrap.wrap(text, width=70)))
print()
display(Audio(wav, rate=22050))

In this work we propose the Transformer, a model architecture 
eschewing recurrence and instead relying entirely on an attention 
mechanism to draw global dependencies between input and output. The 
Transformer allows for significantly more parallelization and can 
reach a new state of the art in translation quality after being 
trained for as little as twelve hours on eight Pone hundred GPUs.

