# Controllable TalkNet
To run TalkNet, click on Runtime -> Run all. The interface will appear at the bottom of the page when it's ready.

## Instructions

*   Once the notebook is running, click on Files (the folder icon on the left edge). 
*   Upload audio clips of a singing or speaking voice by dragging and dropping them onto the sidebar.
*   Click on "Update file list" in the TalkNet interface. Select an audio file from the dropdown, and type what it says into the Transcript box.
*   Select a character, and press Generate. The first line will take a little longer to generate.

## Tips and tricks
*   If you want to use TalkNet as regular text-to-speech system, without any reference audio, tick the "Disable reference audio" checkbox.
*   You can use [ARPABET](http://www.speech.cs.cmu.edu/cgi-bin/cmudict) to override the pronunciation of words, like this: *She made a little bow, then she picked up her {B OW}.*
*   If you're running out of memory generating lines, try to work with shorter clips.
*   The singing models are trained on very little data, and can have a hard time pronouncing certain words. Try experimenting with ARPABET and punctuation.
*   If the voice is off-key, the problem is usually with the extracted pitch. Press "Debug pitch" to listen to it. Reference audio with lots of echo/reverb or background noise, or singers with a very high vocal range can cause issues.
*   If the singing voice sounds strained, try enabling "Change input pitch" and adjusting it up or down a few semitones. If you're remixing a song, remember to pitch-shift your background track as well.

In [1]:
#@markdown **Step 1:** Check which GPU you've been allocated.

!nvidia-smi -L
!nvidia-smi

GPU 0: Tesla T4 (UUID: GPU-7e4e6f2b-cd49-00ad-fab7-058c9722d035)
Thu Apr 13 04:20:18 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   51C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+----------------------

In [None]:
#@markdown **Step 2:** Download dependencies.
%tensorflow_version 2.x
import os

custom_lists = [
    #"https://gist.github.com/emmi-01/5c76aaa76d0e1cd5e09b7f9bb0333a79#file-example_models-json",
]

!apt-get install sox libsndfile1 ffmpeg
!pip install torch==1.8.1
!pip install tensorflow dash==1.21.0 dash-bootstrap-components==0.13.0 jupyter-dash==0.4.0 psola wget unidecode pysptk frozendict torchvision==0.9.1 torchaudio==0.8.1 torchtext torch_stft kaldiio pydub pyannote.audio g2p_en pesq pystoi crepe resampy ffmpeg-python torchcrepe einops taming-transformers-rom1504==0.0.6 tensorflow-hub
!pip install --upgrade --no-cache-dir gdown
!python -m pip install git+https://github.com/emmi-01/NeMo.git
if not os.path.exists("hifi-gan"):
    !git clone -q --recursive https://github.com/emmi-01/hifi-gan
!git clone -q https://github.com/emmi-01/ControllableTalkNet
os.chdir("/content/ControllableTalkNet")
!git archive --output=./files.tar --format=tar HEAD
os.chdir("/content")
!tar xf ControllableTalkNet/files.tar
!rm -rf ControllableTalkNet

# PESQ fix
!python -m pip uninstall -y pesq
!python -m pip uninstall -y numpy
!python -m pip install numpy==1.19.5
!python -m pip --no-cache-dir install --no-build-isolation --no-binary :all: pesq==0.0.2

# werkzeug fix
!python -m pip install werkzeug==2.0.0 flask==2.1.3

# 3.9 fix
!pip install torch==1.8.1 torchmetrics==0.6.0 pytorch-lightning==1.3.8
!pip uninstall numpy -y
!pip install -U numpy numba --no-cache-dir

os.chdir("/content/model_lists")
for c in custom_lists:
    !wget "{c}"
os.chdir("/content")


Colab only includes TensorFlow 2.x; %tensorflow_version has no effect.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
libsndfile1 is already the newest version (1.0.28-7ubuntu0.1).
ffmpeg is already the newest version (7:4.2.7-0ubuntu0.1).
The following additional packages will be installed:
  libopencore-amrnb0 libopencore-amrwb0 libsox-fmt-alsa libsox-fmt-base
  libsox3
Suggested packages:
  libsox-fmt-all
The following NEW packages will be installed:
  libopencore-amrnb0 libopencore-amrwb0 libsox-fmt-alsa libsox-fmt-base
  libsox3 sox
0 upgraded, 6 newly installed, 0 to remove and 24 not upgraded.
Need to get 513 kB of archives.
After this operation, 1,564 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal/universe amd64 libopencore-amrnb0 amd64 0.1.5-1 [94.8 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal/universe amd64 libopencore-amrwb0 amd64 0.1.5-1 [49.1 kB]
Get:3 http://archive.ubuntu.com/ubu

In [None]:
# @markdown **Step 3:** **Restart the runtime before running this step!** Then run this cell.
using_inline = True
import pkg_resources
from pkg_resources import DistributionNotFound, VersionConflict
"""dependencies = [
"tensorflow==2.4.1", 
"dash", 
"jupyter-dash", 
"psola", 
"wget", 
"unidecode", 
"pysptk", 
"frozendict", 
"torchvision==0.9.1", 
"torchaudio==0.8.1", 
"torchtext==0.9.1", 
"torch_stft", 
"kaldiio", 
"pydub", 
"pyannote.audio", 
"g2p_en", 
"pesq", 
"pystoi", 
"crepe", 
"resampy", 
"ffmpeg-python",
"numpy",
"scipy",
"nemo_toolkit",
"tqdm",
"gdown",
]
pkg_resources.require(dependencies)"""

from controllable_talknet import *
app.run_server(
    mode="inline",
    #dev_tools_ui=True,
    #dev_tools_hot_reload=True,
    threaded=True,
)

In [None]:
# @markdown **Step 3B:** If the above fails with a 403 error, do the following:
# @markdown * Go to Runtime -> Restart runtime
# @markdown * Run this cell (click the play button)
# @markdown * Click on the googleusercontent.com link to use TalkNet in a separate tab
try:
    using_inline
except:
    using_inline = False
if not using_inline:
    from controllable_talknet import *
    from google.colab.output import eval_js

    print(eval_js("google.colab.kernel.proxyPort(8050)"))
    app.run_server(
        mode="external",
        debug=False,
        #dev_tools_ui=True,
        #dev_tools_hot_reload=True,
        threaded=True,
    )

[NeMo W 2023-03-14 20:22:40 optimizers:47] Apex was not found. Using the lamb optimizer will error out.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package cmudict to /root/nltk_data...
[nltk_data]   Unzipping corpora/cmudict.zip.
[NeMo W 2023-03-14 20:22:43 experimental:27] Module <class 'nemo.collections.asr.data.audio_to_text_dali.AudioToCharDALIDataset'> is experimental, not ready for production and is not fully supported. Use at your own risk.


[NeMo I 2023-03-14 20:22:43 cloud:66] Downloading from: https://api.ngc.nvidia.com/v2/models/nvidia/nemo/asr_talknet_aligner/versions/1.0.0rc1/files/qn5x5_libri_tts_phonemes.nemo to /root/.cache/torch/NeMo/NeMo_1.0.2/qn5x5_libri_tts_phonemes/656c7439dd3a0d614978529371be498b/qn5x5_libri_tts_phonemes.nemo
[NeMo I 2023-03-14 20:22:46 common:676] Instantiating model from pre-trained checkpoint


[NeMo W 2023-03-14 20:22:47 features:229] Using torch_stft is deprecated and will be removed in 1.1.0. Please set stft_conv and stft_exact_pad to False for FilterbankFeatures and AudioToMelSpectrogramPreprocessor. Please set exact_pad to True as needed.


[NeMo I 2023-03-14 20:22:47 features:252] PADDING: 1
[NeMo I 2023-03-14 20:22:47 features:262] STFT using conv


      fft_window = pad_center(fft_window, filter_length)
    
      librosa.filters.mel(sample_rate, self.n_fft, n_mels=nfilt, fmin=lowfreq, fmax=highfreq), dtype=torch.float
    


[NeMo I 2023-03-14 20:22:53 modelPT:439] Model EncDecCTCModel was successfully restored from /root/.cache/torch/NeMo/NeMo_1.0.2/qn5x5_libri_tts_phonemes/656c7439dd3a0d614978529371be498b/qn5x5_libri_tts_phonemes.nemo.


INFO:werkzeug: * Running on http://127.0.0.1:8050/ (Press CTRL+C to quit)
INFO:werkzeug:127.0.0.1 - - [14/Mar/2023 20:22:54] "GET /_alive_364386a2-76e0-45d6-b780-7887cb53bb58 HTTP/1.1" 200 -


https://3xzy7jiqaef-496ff2e9c6d22116-8050-colab.googleusercontent.com/
Dash app running on:


<IPython.core.display.Javascript object>