# SDialog dependencies

In [1]:
# Setup the environment depending on weather we are running in Google Colab or Jupyter Notebook
import os
from IPython import get_ipython

if "google.colab" in str(get_ipython()):
    print("Running on CoLab")

    # Installing Ollama (if you are not planning to use Ollama, you can just comment these lines to speed up the installation)
    !curl -fsSL https://ollama.com/install.sh | sh

    # Installing sdialog
    !git clone https://github.com/qanastek/sdialog.git
    %cd sdialog
    %pip install -e .
    %cd ..
else:
    print("Running in Jupyter Notebook")
    # Little hack to avoid the "OSError: Background processes not supported." error in Jupyter notebooks"
    get_ipython().system = os.system

Running in Jupyter Notebook


## Locally

Create a `.venv` using the root `requirement.txt` file and Python `3.11.14`

# Tutorial 13: Voices database

## Instanciate voices database from HuggingFace HUB

In [2]:
from sdialog.audio.voice_database import HuggingfaceVoiceDatabase
voices_libritts = HuggingfaceVoiceDatabase("sdialog/voices-libritts")

  from .autonotebook import tqdm as notebook_tqdm
Downloading data: 100%|██████████| 2456/2456 [00:00<00:00, 5483.18files/s]
Generating train split: 100%|██████████| 2455/2455 [00:00<00:00, 28860.89 examples/s]
[2025-10-14 02:39:21] INFO:root:Voice database populated with 2455 voices


or if you encounter any issue during the download due to timeout:

In [3]:
%%script false --no-raise-error
!hf download sdialog/voices-libritts --repo-type dataset

If you encounter `We had to rate limit your IP (2a02:8429:4cfb:8b01:5476:95f0:3c2d:9880). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API.` please follow those steps to login (`hf auth login`) with your HuggingFace account on the huggingface cli: [URL HF CLI DOCS](https://huggingface.co/docs/huggingface_hub/guides/cli#hf-auth-login)

Once the database of voice is downloaded and created in the local cache, we will select a voice for a `20` years old `female`.

In [4]:
voices_libritts.get_voice(gender="female", age=20)

{'identifier': 3926,
 'voice': '/Users/yanislabrak/.cache/huggingface/hub/datasets--sdialog--voices-libritts/snapshots/86e8c47af749a36c479e6a24c264c5c8beb3563c/audio/3926_Denise_Lacey.wav'}

You can also prevent voice to be selected twice, expliciting the parameter `keep_duplicate`:

In [5]:
voices_libritts.get_voice(gender="female", age=20, keep_duplicate=False)

{'identifier': 1731,
 'voice': '/Users/yanislabrak/.cache/huggingface/hub/datasets--sdialog--voices-libritts/snapshots/86e8c47af749a36c479e6a24c264c5c8beb3563c/audio/1731_Dani.wav'}

And when you want to reset this list of used voices you can use:

In [6]:
voices_libritts.reset_used_voices()

## Custom local voice database

Download voices from our `demo` repository.

In [7]:
import os

# If directory my_custom_voices is not present, download it
if os.path.exists("my_custom_voices"):
    print("my_custom_voices already exists")
else:
    !wget https://raw.githubusercontent.com/qanastek/sdialog/refs/heads/main/tests/data/my_custom_voices.zip
    !unzip my_custom_voices.zip
    !rm my_custom_voices.zip

--2025-10-14 02:39:22--  https://raw.githubusercontent.com/qanastek/sdialog/refs/heads/main/tests/data/my_custom_voices.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8002::154, 2606:50c0:8003::154, 2606:50c0:8000::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8002::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6636196 (6.3M) [application/zip]
Saving to: ‘my_custom_voices.zip’

     0K .......... .......... .......... .......... ..........  0% 3.12M 2s
    50K .......... .......... .......... .......... ..........  1% 9.03M 1s
   100K .......... .......... .......... .......... ..........  2% 6.54M 1s
   150K .......... .......... .......... .......... ..........  3% 16.7M 1s
   200K .......... .......... .......... .......... ..........  3% 6.44M 1s
   250K .......... .......... .......... .......... ..........  4% 20.7M 1s
   300K .......... .......... .......... .......... ....

Archive:  my_custom_voices.zip
   creating: my_custom_voices
  inflating: my_custom_voices/yanis.wav  
  inflating: my_custom_voices/metadata.tsv  
  inflating: my_custom_voices/metadata.json  
  inflating: my_custom_voices/thomas.wav  
  inflating: my_custom_voices/metadata.csv  


Once the voices are downloaded in the directory `./my_custom_voices/`, we will create the metadata file that contains the ages, genders and the corresponding voice file for each of the speakers.

In [8]:
from sdialog.audio.voice_database import LocalVoiceDatabase

With CSV metadata file:

In [9]:
voice_database = LocalVoiceDatabase(
    directory_audios="./my_custom_voices/",
    metadata_file="./my_custom_voices/metadata.csv"
)
voice_database.get_voice(gender="female", age=20)

[2025-10-14 02:39:22] INFO:root:Voice database populated with 4 voices


{'identifier': 4,
 'voice': '/Users/yanislabrak/Desktop/HUB/PostJSALT/sdialog/tutorials/my_custom_voices/yanis.wav'}

With TSV metadata file:

In [10]:
voice_database = LocalVoiceDatabase(
    directory_audios="./my_custom_voices/",
    metadata_file="./my_custom_voices/metadata.tsv"
)
voice_database.get_voice(gender="female", age=21)

[2025-10-14 02:39:22] INFO:root:Voice database populated with 4 voices


{'identifier': 3,
 'voice': '/Users/yanislabrak/Desktop/HUB/PostJSALT/sdialog/tutorials/my_custom_voices/thomas.wav'}

With JSON metadata file:

In [11]:
voice_database = LocalVoiceDatabase(
    directory_audios="./my_custom_voices/",
    metadata_file="./my_custom_voices/metadata.json"
)

[2025-10-14 02:39:22] INFO:root:Voice database populated with 4 voices


In [12]:
voice_database.get_voice(gender="female", age=20)

{'identifier': 4,
 'voice': '/Users/yanislabrak/Desktop/HUB/PostJSALT/sdialog/tutorials/my_custom_voices/yanis.wav'}

# Language specific voices

By default all the voices are imported or fetch from/in the database is `english` if no language is specified.

Otherwise, you are able to mention the language you want to work with when you add or get a voice as shown in the following code snippet:

In [13]:
voice_database = LocalVoiceDatabase(
    directory_audios="./my_custom_voices/",
    metadata_file="./my_custom_voices/metadata.json"
)

[2025-10-14 02:39:22] INFO:root:Voice database populated with 4 voices


In [14]:
voice_database.add_voice(
    gender="female",
    age=42,
    identifier="french_female_42",
    path="./my_custom_voices/french_female_42.wav",
    lang="french"
)

Now that a French voice is available in the database, we can retrieve it.

In [15]:
voice_database.get_voice(gender="female", age=20, lang="french")

{'identifier': 'french_female_42',
 'voice': './my_custom_voices/french_female_42.wav'}

But if no voice are available in the targetted language, an error will be thrown:

In [16]:
try:
    voice_database.get_voice(gender="female", age=20, lang="hindi")
except ValueError as e:
    print("Normal error in this case:", e)

Normal error in this case: Language hindi not found in the database
