# SDialog dependencies

In [1]:
# Setup the environment depending on weather we are running in Google Colab or Jupyter Notebook
import os
from IPython import get_ipython

if "google.colab" in str(get_ipython()):
    print("Running on CoLab")

    # Installing Ollama (if you are not planning to use Ollama, you can just comment these lines to speed up the installation)
    !curl -fsSL https://ollama.com/install.sh | sh

    # Installing sdialog
    !git clone https://github.com/qanastek/sdialog.git
    %cd sdialog
    %pip install -e .
    %cd ..
else:
    print("Running in Jupyter Notebook")
    # Little hack to avoid the "OSError: Background processes not supported." error in Jupyter notebooks"
    get_ipython().system = os.system

Running in Jupyter Notebook


## Local installation

Create a `.venv` using the root `requirement.txt` file and Python `3.11.14`

# Tutorial 13: Voices database

## Instanciate voices database from HuggingFace HUB

In [2]:
from sdialog.audio.voice_database import HuggingfaceVoiceDatabase
voices_libritts = HuggingfaceVoiceDatabase("sdialog/voices-libritts")

  from .autonotebook import tqdm as notebook_tqdm
[2025-10-17 23:16:42] INFO:root:Voice database populated with 2455 voices


or if you encounter any issue during the download due to timeout:

In [3]:
%%script false --no-raise-error
!hf download sdialog/voices-libritts --repo-type dataset

If you encounter `We had to rate limit your IP (2a02:8429:4cfb:8b01:5476:95f0:3c2d:9880). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API.` please follow those steps to login (`hf auth login`) with your HuggingFace account on the huggingface cli: [URL HF CLI DOCS](https://huggingface.co/docs/huggingface_hub/guides/cli#hf-auth-login)

Once the database of voice is downloaded and created in the local cache, we will select a voice for a `20` years old `female`.

In [4]:
voices_libritts.get_voice(gender="female", age=20)

Voice(gender='female', age=-1, identifier='8197', voice='/Users/yanislabrak/.cache/huggingface/hub/datasets--sdialog--voices-libritts/snapshots/9bc6e21fcd960dc8e0843fd840cd7ced0ed5feb5/audio/8197_Victoria_P.wav', language='english', language_code='e')

You can also prevent voice to be selected twice, expliciting the parameter `keep_duplicate`:

In [5]:
voices_libritts.get_voice(gender="female", age=20, keep_duplicate=False)

Voice(gender='female', age=-1, identifier='4116', voice='/Users/yanislabrak/.cache/huggingface/hub/datasets--sdialog--voices-libritts/snapshots/9bc6e21fcd960dc8e0843fd840cd7ced0ed5feb5/audio/4116_Amy_Benton.wav', language='english', language_code='e')

And when you want to reset this list of used voices you can use:

In [6]:
voices_libritts.reset_used_voices()

### You can also use HuggingFace datasets that store names of pre-defined voices like for Kokoro

In [7]:
from sdialog.audio.voice_database import HuggingfaceVoiceDatabase
voices_kokoro = HuggingfaceVoiceDatabase("sdialog/voices-kokoro")

[2025-10-17 23:16:44] INFO:root:Voice database populated with 54 voices


In [8]:
print(voices_kokoro.get_statistics(pretty=True))

### Voice Database Statistics



#### Overall

Number of languages: 8
Total voices: 54
By gender: male: 25, female: 29
Ages (first 10 bins sorted): 0:54



#### By Language (summary)

| dataset    |   total |   male |   female |   unique_speakers |   age_min |   age_mean |   age_max | codes   |
|:-----------|--------:|-------:|---------:|------------------:|----------:|-----------:|----------:|:--------|
| english    |      28 |     13 |       15 |                28 |         0 |       0.00 |         0 | a,b     |
| chinese    |       8 |      4 |        4 |                 8 |         0 |       0.00 |         0 | z       |
| japanese   |       5 |      1 |        4 |                 5 |         0 |       0.00 |         0 | j       |
| hindi      |       4 |      2 |        2 |                 4 |         0 |       0.00 |         0 | h       |
| spanish    |       3 |      2 |        1 |                 3 |         0 |       0.00 |         0 | e       |
| portuguese |       3 |      2 

## Custom local voice database

Download voices from our `demo` repository.

In [9]:
import os

# If directory my_custom_voices is not present, download it
if os.path.exists("my_custom_voices"):
    print("my_custom_voices already exists")
else:
    !wget https://raw.githubusercontent.com/qanastek/sdialog/refs/heads/main/tests/data/my_custom_voices.zip
    !unzip my_custom_voices.zip
    !rm my_custom_voices.zip

my_custom_voices already exists


Once the voices are downloaded in the directory `./my_custom_voices/`, we will create the metadata file that contains the ages, genders and the corresponding voice file for each of the speakers.

In [10]:
from sdialog.audio.voice_database import LocalVoiceDatabase

With CSV metadata file:

In [11]:
voice_database = LocalVoiceDatabase(
    directory_audios="./my_custom_voices/",
    metadata_file="./my_custom_voices/metadata.csv"
)
voice_database.get_voice(gender="female", age=20)

[2025-10-17 23:16:44] INFO:root:Voice database populated with 4 voices


Voice(gender='female', age=20, identifier='4', voice='/Users/yanislabrak/Desktop/HUB/PostJSALT/sdialog/tutorials/my_custom_voices/yanis.wav', language='english', language_code='e')

With TSV metadata file:

In [12]:
voice_database = LocalVoiceDatabase(
    directory_audios="./my_custom_voices/",
    metadata_file="./my_custom_voices/metadata.tsv"
)
voice_database.get_voice(gender="female", age=21)

[2025-10-17 23:16:45] INFO:root:Voice database populated with 4 voices


Voice(gender='female', age=21, identifier='3', voice='/Users/yanislabrak/Desktop/HUB/PostJSALT/sdialog/tutorials/my_custom_voices/thomas.wav', language='english', language_code='e')

With JSON metadata file:

In [13]:
voice_database = LocalVoiceDatabase(
    directory_audios="./my_custom_voices/",
    metadata_file="./my_custom_voices/metadata.json"
)

[2025-10-17 23:16:45] INFO:root:Voice database populated with 4 voices


In [14]:
voice_database.get_voice(gender="female", age=20)

Voice(gender='female', age=20, identifier='4', voice='/Users/yanislabrak/Desktop/HUB/PostJSALT/sdialog/tutorials/my_custom_voices/yanis.wav', language='english', language_code='e')

# Language specific voices

By default all the voices are imported or fetch from/in the database is `english` if no language is specified.

Otherwise, you are able to mention the language you want to work with when you add or get a voice as shown in the following code snippet:

In [15]:
voice_database = LocalVoiceDatabase(
    directory_audios="./my_custom_voices/",
    metadata_file="./my_custom_voices/metadata.json"
)

[2025-10-17 23:16:45] INFO:root:Voice database populated with 4 voices


In [16]:
print(voice_database.get_statistics(pretty=True))

### Voice Database Statistics



#### Overall

Number of languages: 1
Total voices: 4
By gender: male: 2, female: 2
Ages (first 10 bins sorted): 20:2, 21:2



#### By Language (summary)

| dataset   |   total |   male |   female |   unique_speakers |   age_min |   age_mean |   age_max | codes   |
|:----------|--------:|-------:|---------:|------------------:|----------:|-----------:|----------:|:--------|
| english   |       4 |      2 |        2 |                 4 |        20 |      20.50 |        21 | e       |



#### english — gender/age distribution

|   dataset |   female |   male |
|----------:|---------:|-------:|
|        20 |     1.00 |   1.00 |
|        21 |     1.00 |   1.00 |


In [17]:
voice_database.add_voice(
    gender="female",
    age=42,
    identifier="french_female_42",
    voice="./my_custom_voices/french_female_42.wav",
    lang="french",
    language_code="f"
)

Now that a French voice is available in the database, we can retrieve it.

In [18]:
voice_database.get_voice(gender="female", age=20, lang="french")

Voice(gender='female', age=42, identifier='french_female_42', voice='./my_custom_voices/french_female_42.wav', language='french', language_code='f')

But if no voice are available in the targetted language, an error will be thrown:

In [19]:
try:
    voice_database.get_voice(gender="female", age=20, lang="hindi")
except ValueError as e:
    print("Normal error in this case:", e)

Normal error in this case: Language hindi not found in the database


In [20]:
print(voice_database.get_statistics(pretty=True))

### Voice Database Statistics



#### Overall

Number of languages: 2
Total voices: 5
By gender: male: 2, female: 3
Ages (first 10 bins sorted): 20:2, 21:2, 42:1



#### By Language (summary)

| dataset   |   total |   male |   female |   unique_speakers |   age_min |   age_mean |   age_max | codes   |
|:----------|--------:|-------:|---------:|------------------:|----------:|-----------:|----------:|:--------|
| english   |       4 |      2 |        2 |                 4 |        20 |      20.50 |        21 | e       |
| french    |       1 |      0 |        1 |                 1 |        42 |      42.00 |        42 | f       |



#### english — gender/age distribution

|   dataset |   female |   male |
|----------:|---------:|-------:|
|        20 |     1.00 |   1.00 |
|        21 |     1.00 |   1.00 |



#### french — gender/age distribution

|   dataset |   female |
|----------:|---------:|
|        42 |     1.00 |
