# Voice cloning
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sensein/senselab/blob/main/tutorials/audiovoice_cloning.ipynb)

This tutorial demonstrates how to use the `clone_voices` function from the `senselab` library to convert someone's speech into another person's voice. Currently, `senselab` includes the `KNNVC` model for voice cloning. In this tutorial, we will see how to use it.

## Importing necessary classes and methods
First, we need to import the necessary modules and classes from the `senselab` package.

In [1]:
from senselab.audio.data_structures import Audio
from senselab.audio.tasks.plotting.plotting import play_audio
from senselab.audio.tasks.preprocessing import extract_segments, resample_audios
from senselab.audio.tasks.voice_cloning import clone_voices
from senselab.utils.data_structures import DeviceType, TorchModel

## Initializations

In [2]:
# Specify the device type for model inference
device = DeviceType.CPU

# Specify the model
model = TorchModel(path_or_uri="bshall/knn-vc", revision="master")

## Loading and preparing the source and target audio clips
We will load an audio file and resample it to 16kHz. This ensures compatibility with the voice cloning model.
We will then extract specific segments from the audio for the source and target voices.

In [3]:
audio = Audio.from_filepath("../../src/tests/data_for_testing/audio_48khz_mono_16bits.wav")

# Resample the audio to 16kHz
audio = resample_audios([audio], 16000)[0]

# Extract segments from the audio (example segments: 0.0-1.0s and 3.2-4.9s)
chunks = extract_segments([(audio, [(0.0, 1.0), (3.2, 4.9)])])[0]
audio1 = chunks[0]
audio2 = chunks[1]

# Play the extracted audio segments
play_audio(audio1)
play_audio(audio2)


## Cloning the Voices
Now, we will perform the voice cloning by specifying the source and target audios.

In [4]:
source_audios = [audio1]
target_audios = [audio2]

cloned_output = clone_voices(
    source_audios=source_audios,
    target_audios=target_audios,
    model=model,
    device=device,
    topk=5,
    prematched_vocoder=False,
)

# Play the cloned output
play_audio(cloned_output[0])

DeviceType.CPU <enum 'DeviceType'>


Using cache found in /Users/fabiocat/.cache/torch/hub/bshall_knn-vc_master


Removing weight norm...
[HiFiGAN] Generator loaded with 16,523,393 parameters.
WavLM-Large loaded with 315,453,120 parameters.


## Objective Evaluation
To ensure the quality and effectiveness of the voice cloning, we can perform several evaluations:
- Speaker Verification: Use an automatic speaker verification tool to determine if the original speaker, the target speaker, and the cloned speaker can be distinguished from each other.
- Speech Intelligibility: Use an automatic speech recognition system to verify that the content remains unchanged and intelligible.
- Emotion Preservation: Assess if the emotion in the original speech is preserved in the cloned voice.

To run all these analysis, you can use `senselab`.