Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Voice Cloning #40

Merged
merged 10 commits into from
Feb 15, 2024
Merged

Voice Cloning #40

merged 10 commits into from
Feb 15, 2024

Conversation

howardbaik
Copy link
Contributor

@howardbaik howardbaik commented Jan 3, 2024

Purpose/implementation Section

What changes are being implemented in this Pull Request?

  • tts_coqui_vc(), a function that takes as input the text to convert to speech, WAV audio file of speaker whose voice to clone, language of speaker, whether to use GPU, version of Python to be used, etc.
  • system_open(), an utility function to use the system command to open audio files in a private folder. Useful for quickly opening up the output from tts().

What was your approach?

tts_coqui_vc() is very similar to tts_coqui() except it requires you to specify the version of Python to be used by reticulate, provide a WAV audio file of the speaker, and decide whether to use the GPU or not. It interacts with the Python API of Coqui TTS with reticulate, using the Sample Python code in #39

Tell potential reviewers what kind of feedback you are soliciting.

I realize that tts(), which is a wrapper around tts_coqui_vc(), and tts_auth(), a function to check if the Python API of TTS is properly installed, is not complete. I will loop back to this once I integrate tts_loqui_vc() into loqui-vc.

@howardbaik howardbaik linked an issue Jan 3, 2024 that may be closed by this pull request
@howardbaik howardbaik changed the base branch from main to dev January 3, 2024 19:33
@howardbaik
Copy link
Contributor Author

For this PR, I changed the base branch from main to dev after remembering these notes: jhudsl/ari#54

Copy link
Contributor

@cansavvy cansavvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @howardbaek ! It seems good but I don't necessarily know much about what's happening. My comments are mainly asking for some clarity. Thanks!


#' @export
#' @rdname tts
tts_coqui_vc <- function(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardbaek Can you add some more comments here so I can understand what is going on? Thanks!!

@@ -82,22 +103,17 @@ tts = function(
bind_audio = bind_audio,
...)
}
if (service == "microsoft") {
res = tts_microsoft(
if (service == "google") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this change about?

Copy link
Contributor Author

@howardbaik howardbaik Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this change is to change the order of the TTS services (Coqui and Coqui Voice Cloning services come first) so that the free services can be highlighted. Also, I imagine Coqui would be used much more than the other paid services, I moved the two Coqui services to the front.

}
if (service == "coqui-vc") {
cli::cli_alert_info("This service does not support MP3 format; will produce a WAV audio output.")
# TODO: Specify Python version, just as we specify path to coqui above
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to make an issue for this as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Issue is created: #41

@howardbaik
Copy link
Contributor Author

howardbaik commented Jan 10, 2024

Thanks for working on this @howardbaek ! It seems good but I don't necessarily know much about what's happening. My comments are mainly asking for some clarity. Thanks!

Sorry if this was confusing and overwhelming! I have left detailed comments for you, but let me know if you have further questions!

Also, tagging @seankross to keep him in the loop.

@@ -167,3 +167,10 @@ coqui_path_missing <- paste(
"If you've already downloaded the software, use function",
"'set_coqui_path(path = \"path/to/coqui/tts\")' to point R to your local coqui tts Executable File"
)

# Open private audio files
system_open <- function(path) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper function takes a file path as input and invokes the open system command to pull up the audio file (or video file) at that path.

save_local_dest = NULL,
...) {
# Specify version of Python to be used by reticulate
reticulate::use_python(python_version)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Select the version of Python to be used by reticulate.

# Specify version of Python to be used by reticulate
reticulate::use_python(python_version)
# Import TTS
TTS_api <- reticulate::import("TTS.api")
Copy link
Contributor Author

@howardbaik howardbaik Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports the module api within the TTS package to make it available for use within R.

# Import TTS
TTS_api <- reticulate::import("TTS.api")
# Model name
model_name = "tts_models/multilingual/multi-dataset/xtts_v2"
Copy link
Contributor Author

@howardbaik howardbaik Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify the name of the model. For voice cloning, we use the xtts_v2 model.

# Model name
model_name = "tts_models/multilingual/multi-dataset/xtts_v2"
# TTS
tts <- TTS_api$TTS(model_name, gpu = gpu)
Copy link
Contributor Author

@howardbaik howardbaik Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the TTS class from the TTS.api module (https://github.com/coqui-ai/TTS/blob/dev/TTS/api.py), create an instance


res = vapply(string_processed, function(tt) {
output_path = tts_temp_audio("wav")
tts$tts_to_file(text = tt,
Copy link
Contributor Author

@howardbaik howardbaik Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this comment, I'll link the underlying Python code inside the TTS.api module.

Use the tts_to_file method (https://github.com/coqui-ai/TTS/blob/dev/TTS/api.py#L290) within the TTS class (https://github.com/coqui-ai/TTS/blob/dev/TTS/api.py#L15).

# Output file path
output_path
}, FUN.VALUE = character(1L), USE.NAMES = FALSE)
out = lapply(res, tts_audio_read,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of these lines are the same as the code inside tts_coqui()

@howardbaik
Copy link
Contributor Author

@cansavvy Can I merge this PR?

Copy link
Contributor

@cansavvy cansavvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Sorry bout that!

@howardbaik howardbaik merged commit dc2c169 into dev Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Voice Cloning
2 participants