-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Voice Cloning #40
Voice Cloning #40
Conversation
For this PR, I changed the base branch from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @howardbaek ! It seems good but I don't necessarily know much about what's happening. My comments are mainly asking for some clarity. Thanks!
|
||
#' @export | ||
#' @rdname tts | ||
tts_coqui_vc <- function( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@howardbaek Can you add some more comments here so I can understand what is going on? Thanks!!
@@ -82,22 +103,17 @@ tts = function( | |||
bind_audio = bind_audio, | |||
...) | |||
} | |||
if (service == "microsoft") { | |||
res = tts_microsoft( | |||
if (service == "google") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this change about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this change is to change the order of the TTS services (Coqui and Coqui Voice Cloning services come first) so that the free services can be highlighted. Also, I imagine Coqui would be used much more than the other paid services, I moved the two Coqui services to the front.
} | ||
if (service == "coqui-vc") { | ||
cli::cli_alert_info("This service does not support MP3 format; will produce a WAV audio output.") | ||
# TODO: Specify Python version, just as we specify path to coqui above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to make an issue for this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Issue is created: #41
Sorry if this was confusing and overwhelming! I have left detailed comments for you, but let me know if you have further questions! Also, tagging @seankross to keep him in the loop. |
@@ -167,3 +167,10 @@ coqui_path_missing <- paste( | |||
"If you've already downloaded the software, use function", | |||
"'set_coqui_path(path = \"path/to/coqui/tts\")' to point R to your local coqui tts Executable File" | |||
) | |||
|
|||
# Open private audio files | |||
system_open <- function(path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helper function takes a file path as input and invokes the open
system command to pull up the audio file (or video file) at that path.
save_local_dest = NULL, | ||
...) { | ||
# Specify version of Python to be used by reticulate | ||
reticulate::use_python(python_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Select the version of Python to be used by reticulate.
# Specify version of Python to be used by reticulate | ||
reticulate::use_python(python_version) | ||
# Import TTS | ||
TTS_api <- reticulate::import("TTS.api") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imports the module api
within the TTS
package to make it available for use within R.
# Import TTS | ||
TTS_api <- reticulate::import("TTS.api") | ||
# Model name | ||
model_name = "tts_models/multilingual/multi-dataset/xtts_v2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify the name of the model. For voice cloning, we use the xtts_v2
model.
# Model name | ||
model_name = "tts_models/multilingual/multi-dataset/xtts_v2" | ||
# TTS | ||
tts <- TTS_api$TTS(model_name, gpu = gpu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the TTS
class from the TTS.api
module (https://github.com/coqui-ai/TTS/blob/dev/TTS/api.py), create an instance
|
||
res = vapply(string_processed, function(tt) { | ||
output_path = tts_temp_audio("wav") | ||
tts$tts_to_file(text = tt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this comment, I'll link the underlying Python code inside the TTS.api module
.
Use the tts_to_file
method (https://github.com/coqui-ai/TTS/blob/dev/TTS/api.py#L290) within the TTS
class (https://github.com/coqui-ai/TTS/blob/dev/TTS/api.py#L15).
# Output file path | ||
output_path | ||
}, FUN.VALUE = character(1L), USE.NAMES = FALSE) | ||
out = lapply(res, tts_audio_read, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of these lines are the same as the code inside tts_coqui()
@cansavvy Can I merge this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Sorry bout that!
Purpose/implementation Section
What changes are being implemented in this Pull Request?
tts_coqui_vc()
, a function that takes as input the text to convert to speech, WAV audio file of speaker whose voice to clone, language of speaker, whether to use GPU, version of Python to be used, etc.system_open()
, an utility function to use the system command to open audio files in a private folder. Useful for quickly opening up the output fromtts()
.What was your approach?
tts_coqui_vc()
is very similar totts_coqui()
except it requires you to specify the version of Python to be used byreticulate
, provide a WAV audio file of the speaker, and decide whether to use the GPU or not. It interacts with the Python API of Coqui TTS withreticulate
, using the Sample Python code in #39Tell potential reviewers what kind of feedback you are soliciting.
I realize that
tts()
, which is a wrapper aroundtts_coqui_vc()
, andtts_auth()
, a function to check if the Python API of TTS is properly installed, is not complete. I will loop back to this once I integratetts_loqui_vc()
into loqui-vc.