AI Dubbing allows you to create localized videos using the same video base and adding translations using Google AI Powered TextToSpeech API
-
Google Cloud
-
Google Workspace (Google Spreadsheets)
-
Google Cloud user with privileges over all the APIs listed in the config (ideally Owner role), so it’s possible to grant some privileges to the Service Account automatically. \
-
Latest version of Terraform installed \
-
Python version >= 3.8.1 installed
-
Python Virtualenv installed \
Roles that will be automatically granted to the service account during the installation process:
"roles/iam.serviceAccountShortTermTokenMinter"
"roles/storage.objectAdmin"
"roles/pubsub.publisher"
-
Open a shell \
-
Clone the git repository
-
Open a text editor and configure the following installation variables in the file “variables.tf”
variable "gcp_project" { type = string description = "Google Cloud Project ID where the artifacts will be deployed" default = "my-project" }
variable "gcp_region" { type = string description = "Google Cloud Region" default = "my-gcp-region" }
variable "ai_dubbing_sa" { type = string description = "Service Account for the deployment" default = "ai-dubbing" }
variable "ai_dubbing_bucket_name" { type = string description = "GCS bucket used for deployment tasks" default = "my-bucket" }
variable "config_spreadsheet_id" { type = string description = "The ID of the config spreadhseet" default = "my-google-sheets-sheet-id" }
#Do not set this value to a high frequency, since executions might overlap
variable "execution_schedule" { type = string description = "The schedule to execute the process (every 30 min default) " default = "*/30 * * * *" }
-
_Now execute ”terraform apply”
_ -
Type “yes” and hit return when the system asks for confirmation_
_
- Service Account: if it not exists, it will be created as per the values in the configuration
- Cloud Scheduler: ai-dubbing-trigger
- Cloud Functions: generate_tts_file, generate_video_file. Both triggered by pub/sub
- Cloud Pub/Sub topics: generate_tts_files_trigger, generate_video_file_trigger
Every generated artifact will be stored in the supplied GCS bucket under the output/YYYYMMDD folder, where YYYY represents the year, MM the month and DD the day of the processing date
The generated TTS audio files will be stored as mp3
{campaign}-{topic}-{voice_id}.mp3
Audio url: gs://{gcs_bucket}/output/{YYYYMMDD}/{campaign}-{topic}-{voice_id}.mp3
The generated video field will be stored as mp4
{campaign}-{topic}-{voice_id}.mp4
Video url: gs://{gcs_bucket}/output/{YYYYMMDD}/{campaign}-{topic}-{voice_id}.mp4
Most of the effort will be on building the first SSML text and adapting the timings to the video. Once that task is mastered, video creation will be done in a breeze!
You can use the web-based SSML-Editor for this purpose, and then export each SSML file.
- A file containing a base video without music
- A file containing the music for the video
-
Configure the fields in the sheet “config” following the instructions
Field Name | Type | Mandatory | Description | Sample Value | Notes |
campaign | Input | Yes | A string to generate the name of the video | summer | |
topic | Input | Yes | A string to generate the name of the video | outdoor | |
gcs_bucket | Input | Yes | The bucket where video_file and base_audio_file could be located (the service account must be granted access). We recommend to use the same gcs_bucket as for the output | videodub_test_input | |
video_file | Input | Yes | The location of the master video file within the gcs_bucket | input/videos/bumper_base_video.mp4 | |
base_audio_file | Input | No | The location of the base audio file within the gcs_bucket | input/audios/bumper_base_audio.mp3 | |
text | Input | Yes | The SSML text to convert to speech |
Find your own style in the constantly renewed catalog of the <emphasis level="strong">somewhere.com online shop</emphasis></prosody>
Design what you love</prosody>
|
Check SSML supported syntax |
voice_id | Input | Yes | The id of the voice to use | en-GB-Wavenet-C##FEMALE | Check voices here |
millisecond_start_audio | Input | No | Millisecond of the video when the audio must start. This could be also accomplished using TTS | 0 |
|
audio_encoding | Input | Yes | The audio encoding available | MP3 | At the moment only MP3 is supported |
base_audio_vol_percent | Input | Yes | Modifies the volume of the base audio (whether in the base video or in the base audio file) | 0.6 | |
final_video_file_url | Output | N/A | The location of the generated video file with the base audio and speech | gs://videodub_tester/output/20230420/summer-outdoor-en-gb-wavenet-c##female.mp4 | |
status | Output | N/A | The status of the process | Video OK | |
last_update | Output | N/A | The last time the row was modified by the automatic process | 2023/04/20, 12:25:16 |
Once all the configuration is set in the spreadsheet, the process will run every X minutes, as defined by the execution_schedule.
The “Status” column will change its contents, the possible values are:
- “TTS OK”: audio file generated correctly
- “Video OK”: video file generated correctly
- Other value: an error occurred
When all the cells in the status column would display “Video OK”, the process will be completed
When all the cells display “Video OK” or different from “TTS OK”, the process will be completed but it might have errors \
Just download the videos from gs://{gcs_bucket}/output/{YYYYMMDD} and make the best use of them.
Note:
For the initial tests, the scheduled execution period might be too long. The recommendation in these kinds of situations is just to disable the schedule and run it on demand. To do that:
- Go to the Cloud Scheduler tab in your Google Cloud project and
- Check the box next to “ai-dubbing-trigger”
- Click on “Force Run”