A python script to create a .txt transcript from an .m4a audio file
-
Uses ffmpeg to convert an .m4a file into a .flac file (single channel, 16000 Hz)
-
Uploads the .flac file to a Google Cloud Storage bucket (required by Speech-to-Text for audio files over 60 seconds)
-
Deletes the .flac file from the local machine
-
Transcribes the uploaded .flac file, saving the transcript as a .txt file, while also building a mysql INSERT statement and appending it to the vsearch.sql file.
-
Deletes the .flac file from the Google Cloud Storage bucket
-
Simply log into the database and run the updated vsearch.sql file to add additional transcripts into the database.
- Sign in to Google Cloud Console
https://console.cloud.google.com/
- Go to the project selector page and create a project
https://console.cloud.google.com/projectselector2/home
- Enable the Speech-to-Text API
https://cloud.google.com/speech-to-text
- Create a service account for your project
https://console.cloud.google.com/projectselector/iam-admin/serviceaccounts/create
note: the account needs to have at least Storage Admin access in order to upload to the bucket on Google Cloud.
- Create a JSON key for your service account
Select your service account, navigate to the KEYS tab, and select ADD KEY (Create new key)
This will download a JSON file to your machine, you'll need to know where this is later.
- Create a Google Cloud Storage bucket
note: do not use a nested bucket.
https://console.cloud.google.com/storage/browser
-
Pull the transcribe repository to the same machine as your audio files.
-
Edit transcribe.py and set BUCKET_NAME to the name of your Google Cloud Storage bucket.
-
Using a linux command line, set your authentication environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"
setting KEY_PATH with the path to your JSON key file, for example:
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"
- Call transcribe from the command line with the audio file as an argument:
python3 transcribe.py path/to/file_name.m4a
- There may be a few installs required:
sudo apt install python3-pip
pip install --upgrade google
pip install --upgrade google-cloud-storage
pip install google-cloud-speech
sudo apt install ffmpeg
If you wish to loop over multiple .m4a audio files for processing, a shell script vloop.sh was created to facilitate using pattern matching to transcribe batches with a single command.
Edit vloop.sh to declare the following:
- Set the starting pattern
starting_pattern="abc"
(this will match all files starting with the string “abc”)
- Set the KEY_PATH with the path to your JSON file, just as before
cred_path="/home/user/Downloads/service-account-file.json"
- Set the FILE_PATH with the path to your audio files to process
FILES="audio/m4a"
Call vloop from the command line:
./vloop.sh
vloop.sh does 2 passes, first transcribing all the scoped exhibits matching your specified starting pattern, then matching the remaining exhibits. Duplicates are avoided by checking the txt folder for an existing transcript, scoped or unscoped.
Google Cloud's Speech-to-Text API can be tricky to get set up correctly.
-
If the process fails during audio conversion, make sure the original audio files are in .m4a format. Conversion from other formats is not currently supported.
-
If the process fails during upload to your Google Cloud Storage bucket, make sure your service account was granted at least Storage Admin access.
-
If the process fails during transcription, make sure the file path to your JSON key is correctly set and that Google Cloud's Speech-to-Text API is activated. Wait a few minutes before trying again.
ffmpeg:
Upload objects to a Google Cloud Storage bucket:
https://cloud.google.com/storage/docs/uploading-objects
Transcribe short audio files:
https://cloud.google.com/speech-to-text/docs/sync-recognize
Transcribe long audio files:
https://cloud.google.com/speech-to-text/docs/async-recognize#speech_transcribe_async_gcs-python
Delete objects from a Google Cloud Storage bucket: