FFmpeg script to convert VoxCeleb 2 dataset to wav files #10

mogwai · 2020-09-25T13:45:26Z

Having an explicit script to do this would help people downloading this dataset convert the files ready to be used in pyannote audio

hbredin · 2020-09-28T08:45:56Z

Or maybe find a way to avoid this conversion (though it would probably result in slower training because of on-the-fly m4a decoding).

sangeetaghangam · 2020-10-02T04:33:05Z

I used the following commands on bash, for some reason it did not let me use the file name substring so I ended up renaming the files

cmd from m4a to wav

find ./ -iname '*.m4a' -exec bash -c 'ffmpeg -i "{}" "{}".wav' ;

convert from m4a.wav to .wav

find . -name "*.m4a.wav" -exec rename 's/.m4a.wav$/.wav/' '{}' +

mogwai · 2020-10-02T09:57:07Z

I dug into this rabbit hole a while ago and found a nice script that we could adapt to do this:

#!/bin/bash

# In order to use this script, you must install parallel and
# ffmpeg

export SOURCE_DIR="/home/$USER/audiom4a"
export TARGET_DIR="/home/$USER/audiowav"

doone() {
    audioFile="$1"
    tmpVar="${audioFile%.*}.wav"
    wavFile="${tmpVar/$SOURCE_DIR/$TARGET_DIR}"
    wavFilePath=$(dirname "${audioFile}")
    mkdir -p "${wavFilePath}"
    if [ ! -f "$wavFile" ]; then # If the mp3 file doesn't exist already
        echo "Input: $audioFile"
        echo "Output: $wavFile"
        ffmpeg -i "$audioFile" "$wavFile" < /dev/null
    fi
}

export -f doone

find "${SOURCE_DIR}" -type f \( -iname "*.m4a" -or -iname "*.mp3" \) -print0 |
  parallel -0 doone

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FFmpeg script to convert VoxCeleb 2 dataset to wav files #10

FFmpeg script to convert VoxCeleb 2 dataset to wav files #10

mogwai commented Sep 25, 2020

hbredin commented Sep 28, 2020

sangeetaghangam commented Oct 2, 2020

mogwai commented Oct 2, 2020

FFmpeg script to convert VoxCeleb 2 dataset to wav files #10

FFmpeg script to convert VoxCeleb 2 dataset to wav files #10

Comments

mogwai commented Sep 25, 2020

hbredin commented Sep 28, 2020

sangeetaghangam commented Oct 2, 2020

cmd from m4a to wav

convert from m4a.wav to .wav

mogwai commented Oct 2, 2020