Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFmpeg script to convert VoxCeleb 2 dataset to wav files #10

Open
mogwai opened this issue Sep 25, 2020 · 3 comments
Open

FFmpeg script to convert VoxCeleb 2 dataset to wav files #10

mogwai opened this issue Sep 25, 2020 · 3 comments

Comments

@mogwai
Copy link

mogwai commented Sep 25, 2020

Having an explicit script to do this would help people downloading this dataset convert the files ready to be used in pyannote audio

@hbredin
Copy link
Member

hbredin commented Sep 28, 2020

Or maybe find a way to avoid this conversion (though it would probably result in slower training because of on-the-fly m4a decoding).

@sangeetaghangam
Copy link

I used the following commands on bash, for some reason it did not let me use the file name substring so I ended up renaming the files

cmd from m4a to wav

find ./ -iname '*.m4a' -exec bash -c 'ffmpeg -i "{}" "{}".wav' ;

convert from m4a.wav to .wav

find . -name "*.m4a.wav" -exec rename 's/.m4a.wav$/.wav/' '{}' +

@mogwai
Copy link
Author

mogwai commented Oct 2, 2020

I dug into this rabbit hole a while ago and found a nice script that we could adapt to do this:

#!/bin/bash

# In order to use this script, you must install parallel and
# ffmpeg

export SOURCE_DIR="/home/$USER/audiom4a"
export TARGET_DIR="/home/$USER/audiowav"

doone() {
    audioFile="$1"
    tmpVar="${audioFile%.*}.wav"
    wavFile="${tmpVar/$SOURCE_DIR/$TARGET_DIR}"
    wavFilePath=$(dirname "${audioFile}")
    mkdir -p "${wavFilePath}"
    if [ ! -f "$wavFile" ]; then # If the mp3 file doesn't exist already
        echo "Input: $audioFile"
        echo "Output: $wavFile"
        ffmpeg -i "$audioFile" "$wavFile" < /dev/null
    fi
}

export -f doone

find "${SOURCE_DIR}" -type f \( -iname "*.m4a" -or -iname "*.mp3" \) -print0 |
  parallel -0 doone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants