Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

splitting a file with silence #117

Open
shakfu opened this issue Sep 19, 2020 · 10 comments
Open

splitting a file with silence #117

shakfu opened this issue Sep 19, 2020 · 10 comments

Comments

@shakfu
Copy link

shakfu commented Sep 19, 2020

One typical use case for me is to split a file using the silence effect as outlined in this excellent sox tutorial https://madskjeldgaard.dk/posts/sox-tutorial-split-by-silence/

sox input.wav clip.wav silence 1 0.1 1% 1 0.1 1% : newfile : restart

gives "clip001.wav", "clip002.wav" etc. with only the audio in clip and without the silence in between.

Is it possible to do this in pysox (and apply a fade-in and fade-out to each resulting clip)?

@lostanlen
Copy link
Member

@shakfu
Copy link
Author

shakfu commented Sep 21, 2020

I should clarify my question more: I'm aware that the 'silence' effect is implemented. My questions was more related to how I could translate ": newfile : restart" idiom to pysox.

@lostanlen
Copy link
Member

lostanlen commented Sep 21, 2020

i'm not sure if this is featured in pysox. it might be good to bring this up to @rabitt

@lostanlen lostanlen reopened this Sep 21, 2020
@shakfu
Copy link
Author

shakfu commented Sep 21, 2020

Thanks for your response @lostanlen . I will post a more clearly specified feature request then.

@shakfu
Copy link
Author

shakfu commented Sep 21, 2020

I have a small bash script to use sox to split an input file into a number of clips based on a silence threshold and then applying a fade (in/out) to the resulting output files.

# splits input file into clip files based on a silence threshold
sox --show-progress $1 clip.wav silence 1 0.1 1% 1 0.1 1% : newfile : restart

# applies fade-in fade-out to each output file
for f in clip*
do
    name=$(basename -s .wav $f)
    newname="$name-f.wav"
    sox $f $newname fade 0.1 0
done

My question to @rabitt is whether it is possible to translate this script's functionality (in particular the ": newfile : restart" idiom) into a pure-python pysox solution.

@shakfu
Copy link
Author

shakfu commented Sep 24, 2020

Trying to convert the splitting part of the above sox call into current pysox, I got as far as the following:

import sox
t = sox.Transformer()
t.silence(1, 1.0, 0.1)
t.silence(-1, 1.0, 0.1)
t.build('input.wav', 'clip.wav', extra_args=[':', 'newfile', ':', 'restart'])

# pysox converts this into the following sox args:

args = ['sox', '-D', '-V2', '-c', '1', 's104.wav', 'clip.wav', 'silence', '1', '0.100000', '1.000000%', 'reverse', 'silence', '1', '0.100000', '1.000000%', 'reverse', ':', 'newfile', ':', 'restart']

The problem is that this doesn't split the input file. I presume the culprit is the default translation to 'reverse' which always output one file. Also the arg order for pysox silence didn't match with the command line silence args.

@rabitt
Copy link
Collaborator

rabitt commented Feb 18, 2021

Hey @shakfu

My question to @rabitt is whether it is possible to translate this script's functionality (in particular the ": newfile : restart" idiom) into a pure-python pysox solution.

The short answer is, no pysox doesn't currently support the newfile : restart idiom, but we could extend the API to support it.

Also the arg order for pysox silence didn't match with the command line silence args.

Yes, this is the case for several of the transforms - the documentation should describe what each argument is doing, but it's true that it may not exactly match the command line tool's ordering. Note that for the silence command in particular, the "location" argument in pysox is there to support removing silence from the end of the file, hence the reverse.

@StuartIanNaylor
Copy link

StuartIanNaylor commented Feb 22, 2021

@rabitt I do love pysox it rocks guys & thankx

I am going to steal @shakfu bash script but also would love to do this in pysox (creating model datasets for kws)
Vad also but as Vad can sometimes be confusing (spectral representation doesn't always work out well) to results I often use silence as the result is more logical, rather than occasionally wondering about a curious VAD result.

Thnx @shakfu for the script as was just about to ask about silence splitting and saw your post

PS if it could also do no action but output split points to txt would also be useful might be useful with a ASR aligner as still have to get one that extracts words satisfactory.

@shakfu
Copy link
Author

shakfu commented Feb 22, 2021

@rabitt Thanks for your response. It would be great if the pysox API could be extended to accommodate this use-case. Naturally, it would be great to accomplish this in python (-:

@StuartIanNaylor Thanks, glad that my little script can be of use. Incidentally, I was curious about the answer to your last question and found some possible solutions in this stack overflow exchange.

@StuartIanNaylor
Copy link

@shakfu

def get_voice_params(file, silence_maximum_amplitude,file_min_silence_duration=0.2):
  stat = sox.file_info.stat(file)
  file_maximum_amplitude = stat['Maximum amplitude']
  file_duration = stat['Length (seconds)']

  percent_silence_threshold = (silence_maximum_amplitude / file_maximum_amplitude) * 100

  tmp1 = tempfile.NamedTemporaryFile(suffix='.wav')
  tmp2 = tempfile.NamedTemporaryFile(suffix='.wav')

  tfm1 = sox.Transformer()
  tfm1.silence(location=-1, min_silence_duration=file_min_silence_duration, silence_threshold=percent_silence_threshold, buffer_around_silence=True)
  tfm1.build(kw_files[0], tmp1.name)
  tfm1.clear_effects()

  stat = sox.file_info.stat(tmp1.name)
  voice_end = stat['Length (seconds)']

  tfm1.silence(location=1, min_silence_duration=file_min_silence_duration, silence_threshold=percent_silence_threshold, buffer_around_silence=True)
  tfm1.build(tmp1.name, tmp2.name)
  tfm1.clear_effects()

  stat = sox.file_info.stat(tmp2.name)
  print(stat)
  voice_start = voice_end - stat['Length (seconds)']
  voice_duration = voice_end - voice_start
  return file_maximum_amplitude, file_duration, voice_start, voice_end, voice_duration


file_maximum_amplitude, file_duration, voice_start, voice_end, voice_duration = get_voice_params(kw_file, silence_maximum_amplitude)

print(file_maximum_amplitude, file_duration, voice_start, voice_end, voice_duration)

I just suddenly clicked and didn't like writing out to harddrive all the time use tmp...
Still need to change the python logging to stop the warnings but an easy add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants