New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scipywavread error #194

Closed
spirali opened this Issue Jan 10, 2018 · 5 comments

Comments

Projects
None yet
2 participants
@spirali

spirali commented Jan 10, 2018

Hello,

When I run the following command on a Czech text and audio:

python3 -m aeneas.tools.execute_task \
                   audio.mp3 \
                   text.txt \
                   "task_language=ces|os_task_file_format=json|is_text_type=plain" \
                   map.json

I get the following error:

[INFO] Validating config string (specify --skip-validator to bypass)...
[INFO] Validating config string... done
[INFO] Creating task...
[INFO] Creating task... done
[INFO] Executing task...
[ERRO] An unexpected error occurred while executing the task:
[ERRO] Unexpected error while executing task : Audio format not supported by scipywavread

I would guess from the error message, an that installation of a decoding library may be broken; however, if I use task_language=eng (other arguments are same) on the same data. It passes without any error.

Version: aeneas (1.7.3.0) (from pip)
OS: Ubuntu 17.10 and 16.04

Thank you

@readbeyond

This comment has been minimized.

Show comment
Hide comment
@readbeyond

readbeyond Jan 10, 2018

Owner
Owner

readbeyond commented Jan 10, 2018

@spirali

This comment has been minimized.

Show comment
Hide comment
@spirali

spirali Jan 11, 2018

Hi,

Thank you for your reply. This is log obtained by "-l" argument while using command that produces the error: output.log

When I use python3 -m aeneas.tools.synthesize_text plain text.txt ces output.wav, I obtain valid synthesis of the text without any error during the process.

spirali commented Jan 11, 2018

Hi,

Thank you for your reply. This is log obtained by "-l" argument while using command that produces the error: output.log

When I use python3 -m aeneas.tools.synthesize_text plain text.txt ces output.wav, I obtain valid synthesis of the text without any error during the process.

@readbeyond

This comment has been minimized.

Show comment
Hide comment
@readbeyond

readbeyond Jan 11, 2018

Owner

If I read correctly the log, your audio.mp3 file is long ~1120s:

[DEBU] 2018-01-11 10:49:01.517832 FFPROBEWrapper: Duration found in stdout: '1120.604812'

However, the WAV file generated synthesizing the input text appears to be ~90310s long:

 [DEBU] 2018-01-11 10:50:34.913667 ESPEAKTTSWrapper: Current time 90310.414

This is already quite strange, because usually the duration of the synthetic wave is within a factor of 2 the duration of the real audio. Can you double check that

python3 -m aeneas.tools.synthesize_text plain text.txt ces output.wav

produces an output.wav with a reasonable duration?

$ ffmpeg -i output.wav 

If not, then there is a problem either with eSpeak(ng) or with cew --- albeit that would be strange, because the aeneas.tools.synthesize_text uses the same code.

If so, then maybe what fails is the subsequent step, i.e. ffmpeg converting the synthesized PCM16LE mono 22050 Hz WAVE file into PCM16LE mono 16000 Hz WAVE file:

'ffmpeg', '-i', '/tmp/tmptz59_bc9.wav', '-ac', '1', '-ar', '16000', '-y', '-map_metadata', '-1', '-flags', '+bitexact', '-f', 'wav', '/tmp/tmp4805rjom.wav'

Again, it would be helpful checking whether the conv16.wav you can obtain from your output.wav:

$ ffmpeg -i output.wav -ac 1 -ar 16000 -y -map_metadata -1 -flags +bitexact -f wav conv16.wav

has the right duration. If not, there is a problem with your version of ffmpeg.

Also, what's the output of your:

$ ffmpeg -v

? For reference, mine is:

ffmpeg version 3.4.1-1+b1 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 7 (Debian 7.2.0-18)
Owner

readbeyond commented Jan 11, 2018

If I read correctly the log, your audio.mp3 file is long ~1120s:

[DEBU] 2018-01-11 10:49:01.517832 FFPROBEWrapper: Duration found in stdout: '1120.604812'

However, the WAV file generated synthesizing the input text appears to be ~90310s long:

 [DEBU] 2018-01-11 10:50:34.913667 ESPEAKTTSWrapper: Current time 90310.414

This is already quite strange, because usually the duration of the synthetic wave is within a factor of 2 the duration of the real audio. Can you double check that

python3 -m aeneas.tools.synthesize_text plain text.txt ces output.wav

produces an output.wav with a reasonable duration?

$ ffmpeg -i output.wav 

If not, then there is a problem either with eSpeak(ng) or with cew --- albeit that would be strange, because the aeneas.tools.synthesize_text uses the same code.

If so, then maybe what fails is the subsequent step, i.e. ffmpeg converting the synthesized PCM16LE mono 22050 Hz WAVE file into PCM16LE mono 16000 Hz WAVE file:

'ffmpeg', '-i', '/tmp/tmptz59_bc9.wav', '-ac', '1', '-ar', '16000', '-y', '-map_metadata', '-1', '-flags', '+bitexact', '-f', 'wav', '/tmp/tmp4805rjom.wav'

Again, it would be helpful checking whether the conv16.wav you can obtain from your output.wav:

$ ffmpeg -i output.wav -ac 1 -ar 16000 -y -map_metadata -1 -flags +bitexact -f wav conv16.wav

has the right duration. If not, there is a problem with your version of ffmpeg.

Also, what's the output of your:

$ ffmpeg -v

? For reference, mine is:

ffmpeg version 3.4.1-1+b1 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 7 (Debian 7.2.0-18)
@spirali

This comment has been minimized.

Show comment
Hide comment
@spirali

spirali Jan 12, 2018

I have found the source of the problem. There is a bug in my export script causing that transcript in "text.txt" is repeated many times for this particular example. So at the first glance, beginning, middle, and end of text.txt looked ok. The same was repeated when I have checked the synthesized text.

I have fixed the text.txt and everything works as expected.

Thank you for your help.

spirali commented Jan 12, 2018

I have found the source of the problem. There is a bug in my export script causing that transcript in "text.txt" is repeated many times for this particular example. So at the first glance, beginning, middle, and end of text.txt looked ok. The same was repeated when I have checked the synthesized text.

I have fixed the text.txt and everything works as expected.

Thank you for your help.

@spirali spirali closed this Jan 12, 2018

@readbeyond

This comment has been minimized.

Show comment
Hide comment
@readbeyond

readbeyond Jan 12, 2018

Owner
Owner

readbeyond commented Jan 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment