Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking sample width, is it necessary? #57

Closed
keighrim opened this issue Mar 30, 2021 · 6 comments
Closed

Checking sample width, is it necessary? #57

keighrim opened this issue Mar 30, 2021 · 6 comments

Comments

@keighrim
Copy link

keighrim commented Mar 30, 2021

Hi, thanks for the useful tool.

The tool fails to run with an "error: <class 'AssertionError'>") when I tried to run it on my audio file. I managed to narrow down the source of the error to here;


where the sample width of the temporarily generated wave file is 2 instead of 4.

When I commented that line out out, the tool ran fine and the output seemed a reasonable segmentation. My question is whether the check above is necessary, and if so how I can pre-process my audio file to have sample width of 4.

File information

stderr captured from the subprocessed ffmpeg call (value of error from these lines)

p = Popen(args, stdout=PIPE, stderr=PIPE)
output, error = p.communicate()

ffmpeg version 4.3.2 Copyright (c) 2000-2021 the FFmpeg developers
  built with Apple clang version 11.0.0 (clang-1100.0.33.17)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3.2_3 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
Input #0, mp3, from '---.mp3':
  Metadata:
    comment         : Cubase LE AI Elements
    encoded_by      : Cubase LE AI Elements
    originator_reference: CCOOONNNNNNNNNNNN160932RRRRRRRRR
    date            : 2019-05-30
    coding_history  :
    time_reference  : 104069664
    umid            : 0x5604883FA9ED40B6B1D3B24AF8D0AD5D00000000000000000000000000000000
    encoder         : Lavf57.66.102
  Duration: 00:01:47.76, start: 0.023021, bitrate: 192 kb/s
    Stream #0:0: Audio: mp3, 48000 Hz, stereo, fltp, 192 kb/s
    Metadata:
      encoder         : Lavc57.81
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to '/var/folders/38/h8q_t4sj0jz90v5cbv2w4l4h0000gn/T/tmpkpn96hjd/---.wav':
  Metadata:
    ICMT            : Cubase LE AI Elements
    ITCH            : Cubase LE AI Elements
    originator_reference: CCOOONNNNNNNNNNNN160932RRRRRRRRR
    ICRD            : 2019-05-30
    coding_history  :
    time_reference  : 104069664
    umid            : 0x5604883FA9ED40B6B1D3B24AF8D0AD5D00000000000000000000000000000000
    ISFT            : Lavf58.45.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
    Metadata:
      encoder         : Lavc58.91.100 pcm_s16le
size=    3367kB time=00:01:47.72 bitrate= 256.0kbits/s speed= 733x
video:0kB audio:3367kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.004583%

and arguments used in the call

['/usr/local/bin/ffmpeg', '-y', '-i', '---.mp3', '-ar', '16000', '-ac', '1', '/var/folders/38/h8q_t4sj0jz90v5cbv2w4l4h0000gn/T/tmpd3mtpgzg/c---.wav']

(Files name is redacted)

@r-uro
Copy link
Contributor

r-uro commented Apr 12, 2021

Hi,

This value is actually hard-coded in sidekit >= 1.3.6 (which is used to load the audio file), can you make sure you have sidekit 1.3.6 or 1.3.7 installed?

@keighrim
Copy link
Author

Yes, I installed the segmenter via pypi and it came with sidekit 1.3.6.9 (when I installed a while ago)

$ pip list  | grep -i sidekit
SIDEKIT                1.3.6.9

Then I tried upgrading sidekit but got the same error.

$ pip install --upgrade sidekit
Requirement already satisfied: sidekit in /usr/local/anaconda3/envs/some-env/lib/python3.6/site-packages (1.3.6.9)
Collecting sidekit
  Using cached SIDEKIT-1.3.8.5.2.tar.gz (169 kB)
Requirement already satisfied: numpy in /...
...

$ ina_speech_segmenter.py -i /input -o /output
2021-04-12 07:11:07.762836: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-12 07:11:07.763054: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
batch_processing 1 files
1/1 [('/output/some-file.csv', 2, "error: <class 'AssertionError'>")]

Finally, I also tried the same input files on a linux machine (ubuntu 20.04, ffmpeg 4.2.4), with no luck.

@keighrim
Copy link
Author

#56 also reports on an Assertion Error and suggesting using py>=3.8. So I tried py 3.8.8, but it didn't fix the problem.

@DavidDoukhan
Copy link
Member

Dear @keighrim ,

We experienced some issues due to recent updates of dependency SIDEKIT.
The issue is now solved, and the SIDEKIT related code has been copied directly within inaSpeechSegmenter source code.
In other words, the dependency to SIDEKIT has been removed.

It has been tested using python 3.6.9 and python 3.8.5.
It is supposed to work well for any python >= 3.6 .

Could you clone the latest version available on github, install it in a new virtual environment, and let me know if your issue is now solved ?

I'm looking forward to hearing from you before updating the pip repository as well.

Kind regards, Sorry for the inconvenience, and thanks a lot for this feedback !

@keighrim
Copy link
Author

#56 also reports on an Assertion Error and suggesting using py>=3.8. So I tried py 3.8.8, but it didn't fix the problem.

This isn't true as when I was testing the segmenter 0.6.6 on python 3.8.8, I mistakenly passed a directory instead of a file to --input argument. When parameters are properly set, the segmenter successfully ran without an AssersionError. (FileNotFoundError might be a better message in this case, though)

Could you clone the latest version available on github, install it in a new virtual environment, and let me know if your issue is now solved ?

I tested on python 3.7 & segmenter 0.6.7 (from the master branch) and it ran fine. Also I'm very glad to see that pytorch is not installed as a dependency of the sidekit.

I think the issue is resolved. Please feel free to close the issue whenever you think so.

@DavidDoukhan
Copy link
Member

Thanks a lot for this prompt answer, and for this constructive feedback !
This issue is now closed !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants