Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some file sizes will crash CLI version #1067

Closed
Oortone opened this issue Nov 7, 2021 · 6 comments
Closed

Some file sizes will crash CLI version #1067

Oortone opened this issue Nov 7, 2021 · 6 comments
Labels

Comments

@Oortone
Copy link

Oortone commented Nov 7, 2021

Describe the bug
When using the CLI version to analyze files produced from the same source (produced by scipy.io wavfile) but of different length, some lengths will crash Meyda.

To Reproduce
Use attached files and issue following commands, this is what I get:

meyda --bs=1024 --mfcc=13 --o=22025.csv 22025.wav mfcc rms
>> internal/buffer.js:53
  throw new ERR_OUT_OF_RANGE(type || 'offset',

meyda --bs=1024 --mfcc=13 --o=11000.csv 11000.wav mfcc rms
>> OK

File length of 11000 samples is ok but 22025 is not. None of these are multiples of buffer size, so that's not the issue.

Expected behavior
All files analyzed regardless of length, at least for unextreme sizes.

Environment configuration
Node on macOS 10.14 using Terminal.
Node: v10.15.0
Meyda: sorry I don't get how to list the version number (new to node and npm, npm list meyda says (empty)) but I think it's 5.3.0. I installed it oct 2021.

Additional context
filesize_issue.zip

@Oortone Oortone added the bug label Nov 7, 2021
@Oortone
Copy link
Author

Oortone commented Nov 7, 2021

Addition: With the file type provided and command as above it seems all files equal to or bigger than 16370 samples will crash Meyda.

@Oortone
Copy link
Author

Oortone commented Nov 7, 2021

Complete error trace:

Starting extraction...
|Sample rate recognized as: 22025
internal/buffer.js:53
  throw new ERR_OUT_OF_RANGE(type || 'offset',
  ^

RangeError [ERR_OUT_OF_RANGE]: The value of "offset" is out of range. It must be >= 0 and <= 65474. Received 65476
    at boundsError (internal/buffer.js:53:9)
    at Buffer.readInt32LE (internal/buffer.js:275:5)
    at Buffer.readIntLE (internal/buffer.js:232:17)
    at Reader.<anonymous> (/usr/local/lib/node_modules/meyda/bin/wav-loader.js:56:23)
    at Reader.emit (events.js:182:13)
    at readableAddChunk (/usr/local/lib/node_modules/meyda/node_modules/readable-stream/lib/_stream_readable.js:195:16)
    at Reader.Readable.push (/usr/local/lib/node_modules/meyda/node_modules/readable-stream/lib/_stream_readable.js:162:10)
    at Reader.Transform.push (/usr/local/lib/node_modules/meyda/node_modules/readable-stream/lib/_stream_transform.js:145:32)
    at process (/usr/local/lib/node_modules/meyda/node_modules/stream-parser/index.js:235:5)
    at /usr/local/lib/node_modules/meyda/node_modules/stream-parser/index.js:198:14

I notice a method called Buffer.readInt32LE is used. I do believe the samples are stored in a float format, an inspection application called AudioFinder says 32 bit and when read to and from Python they are float32. But even this doesn't really explain why the file size matters.

@Oortone
Copy link
Author

Oortone commented Nov 7, 2021

Yet another observation.
I do believe both this issue and #1066 has to do with floating point wav formats not being read properly. Converting a problematic file to 16 bit (int I guess?) using Sox (a Terminal program similar to ffmpeg) gives a proper csv-file with RMS-levels that make sense when analyzed with CLI version.

So my proposal is merging #1066 with this issue and that the solution is to throw an error for the file format in question since it is not read properly.

Worth noting is that the files I provided are playable in all audio applications I've tried so the file format is not incorrect, but since they are probably not compatible with Meyda better to throw an error.

(Will be away for a few days now...)

@hughrawlinson
Copy link
Member

Yes, the CLI only supports integer encoded WAVs. See #118. I hope to address that as soon as we have a stateless extract method, which I'm working on in #931, but running into typescript issues with. We should definitely throw a better error in the existing CLI, but that's some fairly old code, and adding stuff there would take time away from finishing #931, which would make working on the CLI much smoother.

In the initial issue though, you described a size limit - do you still think that's the case? or was it just that the shorter file was float encoded?

@hughrawlinson
Copy link
Member

I wrote this quick little CLI that should work for whatever encoding that ffmpeg supports: let me know if this suits your purposes better?. I imagine you'll need to make changes (like having it call out to sox instead of ffmpeg?), but it might be a good start. In its current state, you need to have ffmpeg installed on your machine, and you can edit it to change the config for Meyda at the top of the file.

@Oortone
Copy link
Author

Oortone commented Nov 10, 2021

Thanks for the feedback. I solved this by another extraction strategy in Python prior to calling the Meyda script so I produce integer wav files, seems to work, but I haven't done any deep tests.

I'm pretty sure the size thing was just som strange side effect from floats getting misread. Now with integer wavs I can Meyda-analyze longer files.

@Oortone Oortone closed this as completed Nov 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants