Use Transformer in-memory with stdin/stdout #102

pseeth · 2020-05-16T01:17:52Z

This PR tries to address #6. I basically followed the code by @carlthome in pysndfx here: https://github.com/carlthome/python-audio-effects/blob/master/pysndfx/dsp.py#L472. I followed the discussion in the issue and implemented a new function that belongs to Transformer called build_array, which takes in a numpy array. The main issue is keeping track of all of the information that would normally be in the header of the audio file. Instead, these have to be passed as arguments to build_array, both for the input and output.

I modified all of the tests. Each piece of the Transformer is tested by taking an input file and passing it through sox to write to an output file. So in the tests, what I do is load both the input file and the output file as numpy arrays, pass the input array into tfm.build_array and collect the output array (which is the second argument in the tuple, matching the API of tfm.build). Then, I do an np.allclose between the loaded output array from the output file and the output array. Every test was modified in this way, except for one, which I can't get working in the time I tried to put this together today. That test is test_bitdepth_valid, here: https://github.com/pseeth/pysox/blob/master/tests/test_transform.py#L1347.

This change to the tests is encapsulated in a single function tfm_assert_array_to_file_output which you'll see calls to sprinkled throughout the test.

To do all this, I modified the sox function in a backwards compatible way. I probably need to up coverage a bit still but this is hopefully okay to PR now for feedback.

Let me know if this is a good start and how to get this merged, if possible! It would go a long way to making pysox super efficient for Scaper, thus why I'm here. :)

Thanks!

coveralls · 2020-05-16T01:19:33Z

Coverage decreased (-0.08%) to 98.678% when pulling de89d97 on pseeth:master into 99de814 on rabitt:master.

hadware · 2020-05-16T12:52:07Z

The import of numpy could probably be made optional (if it's deemed to be too "heavy" of a dependency for this package).

Great work though!

pseeth · 2020-05-18T04:48:11Z

I got this up to 100% coverage in transforms.py but I'm getting coverage has dropped for some reason still. As far as I can tell, all the lines I wrote are covered but there are some missing lines in core, as there was before. Let me know if I missed something...

I also added docstrings for the new function.

I can easily make numpy optional by checking for it or importing it dynamically but I'm guessing everyone using pysox for something might already have numpy installed, anyway?

rabitt · 2020-05-19T11:25:11Z

@pseeth this is great, thank you! 🚀 I'm reviewing now.

rabitt

Thanks for all the work on this @pseeth ! Take a look at my comments and let me know what you think. Seeing the PR I'm reconsidering if it's better as part of build directly or as a separate function. Something else to consider - does it make sense to support the same functionality now in the Combiner, and in the preview functions?

If you need/want, I'm happy to help with any of this, let me know if you want me to and I'll add some commits on this PR.

rabitt · 2020-05-19T11:28:34Z

setup.py

@@ -20,12 +20,14 @@
        keywords='audio effects SoX',
        license='BSD-3-Clause',
        install_requires=[
+            'numpy',


what's the minimum version we can get away with here?

I think it's 1.9.0. According to this: https://numpy.org/doc/1.18/reference/generated/numpy.ndarray.tobytes.html, it's new in 1.9.0.

rabitt · 2020-05-19T11:29:27Z

setup.py

        ],
        extras_require={
            'tests': [
                'pytest',
                'pytest-cov',
                'pytest-pep8',
+                'pysoundfile',


which version?

This is just for tests, but I think we need at least 0.9.0 to get the dtype functionality:

https://pysoundfile.readthedocs.io/en/latest/#breaking-changes

rabitt · 2020-05-19T11:33:38Z

sox/core.py

-        process_handle = subprocess.Popen(
-            args, stdout=subprocess.PIPE, stderr=subprocess.PIPE
-        )
+        if src_array is not None and isinstance(src_array, np.ndarray):


Raise a TypeError if not isinstance(src_array, np.ndarray) - otherwise the behavior will be confusing if a user passes e.g. a list and it gets ignored.

nit - swap the order of this if/else to start with the "standard" case - if src_array is None: ..., elif isinstance(src_array, np.ndarray): ... else ... raise TypeError

Yeah makes sense! I updated this.

sox/core.py

rabitt · 2020-05-19T12:11:06Z

sox/transform.py

+        extra_args : list or None, default=None
+            If a list is given, these additional arguments are passed to SoX
+            at the end of the list of effects.
+            Don't use this argument unless you know exactly what you're doing!


Document the returns

rabitt · 2020-05-19T12:11:31Z

sox/transform.py

+        if encoding_out is None:
+            encoding_out = encoding
+
+        self.set_input_format(


I'm not sure we want to call these functions inside build_array - they change the internal state of the Transformer. The way it's written now, calling build_array will silently change the behavior of build.

Brainstorming how to get around this, we could:

write a function that builds the args list for input/output formats, which gets called by set_input_format/set_output_format and by this function.

reduce the number of inputs to this function. as far as I can tell, the only needed input format argument is sample_rate_in. The rest can (should!?) be inferred from the input array. The output format support is already built into sox.build in several ways, either by calling set_output_format or by transformer commands like rate and channels. The only one I'd leave optionally specified here is encoding_out.

so, in short this function would become

build_array(self, input_array, sample_rate_in, encoding_out=None, extra_args=None, return_output=True)

Thoughts/objections?

So the argument list for build_array was initially what you said, but had to be expanded quite a bit to get through all of the test cases. The ones I can get rid of are channels_in and bits_in as they can be inferred from the numpy array. But the output types can't be inferred.

What if we had set_output_format and set_input_format return (optionally) the arg list rather than set it as a variable in the object? Then we could extend it dynamically in build_array instead without side effects.

I modified set_[input, output]_format with a flag return_only where the arg list will be built without saving it to self. I had to keep some of the arguments to build_array, but as you'll see further down, build_array is now merged with build.

rabitt · 2020-05-19T12:14:52Z

sox/transform.py

+            at the end of the list of effects.
+            Don't use this argument unless you know exactly what you're doing!
+        '''
+        output_filepath = '-'


Thinking out loud... does it also make sense to support array in --> file out, and file in --> array out? My thinking is yes, in which case maybe all or this function should be part of build? What do you think @pseeth ?

Totally possible. Do you have a suggested function signature for build? It seems like a lot of the arguments will become optional, or do type-checking on input/output (in which case the arguments will be renamed). Then we'll have some sort of merge between the build_array and build functions, which I think would look reasonable. Let me know your thoughts!

This is supported now! I merged build and build_array to be able to support this. There's some finicky typechecking that needed to happen, and a flag that tells the core.sox function whether to decode out with utf-8 but it now works! I expanded the test cases to test file -> array, array -> array, and array -> file. All of these are checked against the original file -> file.

rabitt · 2020-05-19T12:23:39Z

by the way, I'm totally OK with adding numpy as a dependency.

pseeth · 2020-05-19T17:54:21Z

I'm not sure if I can get this to work for the Combiner as there are multiple input arrays and I'm not sure how that works with stdin. Seems like there might be some black magic...but if a user is using numpy arrays after the transformer, they really should just use np.stack or sum to implement functionality like the combiner as it's much faster than doing an exec call to sox.

Thanks for the review, @rabitt! Quick question - do you know why the coverage test is failing?

…tputs to be either arrays or filepaths.

pseeth · 2020-05-19T22:35:24Z

Updated the PR in response to comments! And it seems coverage checks are now passing. :)

pseeth · 2020-05-20T00:43:50Z

I think this will require a version bump and associated text in the changelog. Let me know what to do there, or if you'd like, you can add it yourself once this PR is ready to be merged!

rabitt · 2020-05-20T12:17:54Z

I was reviewing and had some comments requiring nontrivial changes so I'm going to push to this branch -

create separate functions _input_format_args(...) and _output_format_args(...) called by set_input/output_format and by build.
rework build to have input_filepath and input_array as separate inputs
fix some problems all of this causes with a few special Transformer functions like .stat

working on it now!

rabitt · 2020-05-20T17:54:05Z

@pseeth just opened a PR on your fork with those changes.

refactor build

pseeth · 2020-05-20T18:17:16Z

Looks good to me! Thanks! I merged it into my PR, so now we wait on tests...

rabitt · 2020-05-20T20:12:50Z

It looks like the docs build is failing because we added numpy, and there are a few lines I added that aren't tested. Going to make one more PR to your PR to fix!

Could you add the changelog notes and maybe add an example to the docs for how to use the new functionality?

patch 2

pseeth · 2020-05-20T20:25:52Z

Sounds good! I'll update the changelog.

pseeth · 2020-05-21T02:45:29Z

Actually, some guidance on this would be helpful. What should the version bump to?

rabitt · 2020-05-21T10:17:33Z

I found some issues with the docs build (the returns aren't properly documented so they're not showing up in the docs) - I'm going ahead and merging this without bumping the version.

I'll fix the docs and do the version bump in a separate PR.

Thanks!

pseeth added 3 commits May 16, 2020 01:06

added in-memory build_array for sox transformer

a8d65b4

adding soundfile to travis

faa56fa

should be pysooundfile

fc79fa8

pseeth added 2 commits May 18, 2020 04:29

upping coverage and adding docstrings

35f4fdf

boosting coverage back up

8c9082c

rabitt requested changes May 19, 2020

View reviewed changes

merging build_array into build, handling cases for both inputs and ou…

9f99d1c

…tputs to be either arrays or filepaths.

cleaning up a bit

74da89a

refactor build

2d9a30b

Merge pull request #1 from rabitt/pseeth-pr-patch

453c6f6

refactor build

rabitt added 2 commits May 20, 2020 22:11

add numpy to docs requirements

140fd2d

increase test coverage

2aca170

Merge pull request #2 from rabitt/pseeth-pr2

de89d97

patch 2

rabitt approved these changes May 21, 2020

View reviewed changes

rabitt merged commit bdb1058 into marl:master May 21, 2020

rabitt mentioned this pull request May 21, 2020

Use directly in memory, instead of via files, possible? #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Transformer in-memory with stdin/stdout #102

Use Transformer in-memory with stdin/stdout #102

pseeth commented May 16, 2020

coveralls commented May 16, 2020 •

edited

Loading

hadware commented May 16, 2020

pseeth commented May 18, 2020 •

edited

Loading

rabitt commented May 19, 2020

rabitt left a comment

rabitt May 19, 2020

pseeth May 19, 2020

pseeth May 19, 2020

rabitt May 19, 2020

pseeth May 19, 2020

pseeth May 19, 2020

rabitt May 19, 2020

rabitt May 19, 2020

pseeth May 19, 2020

rabitt May 19, 2020

rabitt May 19, 2020

pseeth May 19, 2020

pseeth May 19, 2020

rabitt May 19, 2020

pseeth May 19, 2020

pseeth May 19, 2020 •

edited

Loading

rabitt commented May 19, 2020

pseeth commented May 19, 2020 •

edited

Loading

pseeth commented May 19, 2020

pseeth commented May 20, 2020

rabitt commented May 20, 2020

rabitt commented May 20, 2020

pseeth commented May 20, 2020

rabitt commented May 20, 2020 •

edited

Loading

pseeth commented May 20, 2020

pseeth commented May 21, 2020

rabitt commented May 21, 2020

Use Transformer in-memory with stdin/stdout #102

Use Transformer in-memory with stdin/stdout #102

Conversation

pseeth commented May 16, 2020

coveralls commented May 16, 2020 • edited Loading

hadware commented May 16, 2020

pseeth commented May 18, 2020 • edited Loading

rabitt commented May 19, 2020

rabitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pseeth May 19, 2020 • edited Loading

Choose a reason for hiding this comment

rabitt commented May 19, 2020

pseeth commented May 19, 2020 • edited Loading

pseeth commented May 19, 2020

pseeth commented May 20, 2020

rabitt commented May 20, 2020

rabitt commented May 20, 2020

pseeth commented May 20, 2020

rabitt commented May 20, 2020 • edited Loading

pseeth commented May 20, 2020

pseeth commented May 21, 2020

rabitt commented May 21, 2020

coveralls commented May 16, 2020 •

edited

Loading

pseeth commented May 18, 2020 •

edited

Loading

pseeth May 19, 2020 •

edited

Loading

pseeth commented May 19, 2020 •

edited

Loading

rabitt commented May 20, 2020 •

edited

Loading