Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TorchScript-able "load" func to sox_io backend #731

Merged
merged 1 commit into from Jun 25, 2020

Conversation

mthrok
Copy link
Collaborator

@mthrok mthrok commented Jun 18, 2020

This is a part of PRs to add new "sox_io" backend. #726 and depends on #718 and #728 .

This PR adds load function to "sox_io" backend, which is tested on the following audio formats;

  • wav
  • mp3
  • flac
  • ogg/vorbis *

By default, "sox_io" backend returns Tensor with float32 dtype and the shape of [channel, time]. The samples are normalized to fit in the range of [-1.0, 1.0].

Unlike existing "sox" backend, the new load function can handle WAV file natively, when the input format is WAV with integer type, (such as 32-bit signed integer, 16-bit signed integer and 8-bit unsigned integer) by providing normalize=False, this function can return integer Tensor, where the samples are expressed within the whole range of the corresponding dtype, that is, int32 tensor for 32-bit PCM, int16 for 16-bit PCM and uint8 for 8-bit PCM. This behavior follows scipy.io.wavfile.read. normalize parameter has no effect for other formats and the load function always return normalized value with float32 Tensor.

* Note The current binary distribution of torchaudio does not contain ogg/vorbis and opus codecs. To handle these files, one needs to build torchaudio from the source with proper codecs in the system.

Note 2 Since this PR, scipy becomes required module for running test.

@codecov
Copy link

codecov bot commented Jun 19, 2020

Codecov Report

Merging #731 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #731      +/-   ##
==========================================
+ Coverage   89.21%   89.24%   +0.02%     
==========================================
  Files          32       32              
  Lines        2513     2519       +6     
==========================================
+ Hits         2242     2248       +6     
  Misses        271      271              
Impacted Files Coverage Δ
torchaudio/backend/sox_io_backend.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f0d0af...036bf8a. Read the comment docs.

@mthrok mthrok force-pushed the load branch 20 times, most recently from 99462ec to 500733f Compare June 23, 2020 18:02
@mthrok mthrok force-pushed the load branch 5 times, most recently from 6b1134b to 1c7af65 Compare June 23, 2020 19:25

script_path = self.get_temp_path('load_func')
torch.jit.script(py_load_func).save(script_path)
ts_load_func = torch.jit.load(script_path)
Copy link
Contributor

@vincentqb vincentqb Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woohoo!! :D

@mthrok mthrok marked this pull request as ready for review June 23, 2020 19:45
return tensor, signal_info.get_sample_rate()


load_wav = load
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woohoo!! :D

pass
elif tensor.dtype == torch.int32:
tensor = tensor.to(torch.float32)
tensor[tensor > 0] /= 2147483647.
Copy link
Contributor

@vincentqb vincentqb Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is there a way of fetching these numbers instead of hard coding them? It's in the tests, so I don't worry much about it though.

// So make sure to create a new copy after processing samples.
if (normalize || dtype == torch::kFloat32) {
t = t.to(torch::kFloat32);
t *= (t > 0) / 2147483647. + (t < 0) / 2147483648.;
Copy link
Contributor

@vincentqb vincentqb Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this one is not in the tests though :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? The combination of float32 and normalize=true|false are tested everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the 2147483647 :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is tested.

Copy link
Contributor

@vincentqb vincentqb Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant something different: I have seen this hardcoded values appear in a few places in this PR (e.g. above). Most of them are in tests, so I don't worry much about them. This hardcoded value is not in a test file though, so I would have preferred if it were not hardcoded.

Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM!

nit: there's a C++ lint error :)

@@ -51,4 +54,17 @@ def gen_audio_file(
command += ['vol', f'-{attenuation}dB']
print(' '.join(command))
subprocess.run(command, check=True)
subprocess.run(['soxi', path], check=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why is this removed at this time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command above it, sox -V prints the same information, so it turned out redundant.

@mthrok
Copy link
Collaborator Author

mthrok commented Jun 25, 2020

nit: there's a C++ lint error :)

Yeah I was waiting for the approval first as I have six branches depend on this and had to rebase all of them.

@mthrok
Copy link
Collaborator Author

mthrok commented Jun 25, 2020

thanks!

@mthrok mthrok merged commit 793eeab into pytorch:master Jun 25, 2020
@mthrok mthrok deleted the load branch June 25, 2020 23:17
mthrok added a commit that referenced this pull request Jul 1, 2020
This is a part of PRs to add new "sox_io" backend. #726 and depends on #718, #728 and #731.

This PR adds `save` function to "sox_io" backend, which can save Tensor to a file with the following audio formats;
 - `wav`
 - `mp3`
 - `flac`
 - `ogg/vorbis`
mpc001 pushed a commit to mpc001/audio that referenced this pull request Aug 4, 2023
mpc001 pushed a commit to mpc001/audio that referenced this pull request Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants