Add model Wav2Letter #462

tomassosorio · 2020-03-11T17:58:38Z

Add Documentation related to module models
Add documentation to model wav2letter
Add Unit-test to wav2-letter
Add model Wav2Letter according to its paper wav2letter

Relates to #446

torchaudio/models/wav2letter.py

cpuhrsch · 2020-03-12T01:12:40Z

test/test_models.py

+
+
+class ModelTester(unittest.TestCase):
+    def test_wav2letter(self):


Should we extend these tests further (i.e. specific input / output tests for fixed weights) or is there little value in it?

For what I saw in torchvision they had a problem with doing extensive test due to slowing Travis.

The way of the test that I added is that the model should give a letter every 20ms, since the sample rate that they use is 16000, 20ms would be 320 points, however, because of the padding at the 10-th layer it would add 1 extra letter as output.

I used padding same with formula padding = ceil(kernel-stride)/2 due to wav2letter++ issue

I used padding same with formula padding = ceil(kernel-stride)/2 due to wav2letter++ issue

I feel like this should be mentioned in the documentation.

cpuhrsch · 2020-03-12T01:13:30Z

As a generic reminder: Since this creates a whole new top-level folder and design (collection of models) I think it's worth pausing to make sure we're not setting us up for some hard-to-reverse decisions down the road.

tomassosorio

As a generic reminder: Since this creates a whole new top-level folder and design (collection of models) I think it's worth pausing to make sure we're not setting us up for some hard-to-reverse decisions down the road.

I totally agree I suggested at issue #446 since it was no blockers I proceeded to do a PR, but we can discuss further if it would be beneficial and how it would be implemented if so.

vincentqb · 2020-03-13T14:41:00Z

As a generic reminder: Since this creates a whole new top-level folder and design (collection of models) I think it's worth pausing to make sure we're not setting us up for some hard-to-reverse decisions down the road.

Yes, we need to make sure we think the interface correctly. This provides a nice forcing function for us to do so. :)

torchaudio/models/wav2letter.py

vincentqb · 2020-04-02T15:43:54Z

test/test_models.py

+
+
+class ModelTester(unittest.TestCase):
+    def test_wav2letter(self):


I used padding same with formula padding = ceil(kernel-stride)/2 due to wav2letter++ issue

I feel like this should be mentioned in the documentation.

test/test_models.py

torchaudio/models/wav2letter.py

vincentqb

Overall, this looks good to me. Do we have other models we could consider implementing? This would help thinking about whether the interface is general enough while still being simple.

One open point is to ensure correctness when someone suggests a new model. The tests added here are very limited, and testing that the implementation is right is not obvious.

torchaudio/models/wav2letter.py

vincentqb · 2020-04-02T15:51:26Z

torchaudio/models/wav2letter.py

+            x (torch.Tensor): Tensor of dimension (batch_size, n_features, input_length).
+
+        Returns:
+            torch.Tensor: Predictor tensor of dimension (input_length, batch_size, number_of_classes).


If we were initializing the module lazily, we could infer the number of features. Maybe a factory function could help with that? I don't see it done with torchvision though, so I won't be advocating this for this PR. :)

torchaudio/models/wav2letter.py

vincentqb · 2020-04-02T16:22:23Z

One open point is to ensure correctness when someone suggests a new model. The tests added here are very limited, and testing that the implementation is right is not obvious.

Is there an invariant that could be verified? say the shape remains the same at each layers?

We could also think of an optional test that runs a simple convergence test on a standard dataset. This could get expensive though.

vincentqb · 2020-04-20T23:14:41Z

Test is failling:

_________________________ ModelTester.test_wav2letter __________________________
self = <test_models.ModelTester testMethod=test_wav2letter>
    def test_wav2letter(self):
        batch_size = 2
        n_features = 1
        input_length = 320
    
        model = Wav2Letter()
        x = torch.rand(batch_size, n_features, input_length)
>       out = model(x)
test/test_models.py:15: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../miniconda3/envs/testenv/lib/python3.7/site-packages/torch/nn/modules/module.py:532: in __call__
    result = self.forward(*input, **kwargs)
torchaudio/models/wav2letter.py:71: in forward
    x = self.acoustic_model(x)
../../../miniconda3/envs/testenv/lib/python3.7/site-packages/torch/nn/modules/module.py:532: in __call__
    result = self.forward(*input, **kwargs)
../../../miniconda3/envs/testenv/lib/python3.7/site-packages/torch/nn/modules/container.py:100: in forward
    input = module(input)
../../../miniconda3/envs/testenv/lib/python3.7/site-packages/torch/nn/modules/module.py:532: in __call__
    result = self.forward(*input, **kwargs)
../../../miniconda3/envs/testenv/lib/python3.7/site-packages/torch/nn/modules/container.py:100: in forward
    input = module(input)
../../../miniconda3/envs/testenv/lib/python3.7/site-packages/torch/nn/modules/module.py:532: in __call__
    result = self.forward(*input, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = Conv1d(1, 250, kernel_size=(48,), stride=(2,), padding=(23,))
input = tensor([[[0.0000, 0.0000],
         [0.0000, 0.0000],
         [0.0000, 0.0000],
         [0.0000, 0.0000],
         [....0000],
         [0.0000, 0.0973],
         [0.3981, 0.2518],
         [0.4468, 0.1599]]], grad_fn=<AsStridedBackward>)
    def forward(self, input):
        if self.padding_mode == 'circular':
            expanded_padding = ((self.padding[0] + 1) // 2, self.padding[0] // 2)
            return F.conv1d(F.pad(input, expanded_padding, mode='circular'),
                            self.weight, self.bias, self.stride,
                            _single(0), self.dilation, self.groups)
        return F.conv1d(input, self.weight, self.bias, self.stride,
>                       self.padding, self.dilation, self.groups)
E       RuntimeError: Given groups=1, weight of size 250 1 48, expected input[2, 250, 2] to have 1 channels, but got 250 channels instead
../../../miniconda3/envs/testenv/lib/python3.7/site-packages/torch/nn/modules/conv.py:202: RuntimeError

torchaudio/models/wav2letter.py

vincentqb · 2020-04-20T23:27:04Z

Overall, this looks very good to me, and serves as a great template for other models to come :)

tomassosorio · 2020-04-21T12:31:54Z

Overall, this looks very good to me, and serves as a great template for other models to come :)

Thanks! I will try to take a look today! Did not have the time yet :/

tomassosorio · 2020-04-27T15:44:52Z

@vincentqb Sorry for the delay, I was a bit busy.
If you could take a look I would appreciate :)

Used pytest instead of unittest since I think torchaudio is moving towards pytest for what I saw

also changed parameter version to input_type since the same model can have different versions, however wave_type or wave_shape might be a name than input_type?

vincentqb

LGTM. Thanks for setting this up, and looking forward to adding more models :)

We haven't yet made a final decision about pytest. Since the tests here are easy to convert either way, I will merge anyway.

tomassosorio added 5 commits March 11, 2020 15:55

add wav2letter model

c7d6ece

add unit_test to model

17b7078

add docstrings

48c312b

add documentation

aea6fad

fix minor error, change logic on forward

5b68bab

cpuhrsch reviewed Mar 12, 2020

View reviewed changes

torchaudio/models/wav2letter.py Outdated Show resolved Hide resolved

cpuhrsch reviewed Mar 12, 2020

View reviewed changes

update padding same with ceil

064e923

tomassosorio commented Mar 12, 2020

View reviewed changes

vincentqb reviewed Apr 2, 2020

View reviewed changes

vincentqb suggested changes Apr 2, 2020

View reviewed changes

tomassosorio added 4 commits April 3, 2020 14:54

Merge branch 'master' into addModelWav2Letter

0a7f0bf

add inline typing and minor fixes to docstrings

390e176

remove python2

48df4fb

add formula do docstrings, change param name

2fc356b

vincentqb reviewed Apr 20, 2020

View reviewed changes

torchaudio/models/wav2letter.py Outdated Show resolved Hide resolved

torchaudio/models/wav2letter.py Outdated Show resolved Hide resolved

tomassosorio added 4 commits April 21, 2020 14:05

Merge branch 'master' into addModelWav2Letter

895a2e6

Merge branch 'master' into addModelWav2Letter

d4fb114

add test with mfcc, add pytest

34785ec

fix bug, update docstrings

63def38

tomassosorio requested a review from vincentqb April 27, 2020 15:44

change parameter name

9f9e79d

vincentqb approved these changes Apr 28, 2020

View reviewed changes

vincentqb merged commit d678357 into pytorch:master Apr 28, 2020

tomassosorio deleted the addModelWav2Letter branch April 28, 2020 21:34

discort mentioned this pull request Mar 19, 2021

Add vanilla DeepSpeech model #1399

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add model Wav2Letter #462

Add model Wav2Letter #462

tomassosorio commented Mar 11, 2020 •

edited by vincentqb

cpuhrsch Mar 12, 2020

tomassosorio Mar 12, 2020

vincentqb Apr 2, 2020

cpuhrsch commented Mar 12, 2020

tomassosorio left a comment

vincentqb commented Mar 13, 2020

vincentqb Apr 2, 2020

vincentqb left a comment •

edited

vincentqb Apr 2, 2020

vincentqb commented Apr 2, 2020 •

edited

vincentqb commented Apr 20, 2020

vincentqb commented Apr 20, 2020

tomassosorio commented Apr 21, 2020

tomassosorio commented Apr 27, 2020 •

edited

vincentqb left a comment



		class ModelTester(unittest.TestCase):
		def test_wav2letter(self):

Add model Wav2Letter #462

Add model Wav2Letter #462

Conversation

tomassosorio commented Mar 11, 2020 • edited by vincentqb

cpuhrsch Mar 12, 2020

Choose a reason for hiding this comment

tomassosorio Mar 12, 2020

Choose a reason for hiding this comment

vincentqb Apr 2, 2020

Choose a reason for hiding this comment

cpuhrsch commented Mar 12, 2020

tomassosorio left a comment

Choose a reason for hiding this comment

vincentqb commented Mar 13, 2020

vincentqb Apr 2, 2020

Choose a reason for hiding this comment

vincentqb left a comment • edited

Choose a reason for hiding this comment

vincentqb Apr 2, 2020

Choose a reason for hiding this comment

vincentqb commented Apr 2, 2020 • edited

vincentqb commented Apr 20, 2020

vincentqb commented Apr 20, 2020

tomassosorio commented Apr 21, 2020

tomassosorio commented Apr 27, 2020 • edited

vincentqb left a comment

Choose a reason for hiding this comment

tomassosorio commented Mar 11, 2020 •

edited by vincentqb

vincentqb left a comment •

edited

vincentqb commented Apr 2, 2020 •

edited

tomassosorio commented Apr 27, 2020 •

edited