New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add model Wav2Letter #462
Add model Wav2Letter #462
Conversation
test/test_models.py
Outdated
|
||
|
||
class ModelTester(unittest.TestCase): | ||
def test_wav2letter(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we extend these tests further (i.e. specific input / output tests for fixed weights) or is there little value in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what I saw in torchvision they had a problem with doing extensive test due to slowing Travis.
The way of the test that I added is that the model should give a letter every 20ms, since the sample rate that they use is 16000, 20ms would be 320 points, however, because of the padding at the 10-th layer it would add 1 extra letter as output.
I used padding same with formula padding = ceil(kernel-stride)/2 due to wav2letter++ issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used padding same with formula padding = ceil(kernel-stride)/2 due to wav2letter++ issue
I feel like this should be mentioned in the documentation.
As a generic reminder: Since this creates a whole new top-level folder and design (collection of models) I think it's worth pausing to make sure we're not setting us up for some hard-to-reverse decisions down the road. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a generic reminder: Since this creates a whole new top-level folder and design (collection of models) I think it's worth pausing to make sure we're not setting us up for some hard-to-reverse decisions down the road.
I totally agree I suggested at issue #446 since it was no blockers I proceeded to do a PR, but we can discuss further if it would be beneficial and how it would be implemented if so.
Yes, we need to make sure we think the interface correctly. This provides a nice forcing function for us to do so. :) |
test/test_models.py
Outdated
|
||
|
||
class ModelTester(unittest.TestCase): | ||
def test_wav2letter(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used padding same with formula padding = ceil(kernel-stride)/2 due to wav2letter++ issue
I feel like this should be mentioned in the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks good to me. Do we have other models we could consider implementing? This would help thinking about whether the interface is general enough while still being simple.
One open point is to ensure correctness when someone suggests a new model. The tests added here are very limited, and testing that the implementation is right is not obvious.
torchaudio/models/wav2letter.py
Outdated
x (torch.Tensor): Tensor of dimension (batch_size, n_features, input_length). | ||
|
||
Returns: | ||
torch.Tensor: Predictor tensor of dimension (input_length, batch_size, number_of_classes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were initializing the module lazily, we could infer the number of features. Maybe a factory function could help with that? I don't see it done with torchvision though, so I won't be advocating this for this PR. :)
Is there an invariant that could be verified? say the shape remains the same at each layers? We could also think of an optional test that runs a simple convergence test on a standard dataset. This could get expensive though. |
Test is failling:
|
Overall, this looks very good to me, and serves as a great template for other models to come :) |
Thanks! I will try to take a look today! Did not have the time yet :/ |
@vincentqb Sorry for the delay, I was a bit busy. Used pytest instead of unittest since I think torchaudio is moving towards pytest for what I saw also changed parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for setting this up, and looking forward to adding more models :)
We haven't yet made a final decision about pytest. Since the tests here are easy to convert either way, I will merge anyway.
Relates to #446