Skip to content

Questions About RNN Pool #216

@snakers4

Description

@snakers4

Hi,

My name is Alexander, we are mostly working on high-quality fast Speech Recognition. We mostly use strided 1D Convolutions and Transformer modules. We mostly use progressive strides (i.e. 2 - 2 - 2) in our models. I have tried overall stride of 2 and 4 and 8 was better (on a limited compute budget ofc). But I have not looked into other configurations, like 2 - 4 for example (very similar to an example you show in your blog post).

As you have noted in your blog post, having an expressive stride module is of utmost importance. Your post made me think about improving our models further and I took the liberty of looking through your code briefly.

Though the idea in itself is kind of simple (just use a RNN to a batched input, in case of 1D it is even easier) a series of questions arose after taking a look at your code because it kind of heavily departs from PyTorch standard practices. So I would like to know whether all of this is a bug or feature, so to say:

(0)
The module contains .to(torch.device("cuda") but it does not inherit from FastGRNNCUDA (it inherits from FastGRNN).
It seems a bit strange, so why is it the case?

(1)
You use .to(torch.device("cuda"), though it is a standard practice in PyTorch to write device agnostic code. Does this imply that:

  • This code is NOT meant for multi-node (or multi-device) parallelization (e.g. DP , DDP)?
  • This code is NOT meant to be run later on x86 (quantized or pruned) inference afterwards?

(2)
I saw some pruning and low-rank snippets in your code.
Low-rank does not seem to be used in RNNPool.

(3)
I took a quick look at RNNCell and FastGRNNCell. Apart from having some utilities for model size estimation, seeming unused low rank options and pruning, apart from param initialization I cannot really why you went for implementing these classes from scratch instead of just using the standard PyTorch ones. Is there some reasoning behind it?

(4)

The network is composed of a base VGG network followed by the
    added multibox conv layers.  Each multibox layer branches into

You seem to apply RNNPool to a VGG encoder. VGG is usually slow and large, is there any reason you do not apply this scheme to a mobilenet?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions