-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NeuralBlock replication functionality #79
Conversation
Nice work! I saw that by default you combine the connections by summing them up. This can work only if the dimensionalities are the same. One more general combination function would be one that: |
I'm not sure its best to do this automatically. Perhaps we could provide an interface where the user could specify what happens, e.g. an additional parameter of a function that is applied to the shortcut. |
Yes, makes sense
…On Thu, 14 May 2020 at 17:46, Peter Massey-Plantinga < ***@***.***> wrote:
2- If the tensors don't have the same dimensionality, we perform a simple
linear transformation before combining them
I'm not sure its best to do this automatically. Perhaps we could provide
an interface where the user could specify what happens, e.g. an additional
parameter of a function that is applied to the shortcut.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#79 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEA2ZVWZSMEC7G5R3PYE473RRRRDFANCNFSM4NA5HUQA>
.
|
Just a quick comment, I found NeuralBlock way more explicit than Nerve. Every single deep learning research will know what a NeuralBlock is certainly expected to do .. but Nerve :s |
Hey thanks @TParcollet , this is valuable feedback. Of course we want to make SpeechBrain as clear and explicit as possible. I think for the things that have creative names (i.e. Brain, Lobe, Nerve) the reason to do this rather than a more "explicit" name is because they may not have an easy equivalent in the literature. This does mean more work for the user to understand how they work, but in these cases it may be necessary. In the case of Nerve, currently it reads the yaml to construct a block, which is pretty distinct from most modern machine learning systems. Perhaps one way to make Nerve more explicit is to actually keep NeuralBlock as a separate module, and only call Nerve when we want a sequence of NeuralBlocks of the same type. We could also rename it as "Replicate" or something similar. Would love to hear your feedback Titouan as to whether we need this replicate functionality and if you think there's a better way to do it. |
I understand better now. Nonetheless I agree that this should has a more explicit name. Remaining in the "brain" concept is fun, but it shouldn't be at the cost of ease of use. If this function is wiling to often be called, then why not combining its usage into its name? It is replicating a block from a yaml file? Then ReplicateBlock? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this functionality will be useful as its already used in like two lobes.
I have one minor docstring comment that should be changed, but otherwise after that support merging.
We should also take note that this establishes "Block" as a term to mean a logical unit of neural computation (between a layer and a network). That terminology has been used elsewhere too so that's fine by me. We should just be sure that we use the term consistently. I've tried to highlight this in another comment
Also you may consider "ReplicatedBlock" or "ReplicatingBlock" as well. "ReplicaBlock" for that Blade Runner feel. EDIT: Peter has pointed out to me that it is ReplicantBlock that I want. |
Peter, did you try to add some kind of shortcuts and see what happens to the performance of our TIMIT_CTC example? |
I would not expect to see much difference when there's only two blocks, but I guess I can try. |
@pplantinga do you want me to help to do some tests with shortcuts connections? |
I did some full TIMIT experiments with residual, skip, and dense connections (with 4 DNN blocks rather than just two), and, as expected, the performance doesn't change. |
In general, I think it is much easier for the users to just specify the combination with a string like "combine_type=sum, add, diff, avg, concat, linear". These are the types that users will use more frequently. If they wanna specify their own combination function, we can still support the user-customized function as we do now. It's a bit annoying adding a combination function in the lobe just for switching from addition to the difference. |
Also, we should raise an error if the shortcut names are not "residual", "skip", or "dense". In the current version, if you add a random name it doesn't complain. |
When I run residual connections with "dnn_blocks=1" it raises an error:
This is probably due to the fact that we decided to not support shortcut connections in the first block. |
This replaces the
NeuralBlock
class with a slightly cleaned-up version as well as an additonalReplicateBlock
class for replication. Remaining tasks:difference
andaverage
combine functions.