You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a draft of future code that would assemble modules to create our deep mixture of experts model. It is three parametrized layers of gated experts. There are two gaters, one for each layer of hidden neurons.
localconcatA=nn.ConcatTable() -- outputs a table of tensorsconcatA:add(nn.Identity()) -- forwards input as isconcatA:add(gaterA) -- forwards gated expert indiceslocalmixtureA=nn.Sequential()
mixtureA:add(concatA)
mixtureA:add(nn.BlockSparse(nInput, hiddenBlockSizeA, 1, nExpertA)) -- expertsmixtureA:add(nn.Tanh)
The input and outputs of nn.BlockSparse are each a table of 3 tensors: {activation, {indices, scales}}
Second Layer
Input and output are sparse.
Gater B
The input to the B gater is sparse so we use a nn.BlockSparse instead of a nn.Linear:
localgaterB=nn.Sequential()
gaterB:add(nn.BlockSparse(hiddenBlockSizeA,nExpertB, nExpertA,1)) --sparse to densegaterB:add(nn.NoisyReLU(sparsityFactor))
gaterB:add(nn.SortFilter(sparsityFactor))
Mixture of experts B
localconcatB=nn.ConcatTable() -- outputs a table of tensorsconcatB:add(nn.Identity()) -- forwards the input to the outputconcatB:add(gaterB)
localmixtureB=nn.Sequential()
mixtureB:add(concatB)
mixtureB:add(nn.BlockSparse(hiddenBlockSizeA, hiddenBlockSizeB, nExpertA, nExpertB))
mixtureB:add(nn.Tanh)
Input of nn.BlockSparse is a multi-table of 5 tensors: {{activation, {indices, scales}}, {indices, scales}}
Output is again a multi-table of 3 tensors: {activation, {indices, scales}}
Third Layer
Input is sparse, output is dense.
Mixture of experts C
Input of the next nn.BlockSparse is a table of 3 tensors: {activation, {indices, scales}}
localmixtureC=nn.Sequential()
mixtureC:add(nn.BlockSparse(hiddenBlockSizeB, nOutput, nExpertB, 1))
mixtureC:add(nn.Tanh()) -- sparse to dense
Input of nn.BlockSparse is a multi-table of 3 tensors: {activation, {indices, scales}}
Output is a tensor of activations.
This is a draft of future code that would assemble modules to create our deep mixture of experts model. It is three parametrized layers of gated experts. There are two gaters, one for each layer of hidden neurons.
Hyper parameters
First Layer
Input is dense, output is sparse.
Gater A
Mixture of experts A
The input and outputs of nn.BlockSparse are each a table of 3 tensors:
{activation, {indices, scales}}
Second Layer
Input and output are sparse.
Gater B
The input to the B gater is sparse so we use a nn.BlockSparse instead of a nn.Linear:
Mixture of experts B
Input of nn.BlockSparse is a multi-table of 5 tensors:
{{activation, {indices, scales}}, {indices, scales}}
Output is again a multi-table of 3 tensors:
{activation, {indices, scales}}
Third Layer
Input is sparse, output is dense.
Mixture of experts C
Input of the next nn.BlockSparse is a table of 3 tensors:
{activation, {indices, scales}}
Input of nn.BlockSparse is a multi-table of 3 tensors:
{activation, {indices, scales}}
Output is a tensor of activations.
Stack Mixtures
Stack the 3 mixtures of experts layers.
The text was updated successfully, but these errors were encountered: