Assemble Modules #8

nicholas-leonard · 2014-06-13T03:47:07Z

This is a draft of future code that would assemble modules to create our deep mixture of experts model. It is three parametrized layers of gated experts. There are two gaters, one for each layer of hidden neurons.

Hyper parameters

local nInput = 50
local hiddenBlockSizeA = 70
local nExpertBlockA = 100
local hiddenBlockSizeB = 70
local nExpertBlockB = 100
local nOutput = 50
local sparsityFactor = 0.1

First Layer

Input is dense, output is sparse.

Gater A

local gaterA = nn.Sequential()
gaterA:add(nn.Linear(nInput,nExpertA))
gaterA:add(nn.NoisyReLU(sparsityFactor))
gaterA:add(nn.SortFilter(sparsityFactor))

Mixture of experts A

local concatA = nn.ConcatTable() -- outputs a table of tensors
concatA:add(nn.Identity()) -- forwards input as is
concatA:add(gaterA) -- forwards gated expert indices

local mixtureA = nn.Sequential()
mixtureA:add(concatA)
mixtureA:add(nn.BlockSparse(nInput, hiddenBlockSizeA, 1, nExpertA)) -- experts
mixtureA:add(nn.Tanh)

The input and outputs of nn.BlockSparse are each a table of 3 tensors:
{activation, {indices, scales}}

Second Layer

Input and output are sparse.

Gater B
The input to the B gater is sparse so we use a nn.BlockSparse instead of a nn.Linear:

local gaterB = nn.Sequential()
gaterB:add(nn.BlockSparse(hiddenBlockSizeA,nExpertB, nExpertA,1)) --sparse to dense
gaterB:add(nn.NoisyReLU(sparsityFactor))
gaterB:add(nn.SortFilter(sparsityFactor))

Mixture of experts B

local concatB = nn.ConcatTable() -- outputs a table of tensors
concatB:add(nn.Identity()) -- forwards the input to the output
concatB:add(gaterB)

local mixtureB = nn.Sequential()
mixtureB:add(concatB)
mixtureB:add(nn.BlockSparse(hiddenBlockSizeA, hiddenBlockSizeB, nExpertA, nExpertB))
mixtureB:add(nn.Tanh)

Input of nn.BlockSparse is a multi-table of 5 tensors:
{{activation, {indices, scales}}, {indices, scales}}
Output is again a multi-table of 3 tensors:
{activation, {indices, scales}}

Third Layer

Input is sparse, output is dense.

Mixture of experts C
Input of the next nn.BlockSparse is a table of 3 tensors:
{activation, {indices, scales}}

local mixtureC = nn.Sequential()
mixtureC:add(nn.BlockSparse(hiddenBlockSizeB, nOutput, nExpertB, 1))
mixtureC:add(nn.Tanh()) -- sparse to dense

Input of nn.BlockSparse is a multi-table of 3 tensors:
{activation, {indices, scales}}
Output is a tensor of activations.

Stack Mixtures

Stack the 3 mixtures of experts layers.

local mlp = nn.Sequential()
mlp:add(mixtureA)
mlp:add(mixtureB)
mlp:add(mixtureC)

The text was updated successfully, but these errors were encountered:

nicholas-leonard · 2014-07-04T00:42:33Z

https://github.com/nicholas-leonard/cunnx/blob/master/test/test.lua#L364

Alireza-Chakeri · 2016-07-22T15:45:30Z

Is there any way to stack two mixture of experts?

nicholas-leonard mentioned this issue Jun 22, 2014

BlockSparse Model nicholas-leonard/dp#46

Closed

nicholas-leonard closed this as completed Jul 4, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assemble Modules #8

Assemble Modules #8

nicholas-leonard commented Jun 13, 2014

nicholas-leonard commented Jul 4, 2014

Alireza-Chakeri commented Jul 22, 2016

Assemble Modules #8

Assemble Modules #8

Comments

nicholas-leonard commented Jun 13, 2014

Hyper parameters

First Layer

Second Layer

Third Layer

Stack Mixtures

nicholas-leonard commented Jul 4, 2014

Alireza-Chakeri commented Jul 22, 2016