Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assemble Modules #8

Closed
nicholas-leonard opened this issue Jun 13, 2014 · 2 comments
Closed

Assemble Modules #8

nicholas-leonard opened this issue Jun 13, 2014 · 2 comments

Comments

@nicholas-leonard
Copy link
Owner

This is a draft of future code that would assemble modules to create our deep mixture of experts model. It is three parametrized layers of gated experts. There are two gaters, one for each layer of hidden neurons.

Hyper parameters

local nInput = 50
local hiddenBlockSizeA = 70
local nExpertBlockA = 100
local hiddenBlockSizeB = 70
local nExpertBlockB = 100
local nOutput = 50
local sparsityFactor = 0.1

First Layer

Input is dense, output is sparse.

Gater A

local gaterA = nn.Sequential()
gaterA:add(nn.Linear(nInput,nExpertA))
gaterA:add(nn.NoisyReLU(sparsityFactor))
gaterA:add(nn.SortFilter(sparsityFactor))

Mixture of experts A

local concatA = nn.ConcatTable() -- outputs a table of tensors
concatA:add(nn.Identity()) -- forwards input as is
concatA:add(gaterA) -- forwards gated expert indices

local mixtureA = nn.Sequential()
mixtureA:add(concatA)
mixtureA:add(nn.BlockSparse(nInput, hiddenBlockSizeA, 1, nExpertA)) -- experts
mixtureA:add(nn.Tanh)

The input and outputs of nn.BlockSparse are each a table of 3 tensors:
{activation, {indices, scales}}

Second Layer

Input and output are sparse.

Gater B
The input to the B gater is sparse so we use a nn.BlockSparse instead of a nn.Linear:

local gaterB = nn.Sequential()
gaterB:add(nn.BlockSparse(hiddenBlockSizeA,nExpertB, nExpertA,1)) --sparse to dense
gaterB:add(nn.NoisyReLU(sparsityFactor))
gaterB:add(nn.SortFilter(sparsityFactor))

Mixture of experts B

local concatB = nn.ConcatTable() -- outputs a table of tensors
concatB:add(nn.Identity()) -- forwards the input to the output
concatB:add(gaterB)

local mixtureB = nn.Sequential()
mixtureB:add(concatB)
mixtureB:add(nn.BlockSparse(hiddenBlockSizeA, hiddenBlockSizeB, nExpertA, nExpertB))
mixtureB:add(nn.Tanh)

Input of nn.BlockSparse is a multi-table of 5 tensors:
{{activation, {indices, scales}}, {indices, scales}}
Output is again a multi-table of 3 tensors:
{activation, {indices, scales}}

Third Layer

Input is sparse, output is dense.

Mixture of experts C
Input of the next nn.BlockSparse is a table of 3 tensors:
{activation, {indices, scales}}

local mixtureC = nn.Sequential()
mixtureC:add(nn.BlockSparse(hiddenBlockSizeB, nOutput, nExpertB, 1))
mixtureC:add(nn.Tanh()) -- sparse to dense

Input of nn.BlockSparse is a multi-table of 3 tensors:
{activation, {indices, scales}}
Output is a tensor of activations.

Stack Mixtures

Stack the 3 mixtures of experts layers.

local mlp = nn.Sequential()
mlp:add(mixtureA)
mlp:add(mixtureB)
mlp:add(mixtureC)
@nicholas-leonard
Copy link
Owner Author

@Alireza-Chakeri
Copy link

Is there any way to stack two mixture of experts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants