Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]Restricted Boltzmann Machine #1027

Closed
wants to merge 26 commits into from

Conversation

kris-singh
Copy link
Contributor

I have impmented the

  1. Base layer(visible and hidden layer)
  2. RBM(wrapper)
  3. Cd-k algorithm(In progress)

I will add test after the cd-k algorithm is added

@kris-singh kris-singh changed the title [WIP}Restricted Boltzmann Machine [WIP]Restricted Boltzmann Machine Jun 14, 2017
@kris-singh
Copy link
Contributor Author

I am also confused about the way to handle the biases of the hidden and visible layers.
Visible layer uses the biases of the hidden layer. Right now we have 2 biases for every layer
ownBias and otherBias the problem with that is when we initialize the parameters of the layer we have to swap the how the parameters is initialized using the initializer.

Copy link
Contributor

@lozhnikov lozhnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented the main issues. There are some other minor issues. But I think they'll be fixed when you'll solve the main problems.

typename InputDataType = arma::mat,
typename OutputDataType = arma::mat
>
class VisibleLayer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is better to call the class BinaryRBMVisibleLayer or BinaryRBMLayer since the implementation assumes that the class is intended only for the binary RBM.

* @param output samples from the parameters
*/
template<typename eT>
void Sample(arma::Mat<eT>&& input, arma::Mat<eT>&& output);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between Sample() and Forward()? Why do you don't want to merge these functions into one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we look at the deeplearning.net example. Forward function actually computes the activation form the sigmoid layer and sample actually uses the activations to sample visible/ hidden layer. I think we need both

Copy link
Contributor

@lozhnikov lozhnikov Jun 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess Forward() can compute both the activation and samples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well in many cases you need the preactivation outputs. I think we need both the funcitons

OutputDataType ownBias;

//! Locally-stored bias of the other layer.
OutputDataType otherBias;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hidden variables are sampled from P(h=1|v)=Sigm(W^t * v + c). The weight matrix W is shared between the visible and the hidden layers. The matrix W is denoted by weight (in your notation). The bias c is denoted by ownBias (I suggest just bias) So, you need to store only two matrices: weight and bias.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay... so you mean to say that bias(ownBias + otheBIas). because c actually denotes the other bias always.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, c denotes ownBias. otherBias is not needed at all. So, for visible layer bias means visible biases and for hidden layer bias means hidden biases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the forward pass of the visible layer requires us to have the hidden bias. Where do we get that from. Should i include that as a input to the Forward function. Then i would have to concat the data point and hidden bias as we would have to maintain the signature of the forward funciton.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I looked again through the article, the forward pass of the visible layer needn't hidden biases. Hidden variables are sampled from P(h=1|v)=Sigm(W^t * v + c) where W denotes the shared weight matrix and c denotes visible biases.

Copy link
Contributor Author

@kris-singh kris-singh Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can look at the equation 7 c: represents the hidden biases http://deeplearning.net/tutorial/rbm.html#equation-rbm_propup

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, you are right, nevertheless you can use the technique that I suggested below.

Copy link
Contributor Author

@kris-singh kris-singh Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your solution we have only one bias for the binaryRBMLayer. So the problem is that since the forward function uses the bias of the other layer(hidden layer uses the bias of the visible layer) how can we code that up. Are you suggesting we still keep both biases and do something like this

if(visible)
{
// Parameters for the visible layer
weight = arma::mat(weights.memptr(), outSize, inSize, false, false);
ownBias = arma::mat(weights.memptr()+weight.n_elem, inSize, 1, false, false);
otherBias = arma::mat(weights.memptr()+weight.n_elem+inSize, outSize, 1, false, false);
}
else
{
// Patameters for the hidden layer
weight = arma::mat(weights.memptr(), outSize, inSize, false, false);
ownBias = arma::mat(weights.memptr()+weight.n_elem+inSize, outSize, 1, false, false);
otherBias = arma::mat(weights.memptr()+weight.n_elem, inSize, 1, false, false);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, of course

inSize(inSize),
outSize(outSize)
{
weights.set_size(outSize * inSize + inSize, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is better to allocate memory inside the main RBM class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not understand your comment your saying that when we declare a new layer we do not allocate memory to it. Other layer allocate memory when they are created. Also we cannot this in the vanilla_rbm as we do not have acess to the insize and outsize there.

if(Parameters().empty())
Reset();
else
LogisticFunction::Fn(weight * input + otherBias, output);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a forward pass through Linear and then through SigmoidLayer.
We discussed that it is better to use the existing API.
So this line should look like

linear.Forward(std::move(input), std::move(linear.OutputParameters()));
sigmoid.Forward(std::move(linear.OutputParameters()), std::move(output));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for not doing so is that we would have to declare the linear and sigmoid layer which mean we would have to store there parameters and reset them also(overhead). This seems a overhead to me. Also all layer in mlpack are independent that's why i decided to go with this

void VisibleLayer<InputDataType, OutputDataType>::Sample(arma::Mat<eT>&& input, arma::Mat<eT>&& output)
{
Forward(std::move(input), std::move(output));
for(size_t i = 0; i < output.n_elem; i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also discussed that this is just a forward pass through some layer. I sent an example.
Moreover binomial distribution should look like math::Random() functions (see core/math/random.hpp).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

math::Random() generates a given number between 0 and 1. The way RBM uses it is checking if the generated number is greater the ForwardPass activation through the layer then output(i) = 1.
If we directly use math random it does not take into account the input

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean you should add a function to core/math/random.hpp:) for example RandBinomial

template<typename InitializationRuleType,
typename VisibleLayerType,
typename HiddenLayerType>
class VanillaRBM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if you pass layer types as template parameters these parameters completely define the type of the RBM. So, for example RBM<... , BinaryRBMLayer, BinaryRBMLayer> represents the binary RBM and RBM<..., SSRBMVisibleLayer, SSRBMHiddenLayer> represents the spike and slab RBM. In such a way it is better to call the class just RBM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes i think we can do that

* @param persistence pcdk / not
* @param output store the gradients
*/
void Gradient(arma::mat input,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass k and persistence at the constructor. If you do so, the method will look like other Gradient() methods.

//! The matrix of data points (predictors).
arma::mat predictors;
// Network
std::vector<LayerTypes> network;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The network consists of only two layers: visible and hidden. So, this variable is not needed.

/**
* Simple add module test.
*/
BOOST_AUTO_TEST_CASE(SimpleBinaryRbmLayerTest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you forgot to remove this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to test the rbm_binary layer. So there should be this test.

@lozhnikov
Copy link
Contributor

lozhnikov commented Jun 15, 2017

Regarding your question about shared parameters:
I suggest the following:

  1. The main RBM class allocates memory:
template<.........>
void RBM<VisibleLayerType, HiddenLayerType, .........>::ResetParameters()
{
  size_t weights = visible.Parameters().n_elem;

  parameter.set_size(weights, 1);
  initializeRule.Initialize(parameter, parameter.n_elem, 1);

  visible.Parameters() = arma::mat(parameters.memptr(),
      visible.Parameters().n_rows, visible.Parameters().n_cols, false, false);

  // And the same for the hidden layer
  hidden.Parameters() = arma::mat(parameters.memptr(),
      visible.Parameters().n_rows, visible.Parameters().n_cols, false, false);

  // The Reset() function have to set up links properly
  visible.Reset();
  hidden.Reset();
}
  1. Implementation of the Reset() method:
    The implementation assumes that you have to define the visible variable (at the constructor for example).
template<typename InputDataType, typename OutputDataType>
void BinaryRBMLayer<InputDataType, OutputDataType>::Reset()
{
  if (visible)
  {
    // The number of input (visible) variables is equal to inSize
    // The number of output (hidden) variables is equal to outSize
    weight = arma::mat(weights.memptr(), outSize, inSize, false, false);

    // The offset is equal to the number of elements of the weight matrix
    bias = arma::mat(weights.memptr() + weight.n_elem, outSize, 1, false, false);
  }
  else
  {
    // The number of input (hidden) variables is equal to inSize
    // The number of output (visible) variables is equal to outSize
    weight = arma::mat(weights.memptr(), inSize, outSize, false, false);

    // The offset is equal to the number of elements of the weight matrix and the size of visible biases
    bias = arma::mat(weights.memptr() + weight.n_elem + inSize, outSize, 1, false, false);
  }
}

So, the weights matrix contains all parameters. The weight matrix W is denoted by weight. And each layer has only its own biases.

@kris-singh
Copy link
Contributor Author

Just a few minor things I think
visible.Parameters() = arma::mat(parameters.memptr(), visible.Parameters().n_elem, 1, false, false);
Since parameters is a 1d vector
Also here
// The offset is equal to the number of elements of the weight matrix bias = arma::mat(weights.memptr() + weight.n_elem, inSize, 1, false, false);

@@ -66,6 +66,31 @@ inline double Random(const double lo, const double hi)
}

/**
* Generates a binomial random number in the specified range.
*/
inline double BinomialRandom(const int input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the comment is invalid since input is equal to the success probability (and the mean). Moreover input is a real variable and belongs to [0, 1].
Actually this is the Bernoulli distribution since the binomial distribution depends on the number of trials. And the name of the function should be similar to others (RandNormal, RandInt, etc).
So, I suggest RandBernoulli.

/**
* Generates a binomial random number in the specified range.
*/
template<typename InputDatatype, typename OutputDatatype>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is not needed (e.g. since arma::mat::size() and others return the dimensions).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, you can just use for_eachor imbue to do the same without introducing another function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay i get it. I will remove this now.

if (shuffle) // Determine order of visitation.
visitationOrder = arma::shuffle(visitationOrder);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, right now the implementation of the CDk optimizer is similar to the SGD optimizer. I guess there is no need to duplicate code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, i agree for the most part. But the problem remains that sgd function utilities a evaluate function as stopping criteria. Maybe we can set the tolerance to very low so that the difference evaluate function of the previous and current function is never less than the tolerance level but we cannot attach any guarantees with that. Also i will look at the ssRBM and let you know if we nedd the cdk optimizer or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember right the ssRBM paper don't mention the stopping criteria. You can use the SGD optimizer with tolerance < 0.

ownBias = arma::mat(weights.memptr() + weight.n_elem, inSize, false, false);
otherBias = arma::mat(weight.memptr() + weight.n_elem + inSize, outSize, false, false);

if(typeVisible)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some errors since visible.inSize = hidden.outSize and visible.outSize = hidden.inSize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to say here that when we are initializing the 1.hidden layer(outsize, insize, 0).
According to that i think you might be correct but i wrote it with the idea that 2.hidden layer(insize, outsize, 0). Should i change this to keeping the 1 in mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think inSize should denote the number of input units. I guess if the meaning of inSize depends on the type of the level it is confusing.

VisibleLayerType,
HiddenLayerType>::Evaluate(const arma::mat& /* parameters*/, const size_t i)
{
arma::mat freeEnergy, freeEnergyCorrupted, output;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you send me the link to the source?
P.S. FreeEnergy() returns double.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh sorry. I would change this. So there are 2 sources i am following

  1. https://github.com/scikit-learn/scikit-learn/blob/14031f6/sklearn/neural_network/rbm.py#L28
  2. http://deeplearning.net/tutorial/code/rbm.py you can have look at the pseudo likelihood cost function

Copy link
Contributor

@lozhnikov lozhnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented out some issues.
I didn't point out style guide issues (check spaces, variable names and so on).

{
if(Parameters().empty())
Reset();
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you forgot braces.

*/
void CalcGradient(arma::mat&& input, arma::mat&& output)
{
arma::mat temp1, temp2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reduce memory allocations here? As for me it is better to define these temporary variables as class members.

for(size_t j = 0; j < num_steps - 1; j++)
{
// Use probabilties for updation till the last step(section 3 hinton)
ForwardVisible(std::move(temp), std::move(temp1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot to sample variables after each Forward() call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can read about it here. https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf.
But earlier i was doing just sampleVisible and sampleHidden for numSteps then also the result was not converging

@kris-singh
Copy link
Contributor Author

How can i fix the appveyor check. I want to add a new test to ann_layer_test.cpp. Also all the style checks on my files pass i don't understand why the test is failing.

@zoq
Copy link
Member

zoq commented Jun 19, 2017

If you rebase the PR against the latest master branch the style checks should pass and fix the appveyor and travis build issue. For more information: https://github.com/edx/edx-platform/wiki/How-to-Rebase-a-Pull-Request

/**
* Generates a binomial random number in the specified range.
*/
template<typename InputDatatype, typename OutputDatatype>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, you can just use for_eachor imbue to do the same without introducing another function.

{
std::cout << rbm.Evaluate(iterate, orderFunction) << std::endl;
// Update the step
iterate += stepSize * cumgradient;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked the Gradient function yet, but should we subtract or add here? Also, should we divide cumgradient by batchSize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did subtract and divide the cumgradient by batchsize but it didn't help with the training. Thought i agree with the scaling by batch size if you look at the update for the RBM they the gradients are added to the parameters.

gaussian, visible, hidden);
CDK<RBM<GaussianInitialization, BinaryLayer<>, BinaryLayer<> >> cdk(model,
100, 0.1, 15 * trainData.n_cols, 20, true, true);
model.Train(trainData, cdk);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you used std::move() inside the RBM constructor you can't reuse trainData here. You could remove std::move from the RBM constructor or pass a new copy to the Train function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have remove the reference in the model constructor now.

freeEnergy = FreeEnergy(std::move(currentInput));
currentInput(corruptIdx) = 1 - predictors(corruptIdx);
freeEnergyCorrupted = FreeEnergy(std::move(currentInput));
output(i) = SoftplusFunction::Fn(freeEnergyCorrupted - freeEnergy + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orderFunction holds the visitation order of the current batch and is of size batchSize right? So let's say orderFunction is [120, 121, 122, ...] I guess I run into an issue if I do output(i)with i = 120 since output has the same size as orderFunction which is batchSize. Maybe I missed something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess, you meant to use the index of the current iteration instead of using the value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we should avoid auto here, I don't think it makes it easier for someone trying to understand the code if you have to lookup the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes i meant to use the iteration index here for the output. But funnily the code actually does not give error on this which is strange.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What test case do you use? It will not fail if numFunctions = batchSize which I think is the case for the SGDFunction.

@kris-singh
Copy link
Contributor Author

The training error / evaluate function now only increases rather than going up and down.
The samples are also always one.

Copy link
Contributor

@lozhnikov lozhnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found the fix. Try the following changes.

// Now iterate!
arma::mat gradient(iterate.n_rows, iterate.n_cols);
arma::mat cumgradient(iterate.n_rows, iterate.n_cols);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to add here

cumgradient.zeros();

Or just add to the constructor arma::fill::zeros.


if (i % batchSize == 0)
{
std::cout << overallCost << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to print overallCost each time. I think it is reasonable to print that after each training epoch (in order to debug of course).
I suggest the following:

    if (shuffle)
      rbm.Gradient(visitationOrder[currentFunction], gradient);
    else
      rbm.Gradient(currentFunction, gradient);

    cumgradient += gradient;

    if (shuffle)
      overallCost +=rbm.Evaluate(iterate, visitationOrder[currentFunction]);
    else
      overallCost +=rbm.Evaluate(iterate, currentFunction);

    if (i % batchSize == 0)
    {
      // Update the step
      iterate -= stepSize * (cumgradient / batchSize);
      cumgradient.zeros();
    }

    if (i % numFunctions == 0)
    {
      overallCost /= numFunctions;
      std::cout << overallCost << std::endl;
      overallCost = 0;
    }

// Collect the negative gradients
CalcGradient(std::move(negativeSamples), std::move(negativeGradient));

output = positiveGradient - negativeGradient;
Copy link
Contributor

@lozhnikov lozhnikov Jun 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main issue: the sign is invalid.
So write the following

output = -(positiveGradient - negativeGradient);

Or change the signs of positiveGradient and negativeGradient.
If you calculate the derivatives of the free energy function you will see that they have got the minus sign in the beginning.

BOOST_AUTO_TEST_CASE(VanillaNetworkTest)
{
arma::mat dataset;
dataset.load("/Users/kris/Desktop/GsoC/mlpack/src/mlpack/tests/data/mnist_first250_training_4s_and_9s.arm");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course here should be just dataset.load("mnist_first250_training_4s_and_9s.arm");

@lozhnikov
Copy link
Contributor

If I forget to comment something, here is the patch that seems works. Don't commit this patch (there are
a lot of debug prints and so on).
patch.zip

@lozhnikov
Copy link
Contributor

lozhnikov commented Jun 29, 2017

Try the following:

arma::mat("1.0 0.0 1.0;"
          "1.0 0.0 0.0;"
          /*etc*/);

model.Train(trainData, msgd);
model.Reset();
// Set the parmaeters from a learned rbm
model.Parameters() = {-0.23224054, -0.23000632,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also fix the initializer list here?

model.HiddenLayer().Weight()), 0, 1e-2);

// Check free energy
arma::vec freeEnergy = {-0.87523715, 0.50615066, 0.46923476, 1.21509084};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

@lozhnikov
Copy link
Contributor

Regarding the serialization test:
You couldn't compile the test since the BinaryLayer class contains constants such as inSize, outSize and typeVisible. So, you should remove const modifiers.

@kris-singh
Copy link
Contributor Author

Only krann test is failing so you can review this.

Copy link
Contributor

@lozhnikov lozhnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slow response. I noticed that the regression accuracy is too low (about 10%). So, I looked through the test and I found a critical error in the data loading code (see below). I fixed that and I got the following results:
regression accuracy is equal to 67.3267
RBM accuracy is equal to 26.2695.

I tried to use the digits dataset (the dataset form the scikit example) and I got 76.1825 and 27.6155 for the regression and the RBM respectively. So, I think you did an error somewhere. I didn't' dig into that, but I pointed out some issues below.


for (size_t i = 0; i < testLabels.n_rows; ++i)
{
testLabels(i) = arma::as_scalar(testLabels.row(i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a critical error. That's equal to testLables = testLabels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not understand your point here can you elaborate a bit.
I have used arma::as_scalar for converting the 1x1 matrix to a scalar quantity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't initialize testLabels properly. You initialize testLabelsTemp but you don't copy that to testLabels, so testLabels is equal to zero. Moreover arma::as_scalar(testLabels.row(i)) is equal to testLabels(i) since testLabels is a vector.

Copy link
Contributor Author

@kris-singh kris-singh Jul 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay got your point so
testLabels(i) = arma::as_scalar(testLabelsTemp.row(i));
even with this i am getting the classification accuarcy for softmax as 10%
and for rbm as 84%.
Are you using the same dataset that i commited.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I use the same dataset. And I build the project from scratch.
If I change
testLabels(i) = arma::as_scalar(testLabels.row(i));
to
testLabels(i) = testLabelsTemp(i);
the accuracy becomes 27% for the RBM.
But I think you didn't notice another error:
for (size_t i = 0; i < testLabels.n_rows; ++i)
testLabels is a row, so testLabels.n_rows is equal to 1.
In such a way, you should write for (size_t i = 0; i < testLabelsTemp.n_rows; ++i).

When I fixed that I got 67% and 27% for regression and RBM respectively.

arma::mat trainData, testData, dataset;
arma::vec trainLabelsTemp, testLabelsTemp;
data::Load("mnist_data_small.csv", dataset, true);
trainData = dataset.submat(0, 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for me it is much better to divide the dataset by four parts e.g. test.csv, train.csv, test_labels.csv, train_labels.csv since your way doesn't allow anyone else to use the dataset as the the splitting algorithm is not documented anywhere.

weightNegativeGrad = hiddenBiasNegativeGrad * negativeSamples.t();
visibleBiasNegativeGrad = negativeSamples;

output = (positiveGradient - negativeGradient);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the sign is invalid since you switched to the mini-batch SGD optimizer.


// Train the model.
Timer::Start("rbm_optimization");
const double out = optimizer.Optimize(*this, parameter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tthe out variable is not used.

"-0.23224054, -0.23000632, -0.25701271, -0.25122418, -0.20716651"
"-0.20962217, -0.59922456, -0.60003836, -0.6, -0.625, -0.475;");
// Check Weight Shared
BOOST_REQUIRE_CLOSE(arma::accu(model.VisibleLayer().Weight() -
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the weights should be equal. Why did you choose 1e-2 accuracy? The precision of double around 1 is equal to 1e-16 so you can set something like 1e-14.

RBM<GaussianInitialization, BinaryLayer<>, BinaryLayer<>> model(trainData,
gaussian, visible, hidden, 1, true);
MiniBatchSGD msgd(10, 0.06, trainData.n_cols * 20, 0, true);
model.Train(trainData, msgd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you train the model if you don't use the parameters.

calcultedFreeEnergy(i) = model.FreeEnergy(std::move(trainData.col(i)));
}

BOOST_REQUIRE_LE(arma::accu(calcultedFreeEnergy - freeEnergy), 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here should be BOOST_REQUIRE_CLOSE()

@kris-singh
Copy link
Contributor Author

Strange the accuray i was getting was around 90% i will check again and let you know..

testData = testData.t();

arma::mat output, XRbm(hiddenLayerSize, trainData.n_cols),
YRbm(hiddenLayerSize, trainLabels.n_cols);
Copy link
Contributor

@lozhnikov lozhnikov Jul 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have found the fix. You don't define the dimensions of YRbm properly.
Try the following:
YRbm(hiddenLayerSize, testData.n_cols)

@kris-singh kris-singh force-pushed the Binary_RBM branch 2 times, most recently from 1425a40 to c2bd07c Compare July 6, 2017 07:57
@kris-singh
Copy link
Contributor Author

Ready to merge.

int hiddenLayerSize = 100;
arma::mat trainData, testData, dataset;
arma::vec trainLabelsTemp, testLabelsTemp;
data::Load("mnisttrain.txt", trainData, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If I am not mistaken, you didn't upload the MNIST dataset, you uploaded the Digit dataset that I sent you. In such a way the file names should look like digit_train.txt, digit_train_labels.txt etc.
  2. Moreover you uploaded the same binary files two times: in 646c6c6 and in c2bd07c. So, I guess you should do git-rebase.
  3. The official webpage of the dataset (http://archive.ics.uci.edu/ml/datasets/Pen-Based+Recognition+of+Handwritten+Digits) points out the citation policy (http://archive.ics.uci.edu/ml/citation_policy.html). So, I think we have to refer to the repo in the source code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, have you tested to store the datasets as arma_binary? I think it would be great if we could keep the size of the repo and datasets low.

{
Forward(std::move(input), std::move(output));
for (size_t i = 0; i < output.n_elem; i++)
output(i) = math::RandBernoulli(output(i));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if using arma::randi<arma::mat>(10, 10, arma::distr_param(0, 1)); is faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this and rand Bernoulli is faster

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for checking.

@kris-singh
Copy link
Contributor Author

If they are no further comments i think you can merge this.

@lozhnikov
Copy link
Contributor

Could you add binary data in a separate commit?

arma::mat trainLabelsTemp, testLabelsTemp;
trainData.load("digits_train.arm");
testData.load("digits_test.arm");
trainLabelsTemp.load("digits_train_label.arm");
Copy link
Contributor

@lozhnikov lozhnikov Jul 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I overlooked that. There is no sense in binary labels since each double utilize 8 bytes and each label is a digit. So, plain labels should take 2*numLabels bytes (digit+separator). In my opinion it is better to save arma::Row<size_t> in plain format. That should simplify code since that allow to avoid temporary matrices (such as trainLabelsTemp and testLabelsTemp).

P.S. On second thoughts, that is not critical, so I think you can leave that as is since the dataset is added in a separate commit.

trainLabelsTemp.load("digits_train_label.arm");
testLabelsTemp.load("digits_test_label.arm");

std::cout << arma::size(testLabelsTemp) << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove all unnecessary output?

@lozhnikov
Copy link
Contributor

I'll close this PR since the code was merged into PR #1046 (SpikeSlabRBM) and the code was much refactored in #1046 . So the binary RBM will be merged together with the spike and slab RBM.

@lozhnikov lozhnikov closed this Jul 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants