Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation - Find and Fill Method for Dropout Layer #3684

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

MarkFischinger
Copy link
Contributor

Following our discussion in issue #3662, I've implemented the new algorithm in the dropout layer.

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we split this one into two implementations, like #3683? The numbers in #3662 show that this implementation is actually a slowdown, although the implementation here is general enough to be used with Bandicoot too.

(Requesting changes just because I don't want to merge a slowdown without a bit of discussion)

@shrit
Copy link
Member

shrit commented Apr 11, 2024

it is up to you, usually the dropout is happening once to my understanding per epoch, while there is a slow down, but it is not going to affect the general speed of training.
We can do the split, but I would trade the code maintenance and size compared to the number of nanoseconds we will lose 😄

@rcurtin
Copy link
Member

rcurtin commented Apr 11, 2024

It will happen every step of the forward pass, so the slowdown may be noticeable. One quick test would just be to run the mnist_simple and/or mnist_cnn examples with both implementations and see what the overall time per epoch is.

@shrit
Copy link
Member

shrit commented Apr 11, 2024

okay it makes sense then to do the benchmark
@MarkFischinger I hope you have some time for this if you do not mind

@MarkFischinger
Copy link
Contributor Author

@rcurtin @shrit Absolutely, I'm on the same page. I was actually leaning towards running additional benchmarks for that exact reason before. I've got a few ideas to potentially optimize the algorithm, and I'm eager to test them out in some upcoming benchmarks. Fingers crossed, I should be able to run them later today :)

@MarkFischinger
Copy link
Contributor Author

@rcurtin @shrit The new algorithm appears slightly faster than the old one when processing small matrices but significantly slower with larger matrices (also visible in the graph in the discussion before). If it's not necessary to replace it, I wouldn't. However, if we need to eliminate the transform function, this could be a viable alternative.

For mnist_simple:

New Algorithm:
    Duration: 78910ms
    Loss: 0.0331713
Old Algorithm:
    Duration: 79709ms
    Loss: 0.0331713

For mnist_cnn:

Old Algorithm:
    Duration: 403866ms
    Loss: 0.919632 (Epoch: 32.9s, Step: 170ms)
    Validation loss: 2894.57
New Algorithm:
    Duration: 415430ms
    Loss: 0.919632 (Epoch: 31.93s, Step: 174ms)
    Validation loss: 2894.57
    Accuracy: Train = 86.9947%, Valid = 86.5476%

@shrit
Copy link
Member

shrit commented Apr 14, 2024

@MarkFischinger Okay, I see the benchmark, in this case it would make sense to have both implementation. Please update this PR by following a similar structure that is done in this #3683

@MarkFischinger
Copy link
Contributor Author

@shrit In the latest commit, I've implemented both algorithms using a structure similar to what you did in #3683. I have a question about that: Why do we wrap the entire function within the #ifndef directive? Could it be placed inside the function instead? Is there a specific reason for this that I might not know about yet? I'd appreciate understanding your reasoning behind that :)

@shrit
Copy link
Member

shrit commented Apr 14, 2024

@MarkFischinger the other implementation that you added is for bandicoot library. It is a linear algebra library that we are developing (it has the same API as armadillo) to provide GPU acceleration for matrices operations.
The idea of putting all of that in a #ifdef is to have no dependency on bandicoot, otherwise, you will not be able to run mlpack or the tests without including bandicoot header which is something we are trying to avoid.
Technically, in the future, the user only needs to define MLPACK_HAS_COOT and then the makefile will look for bandicoot.
Our objective is too not add dependencies to mlpack, since they add a massive burden regarding maintenance and compilation, and when necessary we are only adding single file header only dependencies.

@MarkFischinger
Copy link
Contributor Author

MarkFischinger commented Apr 14, 2024

@shrit Thanks for your response! :) Sorry if my initial question wasn’t as clear. I've understood the general concept, but I'm specifically curious about the decision to place the ifdef outside the function. Is this meant to make it easier to distinguish between GPU and CPU operations, thereby improving maintainability?

Because my first thought was to implement it directly inside like this (for #3683 for example):

template<typename MatType>
void LogSoftMaxType<MatType>::ForwardImpl(const MatType& input,
                                          MatType& output)
{
  MatType maxInput = repmat(max(input), input.n_rows, 1);
  output = (maxInput - input);

  ApplyExponentialFunction(output);

  maxInput.each_row() += log(sum(output));
  output = input - maxInput;
}

template<typename MatType>
void LogSoftMaxType<MatType>::ApplyExponentialFunction(MatType& output)
{
#ifdef MLPACK_HAS_COOT // <-- Inside the function like this 
  if constexpr (coot::is_coot_type<MatType>::value)
  {
    output = exp(output * -1);
  }
  else
#endif
  {
    // Approximation of the base-e exponential function. Credits to Leon Bottou.
    output.transform([](double x)
    {
      static constexpr double A0 = 1.0, A1 = 0.125, A2 = 0.0078125, A3 = 0.00032552083, A4 = 1.0172526e-5;
      if (x < 13.0)
      {
        double y = A0 + x * (A1 + x * (A2 + x * (A3 + x * A4)));
        y *= y; y *= y; y *= y; return 1 / y;
      }
      return 0.0;
    });
  }
}

I couldn't find any best practices in C++ on this topic on the web. Do you know which is the preferred implementation?

@shrit
Copy link
Member

shrit commented Apr 15, 2024

@MarkFischinger we can not use if constexpr because we do not support yet C++17

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just some comments on the multiple implementations whenever you have a chance. 👍

* @param input Input data used for evaluating the specified function.
* @param output Resulting output activation.
*/
void ForwardImpl(const MatType& input, MatType& output);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll have to use SFINAE here, e.g. add an argument like const typename std::enable_if_t<arma::is_arma_type<MatType>::value>::type* = 0 (I just did that from memory, it may not be exactly right).

@@ -74,10 +74,18 @@ DropoutType<MatType>::operator=(DropoutType&& other)
return *this;
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

No need for an extra line 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this one is still missing.

*/
void ForwardImpl(const MatType& input, MatType& output);

#ifdef MLPACK_HAS_COOT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we actually need this guard here: we want to use the optimized implementation when we have an Armadillo matrix, and the general implementation otherwise. So long as the second implementation of ForwardImpl() doesn't use the name coot:: inside of it (and it is possible to avoid that), we should be able to avoid the need for MLPACK_HAS_COOT, and just use SFINAE here too with a negated condition (i.e. std::enable_if_t<!arma::is_arma_type< ...).

mask.randu(input.n_rows, input.n_cols);
arma::uvec indices = arma::find(mask > ratio);
mask.zeros();
mask.elem(indices).fill(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mask.elem(indices).fill(1);
mask.elem(find(mask > ratio)).fill(1);

Would this work? It avoids the use of arma:: here, and would make this function fully general. But I think then we would need to find a way to get rid of the .zeros() call... maybe can we just do mask = (mask > ratio)? Or something along those lines? (It would be interesting to see the speed of that approach.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcurtin You can take a look at my implementation in the latest commit. It should be a bit faster and is fully functional. I avoided using arma:: to keep it more general.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, it looks good to me! Thanks for doing that.

@@ -74,10 +74,18 @@ DropoutType<MatType>::operator=(DropoutType&& other)
return *this;
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this one is still missing.

Comment on lines +86 to +96
template<typename T = MatType, typename std::enable_if_t<arma::is_arma_type<T>::value, int> = 0>
void ForwardImpl(const T& input, T& output);

/**
* General implementation of the forward pass of the dropout layer.
*
* @param input Input data used for evaluating the specified function.
* @param output Resulting output activation.
*/
template<typename T = MatType, typename std::enable_if_t<!arma::is_arma_type<T>::value, int> = 0>
void ForwardImpl(const T& input, T& output);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarkFischinger
Looks better, just use directly MatType instead of T. It is better for readability because T does not mean a lot.
Also, could you break the line? mlpack code base is usually under 80 lines of code.
Also, the syntax is correct, I would remove int and instead add a * after > so the signature looks as follows:

typename std::enable_if_t<!arma::is_arma_type<T>::value>* = 0>

The reason for this is to have the same signature over all the code base, making it easier for everyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants