Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recurrent neural nets #4227

Closed
wants to merge 3 commits into from
Closed

Recurrent neural nets #4227

wants to merge 3 commits into from

Conversation

grig-guz
Copy link
Contributor

@grig-guz grig-guz commented Apr 3, 2018

Ongoing work, comments are greatly appreciated. Created another pull request since merging to different fork every time would probably be a pain.
#3966

@grig-guz
Copy link
Contributor Author

grig-guz commented Apr 3, 2018

The overall structure seems to be very different from other NN classes that are already implemented here.

  • Since we need to backprop through entire sequence to get gradients, we have to store to store hidden states and outputs, so let me know if storing it as vector of SGMatrix'es is the best idea.

  • We are forced to apply tanh inside compute_activations of NeuralRecurrentLayer, since it is required to transition to the next state. I don't really see a way to use NeuralTanhLayer here.

@grig-guz
Copy link
Contributor Author

grig-guz commented Apr 3, 2018

Oops, no tests ran here:
https://travis-ci.org/shogun-toolbox/shogun/jobs/361568597#L1951
@karlnapf @vigsterkr could you please check what's wrong with the build?

EDIT: pushed the changes again, let's see what CI says. Nope, same error again, something with gtest. No clue how to fix it, no such errors locally :(

@karlnapf
Copy link
Member

karlnapf commented Apr 5, 2018

The CI errors might come from your test that fails to compile https://travis-ci.org/shogun-toolbox/shogun/jobs/362010308#L1948

It might be from the order of the includes in your test. Try including the googletest headers first (or last) dont remember, check other tests)

@karlnapf
Copy link
Member

karlnapf commented Apr 5, 2018

Next thing, could you chunk the PR into multiple smaller ones? I.e. make a copy of your changes locally and update this PR with a smaller number of changes. It is better to send 5 small PRs than one big one (if possible). Try to separate individual additions. It makes things so much easier to review ( the reason why this wasnt reviewed yet is because it is so big)

1 similar comment
@karlnapf
Copy link
Member

karlnapf commented Apr 5, 2018

Next thing, could you chunk the PR into multiple smaller ones? I.e. make a copy of your changes locally and update this PR with a smaller number of changes. It is better to send 5 small PRs than one big one (if possible). Try to separate individual additions. It makes things so much easier to review ( the reason why this wasnt reviewed yet is because it is so big)

@@ -188,7 +188,7 @@ float64_t CNeuralLinearLayer::compute_error(SGMatrix<float64_t> targets)
float64_t sum = 0;
int32_t length = m_num_neurons*m_batch_size;
for (int32_t i=0; i<length; i++)
sum += (targets[i]-m_activations[i])*(targets[i]-m_activations[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better!
Even better would be to use a single linalg call to compute this, rather than the loop.
The advantage also is that it would be SIMD and therefore faster

namespace shogun
{
/** @brief Recurrent Neural layer with linear neurons, with an identity activation
* function. can be used as a hidden layer or an output layer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, Can with capital C

* Each neuron in the layer is connected to all the neurons in all the input
* and hidden layers that connect into this layer.
*
* The hidden state at time step t is computed according to the previous hidden state
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name the hidden state, i.e. The hidden state $h_t$ at time step $t$ is computed ...
(also pls use proper math mode for all math variables)

* The hidden state at time step t is computed according to the previous hidden state
* and the input at the current state as:
*
* \f$ h_t = f(W x_t + U s_t_-_1 + b_h) \f$ where \f$ b_h \f$ is the bias vector of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the subscript s_t_-1 is wrong
$s_{t-1}$
Pls take extra care when writing math in the doxygen, you can render it locally via make doc and then check index.html it in doc

* the weights matrix between this hidden layer and the output layer
*
*/
class CNeuralRecurrentLayer : public CNeuralLinearLayer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain to me why the reccurent layer inherits from the linear layer?
I am not too familiar with the NN architechure we have but this puzzles me a bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, you're right. That one should inherit from NeuralLayer.

class CNeuralRecurrentLayer : public CNeuralLinearLayer
{
public:
/** default constructor */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Default"

@grig-guz
Copy link
Contributor Author

grig-guz commented Apr 5, 2018

@karlnapf sure, thanks, I'll submit PRs for every big function in these classes. I'm finishing tests for activation function of RecurrentLayer, it's quite buggy on the remote now.

virtual void initialize_neural_layer(CDynamicObjectArray* layers,
SGVector<int32_t> input_indices);

/** Sets the batch_size and allocates memory for m_activations and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new Shogun has generic set and get methods that can change any parameter that was registered using SG_ADD. Therefore we dont need to add any getters or setters ...

Pls give those variables default values, and avoid "must be called before ..."

*/
virtual void set_batch_size(int32_t batch_size);

/** Initializes the layer's parameters. The layer should fill the given
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with "The layer should fill the given arrays .... "? This is a function call and the comment should describe what it does

Copy link
Contributor Author

@grig-guz grig-guz Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, not sure, these comments are back from 2014, taken from LinearLayer class. Will fix that!

*
* @param parameters Vector of size get_num_parameters()
*
* @param parameter_regularizable Vector of size get_num_parameters().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this "Boolean mask"

* i, set parameter_regularizable[i] = false. This is usally used to turn
* off regularization for bias parameters.
*
* @param sigma standard deviation of the gaussian used to random the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capital S

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments

  • "random" is not a verb. Should be be explicit about what actually is done
  • What if I want non-Gaussian noise to be added? Shouldnt this be more modular?

SGVector<bool> parameter_regularizable,
float64_t sigma);

/** Computes the activations of the neurons in this layer, results should
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"results should be stored" should be "results are stored"
However, this is public user API? Then there is not need to talk about internal states (this are hidden from the user)
If the method is an internal helper, it should be protected/private and then the docs can talk about internal states ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise many of these comments are regarding the general design of the NN classes, which are not your fault.
However, we can still discuss these things a bit to make it better.

SGMatrix<float64_t> A_ref(layer.get_num_neurons(), x.num_cols);

float64_t* biases = params.vector;
float64_t* weights = biases + layer.get_num_neurons();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your formatting here is all over the place

A_ref(i,j) = biases[i];

for (int32_t k=0; k<x.num_rows; k++)
A_ref(i,j) += weights[i+k*A_ref.num_rows]*x(k,j);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls dont do such vanilla implementation of linear algebra. Use linalg calls and no loops


for (int32_t i=0; i<A_ref.num_rows; i++)
{
for (int32_t j=0; j<A_ref.num_cols; j++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (auto j : range(A_ref.num_cols)) is the modern way to do this

}

// compare
EXPECT_EQ(A_ref.num_rows, A.num_rows);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be ASSERT_EQ, comparing the content will segfault if this is false

// initialize the layer
layer.initialize_neural_layer(layers, input_indices);
SGVector<float64_t> params(layer.get_num_parameters());
SGVector<bool> param_regularizable(layer.get_num_parameters());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a lot of copy paste code in those tests.
Use Fixtures and templates to avoid this (see e.g. the recently refactored LibLinear test merged PR)

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave some minimal set of comments.

Before I review in more detail, pls split the PR into smaller units (even if very small) and address a few of the things. Then we can do a second iteration.

Thanks for the efforts, much appreciated

for (int i = 0; i < m_time_series_length; i++) {
m_outputs[i] = SGMatrix<float64_t>(m_output_dim, batch_size);
}
// TODO finish this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls avoid TODOs in PRs

// m_dropout_mask = SGMatrix<bool>(m_num_neurons, m_batch_size);

m_outputs = std::vector<SGMatrix<float64_t>>(m_time_series_length);
for (int i = 0; i < m_time_series_length; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto and range pls

}
}

void CNeuralRecurrentLayer::set_batch_size(int32_t batch_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented that we want to remove any getters and setters and use the generic ones.
This means that the code that pre-allocates memory in here need to be moved somewhere else, preferably into helper method that is always called before training. I am sure the other NNs have similar concepts, and this would be a nice PR that improves the general design

for (int32_t i=0; i<m_num_parameters; i++)
{
// random the parameters
parameters[i] = CMath::normal_random(0.0, sigma);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a way to generate a whole vector of random numbers

Also, would be cool to have this modular and user other random streams (separate PR)

// parameter_regularizable[i] = (i >= m_num_neurons);
}
}
// VERY BIG TODO: Figure out how to manage input from multiple layers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed! :)

typedef Eigen::Map<Eigen::VectorXd> EMappedVector;

EMappedMatrix A(m_activations.matrix, m_output_dim, m_batch_size);
EMappedMatrix H(m_hidden_activation.matrix, m_num_neurons, m_batch_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use linalg and not eigen

void CNeuralRecurrentLayer::compute_activations(SGVector<float64_t> parameters,
CDynamicObjectArray* layers)
{
float64_t* hidden_biases = parameters.vector;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls no raw pointers


int32_t weights_index_offset = m_num_neurons * 2;

for (int i = 0; i < m_time_series_length; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto

// but ReLU kill too much gradient when unrolling the recursion
// in backprop
H += (Wxh * X + Whh * H).colwise() + Bh;
H = H.unaryExpr<float64_t(*)(float64_t)>(&std::tanh);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should definitely be modular!

SGMatrix<float64_t> targets,
CDynamicObjectArray* layers,
SGVector<float64_t> parameter_gradients)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

@grig-guz
Copy link
Contributor Author

grig-guz commented Apr 5, 2018

@karlnapf Thank you very much for the reviews!
P.S. I tried to follow the style/structure of already implemented layers (linear, ReLU, etc.), and this TanhLayer test is a slightly tweaked copy of ReLU test. Seems like, given your comments, old implementations require A LOT of updates. I think we could create an issue for somebody to pretty much follow your comments in this PR and update LinearLayer, NeuralLayer and tests.

@karlnapf
Copy link
Member

karlnapf commented Apr 5, 2018

Yes, definitely a lot of work. But hey -- the code is old, and old code requires love :)
I realise you followed the structure and that is totally fine. However, let`s not make the mess bigger. If we are going to extend this, lets at least get some improvements into place as well, to keep this maintainable and extendable. A lot of Shogun is like that: you wanna do something new, you first have to fix the mess (at least some parts of it)
I agree it is a good idea to isolate some of the comments in issues. Do you want to go ahead with some of that?

@karlnapf
Copy link
Member

karlnapf commented Apr 5, 2018

We can discuss more details in IRC.

@stale
Copy link

stale bot commented Feb 26, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 26, 2020
@stale
Copy link

stale bot commented Mar 4, 2020

This issue is now being closed due to a lack of activity. Feel free to reopen it.

@stale stale bot closed this Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants