New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Recurrent neural nets #4227

Closed

grig-guz wants to merge 3 commits into shogun-toolbox:develop from grig-guz:feature/rnn

Contributor

grig-guz commented Apr 3, 2018 •

edited

Ongoing work, comments are greatly appreciated. Created another pull request since merging to different fork every time would probably be a pain.
#3966

Contributor Author

grig-guz commented Apr 3, 2018 •

edited

The overall structure seems to be very different from other NN classes that are already implemented here.

Since we need to backprop through entire sequence to get gradients, we have to store to store hidden states and outputs, so let me know if storing it as vector of SGMatrix'es is the best idea.
We are forced to apply tanh inside compute_activations of NeuralRecurrentLayer, since it is required to transition to the next state. I don't really see a way to use NeuralTanhLayer here.

Contributor Author

grig-guz commented Apr 3, 2018 •

edited

Oops, no tests ran here:
https://travis-ci.org/shogun-toolbox/shogun/jobs/361568597#L1951
@karlnapf @vigsterkr could you please check what's wrong with the build?

EDIT: pushed the changes again, let's see what CI says. Nope, same error again, something with gtest. No clue how to fix it, no such errors locally :(

olinguyen and others added 3 commits

April 3, 2018 23:46


          start RNN classes + added tanh layer

6206b9d


          modified comments and cleaned up tests

4fd62bc


          More structure to NeuralRecurrentLayer

c0c6ecc

Member

karlnapf commented Apr 5, 2018 •

edited

The CI errors might come from your test that fails to compile https://travis-ci.org/shogun-toolbox/shogun/jobs/362010308#L1948

It might be from the order of the includes in your test. Try including the googletest headers first (or last) dont remember, check other tests)

Member

karlnapf commented Apr 5, 2018

Next thing, could you chunk the PR into multiple smaller ones? I.e. make a copy of your changes locally and update this PR with a smaller number of changes. It is better to send 5 small PRs than one big one (if possible). Try to separate individual additions. It makes things so much easier to review ( the reason why this wasnt reviewed yet is because it is so big)

1 similar comment

Member

karlnapf commented Apr 5, 2018

Next thing, could you chunk the PR into multiple smaller ones? I.e. make a copy of your changes locally and update this PR with a smaller number of changes. It is better to send 5 small PRs than one big one (if possible). Try to separate individual additions. It makes things so much easier to review ( the reason why this wasnt reviewed yet is because it is so big)

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralLinearLayer.cpp

               	float64_t sum = 0;
               	int32_t length = m_num_neurons*m_batch_size;
               	for (int32_t i=0; i<length; i++)
-              		sum += (targets[i]-m_activations[i])*(targets[i]-m_activations[i]);

Member

karlnapf Apr 5, 2018

better!
Even better would be to use a single linalg call to compute this, rather than the loop.
The advantage also is that it would be SIMD and therefore faster

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.h

+              namespace shogun
+              {
+              /** @brief Recurrent Neural layer with linear neurons, with an identity activation
+               * function. can be used as a hidden layer or an output layer

Member

karlnapf Apr 5, 2018

Typo, Can with capital C

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.h

+               * Each neuron in the layer is connected to all the neurons in all the input
+               * and hidden layers that connect into this layer.
+               *
+               * The hidden state at time step t is computed according to the previous hidden state

Member

karlnapf Apr 5, 2018

name the hidden state, i.e. The hidden state $h_t$ at time step $t$ is computed ...
(also pls use proper math mode for all math variables)

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.h

+               * The hidden state at time step t is computed according to the previous hidden state
+               * and the input at the current state as:
+               *
+               * \f$ h_t = f(W x_t + U s_t_-_1 + b_h) \f$ where \f$ b_h \f$ is the bias vector of

Member

karlnapf Apr 5, 2018

the subscript s_t_-1 is wrong
$s_{t-1}$
Pls take extra care when writing math in the doxygen, you can render it locally via make doc and then check index.html it in doc

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.h

+               * the weights matrix between this hidden layer and the output layer
+               *
+               */
+              class CNeuralRecurrentLayer : public CNeuralLinearLayer

Member

karlnapf Apr 5, 2018

Could you explain to me why the reccurent layer inherits from the linear layer?
I am not too familiar with the NN architechure we have but this puzzles me a bit?

Contributor Author

grig-guz Apr 5, 2018

Ooops, you're right. That one should inherit from NeuralLayer.

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.h

+              class CNeuralRecurrentLayer : public CNeuralLinearLayer
+              {
+              public:
+              	/** default constructor */

Member

karlnapf Apr 5, 2018

"Default"

Contributor Author

grig-guz commented Apr 5, 2018

@karlnapf sure, thanks, I'll submit PRs for every big function in these classes. I'm finishing tests for activation function of RecurrentLayer, it's quite buggy on the remote now.

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.h

+              	virtual void initialize_neural_layer(CDynamicObjectArray* layers,
+              			SGVector<int32_t> input_indices);
+              	/** Sets the batch_size and allocates memory for m_activations and

Member

karlnapf Apr 5, 2018

The new Shogun has generic set and get methods that can change any parameter that was registered using SG_ADD. Therefore we dont need to add any getters or setters ...

Pls give those variables default values, and avoid "must be called before ..."

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.h

+              	 */
+              	virtual void set_batch_size(int32_t batch_size);
+              	/** Initializes the layer's parameters. The layer should fill the given

Member

karlnapf Apr 5, 2018

What do you mean with "The layer should fill the given arrays .... "? This is a function call and the comment should describe what it does

Contributor Author

grig-guz Apr 5, 2018 •

edited

Haha, not sure, these comments are back from 2014, taken from LinearLayer class. Will fix that!

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.h

+              	 *
+              	 * @param parameters Vector of size get_num_parameters()
+              	 *
+              	 * @param parameter_regularizable Vector of size get_num_parameters().

Member

karlnapf Apr 5, 2018

I would call this "Boolean mask"

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.h

+              	 * i, set parameter_regularizable[i] = false. This is usally used to turn
+              	 * off regularization for bias parameters.
+              	 *
+              	 * @param sigma standard deviation of the gaussian used to random the

Member

karlnapf Apr 5, 2018

Capital S

Member

karlnapf Apr 5, 2018

A few comments

"random" is not a verb. Should be be explicit about what actually is done
What if I want non-Gaussian noise to be added? Shouldnt this be more modular?

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.h

+              			SGVector<bool> parameter_regularizable,
+              			float64_t sigma);
+              	/** Computes the activations of the neurons in this layer, results should

Member

karlnapf Apr 5, 2018

"results should be stored" should be "results are stored"
However, this is public user API? Then there is not need to talk about internal states (this are hidden from the user)
If the method is an internal helper, it should be protected/private and then the docs can talk about internal states ...

Member

karlnapf Apr 5, 2018

I realise many of these comments are regarding the general design of the NN classes, which are not your fault.
However, we can still discuss these things a bit to make it better.

karlnapf reviewed

View reviewed changes

tests/unit/neuralnets/NeuralTanhLayer_unittest.cc

+              	SGMatrix<float64_t> A_ref(layer.get_num_neurons(), x.num_cols);
+              	float64_t* biases = params.vector;
+                  float64_t* weights = biases + layer.get_num_neurons();

Member

karlnapf Apr 5, 2018

your formatting here is all over the place

karlnapf reviewed

View reviewed changes

tests/unit/neuralnets/NeuralTanhLayer_unittest.cc

+              			A_ref(i,j) = biases[i];
+              			for (int32_t k=0; k<x.num_rows; k++)
+                              A_ref(i,j) += weights[i+k*A_ref.num_rows]*x(k,j);

Member

karlnapf Apr 5, 2018

pls dont do such vanilla implementation of linear algebra. Use linalg calls and no loops

karlnapf reviewed

View reviewed changes

tests/unit/neuralnets/NeuralTanhLayer_unittest.cc

+              	for (int32_t i=0; i<A_ref.num_rows; i++)
+              	{
+              		for (int32_t j=0; j<A_ref.num_cols; j++)

Member

karlnapf Apr 5, 2018

for (auto j : range(A_ref.num_cols)) is the modern way to do this

karlnapf reviewed

View reviewed changes

tests/unit/neuralnets/NeuralTanhLayer_unittest.cc

+              	}
+              	// compare
+              	EXPECT_EQ(A_ref.num_rows, A.num_rows);

Member

karlnapf Apr 5, 2018

should be ASSERT_EQ, comparing the content will segfault if this is false

karlnapf reviewed

View reviewed changes

tests/unit/neuralnets/NeuralTanhLayer_unittest.cc

+              // initialize the layer
+              layer.initialize_neural_layer(layers, input_indices);
+              SGVector<float64_t> params(layer.get_num_parameters());
+              SGVector<bool> param_regularizable(layer.get_num_parameters());

Member

karlnapf Apr 5, 2018

there is a lot of copy paste code in those tests.
Use Fixtures and templates to avoid this (see e.g. the recently refactored LibLinear test merged PR)

karlnapf requested changes

View reviewed changes

Member

karlnapf left a comment

I gave some minimal set of comments.

Before I review in more detail, pls split the PR into smaller units (even if very small) and address a few of the things. Then we can do a second iteration.

Thanks for the efforts, much appreciated

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.cpp

+              	for (int i = 0; i < m_time_series_length; i++) {
+              		m_outputs[i] = SGMatrix<float64_t>(m_output_dim, batch_size);
+              	}
+              	// TODO finish this

Member

karlnapf Apr 5, 2018

pls avoid TODOs in PRs

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.cpp

+              	// m_dropout_mask = SGMatrix<bool>(m_num_neurons, m_batch_size);
+              	m_outputs = std::vector<SGMatrix<float64_t>>(m_time_series_length);
+              	for (int i = 0; i < m_time_series_length; i++) {

Member

karlnapf Apr 5, 2018

auto and range pls

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.cpp

+              	}
+              }
+              void CNeuralRecurrentLayer::set_batch_size(int32_t batch_size)

Member

karlnapf Apr 5, 2018

I commented that we want to remove any getters and setters and use the generic ones.
This means that the code that pre-allocates memory in here need to be moved somewhere else, preferably into helper method that is always called before training. I am sure the other NNs have similar concepts, and this would be a nice PR that improves the general design

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.cpp

+              	for (int32_t i=0; i<m_num_parameters; i++)
+              	{
+              		// random the parameters
+              		parameters[i] = CMath::normal_random(0.0, sigma);

Member

karlnapf Apr 5, 2018

I think there is a way to generate a whole vector of random numbers

Also, would be cool to have this modular and user other random streams (separate PR)

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.cpp

+              		// parameter_regularizable[i] = (i >= m_num_neurons);
+              	}
+              }
+              // VERY BIG TODO: Figure out how to manage input from multiple layers

Member

karlnapf Apr 5, 2018

indeed! :)

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.cpp

+              	typedef Eigen::Map<Eigen::VectorXd> EMappedVector;
+              	EMappedMatrix  A(m_activations.matrix, m_output_dim, m_batch_size);
+              	EMappedMatrix  H(m_hidden_activation.matrix, m_num_neurons, m_batch_size);

Member

karlnapf Apr 5, 2018

pls use linalg and not eigen

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.cpp

+              void CNeuralRecurrentLayer::compute_activations(SGVector<float64_t> parameters,
+              		CDynamicObjectArray* layers)
+              {
+              	float64_t* hidden_biases = parameters.vector;

Member

karlnapf Apr 5, 2018

pls no raw pointers

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.cpp


		int32_t weights_index_offset = m_num_neurons * 2;

		for (int i = 0; i < m_time_series_length; i++) {

Member

karlnapf Apr 5, 2018

auto

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.cpp

+              		// but ReLU kill too much gradient when unrolling the recursion
+              		// in backprop
+              		H += (Wxh * X + Whh * H).colwise() + Bh;
+              		H = H.unaryExpr<float64_t(*)(float64_t)>(&std::tanh);

Member

karlnapf Apr 5, 2018

this should definitely be modular!

karlnapf reviewed

View reviewed changes

src/shogun/neuralnets/NeuralRecurrentLayer.cpp

+              		SGMatrix<float64_t> targets,
+              		CDynamicObjectArray* layers,
+              		SGVector<float64_t> parameter_gradients)
+              {

Member

karlnapf Apr 5, 2018

?

Contributor Author

grig-guz commented Apr 5, 2018

@karlnapf Thank you very much for the reviews!
P.S. I tried to follow the style/structure of already implemented layers (linear, ReLU, etc.), and this TanhLayer test is a slightly tweaked copy of ReLU test. Seems like, given your comments, old implementations require A LOT of updates. I think we could create an issue for somebody to pretty much follow your comments in this PR and update LinearLayer, NeuralLayer and tests.

Member

karlnapf commented Apr 5, 2018

Yes, definitely a lot of work. But hey -- the code is old, and old code requires love :)
I realise you followed the structure and that is totally fine. However, let`s not make the mess bigger. If we are going to extend this, lets at least get some improvements into place as well, to keep this maintainable and extendable. A lot of Shogun is like that: you wanna do something new, you first have to fix the mess (at least some parts of it)
I agree it is a good idea to isolate some of the comments in issues. Do you want to go ahead with some of that?

Member

karlnapf commented Apr 5, 2018

We can discuss more details in IRC.

stale bot commented Feb 26, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale bot added the stale label

stale bot commented Mar 4, 2020

This issue is now being closed due to a lack of activity. Feel free to reopen it.

stale bot closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment