Use Ensmallen Callbacks To train LSTM and Fit only on training data. #56

kartikdutt18 · 2020-02-27T11:39:14Z

Hi everyone,
This closes #41, closes #42 and also closes #43.
I have used Ensmallen Callbacks instead of for loop to train LSTMs.
Another good idea that @shrit suggested is to use mlpack's CLI functionality to improve access to models from command line.
This has been built and locally tested.
Thanks.

kartikdutt18 · 2020-02-28T15:14:59Z

LSTM Univariate Results:
loss: 0.00605504
Finished training.
Saving Model
Model saved in lstm_univar.bin
Loading model ...
The predicted energy consumption for the next hour is : 
 0.383799

LSTM Multivariate:
Finished training. 
loss: 0.0245781
 Saving Model
Model saved in lstm_multi.bin
Loading model ...
The predicted Google stock (high, low) for the last day is: 
  (1115.57, 1078.62)

Earlier Predicted values were

 The predicted energy consumption for the next hour is :
 0.410681

and

The predicted Google stock (high, low) for the last day is the following:
1116.4, 1094.65

Nearly similar values were obtained (These may vary machine to machine). So I think changes made till now (after limiting epochs) haven't really changed the results.

kartikdutt18 · 2020-03-04T12:36:42Z

After sampling save results function, The new results are:

Mean Squared Error on Prediction data points: 0.00525148
The predicted energy consumption for the next hour is : 
 0.383751

and

The predicted Google stock (high, low) for the last day is: 
  (956.784, 934.122)

These are similar to earlier values.

kartikdutt18 · 2020-03-04T15:15:59Z

Hi @zoq, Could you take a look at this. I think this is ready. I have repeated the tests twice to ensure that this works.

rcurtin

Hey @kartikdutt18, thanks for taking the time to work on this one. The changes look good to me. I have a handful of comments but they're all pretty minor. Let me know if I can clarify any of them. 👍

LSTM/TimeSeries-Univariate/src/LSTMTimeSeriesUnivariate.cpp

LSTM/TimeSeries-Multivariate/src/LSTMTimeSeriesMultivariate.cpp

rcurtin · 2020-03-04T16:14:08Z

LSTM/TimeSeries-Multivariate/src/LSTMTimeSeriesMultivariate.cpp

+                ens::PrintLoss(),
+                ens::ProgressBar(),
+                ens::EarlyStopAtMinLoss(),
+                ens::StoreBestCoordinates<arma::mat>());


Does StoreBestCoordinates do anything here? It looks like you are creating a temporary StoreBestCoordinates() object to pass to the optimizer, but then there is no way to get the result out since it is temporary. Take a look at the ensmallen callback documentation; I think it has a nice example of using an instantiated StoreBestCoordinates callback.

Hmm, I removing it for now, I think the issue doesn't warrant StoreBestCoordinates() function however I think I need to use it locally to understand it a bit more clearly.

Sounds good, I think it's fine with or without StoreBestCoordinates().

LSTM/TimeSeries-Multivariate/src/LSTMTimeSeriesMultivariate.cpp

kartikdutt18 · 2020-03-04T16:42:04Z

Hi @rcurtin, Thanks for the review. I have made changes that you suggested.

rcurtin · 2020-03-05T14:27:18Z

LSTM/TimeSeries-Multivariate/src/LSTMTimeSeriesMultivariate.cpp

+      dataset.n_cols - 1));
+
+  // Number of epochs for training.
+  const int EPOCHS = 500;


So, just a minor note, and whether or not you handle it is up to you. Previously, we were doing 500 epochs, but they weren't really epochs because we were only looking at 1000 points in each "epoch". Now we're using the whole dataset, and since one epoch now is equivalent to an entire pass (not just 1000 points), 500 epochs is a lot more training. I wonder if maybe 100 is a better number here. 👍

I can make changes but I would have to test it first. Once I do I'll post a comment for the same. I hope that's okay.

Sure, no problem. If 500 epochs are actually necessary we can go with that, I just figured it would generally terminate way before then.

On my laptop, it took 107 epochs to stop. So I have changed it to 150 (A nice round number). If needed I can reduce it further. Thanks for the suggestion.

rcurtin · 2020-03-05T14:27:43Z

LSTM/TimeSeries-Multivariate/src/LSTMTimeSeriesMultivariate.cpp

+                ens::PrintLoss(),
+                ens::ProgressBar(),
+                ens::EarlyStopAtMinLoss(),
+                ens::StoreBestCoordinates<arma::mat>());


Sounds good, I think it's fine with or without StoreBestCoordinates().

rcurtin

Thanks @kartikdutt18! I only have a few comments left. If you want to handle them before merge (or let me know what you think), that would be awesome. 👍

rcurtin · 2020-03-05T14:30:33Z

LSTM/TimeSeries-Univariate/src/LSTMTimeSeriesUnivariate.cpp

+                // Progressbar Callback prints progress bar for each epoch.
+                ens::ProgressBar(),
+                // Stops the optimization process if the loss stops decreasing
+                // or no improvement has been made. Useful in preventing overfitting.


This is a really pedantic comment but I think it's worth making---actually this is not useful for preventing overfitting in the way that people might expect. Normally you might terminate the optimization when the loss on a validation set starts increasing (but the training set loss will keep going down). However, EarlyStopAtMinLoss() only considers the training set. So basically EarlyStopAtMinLoss() will terminate if the training error starts going up. In my opinion the argument to say this is useful in preventing overfitting is a little weak, and I think a more accurate thing to say might be that this will terminate the optimization once we hit a minimum.

Hmm, You are absolutely right, This will only get to a minima on training data. Nice I never looked at it that way. Thanks.

rcurtin · 2020-03-05T14:31:50Z

LSTM/TimeSeries-Univariate/src/LSTMTimeSeriesUnivariate.cpp

-
-      // Don't reset optimizer's parameters between cycles.
-      optimizer.ResetPolicy() = false;
+    // Use Early Stopping criteria to stop training.


Another pedantic comment---this line itself doesn't specify that early stopping should be used, that's the callback. I think a more effective comment might be:

// Instead of terminating based on the tolerance of the objective function, we'll depend on // the maximum number of iterations, and terminate early using the EarlyStopAtMinLoss // callback.

(I didn't check if those are 80 characters, so you might need to reflow it if you want to use that text directly. Feel free to adapt it if you want to improve the wording or anything. 👍)

Great, I will make the changes for the same.

mlpack-bot

Second approval provided automatically after 24 hours. 👍

kartikdutt18 · 2020-03-07T09:52:44Z

Hi @rcurtin, Since I made some changes that you suggested I tested it again and I got the following results:

Model saved in lstm_multi.bin
Loading model ...
Mean Squared Error on Prediction data points:= 0.116839
The predicted Google stock (high, low) for the last day is: 
  (1000.61, 978.512)

Similarly for Univariate results were:

Mean Squared Error on Prediction data points: 0.00492178
The predicted energy consumption for the next hour is : 
0.38489

kartikdutt18

Hi, I have made some changes that I think might be better than what we were doing. Kindly have a look and I'll be more than happy to learn more about them / revert them.
Also, sorry for the delay.
Thanks for all the help.

kartikdutt18 · 2020-03-07T10:10:50Z

LSTM/TimeSeries-Multivariate/src/LSTMTimeSeriesMultivariate.cpp

@@ -57,7 +58,7 @@ double MSE(arma::cube& pred, arma::cube& Y)
    arma::mat temp = diff.slice(i);
    err_sum += accu(temp%temp);
  }
-  return (err_sum) / (diff.n_elem + 1e-50);


Hi @rcurtin, Is there a reason why we need to do this?
I don't think diff.n_elem can ever be zero?

mlpack has L2 distance in core/metric, would it be okay to switch to that rather using for loop to calculate it. I think making our own L2 is a bit redundant. What do you think?

You're right, this is a lot cleaner. Thanks! 👍

And yeah, I agree, diff.n_elem should never be 0, so the + 1e-50 shouldn't ever be necessary.

kartikdutt18 · 2020-03-07T10:11:52Z

LSTM/TimeSeries-Multivariate/src/LSTMTimeSeriesMultivariate.cpp

-  y.set_size(outputSize, dataset.n_cols - rho + 1, rho);
+  // Split the dataset into training and validation sets.
+  arma::mat trainData, testData;
+  data::Split(dataset, trainData, testData, RATIO);


I have also switched to Split function from mlpack. I hope that's okay.

Hmm, actually I'm not sure on this one. The Split() function shuffles the data during splitting, but for a time series problem like this it makes more sense to keep the first part of the data as the training set, and then the last part of the data as the test set. Unfortunately I don't see any option to pass to Split() to avoid shuffling (maybe it would be good to add one? I don't know how important it would be), so I think we should revert this bit, and then I can go ahead and merge it. 👍

Yes you are right, I recently learned about this when I went through the code base for data for creating a DataLoader. This should definitely be reverted. Thanks for pointing it out.

I have made the changes for the same and tested it for epochs. Thanks for all the help.

Awesome, thanks. I'll go ahead and merge it once it passes the tests. 👍

Could also help me in answering this question, I was creating a DataLoader, So for time series analysis since input is transformed into output, does it make sense to fit only on training input and not all training data?
We would however transform both training input and output.

Unfortunately I don't see any option to pass to Split() to avoid shuffling (maybe it would be good to add one? I don't know how important it would be)

I think optional shuffle will be a good idea. Should be simple to resolve this. Would it be okay for me to open a PR for this?

kartikdutt18 · 2020-03-12T06:19:53Z

Hmm, I think this is test failure is related to cmake changes added in mlpack and unrelated to models repo. I might be missing something though. I'll take a closer look just in case it is related to this PR.
Thanks.

rcurtin · 2020-03-12T23:34:45Z

Yes, I think that due to mlpack#2247, the patch mlpack-cmake.patch is no longer necessary. I think the build might succeed if you remove that file and remove the associated line from .appveyor.yml. Do you want to try and see what happens? :)

rcurtin · 2020-03-12T23:42:10Z

Oh, wait, sorry, I didn't see #58 which solves that issue. Maybe we can just merge that, rebase this branch, and it should work. :)

kartikdutt18 · 2020-03-13T06:35:42Z

Oh, wait, sorry, I didn't see #58 which solves that issue. Maybe we can just merge that, rebase this branch, and it should work. :)

Agreed, That would be great. Thanks.

Simplify Save Results Changed Path for data, removed unnecc header and added comment for callbacks Removed optimizer reset Changed epochs, better comments Changed to internal split, reduced epochs Changed path Switch to L2 in MSE

kartikdutt18 · 2020-03-14T03:15:38Z

Hi, I have rebased this. I think this should pass the tests now. Thanks.

birm · 2020-03-19T23:34:18Z

I think this fits well with the other changes from models->examples, or at least doesn't conflict with them. If we're reworking LTSM, I think it would make more sense to do so after this is merged, rather than before.

kartikdutt18 · 2020-03-20T05:15:08Z

Agreed, That makes sense. This is ready from my side. Thanks a lot.

rcurtin · 2020-03-26T01:58:10Z

Oops, I didn't realize that this wasn't merged before the repository split! So I merged it here then cherry-picked the commits into the models repository.

kartikdutt18 · 2020-03-26T03:09:33Z

No worries, I think the models repo would be restructured with mlpack/models#3, so I'll try to take of anything that is needed there. Thanks a lot @rcurtin, @birm for the helpful reviews and comments.

mlpack-bot bot added s: needs review s: unanswered s: unlabeled labels Feb 27, 2020

kartikdutt18 changed the title ~~Use Ensmallen Callbacks To train LSTM~~ Use Ensmallen Callbacks To train LSTM and Fit only on training data. Feb 28, 2020

rcurtin reviewed Mar 4, 2020

View reviewed changes

rcurtin reviewed Mar 5, 2020

View reviewed changes

rcurtin approved these changes Mar 5, 2020

View reviewed changes

mlpack-bot bot approved these changes Mar 6, 2020

View reviewed changes

mlpack-bot bot removed the s: needs review label Mar 6, 2020

kartikdutt18 commented Mar 7, 2020

View reviewed changes

rcurtin removed the s: unanswered label Mar 12, 2020

kartikdutt18 added 4 commits March 14, 2020 08:43

Use Ensmallen To train LSTM

7c3b9d7

Scale only training data

ce09af5

Limit Epochs

b0bbc0c

Simplify Save Results Changed Path for data, removed unnecc header and added comment for callbacks Removed optimizer reset Changed epochs, better comments Changed to internal split, reduced epochs Changed path Switch to L2 in MSE

Removed internal split

9f0e92f

kartikdutt18 force-pushed the LSTM-ens-callback branch from 5da4a76 to 9f0e92f Compare March 14, 2020 03:14

kartikdutt18 mentioned this pull request Mar 18, 2020

What is this repository for? #61

Closed

birm approved these changes Mar 19, 2020

View reviewed changes

kartikdutt18 mentioned this pull request Mar 21, 2020

Simplifying the Examples repository. #66

Closed

7 tasks

rcurtin merged commit f4631d9 into mlpack:master Mar 26, 2020

Use Ensmallen Callbacks To train LSTM and Fit only on training data. #56

Use Ensmallen Callbacks To train LSTM and Fit only on training data. #56

Conversation

kartikdutt18 commented Feb 27, 2020 • edited

kartikdutt18 commented Feb 28, 2020 • edited

kartikdutt18 commented Mar 4, 2020

kartikdutt18 commented Mar 4, 2020

rcurtin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kartikdutt18 commented Mar 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcurtin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlpack-bot bot left a comment

Choose a reason for hiding this comment

kartikdutt18 commented Mar 7, 2020

kartikdutt18 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kartikdutt18 Mar 7, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kartikdutt18 Mar 11, 2020 • edited

Choose a reason for hiding this comment

kartikdutt18 commented Mar 12, 2020

rcurtin commented Mar 12, 2020

rcurtin commented Mar 12, 2020

kartikdutt18 commented Mar 13, 2020

kartikdutt18 commented Mar 14, 2020 • edited

birm commented Mar 19, 2020

kartikdutt18 commented Mar 20, 2020

rcurtin commented Mar 26, 2020

kartikdutt18 commented Mar 26, 2020

kartikdutt18 commented Feb 27, 2020 •

edited

kartikdutt18 commented Feb 28, 2020 •

edited

kartikdutt18 left a comment •

edited

kartikdutt18 Mar 7, 2020 •

edited

kartikdutt18 Mar 11, 2020 •

edited

kartikdutt18 commented Mar 14, 2020 •

edited