Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nonlinear regression using Keras #1874

Closed
polarlight1994 opened this issue Mar 2, 2016 · 13 comments
Closed

Nonlinear regression using Keras #1874

polarlight1994 opened this issue Mar 2, 2016 · 13 comments

Comments

@polarlight1994
Copy link

I was trying to make nonlinear regression using Keras. However the result is far from satisfying. I was wondering how should I choose the Layers to build the NN and how to tuning the parameters like Activations, Objectives and others. Is there any principles or guide materials to address this problem? I am newcomer to deep learning and really need help here. The NN I built is as followes
model = Sequential()
model.add(Dense(input_dim = 4, output_dim = 500))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(input_dim = 500, output_dim = 1))
model.add(Activation('tanh'))
model.compile(loss='mean_absolute_error', optimizer='rmsprop')
Thanks~`

@pasky
Copy link
Contributor

pasky commented Mar 2, 2016

It's hard to give a generic advice, also without knowing the specifics of your data. For example, is your data labelled -1, +1? Is it better if you try without a hiden layer first? etc. etc. - probably the google group is a better place for asking for advice like this.

@mrwns
Copy link

mrwns commented Mar 2, 2016

  1. it is more common to have a linear layer as the output of the net in regression tasks.
  2. did you try normalise to zero mean/unit variance or scale your input to [0,1]?
  3. it is more common to use MSE instead of MAE, even though that should not change much
  4. can you overfit the net with your training data?

@hlin117
Copy link

hlin117 commented Mar 2, 2016

I think this question is more better suited for the keras google group here:
https://groups.google.com/forum/#!forum/keras-users

The question isn't specifically a package related question.

@philipperemy
Copy link

Unfortunately the answer is no. There is no magical tool that makes what you want.
As pointed out before, try to overfit your data. It means that despite the fact that your model will not be able to generalize, you have somehow the ability to approximate this function.
Remember that the fundamental theorem of neural networks is that any nn can theoretically approximate any non linear function (given enough parameters and data).

You can try:

  • Tune the number of hidden layers and the related number of neurons (funnel rule, more neurons in the first layers and less in the final layers as you go higher in abstraction).
  • Sigmoid is usually a good activation function. You can also ReLU.
  • You can look for other optimizers (AdaBoost...)
  • You may not have a huge dropout layer of p=0.5 between them.
  • Your output is also important (you may have a look at the cross entropy error).
  • Normalize your inputs (if it's financial time series, compute the returns. If it's a time series, be sure that it's stationary, i.e. first and second moments exist).

@polarlight1994
Copy link
Author

@pasky @hlin117 Thanks for your advice. I will move my issues to google group later. @mrwns Thank you for your concern. I have format my input using API of sklearn and i was wondering did Keras provide any methods to format data? @philipperemy Really appreciate your answer, now i have make the regression result quite better, however there is still a problem that some negative numbers came out which is not expected. Is that related to the data-format? I scale them into [-1, 1] with mean of 0. How should I constraint the regression result to be all positive in this circumstance? Really thanks for all your help! The NN i created is as follows:

`X_train_scale = preprocessing.scale(X_train)
X_test_scale = preprocessing.scale(X_test)

model = Sequential()
model.add(Dense(input_dim = 4, output_dim = 1000))
model.add(Activation('sigmoid'))
model.add(Dense(input_dim = 1000, output_dim = 1000))
model.add(Activation('sigmoid'))
model.add(Dense(input_dim = 1000, output_dim = 1000))
model.add(Activation('sigmoid'))
model.add(Dense(input_dim = 500, output_dim = 1))
model.add(Activation('linear'))`

@philipperemy
Copy link

@polarlight1994 yes it is somehow related to your inputs but you can also modify your model to handle it. I see two ways to fix your problem:

First you can normalize your data in a different way: http://stats.stackexchange.com/a/70808
This transformation (x-min(X))/(max(X)-min(X)) will give you values between 0 and 1. The fact that you don't have a mean of 0 and a variance of 1 shouldn't matter much.

Secondly, if you want to stick with your current normalization, you might want to replace your final Activation layer from Linear to ReLU. https://en.wikipedia.org/wiki/Rectifier_(neural_networks)
ReLU layer is a Linear layer that converts all negative values to 0. See that as a max(0,x).

So you can replace your last layer by:

model.add(Activation(relu))

Finally if you want only (0,1) as output and no intermediary values like 0.123, you may have a look at the softmax layer (+argmax). This becomes a classification problem.

@polarlight1994
Copy link
Author

@philipperemy I have tried the second way you mentioned before, but the output will all be 0. Since the expected output data in my problem is not constrained in (0, 1), so i was wondering the ReLU will not work because of this? Also i was told that if i want to make a nonlinear regression i should set the linear as output layer. Is that right? By the way, if i use the relu in the last second layer will it solve my problem?

@philipperemy
Copy link

No using the linear activation layer as your final output in a nonlinear regression is not a prerequisite. It depends on where the values of your output data are. The ReLU will output values between (0, +infinity), the Sigmoid between (0,1) and the Linear between (-Infinity,+infinity). The Linear gives you negative values obviously. What is the interval of your expected data?
Changing all your Sigmoid by ReLU will speed up the training. ReLU is very easy to backpropagate compared to Sigmoid. But I don't think you will see a drastic change.

@polarlight1994
Copy link
Author

@philipperemy My expected data should be located in range of (0, +infinity). So as you explained i should set the output layer into ReLU. But i will get all 0 output and the loss will not decrease in every epoch. Is it because the input is constrained in (-1, 1), so after the first three Sigmoid function the output of ReLU is always almost 0?
Again, thank you for your patient explanation!

@philipperemy
Copy link

If you get always 0 as output, it means that all the features at the previous layer are negative. I don't think the problem comes from your model or from your input data. You can always try to test with positive data but I don't think it will solve your problem.
What optimizer are you using? Maybe you're using a non-suitable optimizer for this particular problem.
Or maybe you must run with more epochs. You have more than 2 million weights. So it may take time for the optimizer to find a minimum.
Also try to split your problem into small problems (drop all the superfluous layers and try to overcome the problem. When it's done, you can add them back one by one and see).

@polarlight1994
Copy link
Author

@philipperemy I followed your advice. Now the output is not always zero and the loss can decrease in every epoch, however, there are still lots of negative data in output. I can't figure out why...Here is my latest NN. I used the MinMaxScaler to format my input data in range of (0, 1).

`min_max_scaler = preprocessing.MinMaxScaler()
X_train_scale = min_max_scaler.fit_transform(XX_train)
X_test_scale = min_max_scaler.transform(XX_test)

model = Sequential()
model.add(Dense(input_dim = 4, output_dim = 500))
model.add(Activation('relu'))
model.add(Dense(input_dim = 500, output_dim = 1))
model.add(Activation('relu'))

model.compile(loss='mean_squared_error', optimizer='rmsprop')`

@philipperemy
Copy link

It seems very weird that you still have negative data in your output.

I tried a very simple example with negative and positive values in your XX_train and XX_test (before the MinMaxScaler between 0 and 1).

My expected values were set to -1. I wanted to see that despite the ReLU layers, the NN could output negative values. If you expected this code, you will see that all the predicted values are 0. The ReLU layer prevents negative values.

3s - loss: 1.0000 - acc: 1.0000 - val_loss: 1.0000 - val_acc: 1.0000
[[ 0.]
 [ 0.]
 [ 0.]
 [ 0.]

Set the expected values to 1, and you will see all values very close to 1 (0.98,0.99, 1.01...). Once again the network could figure out this simple function. Values above 1 are consistent with the definition of the final ReLU layer.

Epoch 10/10
3s - loss: 2.2876e-04 - acc: 1.0000 - val_loss: 0.0014 - val_acc: 1.0000
[[ 1.01424801]
 [ 1.01220787]
 [ 1.00581753]
 [ 1.01019406]

Source code is here:

from __future__ import print_function
from keras.layers import Dense, Activation
from keras.models import Sequential
from sklearn import preprocessing
import numpy as np

N = 1000
XX_train = np.ones((N, 4)) * np.random.rand(N, 4) * (-10)
XX_test = np.ones((N, 4)) * np.random.rand(N, 4) * 10

YY_train_labels = - np.ones((N, 1))
YY_test_labels = - np.ones((N, 1))

min_max_scaler = preprocessing.MinMaxScaler()
min_max_scaler.fit(np.concatenate((XX_train, XX_test)))

X_train_scale = min_max_scaler.transform(XX_train)
X_test_scale = min_max_scaler.transform(XX_test)

model = Sequential()
model.add(Dense(input_dim=4, output_dim=500))
model.add(Activation('relu'))
model.add(Dense(input_dim=500, output_dim=1))
model.add(Activation('relu'))

model.compile(loss='mean_squared_error', optimizer='rmsprop')

model.fit(X_train_scale, YY_train_labels,
          batch_size=1, nb_epoch=10,
          show_accuracy=True, verbose=2,
          validation_data=(X_test_scale, YY_test_labels))

print(model.predict(X_train_scale, batch_size=1))

@stale stale bot added the stale label May 23, 2017
@stale
Copy link

stale bot commented May 23, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@stale stale bot closed this as completed Jun 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants