New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Gaussian process for arbitrary-dimensional output spaces #1611
Conversation
…sional, and an (n,m) array if the output is y dimensional. This makes it backwards compatible
That's a useful contribution! Thanks. New tests are indeed useful. If multi-output tests were not around for the other models that support multi-output, I would have broken this support more than once when making changes. |
y = np.asarray(y).ravel()[:, np.newaxis] | ||
y = array2d(y) | ||
|
||
if y.shape[1] == X.shape[0] and y.shape[0] == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That test seems a bit too general. I think that you are only trying to capture the situation when the original 'y' vector was 1D. You can do that by storing the number of dimension of 'y' before the call to array2d
I fixed the tests. The original shape of y is now stored and all the checks are to whether or not the length of the shape is 1 or not. If the original y is 1-d, it will have a shape = (n_samples,) and len(shape) = 1. I apologize that it has taken an eternity for me to get around to fixing this; I am working on my doctorate and this code is at best tangentially related to some of the analysis code. It's been low on the to-do list. |
thanks @JohnTheBear |
…nd incorrectly transposed it will be fixed, also the return values will be given in the same form as the training values
@agramfort But I don't think that is what you are asking about. Do you want me to write something to go in the build tests? I don't quite understand what you are asking for. I do know that this code works when y.shape = (n_samples, 15) because that is what I have been using it for. |
have a look at the test file : test_gaussian_process.py in gaussian_process/tests folder. that is the file that should be edited to test the new features. |
I need to use this feature, and I would like to know if it will be merged soon? |
I'm not dead, and I haven't forgotten about this. My linux box died, I'm in grad school, and this hasn't been super high on my to do list. Sorry all. |
I think this is good now |
Could you add a '[MRG]' in the title of your PR, so that reviewers know |
@@ -271,11 +271,19 @@ def fit(self, X, y): | |||
|
|||
# Force data to 2D numpy.array | |||
X = array2d(X) | |||
y = np.asarray(y).ravel()[:, np.newaxis] | |||
self.y_shape = np.array(y).shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
every attribute that is data dependent should end with _
if you need keep the number of targets at attribute I'd recommend to name it self.n_targets_ or self.n_outputs_
We discussed this and I think we rather not want to merge this in a hurry. We actually wanted to do a feature freeze yesterday. It will probably be merged soon after the release. Sorry. |
That is just fine. Delayed is better than forgotten. It looks like there are new comments for me to address as well. |
Certainly not forgotten! |
Is any chance this is going to be merged any time soon? I'm thinking about adding some extra stuff related to GPs and would like to base it on the multi-input/output version. |
I misunderstood about the trailing underscores. I thought you meant data dependent variables. I also realize now that there were a few other pep8 problems. I have addressed them. Thanks for the feedback, I do most of my coding in a vacuum, so I rarely get helpful criticism. |
""" | ||
|
||
# Check input shapes | ||
X = array2d(X) | ||
n_eval, n_features_X = X.shape | ||
n_eval, n_featuresX = X.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too quick find / replace :)
besides looks good to me. Could it be illustrated in an example? |
I have used this code myself for multi-dimensional interpolation. But... I could probably put together some sort of multidimensional interpolation example: 2D -> 2D (or something similar). It just gets very hard to visualize pretty quickly. |
An array with shape (n_eval, ) with the Best Linear Unbiased | ||
y : array_like, shape (n_samples, ) or (n_samples, n_targets) | ||
An array with shape (n_eval, ) if the Gaussian Process was trained | ||
on an array of shape (n_samples, ) or and array with shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and array -> an array
fix the typo then +1 for merge on my side. For example I wanted more an illustration but forget it. |
I was trying to think of a good example, and while fiddling with the old example I realized that the MSE calculation had gotten broken. I've fixed it, but now I'm a little perplexed how it passed the build test while clearly not giving reasonable MSE values... |
I've got it figured out. The build tests only check that the MSE is small at the training points, it never tests that it is sane away from the training points. I'll add something to the build tests to check that the values are sane away from the training points as well. |
I added another test to the 1d case. it isn't the most rigorous, but it would have caught the error I introduced. Unfortunately, the MSE values will depend on things like the kernel, so you can't make blanket statements about what is necessarily "reasonable", but now if someone screws up like what I did, it should catch it. I don't know what the system is for adding examples, but I have something in mind for 2d->2d interpolation. I'll see if I can get it working. |
|
||
# If the y vales are given to fit() as an array, but transposed wrong | ||
if y.shape == (1, X.shape[0]): | ||
y = y.T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we should be supporting this. We should be raising a meaningful error rather than silently transposing an input.
The meaningful array can be raised by using sklearn.utils.validation.check_arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bit inelegant I agree but this case happens due to the array2d when y.ndim == 1 as input. The code next now only works with 2d y. Makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bit inelegant I agree but this case happens due to the array2d when y.ndim == 1 as input.
OK, but then we need to test and store the shape of y before the call to
array2d.
Appart from the transposition issues, this seems pretty much good to go to me. |
The Gaussian Process will not accept matrices which are transposed wrong, and I think I have removed anything that hints at the possibility that it might. There were a few if: ... else:... checks where the "else" would never happen, so they have been pruned. There may be a few ".T"s still in there, but that arrises from when a list is turned into a 2d array and it has the wrong dimensions. All checks for that sort of behavior should check for len(self.y_shape_) == 1, where self.y_shape_ is stored before y forced to be an array. |
I've rebased on master for a cleaner history and tried to simplify input checking. See: if you're happy I'll update the whats_new page and ask for a merge. |
feel free to close this one if you want. |
Awesome! Thank you all for your patience! |
I have fixed the Gaussian Process submodule so that Gaussian Processes can be trained on data which is vector like in y. Now when training a GP each point in both X and Y can be of arbitrary dimension. Previously, the values of Y were required to be scalars, which is a requirement which is not necessary for GPs.
The changes are fully backwards compatible, and the old tests work fine.
I have not written new tests to test the new functionality, but I can if it is necessary.