[MRG] Gaussian process for arbitrary-dimensional output spaces #1611

JohnFNovak · 2013-01-23T01:09:23Z

I have fixed the Gaussian Process submodule so that Gaussian Processes can be trained on data which is vector like in y. Now when training a GP each point in both X and Y can be of arbitrary dimension. Previously, the values of Y were required to be scalars, which is a requirement which is not necessary for GPs.

The changes are fully backwards compatible, and the old tests work fine. ~~I have not written new tests to test the new functionality, but I can if it is necessary.~~

…y 1)

…sional, and an (n,m) array if the output is y dimensional. This makes it backwards compatible

GaelVaroquaux · 2013-01-23T06:49:49Z

That's a useful contribution! Thanks.

New tests are indeed useful. If multi-output tests were not around for the other models that support multi-output, I would have broken this support more than once when making changes.

GaelVaroquaux · 2013-01-23T06:52:41Z

sklearn/gaussian_process/gaussian_process.py

-        y = np.asarray(y).ravel()[:, np.newaxis]
+        y = array2d(y)
+
+        if y.shape[1] == X.shape[0] and y.shape[0] == 1:


That test seems a bit too general. I think that you are only trying to capture the situation when the original 'y' vector was 1D. You can do that by storing the number of dimension of 'y' before the call to array2d

JohnFNovak · 2013-03-31T23:44:37Z

I fixed the tests. The original shape of y is now stored and all the checks are to whether or not the length of the shape is 1 or not. If the original y is 1-d, it will have a shape = (n_samples,) and len(shape) = 1.

I apologize that it has taken an eternity for me to get around to fixing this; I am working on my doctorate and this code is at best tangentially related to some of the analysis code. It's been low on the to-do list.

agramfort · 2013-04-01T07:38:22Z

thanks @JohnTheBear
can you add a test with y of shape (n_samples, 1) and check that it gives the same as if (n_samples, )
and add a test when y.shape = (n_samples, 2) ?

…nd incorrectly transposed it will be fixed, also the return values will be given in the same form as the training values

JohnFNovak · 2013-04-02T01:32:51Z

@agramfort
I didn't think of the case where the training y was 1-d but given as a 2d array and transposed incorrectly, that would have slipped past my test and failed a few seconds later. I have addressed that, and changed the test when the values are returned so that the return vales for y and MSE are given in the same form as the training y. Although it may be safest in general that if the input values are given in a non-sensical way the module fails instead of pressing on. The only reason I was including shape tests was 1) so the return values are given in the same form as the training values, 2) it works out that array2d(np.array([list of length n_samples])) returns an array of shape == (1, n_samples), which will fail.

But I don't think that is what you are asking about. Do you want me to write something to go in the build tests? I don't quite understand what you are asking for.

I do know that this code works when y.shape = (n_samples, 15) because that is what I have been using it for.

agramfort · 2013-04-02T05:54:43Z

have a look at the test file : test_gaussian_process.py in gaussian_process/tests folder.

that is the file that should be edited to test the new features.

agramfort · 2013-04-02T05:55:22Z

see : http://scikit-learn.org/dev/developers/#testing-and-improving-test-coverage

andyli · 2013-04-18T15:12:30Z

I need to use this feature, and I would like to know if it will be merged soon?
Is that only unit test is missing?

JohnFNovak · 2013-07-16T15:53:51Z

I'm not dead, and I haven't forgotten about this. My linux box died, I'm in grad school, and this hasn't been super high on my to do list. Sorry all.

JohnFNovak · 2013-07-18T15:20:04Z

I think this is good now

GaelVaroquaux · 2013-07-19T10:15:45Z

I think this is good now

Could you add a '[MRG]' in the title of your PR, so that reviewers know
that they should prioritize it.

agramfort · 2013-07-27T07:10:05Z

sklearn/gaussian_process/gaussian_process.py

@@ -271,11 +271,19 @@ def fit(self, X, y):

        # Force data to 2D numpy.array
        X = array2d(X)
-        y = np.asarray(y).ravel()[:, np.newaxis]
+        self.y_shape = np.array(y).shape


every attribute that is data dependent should end with _

if you need keep the number of targets at attribute I'd recommend to name it self.n_targets_ or self.n_outputs_

amueller · 2013-07-27T10:37:08Z

We discussed this and I think we rather not want to merge this in a hurry. We actually wanted to do a feature freeze yesterday. It will probably be merged soon after the release. Sorry.

JohnFNovak · 2013-07-27T13:17:04Z

That is just fine. Delayed is better than forgotten. It looks like there are new comments for me to address as well.

GaelVaroquaux · 2013-07-27T14:17:21Z

Delayed is better than forgotten.

Certainly not forgotten!

colincsl · 2013-08-27T20:39:10Z

Is any chance this is going to be merged any time soon? I'm thinking about adding some extra stuff related to GPs and would like to base it on the multi-input/output version.

JohnFNovak · 2013-09-03T11:34:08Z

I misunderstood about the trailing underscores. I thought you meant data dependent variables. I also realize now that there were a few other pep8 problems. I have addressed them. Thanks for the feedback, I do most of my coding in a vacuum, so I rarely get helpful criticism.

agramfort · 2013-09-03T14:20:45Z

sklearn/gaussian_process/gaussian_process.py

        """

        # Check input shapes
        X = array2d(X)
-        n_eval, n_features_X = X.shape
+        n_eval, n_featuresX = X.shape


too quick find / replace :)

agramfort · 2013-09-03T14:22:54Z

besides looks good to me. Could it be illustrated in an example?

JohnFNovak · 2013-09-04T01:06:41Z

I have used this code myself for multi-dimensional interpolation. In my research we have a relativistic blast-wave hydrodynamic heavy-ion nuclear physics model, but it is very slow. So, we ran it at a bunch of points in parameter space, then used the GP as a proxy/pseudo model while optimizing against experimental data because the GP is fast. [Edit: Sorry, that was jargon heavy. We have a model and it's slow. So, we interpolate model points with the GP because the GP is fast] Paper: http://arxiv.org/pdf/1303.5769v2.pdf (GP stuff kicks in around page 9-10). I doubt that this was what you had in mind when you asked for an example....

But... I could probably put together some sort of multidimensional interpolation example: 2D -> 2D (or something similar). It just gets very hard to visualize pretty quickly.

agramfort · 2013-09-04T08:16:22Z

sklearn/gaussian_process/gaussian_process.py

-            An array with shape (n_eval, ) with the Best Linear Unbiased
+        y : array_like, shape (n_samples, ) or (n_samples, n_targets)
+            An array with shape (n_eval, ) if the Gaussian Process was trained
+            on an array of shape (n_samples, ) or and array with shape


and array -> an array

agramfort · 2013-09-04T08:18:41Z

fix the typo then +1 for merge on my side. For example I wanted more an illustration but forget it.

JohnFNovak · 2013-09-05T00:40:28Z

I was trying to think of a good example, and while fiddling with the old example I realized that the MSE calculation had gotten broken. I've fixed it, but now I'm a little perplexed how it passed the build test while clearly not giving reasonable MSE values...

JohnFNovak · 2013-09-05T00:49:22Z

I've got it figured out. The build tests only check that the MSE is small at the training points, it never tests that it is sane away from the training points. I'll add something to the build tests to check that the values are sane away from the training points as well.

JohnFNovak · 2013-09-05T01:12:19Z

I added another test to the 1d case. it isn't the most rigorous, but it would have caught the error I introduced. Unfortunately, the MSE values will depend on things like the kernel, so you can't make blanket statements about what is necessarily "reasonable", but now if someone screws up like what I did, it should catch it.

I don't know what the system is for adding examples, but I have something in mind for 2d->2d interpolation. I'll see if I can get it working.

GaelVaroquaux · 2013-09-05T10:35:58Z

sklearn/gaussian_process/gaussian_process.py

+
+        # If the y vales are given to fit() as an array, but transposed wrong
+        if y.shape == (1, X.shape[0]):
+            y = y.T


I don't think that we should be supporting this. We should be raising a meaningful error rather than silently transposing an input.

The meaningful array can be raised by using sklearn.utils.validation.check_arrays.

it's a bit inelegant I agree but this case happens due to the array2d when y.ndim == 1 as input. The code next now only works with 2d y. Makes sense?

it's a bit inelegant I agree but this case happens due to the array2d when y.ndim == 1 as input.

OK, but then we need to test and store the shape of y before the call to
array2d.

GaelVaroquaux · 2013-09-05T13:10:50Z

Appart from the transposition issues, this seems pretty much good to go to me.

JohnFNovak · 2013-09-26T23:19:15Z

The Gaussian Process will not accept matrices which are transposed wrong, and I think I have removed anything that hints at the possibility that it might. There were a few if: ... else:... checks where the "else" would never happen, so they have been pruned. There may be a few ".T"s still in there, but that arrises from when a list is turned into a 2d array and it has the wrong dimensions. All checks for that sort of behavior should check for len(self.y_shape_) == 1, where self.y_shape_ is stored before y forced to be an array.

agramfort · 2013-09-28T16:49:41Z

I've rebased on master for a cleaner history and tried to simplify input checking. See:

#2482

if you're happy I'll update the whats_new page and ask for a merge.

agramfort · 2013-09-28T16:50:01Z

feel free to close this one if you want.

JohnFNovak · 2013-09-29T23:31:49Z

Awesome! Thank you all for your patience!

JohnFNovak added 4 commits January 7, 2013 20:50

Allowed n by m matrices in Gaussian Processes (as opposed to only n b…

c62ae87

…y 1)

Gaussian Process will return an (n,) array is the output is one dimen…

c13fc94

…sional, and an (n,m) array if the output is y dimensional. This makes it backwards compatible

Merge remote-tracking branch 'upstream/master' into GaussianProcessWork

16f3a20

adding documentation regarding the changes to the Gaussian Processes

d0db5d5

GaelVaroquaux reviewed Jan 23, 2013
View reviewed changes

fixed the gaussian process test to determine training y shape

be2984a

Added test of y_training shape so that if it is given as a 2d array a…

0a1e6f7

…nd incorrectly transposed it will be fixed, also the return values will be given in the same form as the training values

created test for 2D output space

6cbc43a

Fixed the MSE calculation for multi-D interpolation

161c45f

agramfort reviewed Jul 27, 2013
View reviewed changes

Edited to conform with pep8

99b705f

agramfort reviewed Sep 3, 2013
View reviewed changes

I was a little overzealous with my search/replaces

69f739d

agramfort reviewed Sep 4, 2013
View reviewed changes

JohnFNovak added 2 commits September 4, 2013 20:32

found an error in the MSE calculation

e287fb6

spelling

171f3ea

JohnFNovak added 2 commits September 4, 2013 21:07

Added test of MSE away from training points

4552dbb

made it pep8 comliant

a3cf204

GaelVaroquaux reviewed Sep 5, 2013
View reviewed changes

JohnFNovak added 3 commits September 24, 2013 22:07

Cleaned up some of the shape references

195b4de

removed a redundant conditional

5937ac8

made shape of u more explicit

22087f9

agramfort mentioned this pull request Sep 28, 2013

Gaussian process for arbitrary-dimensional output spaces #2482

Closed

JohnFNovak closed this Sep 29, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Gaussian process for arbitrary-dimensional output spaces #1611

[MRG] Gaussian process for arbitrary-dimensional output spaces #1611

JohnFNovak commented Jan 23, 2013

GaelVaroquaux commented Jan 23, 2013

GaelVaroquaux Jan 23, 2013

JohnFNovak commented Mar 31, 2013

agramfort commented Apr 1, 2013

JohnFNovak commented Apr 2, 2013

agramfort commented Apr 2, 2013

agramfort commented Apr 2, 2013

andyli commented Apr 18, 2013

JohnFNovak commented Jul 16, 2013

JohnFNovak commented Jul 18, 2013

GaelVaroquaux commented Jul 19, 2013

agramfort Jul 27, 2013

amueller commented Jul 27, 2013

JohnFNovak commented Jul 27, 2013

GaelVaroquaux commented Jul 27, 2013

colincsl commented Aug 27, 2013

JohnFNovak commented Sep 3, 2013

agramfort Sep 3, 2013

agramfort commented Sep 3, 2013

JohnFNovak commented Sep 4, 2013

agramfort Sep 4, 2013

agramfort commented Sep 4, 2013

JohnFNovak commented Sep 5, 2013

JohnFNovak commented Sep 5, 2013

JohnFNovak commented Sep 5, 2013

GaelVaroquaux Sep 5, 2013

agramfort Sep 5, 2013

GaelVaroquaux Sep 5, 2013

GaelVaroquaux commented Sep 5, 2013

JohnFNovak commented Sep 26, 2013

agramfort commented Sep 28, 2013

agramfort commented Sep 28, 2013

JohnFNovak commented Sep 29, 2013

[MRG] Gaussian process for arbitrary-dimensional output spaces #1611

[MRG] Gaussian process for arbitrary-dimensional output spaces #1611

Conversation

JohnFNovak commented Jan 23, 2013

GaelVaroquaux commented Jan 23, 2013

GaelVaroquaux Jan 23, 2013

Choose a reason for hiding this comment

JohnFNovak commented Mar 31, 2013

agramfort commented Apr 1, 2013

JohnFNovak commented Apr 2, 2013

agramfort commented Apr 2, 2013

agramfort commented Apr 2, 2013

andyli commented Apr 18, 2013

JohnFNovak commented Jul 16, 2013

JohnFNovak commented Jul 18, 2013

GaelVaroquaux commented Jul 19, 2013

agramfort Jul 27, 2013

Choose a reason for hiding this comment

amueller commented Jul 27, 2013

JohnFNovak commented Jul 27, 2013

GaelVaroquaux commented Jul 27, 2013

colincsl commented Aug 27, 2013

JohnFNovak commented Sep 3, 2013

agramfort Sep 3, 2013

Choose a reason for hiding this comment

agramfort commented Sep 3, 2013

JohnFNovak commented Sep 4, 2013

agramfort Sep 4, 2013

Choose a reason for hiding this comment

agramfort commented Sep 4, 2013

JohnFNovak commented Sep 5, 2013

JohnFNovak commented Sep 5, 2013

JohnFNovak commented Sep 5, 2013

GaelVaroquaux Sep 5, 2013

Choose a reason for hiding this comment

agramfort Sep 5, 2013

Choose a reason for hiding this comment

GaelVaroquaux Sep 5, 2013

Choose a reason for hiding this comment

GaelVaroquaux commented Sep 5, 2013

JohnFNovak commented Sep 26, 2013

agramfort commented Sep 28, 2013

agramfort commented Sep 28, 2013

JohnFNovak commented Sep 29, 2013