Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Allowing Gaussian process kernels on structured data - updated #15557

Merged

Conversation

@yhtang
Copy link
Contributor

yhtang commented Nov 7, 2019

Fixes #13936.

This PR is a from-scratch refresh of #13954 (with all previous modifications incorporated) based on the latest master branch.

Made the Gaussian process regressor and classifier compatible with generic kernels on arbitrary data. Specifically:

  1. Added a requires_vector_input property to all the built-in GP kernels in gaussian_process/kernels.py.
  2. Added a GenericKernelMixin base classes to overload the requires_vector_input property for kernels that can work on structured and/or generic data.
  3. Let the GP regressor and classifier use different check_array and check_X_y parameters for vector/generic input. I.e., do not force the X and Y array to be at least 2D and numeric if the kernel can operate on generic input. Updated docstrings accordingly.
  4. Provided a minimal example of GP regression and classification using a sequence kernel on variable-length strings in plot_gpr_on_structured_data.py.
yhtang added 12 commits Nov 7, 2019
…een kernels that operate on explicit feature vectors or structured objects.
…base `Kernel` class.

Removed `VectorKernelMixin` which is redundant in presence of a default `requires_vector_input()` method.
@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Nov 7, 2019

Why do you regard this as WIP? What work remains other than review?

@yhtang

This comment has been minimized.

Copy link
Contributor Author

yhtang commented Nov 7, 2019

Why do you regard this as WIP? What work remains other than review?

I have one more commit that updates the changelog that has not yet been pushed : ) Should be there in a few minutes.

Copy link
Member

jnothman left a comment

this is looking good to me

yhtang and others added 6 commits Nov 7, 2019
Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>
@yhtang yhtang changed the title [WIP] Allowing Gaussian process kernels on structured data - updated [MRG] Allowing Gaussian process kernels on structured data - updated Nov 7, 2019
@yhtang

This comment has been minimized.

Copy link
Contributor Author

yhtang commented Nov 7, 2019

@jnothman How could I restart a failed pipeline caused by network error?

yhtang added 2 commits Nov 7, 2019
@yhtang

This comment has been minimized.

Copy link
Contributor Author

yhtang commented Nov 7, 2019

@jnothman Could you please let me know the next steps to do?

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Nov 7, 2019

Await another reviewer... unfortunately, this may miss the cutoff for 0.22.

@yhtang

This comment has been minimized.

Copy link
Contributor Author

yhtang commented Nov 8, 2019

Await another reviewer... unfortunately, this may miss the cutoff for 0.22.

I see... Do I need to update the PR name to MRG + 1 etc.?
Thanks very much anyway!

@TomDLT
TomDLT approved these changes Nov 15, 2019
Copy link
Member

TomDLT left a comment

Only nitpicks, thanks for this nice contribution !

sklearn/base.py Outdated Show resolved Hide resolved
sklearn/gaussian_process/_gpc.py Outdated Show resolved Hide resolved
sklearn/gaussian_process/_gpr.py Outdated Show resolved Hide resolved
sklearn/gaussian_process/kernels.py Outdated Show resolved Hide resolved
sklearn/gaussian_process/tests/test_kernels.py Outdated Show resolved Hide resolved
sklearn/gaussian_process/tests/test_kernels.py Outdated Show resolved Hide resolved
yhtang and others added 4 commits Nov 15, 2019
Co-Authored-By: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>
Co-Authored-By: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>
@yhtang

This comment has been minimized.

Copy link
Contributor Author

yhtang commented Nov 15, 2019

Only nitpicks, thanks for this nice contribution !

Thanks for reviewing!

Co-Authored-By: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>
@yhtang yhtang changed the title [MRG] Allowing Gaussian process kernels on structured data - updated [MRG+2] Allowing Gaussian process kernels on structured data - updated Nov 16, 2019
@TomDLT TomDLT merged commit 8360786 into scikit-learn:master Nov 16, 2019
21 checks passed
21 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python 4 new alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc artifact Link to 0/doc/_changed.html
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 99.01% of diff hit (target 97.16%)
Details
codecov/project Absolute coverage decreased by -3.07% but relative coverage increased by +1.85% compared to c96e095
Details
scikit-learn.scikit-learn Build #20191115.46 succeeded
Details
scikit-learn.scikit-learn (Linting) Linting succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_pip_openblas_pandas) Linux pylatest_pip_openblas_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Linux_Runs pylatest_conda_mkl) Linux_Runs pylatest_conda_mkl succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl_no_openmp) macOS pylatest_conda_mkl_no_openmp succeeded
Details
@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Nov 17, 2019

@adrinjalali are you keeping track of things that will either need their change log entries moved to 0.23 or will need to be picked into 0.22 final? (You don't necessarily need to keep track, just to be aware when selecting commits for release)

@NicolasHug

This comment has been minimized.

Copy link
Contributor

NicolasHug commented Jan 14, 2020

@jnothman @TomDLT , this PR allows passing sequences of different lengths to the GP estimators.

The HMM module was deprecated because supporting sequences was considered out of scope for scikit-learn (API compatibility issues, etc).

Can you explain why you think it's worth supporting this for GPs?

@yhtang

This comment has been minimized.

Copy link
Contributor Author

yhtang commented Jan 15, 2020

@NicolasHug Hi Nicolas, the primary motivation that I brought up the PR was due to a need to perform Gaussian process regressions on an ensemble of graphs. While the PR intends to bridge scikit-learn's GPR module to data including and beyond variable-length sequences, the actual changes involves no more than allowing non-vectorial data to be passed through the GP regressors/classifiers --- without being touched at all.

Thanks to the kernel trick, as discussed in the original issue, the logic of computing the kernel matrix from samples is delegated to a kernel, which will be user-supplied for sequence and generic data. As such, I would argue that this introduces a very minimal amount of burden to the development and maintenance of the GP module and does not disrupt the API and/or future development, while at the same time greatly extends the applicability of the module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.