Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] GMMRegressor and BayesianGMMRegressor #454

Closed
mralbu opened this issue Apr 4, 2021 · 2 comments
Closed

[FEATURE] GMMRegressor and BayesianGMMRegressor #454

mralbu opened this issue Apr 4, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@mralbu
Copy link

mralbu commented Apr 4, 2021

Hi!

Would you consider adding GaussianMixture regressors GMMRegressor and BayesianGMMRegressor? GMM regressors have interesting applications as multi-output probabilistic regression models. I've used different configurations of them using the flexmix and condMVNorm R packages.

In python, the gmr package implements GMM regression and several interesting conditioning and sampling methods, though it is not yet fully compatible with other scikit-learn tools. I've recently submitted a pull request to address this.

I've developed a prototype for a more tightly connected scikit-learn mixin for scikit-lego. An example of it's proposed usage is seem bellow:

import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import cross_validate

from sklego.mixture import GMMRegressor, BayesianGMMRegressor

X, y = load_boston(return_X_y=True)

np.random.seed(42)

cross_validate(GMMRegressor(n_components=3), X, y)
>> {'fit_time': array([0.03378463, 0.01123667, 0.00915265, 0.01136422, 0.01201081]),
>>  'score_time': array([0.00116134, 0.00073004, 0.00071645, 0.00070834, 0.00071263]),
>>  'test_score': array([ 0.78339552,  0.75591155,  0.87107641,  0.33350924, -4.41958711])}

cross_validate(BayesianGMMRegressor(n_components=3), X, y)
>> {'fit_time': array([0.01575422, 0.02638698, 0.01391101, 0.02322984, 0.02250314]),
>>  'score_time': array([0.00081778, 0.00081086, 0.00084043, 0.00081563, 0.00081253]),
>>  'test_score': array([ 0.8126109 ,  0.7476735 ,  0.84270689,  0.3453768 , -2.27448196])}

Do you think it would be a good addition?

@mralbu mralbu added the enhancement New feature or request label Apr 4, 2021
@koaning
Copy link
Owner

koaning commented Apr 4, 2021

If your tools will already be added to the GMR package, wouldn't it be overkill to also add here? I think there's certainly utility in your method but it seems preferable to only host the tool in one package.

@mralbu
Copy link
Author

mralbu commented Apr 4, 2021

I guess it might. The gmr package uses an internal numerical method for fitting GMMs. That's why I thought it might be interesting to have a less specialized but more similar to sklearn implementation, using the same parameters/options and making it easy to incorporate other sklearn methods, such as BayesianGMMs.

@mralbu mralbu closed this as completed Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants