Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(wish)list of probabilistic regressors to implement or to interface #32

Closed
2 of 34 tasks
fkiraly opened this issue Oct 31, 2019 · 9 comments
Closed
2 of 34 tasks

Comments

@fkiraly
Copy link
Contributor

fkiraly commented Oct 31, 2019

A wishlist for probabilistic regression methods to implement or interface.
Number of stars at the end is estimated difficulty or time investment.

GLM

  • generalized linear model(s) with regression link, e.g., Gaussian *
  • generalized linear model(s) with count link, e.g., Poisson *
  • heteroscedastic linear regression ***
  • Bayesian GLM where conjugate priors are available, e.g., GLM with Gaussian link ***

KRR aka Gaussian process regression

  • vanilla kernel ridge regression with fixed kernel parameters and variance *
  • kernel ridge regression with MLE for kernel parameters and regularization parameter **
  • heteroscedastic KRR or Gaussian processes ***

CDE

  • variants of conditional density estimation (Nadaraya-Watson type) **
  • reduction to density estimation by binning of input variables, then apply unconditional density estimation **

Tree-based

  • probabilistic regression trees **

Neural networks

  • interface tensorflow probability ***

Bayesian toolboxes

  • generic Stan interface ****
  • generic JAGS interface ****
  • generic BUGS interface ****
  • generic Bayesian interface - prior-valued hyperparameters *****

Pipeline elements for target transformation

  • distr fixed target transformation **
  • distr predictive target calibration **

Composite techniques, reduction to deterministic regression

  • stick mean, sd, from a deterministic regressor which already has these as return types into some location/scale distr family (Gaussian, Laplace) *
  • use model 1 for the mean, model 2 fit to residuals (squared, absolute, or log), put this in some location/scale distr family (Gaussian, Laplace) **
  • upper/lower thresholder for a regression prediction, to use as a pipeline element for a forced lower variance bound **
  • generic parameter prediction by elicitation, output being plugged into parameters of a distr object not necessarily scale/location ****
  • reduction via bootstrapped sampling of a determinstic regressor **

Ensembling type pipeline elements and compositors

  • simple bagging, averaging of pdf/cdf **
  • probabilistic boosting ***
  • probabilistic stacking ***

baselines

  • always predict a Gaussian with mean = training mean, var = training var *
  • IMPORTANT as featureless baseline: reduction to distr/density estimation to produce an unconditional probabilistic regressor **
  • IMPORTANT as deterministic style baseline: reduction to deterministic regression, mean = prediction by det.regressor, var = training sample var, distr type = Gaussian (or Laplace) **

Other reduction from/to probabilistic regression

  • reducing deterministic regression to probabilistic regression - take mean, median or mode **
  • reduction(s) to quantile regression, use predictive quantiles to make a distr ***
  • reducing deterministic (quantile) regression to probabilistic regression - take quantile(s) **
  • reducing interval regression to probabilistic regression - take mean/sd, or take quantile(s) **
  • reduction to survival, as the sub-case of no censoring **
  • reduction to classification, by binning ***
@RaphaelS1
Copy link
Collaborator

To note:

Anyone can implement these however the high-level interface for composition/reduction needs to be discussed as well as interfacing Bayesian toolboxes

@fkiraly
Copy link
Contributor Author

fkiraly commented Oct 31, 2019

yes, indeed:

  • priors ought to be hyper-parameters, and we haven't agreed on a representation, especially in the context of the sets6 discussion
  • composition/reduction should be compatible with mlr3pipelines, though we agreed today with @mllg that the best way to integrate these is to see components as incoming arrows, and have compositors as special pipeline network nodes

@RaphaelS1
Copy link
Collaborator

Oh and can I suggest adding some baselines? e.g. Gaussian with mean = sample mean, variance = sample var?

@fkiraly
Copy link
Contributor Author

fkiraly commented Oct 31, 2019

Regarding Bayes, perhaps it's premature to look at this at all, without thinking carefully about a Bayesian mlr interface - since the issue with priors is potentially also of relevance in Bayesian classifiers, or Bayesian [any method].

@fkiraly
Copy link
Contributor Author

fkiraly commented Oct 31, 2019

and, obviously, any suggestions for the wishlist are welcome too

@fkiraly
Copy link
Contributor Author

fkiraly commented Nov 2, 2019

Oh and can I suggest adding some baselines? e.g. Gaussian with mean = sample mean, variance = sample var?

That's a special case of two methods already there:

  • the model1/model2 residual fitter, where both components are the featureless regressor
  • the reduction-to-distr-estimation featureless baseline, where the density estimator is just Gaussian MLE

Though I agree it probably should be a "special" baseline with its own name, perhaps "the" baseline.

I made a special "baseline" section.

@fkiraly
Copy link
Contributor Author

fkiraly commented Dec 16, 2019

In line with "one feature, one issue" principle (which @RaphaelS1 mentioned in communication elsewhere) - should this be split in individual issues, and the list moved to wiki?
Issues can be collected in projects.

@RaphaelS1
Copy link
Collaborator

If we split this into "one feature, one issue", now it will bloat the issue tracker. Let's split it once we actually finish the design and start implementing learners

@fkiraly
Copy link
Contributor Author

fkiraly commented Dec 16, 2019

ok, let me know when. Just trying to comply with local best practice conventions.

@RaphaelS1 RaphaelS1 closed this as not planned Won't fix, can't repro, duplicate, stale Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants