Refactor tests for sample weights #11316

jnothman · 2018-06-19T00:36:36Z

In various parts of the code, we have tests for sample_weight support, including in metrics, and for individual estimators. we have some common estimator checks for class_weight, but not really for sample_weight functionality (only for weight type invariance).

Recent implementations of sample_weight include #10933 (KMeans) and #10803 (density estimation). But as well as estimators we have things like common tests for evaluation metrics.

Invariance testing for sample weights should include:

sample_weight=np.ones(len(X)) makes the same model as sample_weight=None
sample_weight=random can make a different model to sample_weight=None
sample_weight=s for integer array s makes the same model as X=np.repeat(X, s, axis=0), y=np.repeat(y, s, axis=0) (although there may be exceptions to this depending on how the estimator defines iteration, convergence, etc., as in Test test_weighted_vs_repeated is somehow flaky #11236)
sample_weight=s * k for array s and positive constant k makes the same model as sample_weight=s

I wonder if it is possible to establish a generic test for this, e.g. something like:

def check_sample_weight_invariance(data_args, fit, is_equal):
    """
    Parameters
    ----------
    data_args : dict
        Keyword arguments to pass to fit, and which would need to be repeated
        to test equivalence to integer sample weights.
    fit : callable
        Passed data args, returns a model that can be compared with is_equal
    is_equal : callable
        Passed two models returned from fit, returns a bool to indicate equality
        between models
    """

The text was updated successfully, but these errors were encountered:

sergulaydore · 2018-07-14T14:46:23Z

I want to clarify I understand this correct. Those 4 tests you mentioned are already in check_sample_weight_invariance. But you are suggesting to change the input parameters to the method, right? Basically, instead of having metric, y1 and y2; we will feed fit and is_equal, right?

jnothman · 2018-07-16T08:26:58Z

What I am suggesting is that we should have the same testing code used for weighted metrics as for weighted model fitting, insofar as this is possible.

Higgs32584 · 2024-01-18T18:24:58Z

Is this resolved?

jnothman added Moderate Anything that requires some knowledge of conventions and best practices help wanted labels Jun 19, 2018

glemaitre added the Sprint label Jul 6, 2018

sergulaydore mentioned this issue Jul 16, 2018

[MRG+2] Add a test for sample weights for estimators #11558

Merged

jbschiratti mentioned this issue Jul 17, 2018

Ensure that the shape of sample_weight is checked in all the functions #9926

Closed

rth mentioned this issue Feb 5, 2020

RFC Sample weight invariance properties #15657

Open

rth mentioned this issue Feb 20, 2020

Common check for sample weight invariance with removed samples #16507

Merged

lorentzenchr mentioned this issue May 10, 2020

TST Add tests for LinearRegression that sample weights act consistently #15554

Merged

cmarmo removed the Sprint label Apr 1, 2021

cmarmo added the module:test-suite everything related to our tests label Jan 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor tests for sample weights #11316

Refactor tests for sample weights #11316

jnothman commented Jun 19, 2018 •

edited

sergulaydore commented Jul 14, 2018

jnothman commented Jul 16, 2018

Higgs32584 commented Jan 18, 2024

Refactor tests for sample weights #11316

Refactor tests for sample weights #11316

Comments

jnothman commented Jun 19, 2018 • edited

sergulaydore commented Jul 14, 2018

jnothman commented Jul 16, 2018

Higgs32584 commented Jan 18, 2024

jnothman commented Jun 19, 2018 •

edited