## scikit-learn sample_weight compliance report

This notebook runs compliance tests on all scikit-learn estimators. Estimator as inspected to check whether they are expected to have a stochastic fit or not. If the fit is stochastic, a dedicated statistical test is performed, otherwise a deterministic estimator check is run instead.

In [1]:
import sklearn

sklearn.show_versions()


System:
    python: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:13:44) [Clang 16.0.6 ]
executable: /Users/shrutinath/micromamba/envs/scikit-learn/bin/python
   machine: macOS-14.3-arm64-arm-64bit

Python dependencies:
      sklearn: 1.7.dev0
          pip: 24.0
   setuptools: 75.8.0
        numpy: 2.0.0
        scipy: 1.14.0
       Cython: 3.0.10
       pandas: 2.2.2
   matplotlib: 3.9.0
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/shrutinath/micromamba/envs/scikit-learn/lib/libopenblas.0.dylib
        version: 0.3.27
threading_layer: openmp
   architecture: VORTEX

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libomp
       filepath: /Users/shrutinath/micromamba/envs/scikit-learn/lib/libomp.dylib
        version: None


In [3]:
from inspect import signature
import traceback
import warnings
import pandas as pd
from sklearn.utils import all_estimators
from sklearn.exceptions import ConvergenceWarning
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import KBinsDiscretizer
from sklearn.linear_model import Ridge
import threadpoolctl

from sample_weight_audit import check_weighted_repeated_estimator_fit_equivalence
from sample_weight_audit.sklearn_stochastic_params import STOCHASTIC_FIT_PARAMS

# HistGradientBoostingClassifier trashes the OpenMP thread pool on repeated
# small fits.
threadpoolctl.threadpool_limits(limits=1, user_api="openmp")
warnings.filterwarnings("ignore", category=RuntimeWarning)  # division by zero in AdaBoost
warnings.filterwarnings("ignore", category=ConvergenceWarning)  # liblinear can fail to converge
warnings.filterwarnings("ignore", category=UserWarning)  # KBinsDiscretizer with collapsed bins

In [4]:
from sklearn.linear_model import LogisticRegressionCV

ESTIMATORS_TO_SKIP = [
    LogisticRegressionCV,  # too slow and already somewhat tested by LogisticRegression
]

In [5]:
N_STOCHASTIC_FITS = 100
N_STATISTICAL_TESTS = 58  # measured by running the script
BONFERRONI_CORRECTION = 1 / N_STATISTICAL_TESTS
TEST_THRESHOLD = 0.05 * BONFERRONI_CORRECTION


statistical_test_results = []
missing_sample_weight_support = []
errors = []


for est_name, est_class in all_estimators(
    type_filter=["classifier", "regressor", "cluster", "transformer"]
):
    if est_class in ESTIMATORS_TO_SKIP:
        print(f"Skipping {est_name}")
        continue

    if "sample_weight" not in signature(est_class.fit).parameters:
        print(f"⚠ {est_name} does not support sample_weight")
        missing_sample_weight_support.append(est_name)
        continue

    try:
        if est_name == "CategoricalNB":
            # This estimator expects ordinal inputs so we need to discretize the input
            # features. This is not really valid but it's the best we can do while
            # keeping the dataset generation process common to all estimators of a
            # given type.
            est = Pipeline(
                steps=[
                    (
                        "kbinsdiscretizer",
                        KBinsDiscretizer(
                            encode="ordinal", quantile_method="averaged_inverted_cdf"
                        ),
                    ),
                    ("est", est_class(**STOCHASTIC_FIT_PARAMS.get(est_class, {}))),
                ]
            )
        else:
            est = est_class(**STOCHASTIC_FIT_PARAMS.get(est_class, {}))
    except TypeError as e:
        print(f"⚠ {est_name} failed to instantiate: {e}")
        continue

    print(f"Evaluating {est}")
    try:
        result = check_weighted_repeated_estimator_fit_equivalence(
            est,
            est_name,
            test_name="kstest",
            n_stochastic_fits=N_STOCHASTIC_FITS,
            random_state=0,
        )
        pass_or_fail = "✅" if result.pvalue > TEST_THRESHOLD else "❌"
        print(f"{pass_or_fail} {est_name}: (pvalue: {result.pvalue:.3f})")
        statistical_test_results.append(result)
         # Add extra test with KMeans pipelines to Ridge
        if est_name in ["MiniBatchKMeans","KMeans","BisectingKMeans"]:
            est = Pipeline(
                steps=[
                    (
                        "est",
                        est,
                    ),
                    ("ridge", Ridge(fit_intercept=False)),
                ]
            )  
            est_name = est_name+"_pipelined"
            result = check_weighted_repeated_estimator_fit_equivalence(
                est,
                est_name,
                test_name="kstest",
                n_stochastic_fits=N_STOCHASTIC_FITS,
                random_state=0,
            )
            pass_or_fail = "✅" if result.pvalue > TEST_THRESHOLD else "❌"
            print(f"{pass_or_fail} {est_name}: (pvalue: {result.pvalue:.3f})")
            statistical_test_results.append(result)
        
    except Exception as e:
        print(f"❌ {est} error with: {e}")
        errors.append((est, e))

results_df = pd.DataFrame([r.to_dict() for r in statistical_test_results])

⚠ ARDRegression does not support sample_weight
Evaluating AdaBoostClassifier(estimator=DecisionTreeClassifier(max_features=0.5,
                                                    min_weight_fraction_leaf=0.1))


100%|██████████| 100/100 [00:05<00:00, 16.90it/s]


✅ AdaBoostClassifier: (pvalue: 0.815)


100%|██████████| 100/100 [00:05<00:00, 17.27it/s]


✅ AdaBoostClassifier_pipelined: (pvalue: 0.815)
Evaluating AdaBoostRegressor(estimator=DecisionTreeRegressor(max_features=0.5,
                                                  min_weight_fraction_leaf=0.1))


100%|██████████| 100/100 [00:04<00:00, 21.06it/s]


✅ AdaBoostRegressor: (pvalue: 0.078)


100%|██████████| 100/100 [00:04<00:00, 20.72it/s]


✅ AdaBoostRegressor_pipelined: (pvalue: 0.078)
⚠ AdditiveChi2Sampler does not support sample_weight
⚠ AffinityPropagation does not support sample_weight
⚠ AgglomerativeClustering does not support sample_weight
Evaluating BaggingClassifier(estimator=LogisticRegression())


100%|██████████| 100/100 [00:03<00:00, 32.10it/s]


❌ BaggingClassifier: (pvalue: 0.000)


100%|██████████| 100/100 [00:03<00:00, 31.44it/s]


❌ BaggingClassifier_pipelined: (pvalue: 0.000)
Evaluating BaggingRegressor(estimator=Ridge())


100%|██████████| 100/100 [00:01<00:00, 83.93it/s]


❌ BaggingRegressor: (pvalue: 0.000)


100%|██████████| 100/100 [00:01<00:00, 83.07it/s]


❌ BaggingRegressor_pipelined: (pvalue: 0.000)
Evaluating BayesianRidge()


100%|██████████| 1/1 [00:00<00:00, 762.18it/s]


✅ BayesianRidge: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 709.34it/s]


✅ BayesianRidge_pipelined: (pvalue: 1.000)
Evaluating BernoulliNB()


100%|██████████| 1/1 [00:00<00:00, 332.22it/s]


✅ BernoulliNB: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 341.19it/s]


✅ BernoulliNB_pipelined: (pvalue: 1.000)
⚠ BernoulliRBM does not support sample_weight
⚠ Binarizer does not support sample_weight
⚠ Birch does not support sample_weight
Evaluating BisectingKMeans(n_clusters=10)


100%|██████████| 100/100 [00:00<00:00, 180.71it/s]


✅ BisectingKMeans: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 166.20it/s]


✅ BisectingKMeans_pipelined: (pvalue: 1.000)
⚠ CCA does not support sample_weight
Evaluating CalibratedClassifierCV()


100%|██████████| 1/1 [00:00<00:00, 61.11it/s]


✅ CalibratedClassifierCV: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 69.95it/s]


✅ CalibratedClassifierCV_pipelined: (pvalue: 1.000)
Evaluating Pipeline(steps=[('kbinsdiscretizer',
                 KBinsDiscretizer(encode='ordinal',
                                  quantile_method='averaged_inverted_cdf')),
                ('est', CategoricalNB())])


100%|██████████| 1/1 [00:00<00:00, 194.55it/s]


✅ CategoricalNB: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 132.54it/s]


✅ CategoricalNB_pipelined: (pvalue: 1.000)
⚠ ClassifierChain does not support sample_weight
⚠ ColumnTransformer does not support sample_weight
Evaluating ComplementNB()


100%|██████████| 1/1 [00:00<00:00, 211.19it/s]


✅ ComplementNB: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 379.09it/s]


✅ ComplementNB_pipelined: (pvalue: 1.000)
Evaluating DBSCAN()


100%|██████████| 1/1 [00:00<00:00, 33.24it/s]


✅ DBSCAN: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00,  8.14it/s]


✅ DBSCAN_pipelined: (pvalue: 1.000)
Evaluating DecisionTreeClassifier(max_features=0.5, min_weight_fraction_leaf=0.1)


100%|██████████| 100/100 [00:00<00:00, 532.56it/s]


✅ DecisionTreeClassifier: (pvalue: 0.583)


100%|██████████| 100/100 [00:00<00:00, 603.18it/s]


✅ DecisionTreeClassifier_pipelined: (pvalue: 0.583)
Evaluating DecisionTreeRegressor(max_features=0.5)


100%|██████████| 100/100 [00:00<00:00, 870.50it/s]


✅ DecisionTreeRegressor: (pvalue: 0.470)


100%|██████████| 100/100 [00:00<00:00, 882.52it/s]


✅ DecisionTreeRegressor_pipelined: (pvalue: 0.470)
⚠ DictVectorizer does not support sample_weight
⚠ DictionaryLearning does not support sample_weight
Evaluating DummyClassifier(strategy='stratified')


100%|██████████| 100/100 [00:00<00:00, 685.34it/s]


✅ DummyClassifier: (pvalue: 0.155)


100%|██████████| 100/100 [00:00<00:00, 831.73it/s]


✅ DummyClassifier_pipelined: (pvalue: 0.155)
Evaluating DummyRegressor()


100%|██████████| 1/1 [00:00<00:00, 1279.14it/s]


✅ DummyRegressor: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 1798.59it/s]


✅ DummyRegressor_pipelined: (pvalue: 1.000)
Evaluating ElasticNet(selection='random')


100%|██████████| 100/100 [00:00<00:00, 936.68it/s]


✅ ElasticNet: (pvalue: 0.111)


100%|██████████| 100/100 [00:00<00:00, 947.87it/s]


✅ ElasticNet_pipelined: (pvalue: 0.111)
Evaluating ElasticNetCV(selection='random')


100%|██████████| 100/100 [00:01<00:00, 60.52it/s]


✅ ElasticNetCV: (pvalue: 0.282)


100%|██████████| 100/100 [00:01<00:00, 62.22it/s]


✅ ElasticNetCV_pipelined: (pvalue: 0.282)
Evaluating ExtraTreeClassifier()


100%|██████████| 100/100 [00:00<00:00, 605.52it/s]


✅ ExtraTreeClassifier: (pvalue: 0.470)


100%|██████████| 100/100 [00:00<00:00, 608.02it/s]


✅ ExtraTreeClassifier_pipelined: (pvalue: 0.470)
Evaluating ExtraTreeRegressor()


100%|██████████| 100/100 [00:00<00:00, 883.67it/s]


✅ ExtraTreeRegressor: (pvalue: 0.908)


100%|██████████| 100/100 [00:00<00:00, 883.60it/s]


✅ ExtraTreeRegressor_pipelined: (pvalue: 0.908)
Evaluating ExtraTreesClassifier()


100%|██████████| 100/100 [00:07<00:00, 13.13it/s]


✅ ExtraTreesClassifier: (pvalue: 0.815)


100%|██████████| 100/100 [00:07<00:00, 13.00it/s]


✅ ExtraTreesClassifier_pipelined: (pvalue: 0.815)
Evaluating ExtraTreesRegressor()


100%|██████████| 100/100 [00:07<00:00, 14.09it/s]


❌ ExtraTreesRegressor: (pvalue: 0.000)


100%|██████████| 100/100 [00:06<00:00, 14.53it/s]


❌ ExtraTreesRegressor_pipelined: (pvalue: 0.000)
⚠ FactorAnalysis does not support sample_weight
⚠ FastICA does not support sample_weight
⚠ FeatureAgglomeration does not support sample_weight
⚠ FeatureHasher does not support sample_weight
⚠ FeatureUnion does not support sample_weight
⚠ FixedThresholdClassifier does not support sample_weight
⚠ FunctionTransformer does not support sample_weight
Evaluating GammaRegressor()


100%|██████████| 1/1 [00:00<00:00, 424.65it/s]


✅ GammaRegressor: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 309.66it/s]


✅ GammaRegressor_pipelined: (pvalue: 1.000)
Evaluating GaussianNB()


100%|██████████| 1/1 [00:00<00:00, 434.01it/s]


✅ GaussianNB: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 454.77it/s]


✅ GaussianNB_pipelined: (pvalue: 1.000)
⚠ GaussianProcessClassifier does not support sample_weight
⚠ GaussianProcessRegressor does not support sample_weight
⚠ GaussianRandomProjection does not support sample_weight
⚠ GenericUnivariateSelect does not support sample_weight
Evaluating GradientBoostingClassifier(max_features=0.5)


100%|██████████| 100/100 [00:17<00:00,  5.78it/s]


✅ GradientBoostingClassifier: (pvalue: 0.155)


100%|██████████| 100/100 [00:16<00:00,  5.94it/s]


✅ GradientBoostingClassifier_pipelined: (pvalue: 0.155)
Evaluating GradientBoostingRegressor(max_features=0.5)


100%|██████████| 100/100 [00:03<00:00, 27.83it/s]


✅ GradientBoostingRegressor: (pvalue: 0.211)


100%|██████████| 100/100 [00:03<00:00, 27.57it/s]


✅ GradientBoostingRegressor_pipelined: (pvalue: 0.211)
⚠ HDBSCAN does not support sample_weight
⚠ HashingVectorizer does not support sample_weight
Evaluating HistGradientBoostingClassifier(max_features=0.5)


100%|██████████| 100/100 [00:38<00:00,  2.60it/s]


❌ HistGradientBoostingClassifier: (pvalue: 0.000)


100%|██████████| 100/100 [00:38<00:00,  2.63it/s]


❌ HistGradientBoostingClassifier_pipelined: (pvalue: 0.000)
Evaluating HistGradientBoostingRegressor(max_features=0.5)


100%|██████████| 100/100 [00:15<00:00,  6.44it/s]


❌ HistGradientBoostingRegressor: (pvalue: 0.000)


100%|██████████| 100/100 [00:15<00:00,  6.32it/s]


❌ HistGradientBoostingRegressor_pipelined: (pvalue: 0.000)
Evaluating HuberRegressor()


100%|██████████| 1/1 [00:00<00:00, 59.07it/s]


✅ HuberRegressor: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 53.03it/s]


✅ HuberRegressor_pipelined: (pvalue: 1.000)
⚠ IncrementalPCA does not support sample_weight
⚠ Isomap does not support sample_weight
Evaluating IsotonicRegression()


100%|██████████| 1/1 [00:00<00:00, 393.54it/s]


✅ IsotonicRegression: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 602.72it/s]


✅ IsotonicRegression_pipelined: (pvalue: 1.000)
Evaluating KBinsDiscretizer(encode='ordinal', quantile_method='averaged_inverted_cdf',
                 subsample=50)


100%|██████████| 100/100 [00:00<00:00, 355.27it/s]


✅ KBinsDiscretizer: (pvalue: 0.815)


100%|██████████| 100/100 [00:00<00:00, 388.23it/s]


✅ KBinsDiscretizer_pipelined: (pvalue: 0.815)
Evaluating KMeans(n_clusters=10)


100%|██████████| 100/100 [00:00<00:00, 418.86it/s]


✅ KMeans: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 269.04it/s]


✅ KMeans_pipelined: (pvalue: 1.000)
⚠ KNNImputer does not support sample_weight
⚠ KNeighborsClassifier does not support sample_weight
⚠ KNeighborsRegressor does not support sample_weight
⚠ KNeighborsTransformer does not support sample_weight
⚠ KernelCenterer does not support sample_weight
⚠ KernelPCA does not support sample_weight
Evaluating KernelRidge()


100%|██████████| 1/1 [00:00<00:00, 232.51it/s]


✅ KernelRidge: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 529.65it/s]


✅ KernelRidge_pipelined: (pvalue: 1.000)
⚠ LabelBinarizer does not support sample_weight
⚠ LabelEncoder does not support sample_weight
⚠ LabelPropagation does not support sample_weight
⚠ LabelSpreading does not support sample_weight
⚠ Lars does not support sample_weight
⚠ LarsCV does not support sample_weight
Evaluating Lasso(selection='random')


100%|██████████| 100/100 [00:00<00:00, 950.01it/s]


✅ Lasso: (pvalue: 1.000)


100%|██████████| 100/100 [00:00<00:00, 955.57it/s]


✅ Lasso_pipelined: (pvalue: 1.000)
Evaluating LassoCV(selection='random')


100%|██████████| 100/100 [00:01<00:00, 62.12it/s]


✅ LassoCV: (pvalue: 0.368)


100%|██████████| 100/100 [00:01<00:00, 61.09it/s]


✅ LassoCV_pipelined: (pvalue: 0.368)
⚠ LassoLars does not support sample_weight
⚠ LassoLarsCV does not support sample_weight
⚠ LassoLarsIC does not support sample_weight
⚠ LatentDirichletAllocation does not support sample_weight
⚠ LinearDiscriminantAnalysis does not support sample_weight
Evaluating LinearRegression()


100%|██████████| 1/1 [00:00<00:00, 889.75it/s]


✅ LinearRegression: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 992.03it/s]


✅ LinearRegression_pipelined: (pvalue: 1.000)
Evaluating LinearSVC(dual=True)


100%|██████████| 100/100 [00:00<00:00, 139.57it/s]


❌ LinearSVC: (pvalue: 0.000)


100%|██████████| 100/100 [00:00<00:00, 140.55it/s]


❌ LinearSVC_pipelined: (pvalue: 0.000)
Evaluating LinearSVR(dual=True)


100%|██████████| 100/100 [00:00<00:00, 195.40it/s]


❌ LinearSVR: (pvalue: 0.000)


100%|██████████| 100/100 [00:00<00:00, 159.65it/s]


❌ LinearSVR_pipelined: (pvalue: 0.000)
⚠ LocallyLinearEmbedding does not support sample_weight
Evaluating LogisticRegression(dual=True, max_iter=100000, solver='liblinear')


100%|██████████| 100/100 [00:00<00:00, 355.45it/s]


❌ LogisticRegression: (pvalue: 0.000)


100%|██████████| 100/100 [00:00<00:00, 357.43it/s]


❌ LogisticRegression_pipelined: (pvalue: 0.000)
Skipping LogisticRegressionCV
Evaluating MLPClassifier()


100%|██████████| 100/100 [00:06<00:00, 15.97it/s]


✅ MLPClassifier: (pvalue: 0.211)


100%|██████████| 100/100 [00:06<00:00, 15.78it/s]


✅ MLPClassifier_pipelined: (pvalue: 0.211)
Evaluating MLPRegressor()


100%|██████████| 100/100 [00:05<00:00, 17.87it/s]


✅ MLPRegressor: (pvalue: 0.583)


100%|██████████| 100/100 [00:05<00:00, 19.57it/s]


✅ MLPRegressor_pipelined: (pvalue: 0.583)
⚠ MaxAbsScaler does not support sample_weight
⚠ MeanShift does not support sample_weight
⚠ MinMaxScaler does not support sample_weight
⚠ MiniBatchDictionaryLearning does not support sample_weight
Evaluating MiniBatchKMeans(n_clusters=10)


100%|██████████| 100/100 [00:00<00:00, 171.44it/s]


✅ MiniBatchKMeans: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 127.98it/s]


✅ MiniBatchKMeans_pipelined: (pvalue: 1.000)
⚠ MiniBatchNMF does not support sample_weight
⚠ MiniBatchSparsePCA does not support sample_weight
⚠ MissingIndicator does not support sample_weight
⚠ MultiLabelBinarizer does not support sample_weight
⚠ MultiOutputClassifier failed to instantiate: MultiOutputClassifier.__init__() missing 1 required positional argument: 'estimator'
⚠ MultiOutputRegressor failed to instantiate: MultiOutputRegressor.__init__() missing 1 required positional argument: 'estimator'
⚠ MultiTaskElasticNet does not support sample_weight
⚠ MultiTaskElasticNetCV does not support sample_weight
⚠ MultiTaskLasso does not support sample_weight
⚠ MultiTaskLassoCV does not support sample_weight
Evaluating MultinomialNB()


100%|██████████| 1/1 [00:00<00:00, 438.00it/s]


✅ MultinomialNB: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 357.78it/s]


✅ MultinomialNB_pipelined: (pvalue: 1.000)
⚠ NMF does not support sample_weight
⚠ NearestCentroid does not support sample_weight
⚠ NeighborhoodComponentsAnalysis does not support sample_weight
⚠ Normalizer does not support sample_weight
Evaluating NuSVC(probability=True)


100%|██████████| 100/100 [00:00<00:00, 108.93it/s]


❌ NuSVC: (pvalue: 0.000)


100%|██████████| 100/100 [00:00<00:00, 107.23it/s]


❌ NuSVC_pipelined: (pvalue: 0.000)
Evaluating NuSVR()


100%|██████████| 1/1 [00:00<00:00, 248.57it/s]


✅ NuSVR: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 162.16it/s]


✅ NuSVR_pipelined: (pvalue: 1.000)
⚠ Nystroem does not support sample_weight
⚠ OPTICS does not support sample_weight
⚠ OneHotEncoder does not support sample_weight
⚠ OneVsOneClassifier does not support sample_weight
⚠ OneVsRestClassifier does not support sample_weight
⚠ OrdinalEncoder does not support sample_weight
⚠ OrthogonalMatchingPursuit does not support sample_weight
⚠ OrthogonalMatchingPursuitCV does not support sample_weight
⚠ OutputCodeClassifier does not support sample_weight
⚠ PCA does not support sample_weight
⚠ PLSCanonical does not support sample_weight
⚠ PLSRegression does not support sample_weight
⚠ PLSSVD does not support sample_weight
⚠ PassiveAggressiveClassifier does not support sample_weight
⚠ PassiveAggressiveRegressor does not support sample_weight
⚠ PatchExtractor does not support sample_weight
Evaluating Perceptron(max_iter=100000)


100%|██████████| 100/100 [00:00<00:00, 270.91it/s]


✅ Perceptron: (pvalue: 0.016)


100%|██████████| 100/100 [00:00<00:00, 305.80it/s]


✅ Perceptron_pipelined: (pvalue: 0.016)
Evaluating PoissonRegressor()


100%|██████████| 1/1 [00:00<00:00, 383.78it/s]


✅ PoissonRegressor: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 192.82it/s]


✅ PoissonRegressor_pipelined: (pvalue: 1.000)
⚠ PolynomialCountSketch does not support sample_weight
⚠ PolynomialFeatures does not support sample_weight
⚠ PowerTransformer does not support sample_weight
⚠ QuadraticDiscriminantAnalysis does not support sample_weight
Evaluating QuantileRegressor()


100%|██████████| 1/1 [00:00<00:00, 154.49it/s]


✅ QuantileRegressor: (pvalue: 1.000)


100%|██████████| 1/1 [00:00<00:00, 130.42it/s]


✅ QuantileRegressor_pipelined: (pvalue: 1.000)
⚠ QuantileTransformer does not support sample_weight
Evaluating RANSACRegressor()


  3%|▎         | 3/100 [00:00<00:03, 28.10it/s]


❌ RANSACRegressor() error with: Weights sum to zero, can't be normalized
⚠ RBFSampler does not support sample_weight
⚠ RFE does not support sample_weight
⚠ RFECV does not support sample_weight
⚠ RadiusNeighborsClassifier does not support sample_weight
⚠ RadiusNeighborsRegressor does not support sample_weight
⚠ RadiusNeighborsTransformer does not support sample_weight
Evaluating RandomForestClassifier()


100%|██████████| 100/100 [00:09<00:00, 10.50it/s]


❌ RandomForestClassifier: (pvalue: 0.000)


 24%|██▍       | 24/100 [00:02<00:07,  9.85it/s]


KeyboardInterrupt: 

In [5]:
print(
    f"✅ {len([r for r in statistical_test_results if r.pvalue > TEST_THRESHOLD])} "
    "passed the statistical test"
)
print(
    f"❌ {len([r for r in statistical_test_results if r.pvalue <= TEST_THRESHOLD])} "
    "failed the statistical test"
)
print(f"❌ {len(errors)} other errors")
print(
    f"⚠ {len(missing_sample_weight_support)} estimators lack sample_weight "
    "support"
)
results_df = pd.DataFrame([r.to_dict() for r in statistical_test_results])

✅ 45 passed the statistical test
❌ 14 failed the statistical test
❌ 3 other errors
⚠ 112 estimators lack sample_weight support


## Details on the statistical test results

In [6]:
results_df.sort_values("pvalue")[["estimator_name", "pvalue", "deterministic_predictions"]]

Unnamed: 0,estimator_name,pvalue,deterministic_predictions
42,NuSVC,2.208761e-59,False
48,RandomForestRegressor,2.208761e-59,False
54,SVC,2.208761e-59,False
47,RandomForestClassifier,4.417521e-57,False
2,BaggingClassifier,4.395434e-55,False
25,HistGradientBoostingClassifier,5.600643999999999e-50,False
53,SGDRegressor,2.596277e-44,False
35,LinearSVC,6.314161999999999e-19,False
26,HistGradientBoostingRegressor,2.708443e-18,False
3,BaggingRegressor,4.5283080000000006e-17,False


## Details on errors

In [7]:
import sys

for est, e in errors:
    print(f"❌ {est}: {e}")
    traceback.print_exception(e, file=sys.stdout)
    print()

❌ RANSACRegressor(): Weights sum to zero, can't be normalized
Traceback (most recent call last):
  File "/var/folders/_y/lfnx34p13w3_sr2k12bjb05w0000gn/T/ipykernel_72266/2504969921.py", line 49, in <module>
    result = check_weighted_repeated_estimator_fit_equivalence(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ogrisel/code/sample-weight-audit-nondet/src/sample_weight_audit/estimator_check.py", line 90, in check_weighted_repeated_estimator_fit_equivalence
    multifit_over_weighted_and_repeated(
  File "/Users/ogrisel/code/sample-weight-audit-nondet/src/sample_weight_audit/estimator_check.py", line 312, in multifit_over_weighted_and_repeated
    est_weighted = check_pipeline_and_fit(
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ogrisel/code/sample-weight-audit-nondet/src/sample_weight_audit/estimator_check.py", line 224, in check_pipeline_and_fit
    est = est.fit(X, y, sample_weight=sample_weight)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

## List of estimators with missing sample_weight support

In [8]:
for est_name in missing_sample_weight_support:
    print(est_name)

ARDRegression
AdditiveChi2Sampler
AffinityPropagation
AgglomerativeClustering
BernoulliRBM
Binarizer
Birch
CCA
ClassifierChain
ColumnTransformer
DictVectorizer
DictionaryLearning
FactorAnalysis
FastICA
FeatureAgglomeration
FeatureHasher
FeatureUnion
FixedThresholdClassifier
FunctionTransformer
GaussianProcessClassifier
GaussianProcessRegressor
GaussianRandomProjection
GenericUnivariateSelect
HDBSCAN
HashingVectorizer
IncrementalPCA
Isomap
KNNImputer
KNeighborsClassifier
KNeighborsRegressor
KNeighborsTransformer
KernelCenterer
KernelPCA
LabelBinarizer
LabelEncoder
LabelPropagation
LabelSpreading
Lars
LarsCV
LassoLars
LassoLarsCV
LassoLarsIC
LatentDirichletAllocation
LinearDiscriminantAnalysis
LocallyLinearEmbedding
MaxAbsScaler
MeanShift
MinMaxScaler
MiniBatchDictionaryLearning
MiniBatchNMF
MiniBatchSparsePCA
MissingIndicator
MultiLabelBinarizer
MultiTaskElasticNet
MultiTaskElasticNetCV
MultiTaskLasso
MultiTaskLassoCV
NMF
NearestCentroid
NeighborhoodComponentsAnalysis
Normalizer
Nystroe

## Example interactive inspection of the predictions and scores of a given estimator

In [9]:
results = next(r for r in statistical_test_results if r.estimator_name == "GaussianNB")
results.predictions_repeated.shape

(1, 3000)

In [10]:
results.predictions_weighted.shape

(1, 3000)

In [11]:
import numpy as np
np.abs(results.predictions_repeated - results.predictions_weighted).max()

np.float64(3.9999992207384594e-10)

In [12]:
np.abs(results.scores_repeated - results.scores_weighted).max()

np.float64(3.2652280879119644e-10)

In [13]:
from scipy.stats import kstest
kstest(results.scores_repeated, results.scores_weighted)

KstestResult(statistic=np.float64(1.0), pvalue=np.float64(1.0), statistic_location=np.float64(3.262703314947884), statistic_sign=np.int8(1))