## scikit-learn sample_weight compliance report

This notebook runs compliance tests on all scikit-learn estimators. Estimator as inspected to check whether they are expected to have a stochastic fit or not. If the fit is stochastic, a dedicated statistical test is performed, otherwise a deterministic estimator check is run instead.

In [1]:
import os
os.environ["SCIPY_ARRAY_API"] = "1"

In [2]:
import sklearn

sklearn.show_versions()


System:
    python: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:13:44) [Clang 16.0.6 ]
executable: /Users/shrutinath/micromamba/envs/scikit-learn/bin/python
   machine: macOS-14.3-arm64-arm-64bit

Python dependencies:
      sklearn: 1.7.dev0
          pip: 24.0
   setuptools: 75.8.0
        numpy: 2.0.0
        scipy: 1.14.0
       Cython: 3.0.10
       pandas: 2.2.2
   matplotlib: 3.9.0
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/shrutinath/micromamba/envs/scikit-learn/lib/libopenblas.0.dylib
        version: 0.3.27
threading_layer: openmp
   architecture: VORTEX

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libomp
       filepath: /Users/shrutinath/micromamba/envs/scikit-learn/lib/libomp.dylib
        version: None


In [3]:
from inspect import signature
import traceback
import warnings
import pandas as pd
from sklearn.base import is_clusterer
from sklearn.utils import all_estimators
from sklearn.utils.estimator_checks import check_sample_weight_equivalence_on_dense_data
from sklearn.exceptions import ConvergenceWarning
import threadpoolctl

import sys
sys.path.insert(1,'../src/')
from sample_weight_audit import check_weighted_repeated_estimator_fit_equivalence
from sample_weight_audit.exceptions import UnexpectedDeterministicPredictions
from sample_weight_audit.sklearn_stochastic_params import STOCHASTIC_FIT_PARAMS

# HistGradientBoostingClassifier trashes the OpenMP thread pool on repeated
# small fits.
threadpoolctl.threadpool_limits(limits=1, user_api="openmp")
warnings.filterwarnings("ignore", category=RuntimeWarning)  # division by zero in AdaBoost
warnings.filterwarnings("ignore", category=ConvergenceWarning)  # liblinear can fail to converge
warnings.filterwarnings("ignore", category=UserWarning)  # KBinsDiscretizer with collapsed bins

In [4]:
from sklearn.linear_model import LogisticRegressionCV

ESTIMATORS_TO_SKIP = [
    LogisticRegressionCV,  # too slow and already somewhat tested by LogisticRegression
]

In [6]:
STAT_TEST_DIM = 30
N_STOCHASTIC_FITS = 100
N_STOCHASTIC_ESTIMATORS = 36  # measured a posteriori
# BONFERRONI_CORRECTION = 1 / (N_STOCHASTIC_ESTIMATORS * STAT_TEST_DIM)
BONFERRONI_CORRECTION = 1 / STAT_TEST_DIM

TEST_THRESHOLD = 0.05 * BONFERRONI_CORRECTION


statistical_test_results = []
deterministic_test_results = []
missing_sample_weight_support = []
errors = []


for est_name, est_class in all_estimators(
    type_filter=["classifier", "regressor", "cluster", "transformer"]
):
    if est_class in ESTIMATORS_TO_SKIP:
        print(f"Skipping {est_name}")
        continue

    if "sample_weight" not in signature(est_class.fit).parameters:
        print(f"⚠ {est_name} does not support sample_weight")
        missing_sample_weight_support.append(est_name)
        continue

    try:
        est = est_class(**STOCHASTIC_FIT_PARAMS.get(est_class, {}))
    except TypeError as e:
        print(f"⚠ {est_name} failed to instantiate: {e}")
        continue
    
    print(f"Evaluating {est}")
    try:
        result = check_weighted_repeated_estimator_fit_equivalence(
            est,
            test_name="kstest",
            n_stochastic_fits=N_STOCHASTIC_FITS,
            random_state=0,
        )
        pass_or_fail = "✅" if result.p_value > TEST_THRESHOLD else "❌"
        print(
            f"{pass_or_fail} {est_name}: (p_value: {result.p_value:.3f})"
        )
        statistical_test_results.append(result)
    except UnexpectedDeterministicPredictions:
        # The estimator parametrization led to deterministic behavior, which is
        # unexpected. Run the deterministic check to investigate instead.
        print(f"⚠ {est_name} with different random states led to the same predictions")
        try:
            check_sample_weight_equivalence_on_dense_data(
                est_name, est.set_params(random_state=0)
            )
            print(f"✅ {est} passed the deterministic check")
            deterministic_test_results.append((est, None))
        except Exception as e:
            print(f"❌ {est} failed the deterministic check")
            deterministic_test_results.append((est, e))
    except Exception as e:
        print(f"❌ {est} error with: {e}")
        errors.append((est, e))

results_df = pd.DataFrame([r.to_dict() for r in statistical_test_results])

⚠ ARDRegression does not support sample_weight
Evaluating AdaBoostClassifier(estimator=DecisionTreeClassifier(max_features=0.5,
                                                    min_weight_fraction_leaf=0.1))


100%|██████████| 100/100 [00:05<00:00, 17.07it/s]


✅ AdaBoostClassifier: (p_value: 0.815)
Evaluating AdaBoostRegressor(estimator=DecisionTreeRegressor(max_features=0.5,
                                                  min_weight_fraction_leaf=0.1))


100%|██████████| 100/100 [00:04<00:00, 21.16it/s]


✅ AdaBoostRegressor: (p_value: 0.078)
⚠ AdditiveChi2Sampler does not support sample_weight
⚠ AffinityPropagation does not support sample_weight
⚠ AgglomerativeClustering does not support sample_weight
Evaluating BaggingClassifier()


100%|██████████| 100/100 [00:01<00:00, 65.39it/s]


❌ BaggingClassifier: (p_value: 0.000)
Evaluating BaggingRegressor()


100%|██████████| 100/100 [00:01<00:00, 69.24it/s]


❌ BaggingRegressor: (p_value: 0.000)
Evaluating BayesianRidge()


100%|██████████| 100/100 [00:00<00:00, 932.27it/s]


❌ BayesianRidge: (p_value: 0.000)
Evaluating BernoulliNB()


100%|██████████| 100/100 [00:00<00:00, 623.94it/s]


✅ BernoulliNB: (p_value: 1.000)
⚠ BernoulliRBM does not support sample_weight
⚠ Binarizer does not support sample_weight
⚠ Birch does not support sample_weight
Evaluating BisectingKMeans(n_clusters=10)


100%|██████████| 100/100 [00:00<00:00, 280.25it/s]


✅ BisectingKMeans: (p_value: 0.968)
⚠ CCA does not support sample_weight
Evaluating CalibratedClassifierCV()


100%|██████████| 100/100 [00:01<00:00, 89.86it/s]


❌ CalibratedClassifierCV: (p_value: 0.000)
Evaluating CategoricalNB()
❌ CategoricalNB() error with: Negative values in data passed to CategoricalNB (input X).
⚠ ClassifierChain does not support sample_weight
⚠ ColumnTransformer does not support sample_weight
Evaluating ComplementNB()
❌ ComplementNB() error with: Negative values in data passed to ComplementNB (input X).
Evaluating DBSCAN()


  0%|          | 0/100 [00:00<?, ?it/s]


❌ DBSCAN() error with: 'DBSCAN' object has no attribute 'predict'
Evaluating DecisionTreeClassifier(max_features=0.5, min_weight_fraction_leaf=0.1)


100%|██████████| 100/100 [00:00<00:00, 638.60it/s]


✅ DecisionTreeClassifier: (p_value: 0.583)
Evaluating DecisionTreeRegressor(max_features=0.5)


100%|██████████| 100/100 [00:00<00:00, 904.52it/s]


✅ DecisionTreeRegressor: (p_value: 0.211)
⚠ DictVectorizer does not support sample_weight
⚠ DictionaryLearning does not support sample_weight
Evaluating DummyClassifier(strategy='stratified')


100%|██████████| 100/100 [00:00<00:00, 892.40it/s]


✅ DummyClassifier: (p_value: 0.155)
Evaluating DummyRegressor()


100%|██████████| 100/100 [00:00<00:00, 2254.88it/s]


✅ DummyRegressor: (p_value: 1.000)
Evaluating ElasticNet(selection='random')


100%|██████████| 100/100 [00:00<00:00, 948.61it/s]


✅ ElasticNet: (p_value: 0.368)
Evaluating ElasticNetCV(selection='random')


100%|██████████| 100/100 [00:01<00:00, 56.28it/s]


✅ ElasticNetCV: (p_value: 0.702)
Evaluating ExtraTreeClassifier()


100%|██████████| 100/100 [00:00<00:00, 653.07it/s]


✅ ExtraTreeClassifier: (p_value: 0.470)
Evaluating ExtraTreeRegressor()


100%|██████████| 100/100 [00:00<00:00, 459.65it/s]


✅ ExtraTreeRegressor: (p_value: 0.470)
Evaluating ExtraTreesClassifier()


100%|██████████| 100/100 [00:07<00:00, 13.96it/s]


✅ ExtraTreesClassifier: (p_value: 0.815)
Evaluating ExtraTreesRegressor()


100%|██████████| 100/100 [00:06<00:00, 14.54it/s]


✅ ExtraTreesRegressor: (p_value: 0.155)
⚠ FactorAnalysis does not support sample_weight
⚠ FastICA does not support sample_weight
⚠ FeatureAgglomeration does not support sample_weight
⚠ FeatureHasher does not support sample_weight
⚠ FeatureUnion does not support sample_weight
⚠ FixedThresholdClassifier does not support sample_weight
⚠ FunctionTransformer does not support sample_weight
Evaluating GammaRegressor()
❌ GammaRegressor() error with: Some value(s) of y are out of the valid range of the loss 'HalfGammaLoss'.
Evaluating GaussianNB()


100%|██████████| 100/100 [00:00<00:00, 659.07it/s]


❌ GaussianNB: (p_value: 0.000)
⚠ GaussianProcessClassifier does not support sample_weight
⚠ GaussianProcessRegressor does not support sample_weight
⚠ GaussianRandomProjection does not support sample_weight
⚠ GenericUnivariateSelect does not support sample_weight
Evaluating GradientBoostingClassifier(max_features=0.5)


100%|██████████| 100/100 [00:17<00:00,  5.86it/s]


✅ GradientBoostingClassifier: (p_value: 0.155)
Evaluating GradientBoostingRegressor(max_features=0.5)


100%|██████████| 100/100 [00:03<00:00, 28.23it/s]


✅ GradientBoostingRegressor: (p_value: 0.111)
⚠ HDBSCAN does not support sample_weight
⚠ HashingVectorizer does not support sample_weight
Evaluating HistGradientBoostingClassifier(max_features=0.5)


100%|██████████| 100/100 [00:07<00:00, 12.79it/s]


❌ HistGradientBoostingClassifier: (p_value: 0.000)
Evaluating HistGradientBoostingRegressor(max_features=0.5)


100%|██████████| 100/100 [00:02<00:00, 34.86it/s]


❌ HistGradientBoostingRegressor: (p_value: 0.000)
Evaluating HuberRegressor()


100%|██████████| 100/100 [00:01<00:00, 76.35it/s]


❌ HuberRegressor: (p_value: 0.000)
⚠ IncrementalPCA does not support sample_weight
⚠ Isomap does not support sample_weight
Evaluating IsotonicRegression()
❌ IsotonicRegression() error with: Isotonic regression input X should be a 1d array or 2d array with 1 feature
Evaluating KBinsDiscretizer(encode='ordinal', quantile_method='averaged_inverted_cdf',
                 subsample=50)


100%|██████████| 100/100 [00:00<00:00, 330.12it/s]


✅ KBinsDiscretizer: (p_value: 0.908)
Evaluating KMeans(n_clusters=10)


100%|██████████| 100/100 [00:00<00:00, 344.11it/s]


✅ KMeans: (p_value: 0.211)
⚠ KNNImputer does not support sample_weight
⚠ KNeighborsClassifier does not support sample_weight
⚠ KNeighborsRegressor does not support sample_weight
⚠ KNeighborsTransformer does not support sample_weight
⚠ KernelCenterer does not support sample_weight
⚠ KernelPCA does not support sample_weight
Evaluating KernelRidge()


100%|██████████| 100/100 [00:00<00:00, 831.71it/s]


❌ KernelRidge: (p_value: 0.000)
⚠ LabelBinarizer does not support sample_weight
⚠ LabelEncoder does not support sample_weight
⚠ LabelPropagation does not support sample_weight
⚠ LabelSpreading does not support sample_weight
⚠ Lars does not support sample_weight
⚠ LarsCV does not support sample_weight
Evaluating Lasso(selection='random')


100%|██████████| 100/100 [00:00<00:00, 925.88it/s]


✅ Lasso: (p_value: 0.155)
Evaluating LassoCV(selection='random')


100%|██████████| 100/100 [00:01<00:00, 54.75it/s]


✅ LassoCV: (p_value: 0.155)
⚠ LassoLars does not support sample_weight
⚠ LassoLarsCV does not support sample_weight
⚠ LassoLarsIC does not support sample_weight
⚠ LatentDirichletAllocation does not support sample_weight
⚠ LinearDiscriminantAnalysis does not support sample_weight
Evaluating LinearRegression()


100%|██████████| 100/100 [00:00<00:00, 1349.65it/s]


❌ LinearRegression: (p_value: 0.000)
Evaluating LinearSVC(dual=True)


100%|██████████| 100/100 [00:01<00:00, 95.28it/s]


❌ LinearSVC: (p_value: 0.000)
Evaluating LinearSVR(dual=True)


100%|██████████| 100/100 [00:00<00:00, 1198.48it/s]


❌ LinearSVR: (p_value: 0.000)
⚠ LocallyLinearEmbedding does not support sample_weight
Evaluating LogisticRegression(dual=True, max_iter=100000, solver='liblinear')


100%|██████████| 100/100 [00:00<00:00, 174.68it/s]


❌ LogisticRegression: (p_value: 0.000)
Skipping LogisticRegressionCV
Evaluating MLPClassifier()


100%|██████████| 100/100 [00:06<00:00, 15.26it/s]


✅ MLPClassifier: (p_value: 0.282)
Evaluating MLPRegressor()


100%|██████████| 100/100 [00:05<00:00, 18.64it/s]


✅ MLPRegressor: (p_value: 0.282)
⚠ MaxAbsScaler does not support sample_weight
⚠ MeanShift does not support sample_weight
⚠ MinMaxScaler does not support sample_weight
⚠ MiniBatchDictionaryLearning does not support sample_weight
Evaluating MiniBatchKMeans(n_clusters=10)


100%|██████████| 100/100 [00:00<00:00, 253.33it/s]


✅ MiniBatchKMeans: (p_value: 0.111)
⚠ MiniBatchNMF does not support sample_weight
⚠ MiniBatchSparsePCA does not support sample_weight
⚠ MissingIndicator does not support sample_weight
⚠ MultiLabelBinarizer does not support sample_weight
⚠ MultiOutputClassifier failed to instantiate: MultiOutputClassifier.__init__() missing 1 required positional argument: 'estimator'
⚠ MultiOutputRegressor failed to instantiate: MultiOutputRegressor.__init__() missing 1 required positional argument: 'estimator'
⚠ MultiTaskElasticNet does not support sample_weight
⚠ MultiTaskElasticNetCV does not support sample_weight
⚠ MultiTaskLasso does not support sample_weight
⚠ MultiTaskLassoCV does not support sample_weight
Evaluating MultinomialNB()
❌ MultinomialNB() error with: Negative values in data passed to MultinomialNB (input X).
⚠ NMF does not support sample_weight
⚠ NearestCentroid does not support sample_weight
⚠ NeighborhoodComponentsAnalysis does not support sample_weight
⚠ Normalizer does not support

100%|██████████| 100/100 [00:00<00:00, 118.07it/s]


❌ NuSVC: (p_value: 0.000)
Evaluating NuSVR()


100%|██████████| 100/100 [00:00<00:00, 293.31it/s]


❌ NuSVR: (p_value: 0.000)
⚠ Nystroem does not support sample_weight
⚠ OPTICS does not support sample_weight
⚠ OneHotEncoder does not support sample_weight
⚠ OneVsOneClassifier does not support sample_weight
⚠ OneVsRestClassifier does not support sample_weight
⚠ OrdinalEncoder does not support sample_weight
⚠ OrthogonalMatchingPursuit does not support sample_weight
⚠ OrthogonalMatchingPursuitCV does not support sample_weight
⚠ OutputCodeClassifier does not support sample_weight
⚠ PCA does not support sample_weight
⚠ PLSCanonical does not support sample_weight
⚠ PLSRegression does not support sample_weight
⚠ PLSSVD does not support sample_weight
⚠ PassiveAggressiveClassifier does not support sample_weight
⚠ PassiveAggressiveRegressor does not support sample_weight
⚠ PatchExtractor does not support sample_weight
Evaluating Perceptron(max_iter=100000)


100%|██████████| 100/100 [00:00<00:00, 334.73it/s]


✅ Perceptron: (p_value: 0.004)
Evaluating PoissonRegressor()
❌ PoissonRegressor() error with: Some value(s) of y are out of the valid range of the loss 'HalfPoissonLoss'.
⚠ PolynomialCountSketch does not support sample_weight
⚠ PolynomialFeatures does not support sample_weight
⚠ PowerTransformer does not support sample_weight
⚠ QuadraticDiscriminantAnalysis does not support sample_weight
Evaluating QuantileRegressor()


100%|██████████| 100/100 [00:00<00:00, 305.31it/s]


✅ QuantileRegressor: (p_value: 1.000)
⚠ QuantileTransformer does not support sample_weight
Evaluating RANSACRegressor()


  3%|▎         | 3/100 [00:00<00:03, 28.47it/s]


❌ RANSACRegressor() error with: Weights sum to zero, can't be normalized
⚠ RBFSampler does not support sample_weight
⚠ RFE does not support sample_weight
⚠ RFECV does not support sample_weight
⚠ RadiusNeighborsClassifier does not support sample_weight
⚠ RadiusNeighborsRegressor does not support sample_weight
⚠ RadiusNeighborsTransformer does not support sample_weight
Evaluating RandomForestClassifier()


100%|██████████| 100/100 [00:09<00:00, 10.45it/s]


❌ RandomForestClassifier: (p_value: 0.000)
Evaluating RandomForestRegressor(max_features=0.5)


100%|██████████| 100/100 [00:08<00:00, 11.35it/s]


❌ RandomForestRegressor: (p_value: 0.000)
Evaluating RandomTreesEmbedding(n_estimators=10)


100%|██████████| 100/100 [00:01<00:00, 83.12it/s]


✅ RandomTreesEmbedding: (p_value: 0.470)
⚠ RegressorChain does not support sample_weight
Evaluating Ridge(max_iter=100000, solver='sag')
❌ Ridge(max_iter=100000, solver='sag') error with: Floating-point under-/overflow occurred at epoch #973. Scaling input data with StandardScaler or MinMaxScaler might help.
Evaluating RidgeCV()


100%|██████████| 100/100 [00:00<00:00, 110.62it/s]


❌ RidgeCV: (p_value: 0.000)
Evaluating RidgeClassifier(max_iter=100000, solver='saga')
❌ RidgeClassifier(max_iter=100000, solver='saga') error with: Floating-point under-/overflow occurred at epoch #878. Scaling input data with StandardScaler or MinMaxScaler might help.
Evaluating RidgeClassifierCV()


100%|██████████| 100/100 [00:01<00:00, 61.72it/s]


✅ RidgeClassifierCV: (p_value: 1.000)
⚠ RobustScaler does not support sample_weight
Evaluating SGDClassifier(max_iter=100000)


100%|██████████| 100/100 [00:00<00:00, 294.59it/s]


❌ SGDClassifier: (p_value: 0.000)
Evaluating SGDRegressor(max_iter=100000)


100%|██████████| 100/100 [00:00<00:00, 696.89it/s]


❌ SGDRegressor: (p_value: 0.000)
Evaluating SVC(probability=True)


100%|██████████| 100/100 [00:00<00:00, 131.06it/s]


❌ SVC: (p_value: 0.000)
Evaluating SVR()


100%|██████████| 100/100 [00:00<00:00, 163.11it/s]


❌ SVR: (p_value: 0.000)
⚠ SelectFdr does not support sample_weight
⚠ SelectFpr does not support sample_weight
⚠ SelectFromModel does not support sample_weight
⚠ SelectFwe does not support sample_weight
⚠ SelectKBest does not support sample_weight
⚠ SelectPercentile does not support sample_weight
⚠ SelfTrainingClassifier does not support sample_weight
⚠ SequentialFeatureSelector does not support sample_weight
⚠ SimpleImputer does not support sample_weight
⚠ SkewedChi2Sampler does not support sample_weight
⚠ SparseCoder does not support sample_weight
⚠ SparsePCA does not support sample_weight
⚠ SparseRandomProjection does not support sample_weight
⚠ SpectralClustering does not support sample_weight
Evaluating SplineTransformer()


100%|██████████| 100/100 [00:00<00:00, 254.02it/s]


❌ SplineTransformer: (p_value: 0.000)
⚠ StackingClassifier failed to instantiate: StackingClassifier.__init__() missing 1 required positional argument: 'estimators'
⚠ StackingRegressor failed to instantiate: StackingRegressor.__init__() missing 1 required positional argument: 'estimators'
Evaluating StandardScaler()


100%|██████████| 100/100 [00:00<00:00, 777.05it/s]


❌ StandardScaler: (p_value: 0.000)
⚠ TSNE does not support sample_weight
⚠ TargetEncoder does not support sample_weight
⚠ TfidfTransformer does not support sample_weight
⚠ TheilSenRegressor does not support sample_weight
⚠ TransformedTargetRegressor does not support sample_weight
⚠ TruncatedSVD does not support sample_weight
⚠ TunedThresholdClassifierCV does not support sample_weight
Evaluating TweedieRegressor()


100%|██████████| 100/100 [00:00<00:00, 630.01it/s]

✅ TweedieRegressor: (p_value: 1.000)
⚠ VarianceThreshold does not support sample_weight
⚠ VotingClassifier failed to instantiate: VotingClassifier.__init__() missing 1 required positional argument: 'estimators'
⚠ VotingRegressor failed to instantiate: VotingRegressor.__init__() missing 1 required positional argument: 'estimators'





In [7]:
print(
    f"✅ {len([r for r in deterministic_test_results if r[1] is None])} "
    "passed the deterministic test"
)
print(
    f"❌ {len([r for r in deterministic_test_results if r[1] is not None])} "
    "failed the deterministic test"
)
print(
    f"✅ {len([r for r in statistical_test_results if r.p_value > TEST_THRESHOLD])} "
    "passed the statistical test"
)
print(
    f"❌ {len([r for r in statistical_test_results if r.p_value <= TEST_THRESHOLD])} "
    "failed the statistical test"
)
print(f"❌ {len(errors)} other errors")
print(
    f"⚠ {len(missing_sample_weight_support)} estimators lack sample_weight "
    "support"
)
results_df = pd.DataFrame([r.to_dict() for r in statistical_test_results])

✅ 0 passed the deterministic test
❌ 0 failed the deterministic test
✅ 28 passed the statistical test
❌ 24 failed the statistical test
❌ 10 other errors
⚠ 112 estimators lack sample_weight support


## Details on the statistical test results

In [8]:
results_df.sort_values("p_value")[["estimator_name", "p_value", "deterministic_flag"]]

Unnamed: 0,estimator_name,p_value,deterministic_flag
46,SGDRegressor,2.208761e-59,False
29,LinearRegression,2.208761e-59,True
26,KernelRidge,2.208761e-59,True
50,StandardScaler,2.208761e-59,True
4,BayesianRidge,2.208761e-59,True
49,SplineTransformer,2.208761e-59,True
48,SVR,2.208761e-59,True
7,CalibratedClassifierCV,2.208761e-59,True
43,RidgeCV,2.208761e-59,True
23,HuberRegressor,2.208761e-59,True


## Details on deterministic test errors

In [8]:
import sys

for est, e in deterministic_test_results:
    if e is None:
        continue

    print(f"❌ {est}: {e}")
    traceback.print_exception(e, file=sys.stdout)
    print()

❌ HuberRegressor(): 
Not equal to tolerance rtol=1e-07, atol=1e-09
Comparing the output of HuberRegressor.predict revealed that fitting with `sample_weight` is not equivalent to fitting with removed or repeated data points.
Mismatched elements: 15 / 15 (100%)
Max absolute difference among violations: 0.00051052
Max relative difference among violations: 7.5912896
 ACTUAL: array([-2.323244e-05,  9.999910e-01,  2.000039e+00,  1.202210e+00,
        2.430282e+00,  9.999892e-01,  1.999998e+00,  2.000024e+00,
        1.656899e+00,  1.999991e+00,  1.576810e+00,  1.295011e+00,
        1.829514e+00,  1.000022e+00,  1.000070e+00])
 DESIRED: array([-2.704185e-06,  9.999971e-01,  2.000005e+00,  1.202141e+00,
        2.430112e+00,  9.999871e-01,  1.999991e+00,  2.000007e+00,
        1.656642e+00,  1.999968e+00,  1.576735e+00,  1.294886e+00,
        1.829381e+00,  1.000010e+00,  9.995597e-01])
Traceback (most recent call last):
  File "/var/folders/_y/lfnx34p13w3_sr2k12bjb05w0000gn/T/ipykernel_47673/

## Details on other errors

In [9]:
import sys

for est, e in errors:
    print(f"❌ {est}: {e}")
    traceback.print_exception(e, file=sys.stdout)
    print()

❌ RANSACRegressor(): Weights sum to zero, can't be normalized
Traceback (most recent call last):
  File "/var/folders/_y/lfnx34p13w3_sr2k12bjb05w0000gn/T/ipykernel_47673/2096092967.py", line 49, in <module>
    result = check_weighted_repeated_estimator_fit_equivalence(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ogrisel/code/sample-weight-audit-nondet/src/sample_weight_audit/estimator_check.py", line 83, in check_weighted_repeated_estimator_fit_equivalence
    predictions_weighted, predictions_repeated, _ = multifit_over_weighted_and_repeated(
                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ogrisel/code/sample-weight-audit-nondet/src/sample_weight_audit/estimator_check.py", line 302, in multifit_over_weighted_and_repeated
    est_weighted = check_pipeline_and_fit(
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ogrisel/code/sample-weight-audit-nondet/src/sample_weight_audit/estimator_c

## List of estimators with missing sample_weight support

In [10]:
for est_name in missing_sample_weight_support:
    print(est_name)

ARDRegression
AdditiveChi2Sampler
AffinityPropagation
AgglomerativeClustering
BernoulliRBM
Binarizer
Birch
CCA
ClassifierChain
ColumnTransformer
DictVectorizer
DictionaryLearning
FactorAnalysis
FastICA
FeatureAgglomeration
FeatureHasher
FeatureUnion
FixedThresholdClassifier
FunctionTransformer
GaussianProcessClassifier
GaussianProcessRegressor
GaussianRandomProjection
GenericUnivariateSelect
HDBSCAN
HashingVectorizer
IncrementalPCA
Isomap
KNNImputer
KNeighborsClassifier
KNeighborsRegressor
KNeighborsTransformer
KernelCenterer
KernelPCA
LabelBinarizer
LabelEncoder
LabelPropagation
LabelSpreading
Lars
LarsCV
LassoLars
LassoLarsCV
LassoLarsIC
LatentDirichletAllocation
LinearDiscriminantAnalysis
LocallyLinearEmbedding
MaxAbsScaler
MeanShift
MinMaxScaler
MiniBatchDictionaryLearning
MiniBatchNMF
MiniBatchSparsePCA
MissingIndicator
MultiLabelBinarizer
MultiTaskElasticNet
MultiTaskElasticNetCV
MultiTaskLasso
MultiTaskLassoCV
NMF
NearestCentroid
NeighborhoodComponentsAnalysis
Normalizer
Nystroe