In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import permutation_test_score
from sklearn.dummy import DummyClassifier
from sklearn.ensemble import HistGradientBoostingClassifier

In [2]:
bunch = load_iris()
X, y = bunch.data, bunch.target
dummy = DummyClassifier()
histgrad = HistGradientBoostingClassifier()

In [3]:
score, perm_scores, p = permutation_test_score(dummy, X, y)
print(f"Accuracy for DummyClassifier: {score:.5f}")
print(f"p-value for DummyClassifier: {p:.5f}")

Accuracy for DummyClassifier: 0.33333
p-value for DummyClassifier: 1.00000


The high $p$-value indicates that either there is no dependence between the features and the labels or that the DummyClassifier cannot make use of such dependencies. We do know, however, that the DummyClassifier predicts the majorty class meaning that its predictions are **independent** of the features. 

In [4]:
score, perm_score, p = permutation_test_score(histgrad, X, y, n_jobs=-1)
print(f"Accuracy for HistGradientBoostingClassifier: {score:.5f}")
print(f"P-value for HistGradientBoostingClassifier: {p:.5f}")

Accuracy for HistGradientBoostingClassifier: 0.94667
P-value for HistGradientBoostingClassifier: 0.00990


`HistGradientBoostingClassifier` is an ensemble method that iteratively grows successive trees to minimize the error of preceding trees. This method does take the features into consideration. The $p$-value is computed as `(np.sum(permutation_scores >= score) + 1.0) / (n_permutations + 1)`: the percentage of the times where the classifier achieves the same score or greater when trained on label-permuted data as opposed to the original data. The lowest $p$-value achievable with the `permutation_test_score` implementation with `n_permutations=100` is: $\frac{(0 + 1)}{(100 + 1)} = 0.0099$. This is the result we observe. We can repeat the experiment with a larger number of trials and observe a lower $p$-value:


In [5]:
score, perm_score, p = permutation_test_score(
    histgrad, X, y, n_permutations=1000, n_jobs=-1
)
print(f"Accuracy for HistGradientBoostingClassifier: {score:.5f}")
print(f"P-value for HistGradientBoostingClassifier: {p:.5f}")

Accuracy for HistGradientBoostingClassifier: 0.94667
P-value for HistGradientBoostingClassifier: 0.00100


The small $p$-value allows us to reject the null hypothesis that the features and labels are independent. In other words, it suggests that there is a statistically significant assiciation between features and labels. The `HistGradientBoostingClassifier` successfully made use of these dependencies to achieve a high mean-accuracy. 