# Uncertainty_Estimates_from_Classifiers
Often, you arenot only interested in which class a classifier predicts for a certain test point, but also
how certain it is that this is the right class.There are two different functions in scikit-learn that can be used to obtain uncertainty estimates from classifiers: decision_function and predict_proba .

In [1]:
import pandas as pd
import numpy as np

In [2]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import make_blobs, make_circles
X, y = make_circles(noise=0.25, factor=0.5, random_state=1)
y

array([1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1,
       1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0,
       0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0,
       1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0,
       1, 1, 0, 1, 0, 1, 0, 0])

In [3]:
# we rename the classes "blue" and "red" for illustration purposes
y_named = np.array(["blue", "red"])[y]

In [4]:
gbrt=GradientBoostingClassifier(random_state=0)

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train_named, y_test_named, y_train, y_test =train_test_split(X, y_named, y, random_state=0)

In [6]:
gbrt.fit(X_train,y_train_named)

GradientBoostingClassifier(random_state=0)

# Decision Function

In [7]:
print("X_test shape: {}".format(X_test.shape))
print("decision_function shape: {}".format(gbrt.decision_function(X_test).shape))


X_test shape: (25, 2)
decision_function shape: (25,)


In [8]:
print("Decision Function:{}".format(gbrt.decision_function(X_test)))

Decision Function:[ 4.13592603 -1.70169917 -3.95106099 -3.62609552  4.28986642  3.66166081
 -7.69097179  4.11001686  1.10753937  3.40782222 -6.46255955  4.28986642
  3.90156346 -1.20031247  3.66166081 -4.17231157 -1.23010079 -3.91576223
  4.03602783  4.11001686  4.11001686  0.65709014  2.69826265 -2.65673274
 -1.86776596]


#### Getting decision function along with predictions

In [9]:
print("Thresholded decision function:\n{}".format(gbrt.decision_function(X_test)))
print("Predictions:\n{}".format(gbrt.predict(X_test)))

Thresholded decision function:
[ 4.13592603 -1.70169917 -3.95106099 -3.62609552  4.28986642  3.66166081
 -7.69097179  4.11001686  1.10753937  3.40782222 -6.46255955  4.28986642
  3.90156346 -1.20031247  3.66166081 -4.17231157 -1.23010079 -3.91576223
  4.03602783  4.11001686  4.11001686  0.65709014  2.69826265 -2.65673274
 -1.86776596]
Predictions:
['red' 'blue' 'blue' 'blue' 'red' 'red' 'blue' 'red' 'red' 'red' 'blue'
 'red' 'red' 'blue' 'red' 'blue' 'blue' 'blue' 'red' 'red' 'red' 'red'
 'red' 'blue' 'blue']


**For binary classification the first entry of classes_ attrriibute is 'negative class' and second attribute is 'postive class'**

In [10]:
gbrt.classes_
#Here blue is negative class and red is possitve class.

array(['blue', 'red'],
      dtype='<U4')

# Predicting Probabilities
 The output of predict_proba is a probability for each class.It is always of shape (n_samples,2) for binary classification:

In [11]:
print("Predict Probabilites:{}".format(gbrt.predict_proba(X_test).shape))

Predict Probabilites:(25, 2)


In [12]:
print("Predict Probabilites:\n{}".format(gbrt.predict_proba(X_test)[:10]))

Predict Probabilites:
[[  1.57362639e-02   9.84263736e-01]
 [  8.45756526e-01   1.54243474e-01]
 [  9.81128693e-01   1.88713075e-02]
 [  9.74070327e-01   2.59296728e-02]
 [  1.35214212e-02   9.86478579e-01]
 [  2.50463747e-02   9.74953625e-01]
 [  9.99543275e-01   4.56725221e-04]
 [  1.61426376e-02   9.83857362e-01]
 [  2.48329911e-01   7.51670089e-01]
 [  3.20518935e-02   9.67948107e-01]]


Because the probabilities for the two classes sum to 1, exactly one of the classes will
be above 50% certainty. That class is the one that is predicted. 13
You can see in the previous output that the classifier is relatively certain for most
points. How well the uncertainty actually reflects uncertainty in the data depends on
the model and the parameters. A model that is more overfitted tends to make more
certain predictions, even if they might be wrong. A model with less complexity usu‐
ally has more uncertainty in its predictions. A model is called calibrated if the
reported uncertainty actually matches how correct it is—in a calibrated model, a pre‐
diction made with 70% certainty would be correct 70% of the time.

## Uncertainty in Multiclass Classification
The decisionFUnction and predict probabiliies also work for multiclass classification

In [13]:
from sklearn.datasets import load_iris
iris=load_iris()
train_X,val_X,train_y,val_y=train_test_split(iris.data,iris.target,random_state=0)


In [14]:
gbrt=GradientBoostingClassifier(random_state=0)
gbrt.fit(train_X,train_y)
gbrt.decision_function(val_X[:10])

array([[-8.0104411 , -6.98527486,  4.81705717],
       [-8.01819888,  3.77312674, -6.87620465],
       [ 6.24999284, -4.29928465, -6.91535308],
       [-8.01043891, -6.98023452,  4.73731973],
       [ 6.24943072, -5.1093528 , -6.91535386],
       [-8.01043891, -6.98002911,  4.73713144],
       [ 6.24943072, -5.1093528 , -6.91535386],
       [-8.02907841,  4.26196021, -6.37090699],
       [-8.01820178,  4.26196021, -7.50283163],
       [-8.01819888,  3.82570816, -6.92048631]])

In the multiclass case, the decision_function has the shape (n_samples,
n_classes) and each column provides a “certainty score” for each class, where a large
score means that a class is more likely and a small score means the class is less likely.You can recover the predictions from these scores by finding the maximum entry for
each data point:

In [15]:
print("Argmax of decisionFunction:{}".format(np.argmax(gbrt.decision_function(val_X[:10]),axis=1)))
print("Predictions--------------> {}".format(gbrt.predict(val_X)[:10]))

Argmax of decisionFunction:[2 1 0 2 0 2 0 1 1 1]
Predictions--------------> [2 1 0 2 0 2 0 1 1 1]


In [16]:
p=gbrt.predict_proba(val_X)[:10]
pr=gbrt.predict(val_X)[:10]
print(list(zip(pr,p)))

[(2, array([  2.68586307e-06,   7.48700134e-06,   9.99989827e-01])), (1, array([  7.56970160e-06,   9.99968714e-01,   2.37159514e-05])), (0, array([  9.99971873e-01,   2.62116756e-05,   1.91580234e-06])), (2, array([  2.90880070e-06,   8.14940771e-06,   9.99988942e-01])), (0, array([  9.99986417e-01,   1.16664063e-05,   1.91690594e-06])), (2, array([  2.90934842e-06,   8.15261667e-06,   9.99988938e-01])), (0, array([  9.99986417e-01,   1.16664063e-05,   1.91690594e-06])), (1, array([  4.59258586e-06,   9.99971298e-01,   2.41097097e-05])), (1, array([  4.64288596e-06,   9.99987584e-01,   7.77338949e-06])), (1, array([  7.18197780e-06,   9.99971291e-01,   2.15265570e-05]))]


Recovring predictions from probablilities

In [17]:
print("Argmax of predicted probabilities:\n{}".format(np.argmax(gbrt.predict_proba(val_X), axis=1)))
print("Predictions:\n{}".format(gbrt.predict(val_X)))

Argmax of predicted probabilities:
[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0
 2]
Predictions:
[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0
 2]


In [18]:
from sklearn.linear_model import  LogisticRegression
named_target=iris.target_names[train_y]
logrig=LogisticRegression(max_iter=1000)
logrig.fit(train_X,named_target)
print("Classes:{}".format(logrig.classes_))
print("Predictions:{}".format(logrig.predict(val_X)[:10]))
print("Decision FUnction:\n{}".format(logrig.decision_function(val_X)[:10]))
argmax_dec_func = np.argmax(logrig.decision_function(val_X), axis=1)
print("argmax of decision function: {}".format(argmax_dec_func[:10]))
#https://scikit-learn.org/stable/developers/develop.html#Specific_models 
#and see also intro to ml page 127
print('Predictions:{}'.format(logrig.classes_[argmax_dec_func][:10]))

Classes:['setosa' 'versicolor' 'virginica']
Predictions:['virginica' 'versicolor' 'setosa' 'virginica' 'setosa' 'virginica'
 'setosa' 'versicolor' 'versicolor' 'versicolor']
Decision FUnction:
[[-5.05581224  1.11481883  3.94099342]
 [-1.69801813  2.63320413 -0.935186  ]
 [ 7.06426802  2.91978845 -9.98405647]
 [-7.79851933  2.03462574  5.76389359]
 [ 6.3586709   2.87287553 -9.23154643]
 [-7.03663254  0.96167231  6.07496023]
 [ 6.81126858  2.81547474 -9.62674331]
 [-3.35096083  2.22449021  1.12647062]
 [-3.78163474  2.41444677  1.36718797]
 [-1.52644578  2.29477296 -0.76832718]]
argmax of decision function: [2 1 0 2 0 2 0 1 1 1]
Predictions:['virginica' 'versicolor' 'setosa' 'virginica' 'setosa' 'virginica'
 'setosa' 'versicolor' 'versicolor' 'versicolor']
