# 07 Naive Bayes Tutorial
**Q4(b)**  
A ranking classifier is a classifier that can rank a test set in order of confidence for a given classification outcome.  
Naive Bayes is a ranking classifier because the ‘probability’ can be used as a confidence measure for ranking.
1. Train a Naive Bayes classifier from the `AthleteSelection` data. Use `GaussianNB`.
2. Load the test data from `AthleteTest.csv` and apply the classifier. 
3. Use the `predict_proba` method to find the probability of being selected. 
4. Rank the test set by probability of being selected.  
    4.1. Who is most likely to be selected?  
    4.2. Who is least likely?  


In [18]:
import pandas as pd
from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB
from sklearn.metrics import confusion_matrix 

In [19]:
athlete = pd.read_csv('data/AthleteSelection.csv',index_col = 'Athlete')
athlete.head()

Unnamed: 0_level_0,Speed,Agility,Selected
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
x1,2.5,6.0,No
x2,3.75,8.0,No
x3,2.25,5.5,No
x4,3.25,8.25,No
x5,2.75,7.5,No


In [20]:
y = athlete['Selected'].values
X = athlete[['Speed','Agility']].values

In [21]:
gnb = GaussianNB()
bnb = BernoulliNB()
mnb = MultinomialNB()
ath_NB = gnb.fit(X,y)
y_dash = ath_NB.predict(X)

In [23]:
confusion = confusion_matrix(y, y_dash)
print("Confusion matrix:\n{}".format(confusion)) 

Confusion matrix:
[[12  0]
 [ 1  7]]


In [24]:
print(y)
print(y_dash)

['No' 'No' 'No' 'No' 'No' 'No' 'No' 'No' 'No' 'No' 'No' 'No' 'Yes' 'Yes'
 'Yes' 'Yes' 'Yes' 'Yes' 'Yes' 'Yes']
['No' 'No' 'No' 'No' 'No' 'No' 'No' 'No' 'No' 'No' 'No' 'No' 'Yes' 'Yes'
 'No' 'Yes' 'Yes' 'Yes' 'Yes' 'Yes']


## Test Data 

In [28]:
ath_test = pd.read_csv('data/AthleteTest.csv',index_col = 'Athlete')
X_test = ath_test[['Speed','Agility']].values
X_test

array([[3.3, 8.2],
       [4.5, 4.5],
       [5.5, 7.2],
       [3.8, 8.8],
       [5.5, 5.2],
       [8.1, 7.8],
       [7.7, 5.2],
       [6.1, 5.5],
       [5.5, 6. ],
       [6.1, 5.5]])

In [29]:
ath_NB.predict(X_test)


array(['No', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes'],
      dtype='<U3')

In [71]:
probs_X_test = ath_NB.predict_proba(X_test)

for x in range(10):
    print(str("\nAtlete: " + ath_test.index[x] + 
              ". Chance of selection:" + str(probs_X_test[x][1])))

    
# for prob in probs_X_test:
#     print(prob[1])



Atlete: t1. Chance of selection:0.04131362900029045

Atlete: t2. Chance of selection:0.12298278120340804

Atlete: t3. Chance of selection:0.9119328426426143

Atlete: t4. Chance of selection:0.1504776646262887

Atlete: t5. Chance of selection:0.7998328378161583

Atlete: t6. Chance of selection:0.999997356952902

Atlete: t7. Chance of selection:0.9999451907950627

Atlete: t8. Chance of selection:0.9729309178369229

Atlete: t9. Chance of selection:0.854282642557036

Atlete: t10. Chance of selection:0.9729309178369229


**Q4(c)**

When a `GaussianNB` model is trained the model is stored in two parameters `theta_` and `var_`.  
Train a `GaussianNB` model and check to see if these parameters agree with your own estimates. 

Hint: this code will give you the estimated you need. 
`athlete[athlete[‘Selected']=='No']['Agility'].describe()`

You will find these figures do not agree exactly. 

In [79]:
print(ath_NB.theta_)
print(ath_NB.var_)
print(athlete[athlete['Selected']=='No']['Agility'].describe())
print(athlete)

[[3.39583333 5.08333333]
 [6.40625    6.96875   ]]
[[0.80685764 3.99305556]
 [1.37402344 3.91308594]]
count    12.000000
mean      5.083333
std       2.087118
min       2.000000
25%       3.625000
50%       5.125000
75%       6.375000
max       8.250000
Name: Agility, dtype: float64
         Speed  Agility Selected
Athlete                         
x1        2.50     6.00       No
x2        3.75     8.00       No
x3        2.25     5.50       No
x4        3.25     8.25       No
x5        2.75     7.50       No
x6        4.50     5.00       No
x7        3.50     5.25       No
x8        3.00     3.25       No
x9        4.00     4.00       No
x10       4.25     3.75       No
x11       2.00     2.00       No
x12       5.00     2.50       No
x13       8.25     8.50      Yes
x14       5.75     8.75      Yes
x15       4.75     6.25      Yes
x16       5.50     6.75      Yes
x17       5.25     9.50      Yes
x18       7.00     4.25      Yes
x19       7.50     8.00      Yes
x20       7.25     3.75