# How many wrong predictions on my submission?

This is a way to find out **how many of my submitted predictions are wrong** using the evaluation obtained with a submission.  
I'm quite new in the use of machine learning techniques, so probably there was already a function in every ML library to do this, but I wasn't able to find it in sklearn.

## Log Loss
Submissions for the Spooky Author Identification Challange are evaluated using multi-class lagaritmic loss which has the following formula:

$$log loss = -\frac{1}{N}\sum_{i=1}^N\sum_{j=1}^My_{ij}\log(p_{ij}),$$

$$y_{ij} = 1\ if\ observation\ i\ belongs\ to\ class\ j\ and\ 0\ otherwise$$
$$p_{ij} = my\ predicted\ probability\ that\ observation\ i\ belongs\ to\ class\ j$$
$$M = number\ of\ classes$$
$$N = number\ of\ observations$$


Moreover, to avoid undetermined values, all predicted probabilities are clipped to a maximum of 1-epsilon and a minimum of epsilon, with epsilon=1e-15.  

## The Idea
We can make an "extremization" process over our predictions setting our probabilities directly to the maximum and minimum clipping values.  
Example: [0.99199, 0.00811, 0.0000] -> [1-epsilon, epsilon, epsilon] with epsilon = 1e-15  
  
Submitting predictions in this form will give us the best possible result if all our predictions are right but will also penalize us a lot for each single wrong prediction. As an example, a submission that scores 0.44097 will score 3.93369 if "extremized" and resubmitted.  

The useful thing of this process is that when submitting these extreme predictions we know exactly how much a wrong prediction contributes to our score, so we can easily find the number of wrong predictions.

## Calculating the number of wrong predictions

In logistic loss both our right predictions and wrong predictions contribute to the loss score in reason of their distance from the real thing. (i.e. we also get penalized if we make a right prediction with 0.9 probability instead of 1).  
  
When submitting extremized predictions we have fixed contributions from right and wrong assignments as follows:  
  
$$log1 = log(1 - 1e-15)\ for\ right\ predictions$$  
  
$$log2 = log(1e-15)\ for\ right\ predictions$$  
  
Having determined these quantities we can now express our logloss score as follows:  

$$ logloss= -\frac{1}{N}[(N-w)log1 + wlog2] $$
  
and so we have
  
$$w = N  \frac{log1-logloss}{log2-log1} $$

where w is the number of wrong predictions.

## Let's put it in code!

In [None]:
import pandas as pd
import numpy as np

### First thing, we need to "extremize" our predictions as described above.

In [None]:
def boolToExtreme(b):
    """Service function for predToExtremeValues"""
    if b:
        return np.float128(1 - np.float128(1e-15))
    return np.float128(1e-15)

def predToExtremeValues(predictions):
    """Get the predictions in the form required for submission
    as a pandas DataFrame and edit the probability to be exactly
    the extreme values 10^15 and 1-10^15"""
    
    predictions = predictions.apply(lambda el: [p==el.max() for p in el], axis=1)
    predictions = predictions.applymap(boolToExtreme)
    return predictions

### Then we build our calculating function

In [None]:
def howManyWrong(logloss, N):
    """Returns the estimate number of wrong predictions based on the
    Spooky Author Challenge evaluations system. logloss is the score,
    N is the number of observations"""
    
    log1 = np.float128(np.log(np.float128(1 - 1e-15)))
    log2 = np.float128(np.log(np.float128(1e-15)))
    
    w = N * ((log1 - logloss) / (log2 - log1))
    return w


## How to use it
So here is the scenario I have in mind:  
suppose we have a prediction csv files as required by the challange and you want to know how many wrong predictions there were. Following these steps you can find out:

1. Load your predictions as a pandas dataframe;
2. "Extremize" them using ext_pred = predToExtremeValues(your_predictions)
3. Feed your "extremized" predictions to kaggle and get your "extremized" evaluation
4. call howManyWrong(logloss, n_of_observations) to get the number of wrong predictions

### Test
The following code is a test using the training data so that we know the true labels for our predictions

In [None]:
# For test purposes let's use sklearn's log_loss evaluator
from sklearn.metrics import log_loss

dataset = pd.read_csv('../input/train.csv')

def yToInt(author):
    if author == "EAP":
        return 0
    elif author == "HPL":
        return 1
    else:
        return 2

# True labels from the training set
y_true = [yToInt(a) for a in dataset.author]

# Fake predictions for test purposes (always predict EAP)
fakePred = pd.DataFrame([["id1212", 1, 0, 0]]*len(y_true), columns=("id", "EAP", "HPL", "MWS"))

# Fake prediction values: leave out the id column
fpv = fakePred[["EAP", "HPL", "MWS"]]

# Count the actual number of wrong predictions
aw = 0
for i in range(len(y_true)):
    if not fpv.loc[i][y_true[i]] == 1:
        aw += 1

print("Actual wrong predictions:", aw)

# logloss estimate
logloss = log_loss(y_true, fpv.as_matrix(), eps=1e-15)
print("Logloss:", logloss)
print("Calculated number of wrong predictions:", howManyWrong(logloss, len(y_true)))
print("Accuracy:", (len(y_true)-howManyWrong(logloss, len(y_true)))/len(y_true))

### Actual application

In [None]:
# On one submission I had a score of 0.44097
# Submitting the same predictions in the "extremized" form
# I got the following logloss value
logloss = 3.93369

# Number of predictions in the submission:
N = 8392

w = howManyWrong(logloss, N)
print("Calculated number of wrong predictions:", w)
print("Accuracy:", (N-w)/N)
