### Test the accuracy of the trained model on the final test set

> test dataset is downloadable at [this link](https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/language-detection/europarl-test.zip)

In [1]:
import pandas as pd
import numpy as np

In [2]:
def normalize_text(row):
    """make text exmaple """
    label = '__label__' + str(row['lang'])
    txt = str(row['text'])
    
    return ' '.join(( label + ' , ' + txt ).split())

In [3]:
# first lets load the test dataset
test = pd.read_csv('data/europarl.test', sep='\t', names=['lang', 'text'])

# next lets normalize the text in the test dataset so it conform with `fastText` format
test['normalized'] = test.apply(lambda row: normalize_text(row), axis=1)

# finally lets shuffle the examples and save the final test dataset
test = test.reindex(np.random.permutation(test.index)).reset_index(drop=True)
np.savetxt('data/europarl_normalized.test', test['normalized'].values, fmt='%s')

Next, lets load the trained model and test its accuracy on the test dataset that we just prepared

In [6]:
%%bash
MODEL=model/europarl.bin
TEST=data/europarl_normalized.test

./fastText/fasttext test $MODEL $TEST

P@1: 0.981
Number of examples: 20828


Yay! 98.1% accuracy on the test dataset, while on the validation set we scored 98.8% that makes our model perfectly "good" enough to use in production :) 

> Big thanks to David Tedaldi for reminding me about this :)