# Lodbrok model evaluation
Lodbrok is a neural network using the Keras library which can detect MusicBrainz editors that purely create spam. Often the editors' intent is to improve SEO for other websites.

Lodbrok runs on a pre-processed dataset as described in the `dataset_generation.ipynb` notebook.

## Network layout
![network layout](lodbrok.png)
Lodbrok receives four different inputs which are sub-arrays of the pre-processed input datum.
The website and email inputs have respectively been tokenized to their top 1024 entries and are embedded into 256-dimensional vectors. Meanwhile the user biography input is just reshaped into one 512-dimensional vector, as it is already quasi-embedded.

All three inputs are then passed into LSTMs where the bio-LSTM has an output twice as large as the others.

The outputs of the LSTMS are then concatenated with the other inputs (area set, non-zero privs, bio length, etc.) and passed into a stack of two fully-connected layers with 64 neurons and 50% dropout each.

The output layer consists of two neurons that represent the classification confidence for each category (spam and non-spam) and are activated using softmax so that their sum will always be one.

## Evaluation on different datasets

In [2]:
import sys
sys.path.append("..")

import keras
from utils.evaluation import evaluate, print_stats

ModuleNotFoundError: No module named 'keras'

### Accuracy on evaluation dataset
500 spam non-spam entries each which the model has never seen before.

In [3]:
eval = evaluate("snapshots/lodbrok-2020-04-03T00/00/53.701711.h5py", "spambrainz_dataset.pickle")
print_stats(eval)

NameError: name 'evaluate' is not defined

### Accuracy on entire spam dataset
About 150.000 spam editors, 5.000 of which the model has been trained on.

In [3]:
spam = evaluate("snapshots/lodbrok-2018-08-08T17:10:00.403928.h5py", "../SENSITIVE/spambrainz_dataset_spam.pickle")
print_stats(spam)

Accuracy: 0.999, T1 error: 0.000, T2 error: 0.001

Confidence: Avg. 0.997, Min. 0.000, Max. 1.000, Standard dev. 0.034


### Accuracy on entire non-spam dataset
Around 8.000 editors, 5.000 of which the model has been trained on.

In [4]:
nonspam = evaluate("snapshots/lodbrok-2018-08-08T17:10:00.403928.h5py", "../SENSITIVE/spambrainz_dataset_nonspam.pickle")
print_stats(nonspam)

Accuracy: 0.998, T1 error: 0.002, T2 error: 0.000

Confidence: Avg. 0.998, Min. 0.003, Max. 1.000, Standard dev. 0.037


In summary the Lodbrok model achieves a very high spam detection rate while simultaneously maintaining a low false positive rate. Data falsely classified by the model should be further examined to determine whether it really is part of the right dataset or whether there is a deficit of a certain type of data.

In the authors opinion, Lodbrok is ready for usage in production.