Nyaa torrent classifier. Uses torrent description text to predict its category.
Example input.
Output here is Literature - English-translated.
Possible outputs.
Data located at:
nyaaCategorizer_database repo
Model 1
trainModel.zip, nyaaCategorizer.ipynb
accuracy: 0.5989, cr accuracy: 0.28
Model 2
trainModel_balanced.zip, nyaaCategorizer_balanced.ipynb
accuracy: 0.1159, cr accuracy: 0.03
- balanced classes (class_weights)
Model 3
trainedModel_balanced_nosub_extra.zip, nyaaCategorizer_balanced_nosub_extra.ipynb
accuracy: 0.987, cr accuracy: 0.57
- balanced classes (class_weights)
- only look at the main category (subcategories converted)
- extra
extra = {
- shuffle the dataset at the beginning, //irrelevant
- removed fileAmount feature,
- removed more100File feature,
- increased epochs from 5 to 10,
- decreased batch size from 100 to 64,
- removed Dropout layers,
- learning_rate=0.0001
}
- shuffle the dataset at the beginning, //irrelevant
Model 4
trainedModel_balanced_nosub_extra_maincat.zip, nyaaCategorizer_balanced_nosub_maincats_extra.ipynb
accuracy=0.9919, cr accuracy: 0.56
- balanced classes (class_weights)
- only look at the main category (subcategories converted)
- discard all data that is 'Software' or 'Pictures' since there isnt a lot of it
- extra /<br>
- use only one mid layer instead of 3
- removed "from_logits=True"
- changed accuracy to categorical_accuracy
versions = {
- above + removed Software and Pictures data
- added back fileAmount = same
- only one mid layer (instead of 3) -- best performance (f1: 0.59)
- removed "from_logits=True", changed accuracy to categorical_accuracy - same as always
}
- above + removed Software and Pictures data
Model 5
trainedModel_final.zip, nyaaCategorizer_final.ipnyb
accuracy: 0.9907, cr accuracy: 0.58
- no class balancing
- only look at the main category (subcategories converted)
- discard all data that is 'Software' or 'Pictures' since there isnt a lot of it
- back to 3 Dense layers
- added metrics
Model 6
trainedModel_LSTM.zip, nyaaCategorizer_lstm.ipynb
accuracy: 0.9860, cr accuracy: 0.57
- LSTM with 2 dense layers of 128 and 64 nodes
Model 7
nyaaCategorizer_final_allcats.zip, nyaaCategorizer_final_allcats.ipynb
accuracy: 0.9189, cr accuracy: 0.27
- same as final, but on all categories, with class balancing
Model 8
nyaaCategorizer_final_allmaincats.zip, nyaaCategorizer_final_allmaincats.ipnyb
accuracy: 0.9903, cr accuracy: 0.57
- same as final, but on all main categories
Lograithm of sorted file sizes:
Model1 evaluation:
Model1 classification report:
Model1 confusion matrix:
Refer to /images for other figures.