Skip to content

Project of my master's degree in Computer Science ("Study and Research in Anti-Spam Systems") - Weka (CLI) approach.

License

Notifications You must be signed in to change notification settings

marcelovca90-unifei/anti-spam-weka-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

anti-spam-weka-cli Build Status codecov

Project of my master's degree in Computer Science ("Study and Research in Anti-Spam Systems").

For instructions on how to clone, build and run the project, please refer to this guide.


Machine learning library:

Data sets information:

  • There are five data sets - Ling Spam, Spam Assassin, TREC (2005, 2006 and 2007) and Unifei (2017 and 2018) - available here. Each was pre-processed with three feature extraction methods (CHI2, FD and MI) and eight different feature vector sizes (8, 16, 32, 64, 128, 256, 512 and 1024).

Classification methods:

  • A1DE - Averaged 1-Dependence Estimator
  • A2DE - Averaged 2-Dependence Estimator
  • ADTREE - Alternating Decision Trees
  • BFTREE - Best-first tree
  • CART - Classification And Regression Trees
  • DTNB - Decision Table/Naive Bayes Hybrid Classifier
  • FURIA - Fuzzy Unordered Rule Induction Algorithm
  • FRF - Fast Random Forest
  • HP - Hyper Pipes Classifier
  • HT - Hoeffding tree (VFDT)
  • IBK - K-Nearest Neighbours Classifier
  • J48 - C4.5 Decision Tree
  • J48C - C4.5 Consolidated Decision Tree
  • J48G - C4.5 Grafted Decision Tree
  • JRIP - Repeated Incremental Pruning to Produce Error Reduction
  • LIBLINEAR - Large Linear Classifier
  • LIBSVM - Support Vector Machine
  • LMT - Logistic Model Trees
  • MLP-BFGS - Multilayer Perceptron (custom, multi-thread, trained with BFGS)
  • MLP-BPROP - Multilayer Perceptron (stock, single-thread, trained with Backpropagation)
  • NB - Naive Bayes classifier
  • NBTREE - Decision Tree with Naive Bayes Classifiers at the leaves
  • RBF - Radial Basis Function network
  • RANDTREE - Random Tree
  • REPTREE - Reduced-Error Pruning Tree
  • SGD - Stochastic Gradient Descent
  • SMO - Sequential Minimal Optimization Algorithm
  • SPEGASOS - Stochastic Primal Estimated sub-GrAdient SOlver for SVM
  • VP - Voted Perceptron
  • WRF - Weka Random Forest
  • ZERO-RULE - Zero Rule Algorithm
  • SLP-H - Single Layer Perceptron (Hebbian Learning) from wekaclassalgos
  • SLP_WH - Widrow-Hoff Learning from wekaclassalgos
  • MLP-BP - Multilayer Perceptron (Back Propagation) from wekaclassalgos
  • MLP-BDBP - Multilayer Perceptron (Bold Driver Back Propagation - Vogl's Method) from wekaclassalgos
  • WDL4J - WekaDeeplearning4J: Deep Learning using Weka

Metrics:

This code also supports t-Distributed Stochastic Neighbor Embedding (t-SNE) to generate bidimensional plots of the data sets. For more information, please refer to the author's page.

About

Project of my master's degree in Computer Science ("Study and Research in Anti-Spam Systems") - Weka (CLI) approach.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published