ewk is a probabilistic webpage classification library that uses enlive selectors and clojure functions as feature definitions.
Classification is done using clj-ml (a wrapper around the Weka
library) and currently supports only logistic regression. This library
makes it simple to specify feature sets and training data, and simple to run classification on html.
This project is in a ‘proof-of-concept’ stage.
To load a sample data file and run cross-validation on it run:
./run ../data/amazon