Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
data/pan_train
results
src
.gitignore
README.md

README.md

attribution

Source folder -

pth.py : holds the path to training and test files. The git only stores tokenized dictionaries. Not the entire xml files given in the website.

pan_util.py: contains utilities to mine the xml train files. for example - given a conversation id get all the conversations

pan_alg.py: creates a self designed feature set with 4 parameters. these parameters were crossvalidated and hard coded in it. These features are then used to train a naive bayes classifier.

Results folder -

     The best result so far is 93 % true positive
                               25 % false positive


     Scope for improvement - making the features taking into account the frequency when the bayes classifier is being constructed.

For anydoubt contact elango at kth dot se

Something went wrong with that request. Please try again.