Skip to content

Document preprocessing for preparing formatted input data which is suitable for LibSVM tool.

Notifications You must be signed in to change notification settings

shirdrn/document-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

Process documents to prepare train/test data for 'libsvm' tool. We are using CHI to select terms as the feature vector, and then using TF-IDF to compute weight values.

How To

Compute data for libsvm tool, include 2 phases: train and test.

  • For train

    Program entrance class: org.shirdrn.document.processor.TrainDocumentProcessorDriver

    Configuration file : config-train.properties

  • For test

    Program entrance class: org.shirdrn.document.processor.TestDocumentProcessorDriver

    Configuration file : config-test.properties

FAQ

  • If you choose to use ICTCLAS Chinese analyzer, be sure to copy file 'NLPIR_JNI.dll' to directory 'C:\Windows\System32' in Win7 operating system(default Win7 64bit, more about ICTCLAS, please hit http://ictclas.nlpir.org/downloads).

Contact

About

Document preprocessing for preparing formatted input data which is suitable for LibSVM tool.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published