WordVector

Read all text files under a directory, and calculate the TF-IDF scores of each term in each file, then write the result to a output file, you can also handle the result as you want.

Usage

        String indexPath = "./index"; // path to store the lucene index files
        String docsPath = "path/to/your/files/directory";
        String outFilePath = "./tf_idf_output.txt";
        String encoding = "UTF-8"; // encoding for your files, maybe ISO-8859-1 or UTF-8
        
        WordVector wordVector = new WordVector(indexPath,docsPath,true,encoding);
        // get the result
        Map<String, HashMap> documentsScores = wordVector.TFIDFScore();
        // write to file
        Utils.write2DMapToFile(outFilePath,documentsScores);

You can also run it with mvn from the command line use some args:

Run command mvn compile in the path ./WordVector/ where pom.xml stores there.

$ mvn compile

Run command mvn exec:java -Dexec.mainClass="GenTFIDF" to see the usage info.

$ mvn exec:java -Dexec.mainClass="GenTFIDF"

Run GenTFIDF with command-line arguments like this:

$ mvn exec:java -Dexec.mainClass="GenTFIDF" -Dexec.args="-docs DOCS_PATH [-o OUTPUT_FILE] [-e TEXT_FILE_ENCODING]"

Check your output file to see the tf-idf scores of each term in each document, the terms in each document have been sorted in descending order, so you can find the most important terms to this document in the collection or corpus.

Contributing

Fork it!
Create your feature branch: git checkout -b my-new-feature
Commit your changes: git commit -am 'Add some feature'
Push to the branch: git push origin my-new-feature
Submit a pull request :D

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
target		target
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WordVector

Usage

Contributing

About

Releases

Packages

Languages

hengyicai/WordVector

Folders and files

Latest commit

History

Repository files navigation

WordVector

Usage

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages