Convert text to an inverted index
JavaScript
Switch branches/tags
Nothing to show
Latest commit 890b240 Jun 27, 2013 @espresse espresse add Polish stopwords
Permalink
Failed to load latest commit information.
bin
examples remove err handling as redundant for sync call Mar 20, 2013
lib add Polish stopwords Jun 27, 2013
test
.gitignore
.npmignore
.travis.yml fix travis config Mar 19, 2013
AUTHORS Updated authors Jun 23, 2013
LICENSE
Makefile initial commit Mar 15, 2013
README.md add command line usage Jun 23, 2013
index.js
package.json

README.md

textiijs

Build Status

Text inverted index generator for node.

  • excludes "stop words"
  • normalize words with snowball-js
  • converts words to lowercase
  • excludes words of length less than a specified value - default: 3
  • reports word's position within a text file counting excluded ones
  • supports section indexing
  • splits text with regexp word separator - default: /\W+/
  • supports text encoding - default: 'utf8'
  • supports multiple languages - currently English (which is default) and Norwegian

Installation

via npm:

$ npm install textiijs

Usage

Neither options nor section given

var textii = require('textiijs'),
    sample_text = "Zero, one and three or five, six, seven... seven...";

var pii = new textii(sample_text);

pii.get(null, function(data) {
  console.log(data);
});

With options and section given

// var textii = require('textiijs'),
var textii = require('../index'),
    sample_text = "Zero, one and three or five, six, seven... seven...",
    options = { "word_separator": /\W+/, "min_word_length": 3, "encoding": "utf8", language: "Norwegian" },
    get_options = { "section": "page1" };

var pii = new textii(sample_text, options);

pii.get(get_options, function(data) {
  console.log(data);
});

Command line usage

npm install textiijs -g

Then you can either pipe in data or provide a filename

echo "hello world" | textiijs  
# or
textiijs text.txt

Tests

$ make test

Coverage report

$ make test-cov