Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 357 lines (274 sloc) 10.785 kB
1ea3887 @moos initial checkin
authored
1 wordpos
2 =======
3
74c0d51 @moos updated README.
authored
4 wordpos is a set of part-of-speech utilities for Node.js using [natural's](http://github.com/NaturalNode/natural) WordNet module.
1ea3887 @moos initial checkin
authored
5
6
a41643a @moos updated readme with 512-word bench results
authored
7 ## Usage
6ef8c8c @moos updated README.
authored
8
9 ```js
5f7a72c @moos updated readme
authored
10 var WordPOS = require('wordpos'),
017c002 @moos added WNdb module to obtain WordNet files offline
authored
11 wordpos = new WordPOS();
518725c @moos added package.json and cleanup
authored
12
6ef8c8c @moos updated README.
authored
13 wordpos.getAdjectives('The angry bear chased the frightened little squirrel.', function(result){
14 console.log(result);
518725c @moos added package.json and cleanup
authored
15 });
6ef8c8c @moos updated README.
authored
16 // [ 'little', 'angry', 'frightened' ]
17
18 wordpos.isAdjective('awesome', function(result){
19 console.log(result);
20 });
75a51be @moos add stopword option, pass array to getX(), lookupX() cb gets lookup word
authored
21 // true 'awesome'
6ef8c8c @moos updated README.
authored
22 ```
518725c @moos added package.json and cleanup
authored
23
24 See `wordpos_spec.js` for full usage.
6ef8c8c @moos updated README.
authored
25
a41643a @moos updated readme with 512-word bench results
authored
26 ## Installation
1ea3887 @moos initial checkin
authored
27
5f7a72c @moos updated readme
authored
28 npm install wordpos
da5188c @moos added bench note
authored
29
2652779 @moos adding CLI, upped version to 0.1.6
authored
30 Note: `wordpos-bench.js` requires a [forked uubench](https://github.com/moos/uubench) module. To use the CLI (see below), it is recommended to installed globally with -g option.
017c002 @moos added WNdb module to obtain WordNet files offline
authored
31
5f7a72c @moos updated readme
authored
32 To run spec:
518725c @moos added package.json and cleanup
authored
33
5f7a72c @moos updated readme
authored
34 npm install jasmine-node -g
35 jasmine-node wordpos_spec.js --verbose
a41643a @moos updated readme with 512-word bench results
authored
36 jasmine-node validate_spec.js --verbose
518725c @moos added package.json and cleanup
authored
37
da5188c @moos added bench note
authored
38
a41643a @moos updated readme with 512-word bench results
authored
39 ## API
40411b7 @moos updated README. Added bench text file
authored
40
8c3ec4e @moos update readme, cleaned spec.
authored
41 Please note: all API are async since the underlying WordNet library is async. WordPOS is a subclass of natural's [WordNet class](https://github.com/NaturalNode/natural#wordnet) and inherits all its methods.
40411b7 @moos updated README. Added bench text file
authored
42
43
d16a506 @moos added return value for getX(), expose parse() and WNdb object.
authored
44 ### getX()...
40411b7 @moos updated README. Added bench text file
authored
45
518725c @moos added package.json and cleanup
authored
46 Get POS from text.
40411b7 @moos updated README. Added bench text file
authored
47
518725c @moos added package.json and cleanup
authored
48 ```
75a51be @moos add stopword option, pass array to getX(), lookupX() cb gets lookup word
authored
49 wordpos.getPOS(text, callback) -- callback receives a result object:
518725c @moos added package.json and cleanup
authored
50 {
75a51be @moos add stopword option, pass array to getX(), lookupX() cb gets lookup word
authored
51 nouns:[], Array of text words that are nouns
52 verbs:[], Array of text words that are verbs
53 adjectives:[], Array of text words that are adjectives
54 adverbs:[], Array of text words that are adverbs
55 rest:[] Array of text words that are not in dict or could not be categorized as a POS
6ef8c8c @moos updated README.
authored
56 }
518725c @moos added package.json and cleanup
authored
57 Note: a word may appear in multiple POS (eg, 'great' is both a noun and an adjective)
75a51be @moos add stopword option, pass array to getX(), lookupX() cb gets lookup word
authored
58 wordpos.getNouns(text, callback) -- callback receives an array of nouns in text
59 wordpos.getVerbs(text, callback) -- callback receives an array of verbs in text
60 wordpos.getAdjectives(text, callback) -- callback receives an array of adjectives in text
61 wordpos.getAdverbs(text, callback) -- callback receives an array of adverbs in text
40411b7 @moos updated README. Added bench text file
authored
62 ```
518725c @moos added package.json and cleanup
authored
63
8c3ec4e @moos update readme, cleaned spec.
authored
64 If you're only interested in a certain POS (say, adjectives), using the particular getX() is faster
65 than getPOS() which looks up the word in all index files. [stopwords] (https://github.com/NaturalNode/natural/blob/master/lib/natural/util/stopwords.js)
75a51be @moos add stopword option, pass array to getX(), lookupX() cb gets lookup word
authored
66 are stripped out from text before lookup.
40411b7 @moos updated README. Added bench text file
authored
67
75a51be @moos add stopword option, pass array to getX(), lookupX() cb gets lookup word
authored
68 If text is an array, all words are looked-up -- no deduplication, stopword filter or tokenization is applied.
69
70 getX() functions return the number of parsed words that will be looked up (less duplicates and stopwords).
d16a506 @moos added return value for getX(), expose parse() and WNdb object.
authored
71
40411b7 @moos updated README. Added bench text file
authored
72 Example:
73
74 ```js
6ef8c8c @moos updated README.
authored
75 wordpos.getNouns('The angry bear chased the frightened little squirrel.', console.log)
76 // [ 'bear', 'squirrel', 'little', 'chased' ]
40411b7 @moos updated README. Added bench text file
authored
77
74c0d51 @moos updated README.
authored
78 wordpos.getPOS('The angry bear chased the frightened little squirrel.', console.log)
79 // output:
518725c @moos added package.json and cleanup
authored
80 {
74c0d51 @moos updated README.
authored
81 nouns: [ 'bear', 'squirrel', 'little', 'chased' ],
82 verbs: [ 'bear' ],
83 adjectives: [ 'little', 'angry', 'frightened' ],
84 adverbs: [ 'little' ],
85 rest: [ 'the' ]
86 }
87
88 ```
518725c @moos added package.json and cleanup
authored
89 This has no relation to correct grammer of given sentence, where here only 'bear' and 'squirrel'
40411b7 @moos updated README. Added bench text file
authored
90 would be considered nouns. (see http://nltk.googlecode.com/svn/trunk/doc/book/ch08.html#ex-recnominals)
74c0d51 @moos updated README.
authored
91
92 [pos-js](https://github.com/fortnightlabs/pos-js), e.g., shows only 'squirrel' as noun:
93
94 The / DT
95 angry / JJ
96 bear / VB
97 chased / VBN
98 the / DT
99 frightened / VBN
100 little / JJ
101 squirrel / NN
102
40411b7 @moos updated README. Added bench text file
authored
103
d16a506 @moos added return value for getX(), expose parse() and WNdb object.
authored
104 ### isX()...
40411b7 @moos updated README. Added bench text file
authored
105
518725c @moos added package.json and cleanup
authored
106 Determine if a word is a particular POS.
40411b7 @moos updated README. Added bench text file
authored
107
518725c @moos added package.json and cleanup
authored
108 ```
57681c5 @moos updated stopword docs
authored
109 wordpos.isNoun(word, callback) -- callback receives result (true/false) if word is a noun.
6ef8c8c @moos updated README.
authored
110 wordpos.isVerb(word, callback) -- callback receives result (true/false) if word is a verb.
111 wordpos.isAdjective(word, callback) -- callback receives result (true/false) if word is an adjective.
112 wordpos.isAdverb(word, callback) -- callback receives result (true/false) if word is an adverb.
40411b7 @moos updated README. Added bench text file
authored
113 ```
518725c @moos added package.json and cleanup
authored
114
2548161 @moos v0.1.5: added validate spec, new dir structure
authored
115 isX() methods return the looked-up word as the second argument to the callback.
116
40411b7 @moos updated README. Added bench text file
authored
117 Examples:
118
119 ```js
6ef8c8c @moos updated README.
authored
120 wordpos.isVerb('fish', console.log);
2548161 @moos v0.1.5: added validate spec, new dir structure
authored
121 // true 'fish'
6ef8c8c @moos updated README.
authored
122
123 wordpos.isNoun('fish', console.log);
2548161 @moos v0.1.5: added validate spec, new dir structure
authored
124 // true 'fish'
6ef8c8c @moos updated README.
authored
125
126 wordpos.isAdjective('fishy', console.log);
2548161 @moos v0.1.5: added validate spec, new dir structure
authored
127 // true 'fishy'
6ef8c8c @moos updated README.
authored
128
129 wordpos.isAdverb('fishly', console.log);
2548161 @moos v0.1.5: added validate spec, new dir structure
authored
130 // false 'fishly'
40411b7 @moos updated README. Added bench text file
authored
131 ```
518725c @moos added package.json and cleanup
authored
132
d16a506 @moos added return value for getX(), expose parse() and WNdb object.
authored
133 ### lookupX()...
40411b7 @moos updated README. Added bench text file
authored
134
135 These calls are similar to natural's [lookup()](https://github.com/NaturalNode/natural#wordnet) call, except they can be faster if you
136 already know the POS of the word.
137
518725c @moos added package.json and cleanup
authored
138 ```
6ef8c8c @moos updated README.
authored
139 wordpos.lookupNoun(word, callback) -- callback receives array of lookup objects for a noun
140 wordpos.lookupVerb(word, callback) -- callback receives array of lookup objects for a verb
141 wordpos.lookupAdjective(word, callback) -- callback receives array of lookup objects for an adjective
142 wordpos.lookupAdverb(word, callback) -- callback receives array of lookup objects for an adverb
40411b7 @moos updated README. Added bench text file
authored
143 ```
518725c @moos added package.json and cleanup
authored
144
75a51be @moos add stopword option, pass array to getX(), lookupX() cb gets lookup word
authored
145 lookupX() methods return the looked-up word as the second argument to the callback.
146
40411b7 @moos updated README. Added bench text file
authored
147 Example:
148
149 ```js
6ef8c8c @moos updated README.
authored
150 wordpos.lookupAdjective('awesome', console.log);
151 // output:
152 [ { synsetOffset: 1282510,
153 lexFilenum: 0,
154 pos: 's',
155 wCnt: 5,
156 lemma: 'amazing',
157 synonyms: [ 'amazing', 'awe-inspiring', 'awesome', 'awful', 'awing' ],
158 lexId: '0',
159 ptrs: [],
160 gloss: 'inspiring awe or admiration or wonder; "New York is an amazing city"; "the Grand Canyon is an awe-inspiring
161 sight"; "the awesome complexity of the universe"; "this sea, whose gently awful stirrings seem to speak of some hidden s
75a51be @moos add stopword option, pass array to getX(), lookupX() cb gets lookup word
authored
162 oul beneath"- Melville; "Westminster Hall\'s awing majesty, so vast, so high, so silent" ' } ], 'awesome'
40411b7 @moos updated README. Added bench text file
authored
163 ```
164 In this case only one lookup was found. But there could be several.
518725c @moos added package.json and cleanup
authored
165
1ea3887 @moos initial checkin
authored
166
74c0d51 @moos updated README.
authored
167 Or use WordNet's inherited method:
40411b7 @moos updated README. Added bench text file
authored
168
6ef8c8c @moos updated README.
authored
169 ```js
170 wordpos.lookup('great', console.log);
171 // ...
172 ```
d16a506 @moos added return value for getX(), expose parse() and WNdb object.
authored
173
2548161 @moos v0.1.5: added validate spec, new dir structure
authored
174 ### Other methods/properties
d16a506 @moos added return value for getX(), expose parse() and WNdb object.
authored
175
176 ```
a8ae4c3 @moos fixed typo
authored
177 WordPOS.WNdb -- access to the WNdb object
75a51be @moos add stopword option, pass array to getX(), lookupX() cb gets lookup word
authored
178 WordPOS.natural -- access to underlying 'natural' module
d16a506 @moos added return value for getX(), expose parse() and WNdb object.
authored
179 wordpos.parse(str) -- returns tokenized array of words, less duplicates and stopwords. This method is called on all getX() calls internally.
180 ```
75a51be @moos add stopword option, pass array to getX(), lookupX() cb gets lookup word
authored
181 E.g., WordPOS.natural.stopwords is the list of stopwords.
182
d16a506 @moos added return value for getX(), expose parse() and WNdb object.
authored
183
7d5d213 @moos added profile option. bumped to 0.1.2
authored
184 ### Options
185
186 ```js
187 WordPOS.defaults = {
188 /**
2548161 @moos v0.1.5: added validate spec, new dir structure
authored
189 * enable profiling, time in msec returned as last argument in callback
7d5d213 @moos added profile option. bumped to 0.1.2
authored
190 */
6652265 @moos added fastIndex feature. v0.1.4
authored
191 profile: false,
192
193 /**
194 * use fast index if available
195 */
75a51be @moos add stopword option, pass array to getX(), lookupX() cb gets lookup word
authored
196 fastIndex: true,
197
198 /**
4d5acb9 @moos updated stopword docs
authored
199 * if true, exclude standard stopwords.
200 * if array, stopwords to exclude, eg, ['all','of','this',...]
201 * if false, do not filter any stopwords.
75a51be @moos add stopword option, pass array to getX(), lookupX() cb gets lookup word
authored
202 */
203 stopwords: true
7d5d213 @moos added profile option. bumped to 0.1.2
authored
204 };
205 ```
e408762 @moos update readme
authored
206 To override, pass an options hash to the constructor. With the `profile` option, all callbacks receive a second argument that is the execution time in msec of the call.
7d5d213 @moos added profile option. bumped to 0.1.2
authored
207
208 ```js
209 wordpos = new WordPOS({profile: true});
e408762 @moos update readme
authored
210 wordpos.isAdjective('fast', console.log);
2548161 @moos v0.1.5: added validate spec, new dir structure
authored
211 // true 'fast' 29
7d5d213 @moos added profile option. bumped to 0.1.2
authored
212 ```
213
58d95dd @moos adding CLI, upped version to 0.1.6
authored
214 ### Fast Index
a41643a @moos updated readme with 512-word bench results
authored
215
2548161 @moos v0.1.5: added validate spec, new dir structure
authored
216 Version 0.1.4 introduces `fastIndex` option. This uses a secondary index on the index files and is much faster. It is on by default. Secondary index files are generated at install time and placed in the same directory as WNdb.path. Details can be found in tools/stat.js.
6652265 @moos added fastIndex feature. v0.1.4
authored
217
a41643a @moos updated readme with 512-word bench results
authored
218 See blog article [Optimizing WordPos](http://blog.42at.com/optimizing-wordpos).
518725c @moos added package.json and cleanup
authored
219
58d95dd @moos adding CLI, upped version to 0.1.6
authored
220 ## CLI
221
0266552 @moos adding CLI, upped version to 0.1.6
authored
222 Version 0.1.6 introduces the command-line interface (./bin/wordpos-cli.js), available as 'wordpos' if installed globally
223 "npm install wordpos -g", otherwise as 'node_modules/.bin/wordpos' if installed without the -g.
58d95dd @moos adding CLI, upped version to 0.1.6
authored
224
225 ```bash
226 $ wordpos get The angry bear chased the frightened little squirrel
227 # Noun 4:
228 bear
229 chased
230 little
231 squirrel
232
233 # Adjective 3:
234 angry
235 frightened
236 little
237
238 # Verb 1:
239 bear
240
241 # Adverb 1:
242 little
243 ```
244 Just the nouns, brief output:
245 ```bash
246 $ wordpos get --noun -b The angry bear chased the frightened little squirrel
247 bear chased little squirrel
248 ```
249 Just the counts: (nouns, adjectives, verbs, adverbs, total parsed words)
250 ```bash
251 $ wordpos get -c The angry bear chased the frightened little squirrel
252 4 3 1 1 7
253 ```
10984e4 @moos adding CLI, upped version to 0.1.6
authored
254 Just the adjective count: (0, adjectives, 0, 0, total parsed words)
58d95dd @moos adding CLI, upped version to 0.1.6
authored
255 ```bash
256 $ wordpos get --adj -c The angry bear chased the frightened little squirrel
257 0 3 0 0 7
258 ```
259
260 Get definitions:
261 ```bash
262 $ wordpos def git
263 git
264 n: a person who is deemed to be despicable or contemptible; "only a rotter would do that"; "kill the rat"; "throw the bum out"; "you cowardly little pukes!"; "the British call a contemptible persona `git'"
265 ```
266 Get full result object:
267 ```bash
268 $ wordpos def git -f
269 { git:
270 [ { synsetOffset: 10539715,
271 lexFilenum: 18,
272 pos: 'n',
273 wCnt: 0,
274 lemma: 'rotter',
275 synonyms: [],
276 lexId: '0',
277 ptrs: [],
278 gloss: 'a person who is deemed to be despicable or contemptible; "only a rotter would do that
279 "; "kill the rat"; "throw the bum out"; "you cowardly little pukes!"; "the British call a contemptib
280 le person a `git\'" ' } ] }
281 ```
282 As JSON:
283 ```bash
284 $ wordpos def git -j
285 {"git":[{"synsetOffset":10539715,"lexFilenum":18,"pos":"n","wCnt":0,"lemma":"rotter","synonyms":[],"
286 lexId":"0","ptrs":[],"gloss":"a person who is deemed to be despicable or contemptible; \"only a rotter
287 would do that\"; \"kill the rat\"; \"throw the bum out\"; \"you cowardly little pukes!\"; \"the British
288 call a contemptible person a `git'\" "}]}
289 ```
290 Usage:
291 ```bash
292 $ wordpos
293
294 Usage: wordpos-cli.js [options] <command> [word ... | -i <file> | <stdin>]
295
296 Commands:
297
298 get
299 get list of words for particular POS
300
301 def
302 lookup definitions
303
304 parse
305 show parsed words, deduped and less stopwords
306
307 Options:
308
309 -h, --help output usage information
310 -V, --version output the version number
311 -n, --noun Get nouns
312 -a, --adj Get adjectives
313 -v, --verb Get verbs
314 -r, --adv Get adverbs
315 -c, --count count only (noun, adj, verb, adv, total parsed words)
316 -b, --brief brief output (all on one line, no headers)
317 -f, --full full results object
318 -j, --json full results object as JSON
319 -i, --file <file> input file
320 -s, --stopwords include stopwords
321 ```
322
a41643a @moos updated readme with 512-word bench results
authored
323 ## Benchmark
40411b7 @moos updated README. Added bench text file
authored
324
5f7a72c @moos updated readme
authored
325 node wordpos-bench.js
326
40411b7 @moos updated README. Added bench text file
authored
327
7c2a33b @moos updated readme with 512-word bench results
authored
328 512-word corpus (< v0.1.4) :
518725c @moos added package.json and cleanup
authored
329 ```
a41643a @moos updated readme with 512-word bench results
authored
330 getPOS : 0 ops/s { iterations: 1, elapsed: 9039 }
331 getNouns : 0 ops/s { iterations: 1, elapsed: 2347 }
332 getVerbs : 0 ops/s { iterations: 1, elapsed: 2434 }
333 getAdjectives : 1 ops/s { iterations: 1, elapsed: 1698 }
334 getAdverbs : 0 ops/s { iterations: 1, elapsed: 2698 }
335 done in 20359 msecs
518725c @moos added package.json and cleanup
authored
336 ```
40411b7 @moos updated README. Added bench text file
authored
337
7c2a33b @moos updated readme with 512-word bench results
authored
338 512-word corpus (as of v0.1.4, with fastIndex) :
6652265 @moos added fastIndex feature. v0.1.4
authored
339 ```
a41643a @moos updated readme with 512-word bench results
authored
340 getPOS : 18 ops/s { iterations: 1, elapsed: 57 }
341 getNouns : 48 ops/s { iterations: 1, elapsed: 21 }
342 getVerbs : 125 ops/s { iterations: 1, elapsed: 8 }
343 getAdjectives : 111 ops/s { iterations: 1, elapsed: 9 }
344 getAdverbs : 143 ops/s { iterations: 1, elapsed: 7 }
345 done in 1375 msecs
6652265 @moos added fastIndex feature. v0.1.4
authored
346 ```
347
7c2a33b @moos updated readme with 512-word bench results
authored
348 220 words are looked-up (less stopwords and duplicates) on a win7/64-bit/dual-core/3GHz. getPOS() is slowest as it searches through all four index files.
40411b7 @moos updated README. Added bench text file
authored
349
1ea3887 @moos initial checkin
authored
350
351 License
352 -------
353
40411b7 @moos updated README. Added bench text file
authored
354 (The MIT License)
355
1ea3887 @moos initial checkin
authored
356 Copyright (c) 2012, mooster@42at.com
Something went wrong with that request. Please try again.