corenlp

Clojure wrapper for Stanford CoreNLP tools. Currently very incomplete, only wraps the tokenizer, parser and dependencies.

Usage

TODO: document functionality apart from convenience functions

Tokenization

(use 'corenlp)
(def text "This is a simple sentence.")
(tokenize text)

Part-of-Speech Tagging

(use 'corenlp)
(pos-tag "Colorless green ideas sleep furiously.")
;; => ("JJ" "JJ" "NNS" "VBP" "RB" ".")

Lemmatization

(use 'corenlp)
(lemmatize "The men have four cars.")
;; => ("the" "man" "have" "four" "car" ".")

Named-Entity Recognition

(use 'corenlp)
(named-entities "Barack Obama is the president of the United States.")
;; => ("PERSON" "PERSON" "O" "O" "O" "O" "O" "LOCATION" "LOCATION" "O")

Parsing

To parse a sentence:

(use 'corenlp)
(parse (tokenize text))

You will get back a LabeledScoredTreeNode which you can plug in to other Stanford CoreNLP functions or can convert to a standard Treebank string with:

(str (parse (tokenize text)))

Stanford Dependencies

(dependency-graph "I like cheese.")

will parse the sentence and return the dependency graph as a loom graph, which you can then traverse with standard graph algorithms like shortest path, etc. You can also view it:

(def graph (dependency-graph "I like cheese."))
(use 'loom.io)
(view graph)

This requires GraphViz to be installed.

License

Distributed under the Eclipse Public License, the same as Clojure.

Contributors

Cory Giles
Hans Engel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

corenlp

Usage

Tokenization

Part-of-Speech Tagging

Lemmatization

Named-Entity Recognition

Parsing

Stanford Dependencies

License

Contributors

Files

README.md

Latest commit

History

README.md

File metadata and controls

corenlp

Usage

Tokenization

Part-of-Speech Tagging

Lemmatization

Named-Entity Recognition

Parsing

Stanford Dependencies

License

Contributors