Permalink
Browse files

Added readme.

  • Loading branch information...
1 parent 1a23fe0 commit a915e1535141573cab7fabac838c33044d95cc61 @schmmd schmmd committed May 23, 2012
Showing with 77 additions and 0 deletions.
  1. +77 −0 README.md
View
@@ -0,0 +1,77 @@
+# UW NLPTools
+
+This is a collection of natural language processing tools wrapped behind common
+interfaces. The client can easily use the wrapped libraries through an
+elegant scala interface. It's also a breeze to switch implementations since
+all implementations of a particular tool extend a common trait.
+
+This toolkit also aims to minimize the size of transitive dependencies. Each
+tool is broken into its own component so you can choose what you want to use.
+Each component contains the requisite modules but no more, saving you from
+searching for models but also protecting you from a dependencies with hundreds
+of megabytes of transitive dependencies. If you want to avoid the default
+models, that's OK too. They are a transitive dependency of the tool so you can
+exclude them with a maven exclusion.
+
+Licensing can be a nightmare. Each tool is split into its own component with
+the most permissive license allowable by the dependencies. Licenses are all
+clearly stated in the LICENSE file of the subcomponent.
+
+The largest NLP components are OpenNLP toolkit (Apache 2.0) and Stanford
+CoreNLP (GPL 2.0).
+
+The interfaces are defined in the `core` component.
+
+## Components
+
+### Stemmers
+
+* Morpha
+* Snowball
+ * Porter
+ * Porter2
+ * Lovins
+
+### Tokenizers
+
+* OpenNLP
+* Stanford
+
+### Part-of-Speech (POS) Taggers
+
+* OpenNLP
+* Stanford
+
+### Chunkers
+
+* OpenNLP
+
+### Constituency Parsers
+
+* OpenNLP
+* Stanford
+
+### Dependency Parsers
+
+* MaltParser
+* Stanford
+
+### Coreference
+
+* Stanford
+
+### Sentence Segmentation (sentencer)
+
+* OpenNLP
+* Piao
+
+## Usage
+
+Each component is usable through a java interface as well as on the command
+line. For example:
+
+ `echo 'You are a fool to believe that!' | mvn exec:java -Dexec.mainClass=edu.washington.cs.knowitall.tool.parse.StanfordParse`
+
+## Contributors
+
+* Michael Schmitz <http://www.schmitztech.com/>

0 comments on commit a915e15

Please sign in to comment.