Permalink
Browse files

Moved README and put a new one in its place with a little explanatory…

… blurb.
  • Loading branch information...
1 parent 1460c23 commit 5f8a0524cea61acae13a3ef6eca6466a62681c23 @jcwilk committed with John Wilkinson Jun 18, 2010
Showing with 132 additions and 115 deletions.
  1. +9 −115 README
  2. +123 −0 README.orig
View
@@ -1,123 +1,17 @@
-= Stanford Natural Language Parser Wrapper
+This is an upload of Bill McNeal's stanfordparser rubygem, check it out at its homepage (seems to be partially in french)
-This module is a wrapper for the {Stanford Natural Language Parser}[http://nlp.stanford.edu/downloads/lex-parser.shtml].
+http://rubyforge.org/projects/stanfordparser/
-The Stanford Natural Language Parser is a Java implementation of a probabilistic PCFG and dependency parser for English, German, Chinese, and Arabic. This module provides a thin wrapper around the Java code to make it accessible from Ruby along with pure Ruby objects that enable standoff parsing.
+or its rdocs
+http://stanfordparser.rubyforge.org/
-= Installation and Configuration
+I'll probably be extending this in alternate branches as I use it to prod at the stanford parser, I've decided to leave master totally stock except for this README explaining it, unless I come up with stuff that I think would compliment Bill's work and would be useful to others.
-In addition to the Ruby gems it requires, to run this module you must manually install the {Stanford Natural Language Parser}[http://nlp.stanford.edu/downloads/lex-parser.shtml].
+AFAIK there aren't other copies of this on github, please correct me if I'm mistaken. The only similar one I can see is http://github.com/tiendung/ruby-nlp which has much less code and I can only assume to be something else.
-This module expects the parser to be installed in the <tt>/usr/local/stanford-parser/current</tt> directory on UNIX platforms and in the <tt>C:\stanford-parser\current</tt> directory on Windows platforms. This is the directory that contains the <tt>stanford-parser.jar</tt> file. When the module is loaded, it adds this directory to the Java classpath and launches the Java VM with the arguments <tt>-server -Xmx150m</tt>.
+See README.orig for Bill's readme.
-These defaults can be overridden by creating the configuration file <tt>/etc/ruby_stanford_parser.yaml</tt> on UNIX platforms and <tt>C:\stanford-parser\ruby-stanford-parser.yaml</tt> on Windows platforms. This file is in the Ruby YAML[http://ruby-doc.org/stdlib/libdoc/yaml/rdoc/index.html] format, and may contain two values: <tt>root</tt> and <tt>jvmargs</tt>. For example, the file might look like the following:
- root: /usr/local/stanford-parser/other/location
- jvmargs: -Xmx100m -verbose
-
-
-=Tokenization and Parsing
-
-Use the StanfordParser::DocumentPreprocessor class to tokenize text and files into sentences and words.
-
- >> require "stanfordparser"
- => true
- >> preproc = StanfordParser::DocumentPreprocessor.new
- => <DocumentPreprocessor>
- >> puts preproc.getSentencesFromString("This is a sentence. So is this.")
- This is a sentence .
- So is this .
-
-Use the StanfordParser::LexicalizedParser class to parse sentences.
-
- >> parser = StanfordParser::LexicalizedParser.new
- Loading parser from serialized file /usr/local/stanford-parser/current/englishPCFG.ser.gz ... done [5.5 sec].
- => edu.stanford.nlp.parser.lexparser.LexicalizedParser
- >> puts parser.apply("This is a sentence.")
- (ROOT
- (S [24.917]
- (NP [6.139] (DT [2.300] This))
- (VP [17.636] (VBZ [0.144] is)
- (NP [12.299] (DT [1.419] a) (NN [8.897] sentence)))
- (. [0.002] .)))
-
-For complete details about the use of these classes, see the documentation on the Stanford Natural Language Parser website.
-
-
-=Standoff Tokenization and Parsing
-
-This module also contains support for standoff tokenization and parsing, in which the terminal nodes of parse trees contain information about the text that was used to generate them.
-
-Use StanfordParser::StandoffDocumentPreprocessor class to tokenize text and files into sentences and words.
-
- >> preproc = StanfordParser::StandoffDocumentPreprocessor.new
- => <StandoffDocumentPreprocessor>
- >> s = preproc.getSentencesFromString("This is a sentence. So is this.")
- => [This is a sentence., So is this.]
-
-The standoff preprocessor returns StanfordParser::StandoffToken objects, which contain character offsets into the original text along with information about spacing characters that came before and after the token.
-
- >> puts s
- This [0,4]
- is [5,7]
- a [8,9]
- sentence [10,18]
- . [18,19]
- So [21,23]
- is [24,26]
- this [27,31]
- . [31,32]
- >> "This is a sentence. So is this."[27..31]
- => "this."
-
-This is the same information contained in the <tt>edu.stanford.nlp.ling.FeatureLabel</tt> class in the Stanford Parser Java implementation.
-
-Similarly, use the StanfordParser::StandoffParsedText object to parse a block of text into StanfordParser::StandoffNode parse trees whose terminal nodes are StanfordParser::StandoffToken objects.
-
- >> t = StanfordParser::StandoffParsedText.new("This is a sentence. So is this.")
- Loading parser from serialized file /usr/local/stanford-parser/current/englishPCFG.ser.gz ... done [4.9 sec].
- => <StanfordParser::StandoffParsedText, 2 sentences>
- >> puts t.first
- (ROOT
- (S
- (NP (DT This [0,4]))
- (VP (VBZ is [5,7])
- (NP (DT a [8,9]) (NN sentence [10,18])))
- (. . [18,19])))
-
-Standoff parse trees can reproduce the text from which they were generated verbatim.
-
- >> t.first.to_original_string
- => "This is a sentence. "
-
-They can also reproduce the original text with brackets inserted around the yields of specified parse nodes.
-
- >> t.first.to_bracketed_string([[0,0,0], [0,1,1]])
- => "[This] is [a sentence]. "
-
-The format of the coordinates used to specify individual nodes is described in the documentation for the Ruby Treebank[http://rubyforge.org/projects/treebank/] gem.
-
-See the documentation of the individual classes in this module for more details.
-
-Unlike their parents StanfordParser::DocumentPreprocessor and StanfordParser::LexicalizedParser, which produce Ruby wrappers around Java objects, StanfordParser::StandoffDocumentPreprocessor and StanfordParser::StandoffParsedText produce pure Ruby objects. This is to facilitate serialization of these objects using tools like the Marshal module, which cannot serialize Java objects.
-
-= History
-
-1.0.0:: Initial release
-1.1.0:: Make module initialization function private. Add example code.
-1.2.0:: Read Java VM arguments from the configuration file. Add Word class.
-2.0.0:: Add support for standoff parsing. Change the way Rjb::JavaObjectWrapper wraps returned values: see wrap_java_object for details. Rjb::JavaObjectWrapper supports static members. Minor changes to stanford-sentence-parser script.
-2.1.0:: Different default paths for Windows machines; Minor changes to StandoffToken definition
-2.2.0:: Add parent information to StandoffNode
-
-= Copyright
-
-Copyright 2007-2008, William Patrick McNeill
-
-This program is distributed under the GNU General Public License.
-
-
-= Author
-
-W.P. McNeill mailto:billmcn@gmail.com
+Sincerely,
+John Wilkinson
View
@@ -0,0 +1,123 @@
+= Stanford Natural Language Parser Wrapper
+
+This module is a wrapper for the {Stanford Natural Language Parser}[http://nlp.stanford.edu/downloads/lex-parser.shtml].
+
+The Stanford Natural Language Parser is a Java implementation of a probabilistic PCFG and dependency parser for English, German, Chinese, and Arabic. This module provides a thin wrapper around the Java code to make it accessible from Ruby along with pure Ruby objects that enable standoff parsing.
+
+
+= Installation and Configuration
+
+In addition to the Ruby gems it requires, to run this module you must manually install the {Stanford Natural Language Parser}[http://nlp.stanford.edu/downloads/lex-parser.shtml].
+
+This module expects the parser to be installed in the <tt>/usr/local/stanford-parser/current</tt> directory on UNIX platforms and in the <tt>C:\stanford-parser\current</tt> directory on Windows platforms. This is the directory that contains the <tt>stanford-parser.jar</tt> file. When the module is loaded, it adds this directory to the Java classpath and launches the Java VM with the arguments <tt>-server -Xmx150m</tt>.
+
+These defaults can be overridden by creating the configuration file <tt>/etc/ruby_stanford_parser.yaml</tt> on UNIX platforms and <tt>C:\stanford-parser\ruby-stanford-parser.yaml</tt> on Windows platforms. This file is in the Ruby YAML[http://ruby-doc.org/stdlib/libdoc/yaml/rdoc/index.html] format, and may contain two values: <tt>root</tt> and <tt>jvmargs</tt>. For example, the file might look like the following:
+
+ root: /usr/local/stanford-parser/other/location
+ jvmargs: -Xmx100m -verbose
+
+
+=Tokenization and Parsing
+
+Use the StanfordParser::DocumentPreprocessor class to tokenize text and files into sentences and words.
+
+ >> require "stanfordparser"
+ => true
+ >> preproc = StanfordParser::DocumentPreprocessor.new
+ => <DocumentPreprocessor>
+ >> puts preproc.getSentencesFromString("This is a sentence. So is this.")
+ This is a sentence .
+ So is this .
+
+Use the StanfordParser::LexicalizedParser class to parse sentences.
+
+ >> parser = StanfordParser::LexicalizedParser.new
+ Loading parser from serialized file /usr/local/stanford-parser/current/englishPCFG.ser.gz ... done [5.5 sec].
+ => edu.stanford.nlp.parser.lexparser.LexicalizedParser
+ >> puts parser.apply("This is a sentence.")
+ (ROOT
+ (S [24.917]
+ (NP [6.139] (DT [2.300] This))
+ (VP [17.636] (VBZ [0.144] is)
+ (NP [12.299] (DT [1.419] a) (NN [8.897] sentence)))
+ (. [0.002] .)))
+
+For complete details about the use of these classes, see the documentation on the Stanford Natural Language Parser website.
+
+
+=Standoff Tokenization and Parsing
+
+This module also contains support for standoff tokenization and parsing, in which the terminal nodes of parse trees contain information about the text that was used to generate them.
+
+Use StanfordParser::StandoffDocumentPreprocessor class to tokenize text and files into sentences and words.
+
+ >> preproc = StanfordParser::StandoffDocumentPreprocessor.new
+ => <StandoffDocumentPreprocessor>
+ >> s = preproc.getSentencesFromString("This is a sentence. So is this.")
+ => [This is a sentence., So is this.]
+
+The standoff preprocessor returns StanfordParser::StandoffToken objects, which contain character offsets into the original text along with information about spacing characters that came before and after the token.
+
+ >> puts s
+ This [0,4]
+ is [5,7]
+ a [8,9]
+ sentence [10,18]
+ . [18,19]
+ So [21,23]
+ is [24,26]
+ this [27,31]
+ . [31,32]
+ >> "This is a sentence. So is this."[27..31]
+ => "this."
+
+This is the same information contained in the <tt>edu.stanford.nlp.ling.FeatureLabel</tt> class in the Stanford Parser Java implementation.
+
+Similarly, use the StanfordParser::StandoffParsedText object to parse a block of text into StanfordParser::StandoffNode parse trees whose terminal nodes are StanfordParser::StandoffToken objects.
+
+ >> t = StanfordParser::StandoffParsedText.new("This is a sentence. So is this.")
+ Loading parser from serialized file /usr/local/stanford-parser/current/englishPCFG.ser.gz ... done [4.9 sec].
+ => <StanfordParser::StandoffParsedText, 2 sentences>
+ >> puts t.first
+ (ROOT
+ (S
+ (NP (DT This [0,4]))
+ (VP (VBZ is [5,7])
+ (NP (DT a [8,9]) (NN sentence [10,18])))
+ (. . [18,19])))
+
+Standoff parse trees can reproduce the text from which they were generated verbatim.
+
+ >> t.first.to_original_string
+ => "This is a sentence. "
+
+They can also reproduce the original text with brackets inserted around the yields of specified parse nodes.
+
+ >> t.first.to_bracketed_string([[0,0,0], [0,1,1]])
+ => "[This] is [a sentence]. "
+
+The format of the coordinates used to specify individual nodes is described in the documentation for the Ruby Treebank[http://rubyforge.org/projects/treebank/] gem.
+
+See the documentation of the individual classes in this module for more details.
+
+Unlike their parents StanfordParser::DocumentPreprocessor and StanfordParser::LexicalizedParser, which produce Ruby wrappers around Java objects, StanfordParser::StandoffDocumentPreprocessor and StanfordParser::StandoffParsedText produce pure Ruby objects. This is to facilitate serialization of these objects using tools like the Marshal module, which cannot serialize Java objects.
+
+= History
+
+1.0.0:: Initial release
+1.1.0:: Make module initialization function private. Add example code.
+1.2.0:: Read Java VM arguments from the configuration file. Add Word class.
+2.0.0:: Add support for standoff parsing. Change the way Rjb::JavaObjectWrapper wraps returned values: see wrap_java_object for details. Rjb::JavaObjectWrapper supports static members. Minor changes to stanford-sentence-parser script.
+2.1.0:: Different default paths for Windows machines; Minor changes to StandoffToken definition
+2.2.0:: Add parent information to StandoffNode
+
+= Copyright
+
+Copyright 2007-2008, William Patrick McNeill
+
+This program is distributed under the GNU General Public License.
+
+
+= Author
+
+W.P. McNeill mailto:billmcn@gmail.com

0 comments on commit 5f8a052

Please sign in to comment.