Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
A parsed version of the 7-authors corpus in http://www.cic.ipn.mx/~sidorov/
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Type||Name||Latest commit message||Commit time|
|Failed to load latest commit information.|
INTRO: Thanks to Dr. Grigori Sidorov et al.  for sharing their evaluation corpus of 7 authors. This repository provides a parsed version of that corpus. As the process is computationally expensive, I found sharing it useful. For inquiries email m [ta] khonji [tod] org.  http://www.cic.ipn.mx/~sidorov/ PARSER: stanford-parser-full-2014-10-31 JAVA: oracle-jre-bin-184.108.40.206 COMMANDS: 'java' was executd with -xm14000m as some books contained very long sentences. To be specific, below is the modified version of Stanford's lexparser.sh script that we used to parse the books. #!/usr/bin/env bash # # Runs the English PCFG parser on one or more files, printing trees only if [ ! $# -ge 1 ]; then echo Usage: `basename $0` 'file(s)' echo exit fi scriptdir=`dirname $0` java -mx14000m -cp "$scriptdir/*:" edu.stanford.nlp.parser.lexparser.LexicalizedParser \ -outputFormat "penn,typedDependencies" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz $*