Skip to content

mmaakh/parsedcorpus7

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
INTRO:
    Thanks to Dr. Grigori Sidorov et al. [1] for sharing their evaluation
    corpus of 7 authors.

    This repository provides a parsed version of that corpus. As the process is
    computationally expensive, I found sharing it useful.

    For inquiries email m [ta] khonji [tod] org.

    [1] http://www.cic.ipn.mx/~sidorov/


PARSER:
    stanford-parser-full-2014-10-31


JAVA:
    oracle-jre-bin-1.8.0.25


COMMANDS:
    'java' was executd with -xm14000m as some books contained very long
    sentences. To be specific, below is the modified version of Stanford's
    lexparser.sh script that we used to parse the books.

    #!/usr/bin/env bash
    #
    # Runs the English PCFG parser on one or more files, printing trees only
    
    if [ ! $# -ge 1 ]; then
      echo Usage: `basename $0` 'file(s)'
      echo
      exit
    fi
    
    scriptdir=`dirname $0`
    
    java -mx14000m -cp "$scriptdir/*:" edu.stanford.nlp.parser.lexparser.LexicalizedParser \
     -outputFormat "penn,typedDependencies" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz $*

About

A parsed version of the 7-authors corpus in http://www.cic.ipn.mx/~sidorov/

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published