Unofficial GENIA Sentence Splitter repository (no official website even exists anymore)
C++ Ruby C Perl Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
Classifying2Splitting.rb
EventExtracter.rb
LICENSE
Makefile
README
blmvm.cpp
blmvm.h
geniass-postproc.pl
makeMapping.C
maxent.cpp
maxent.h
model1-0.0
model1-1.0
model2-0.0
model2-0.5
model2-1.0
model2-1.5
model2-2.0
model2c.cpp
remapStandOff.C
run_geniass.sh
sample.cpp
sentence2standOff.rb

README

SS MaxEnt¤òÍѤ¤¤¿Sentence Splitter

* How to use

1) make
2) ./geniass arg1 arg2

arg1 is a target file to split.
arg2 is an output file name.

You need to run geniass in the directory which has 
EventExtracter.rb, Classifying2Splitting.rb, model1-1.0.

If you want to get stand-off format file,
please run

3) ruby sentence2standOff.rb arg1 arg2 arg3

arg1 and arg2 are same with 2).
arg3 is an output stand-off file name.

------------

SS MaxEnt

This is a simple C++ class library for maximum entropy classifiers.
If you are familiar with C++ and STL, you will easily understand how
to use the library by having a look at the sample code.

The main features of this library are:
 - fast parameter estimation using the BLMVM algorithm (Benson and More, 2001)
 - smoothing with Gausian prior (Chen and Rosenfeld, 1999)
 - modelling with inequality constraints (Kazama and Tsujii, 2003)
 - saving/loading the model to/from a file
 - can integrate the model data into your source code.


* How to use

1) make
     - if you encounter errors with hash, try commenting out
         #define USE_HASH_MAP
       in "maxent.h".
 2) ./a.out
 3) see sample.cpp and maxent.h


* Tips

1) If you have many samples for training, use a portion of the data
    as held-out data to see if overfitting is happening or not.
      ex.) model.set_heldout(1000);

 2) If you see overfitting, try one of the followings:
      - feature cut-off        ex.) model.train(3);
      - Gausian prior          ex.) model.train(0, 1000, 0);
      - inequality constrains  ex.) model.train(0, 0, 1.0);
   * I like the third one because it produces a compact model and
     gives equally good performance with gausian prior.

 3) If you want to integrate the generated model file into your code,
    see model2c.cpp.


* References

[1] Jun'ichi Kazama and Jun'ichi Tsujii, Evaluation and Extension of
    Maximum Entropy Models with Inequality Constraints, In the
    Proceedings of EMNLP 2003, pp. 137-144.

[2] Steven J. Benson and Jorge J. More, A Limited-Memory Variable-Metric
    Method for Bound-Constrained Minimization, Preprint ANL/MCS-P909-0901
    http://www-unix.mcs.anl.gov/~benson/blmvm/

[3] Stanley F. Chen and Ronald Rosenfeld, A Gaussian Prior for Smoothing
    Maximum Entropy Models, Technical Report CMU-CS-99-108, Computer
    Science Department, Carnegie Mellon University, 1999.


* History

2005 Jul. 8  version 1.2.2
 - initial public release

2005 Sep. 13 version 1.3
 - requires less memory in training

2005 Sep. 13 version 1.3.1
 - update README

2005 Oct. 28 version 1.3.2
 - fix for overflow (thanks to Ming Li)

-------------------------------------------------------------------------
Yoshimasa Tsuruoka (tsuruoka@is.s.u-tokyo.ac.jp)