Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
C++ Ruby C Perl Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
SS MaxEnt¤òÍÑ¤¤¤¿Sentence Splitter * How to use 1) make 2) ./geniass arg1 arg2 arg1 is a target file to split. arg2 is an output file name. You need to run geniass in the directory which has EventExtracter.rb, Classifying2Splitting.rb, model1-1.0. If you want to get stand-off format file, please run 3) ruby sentence2standOff.rb arg1 arg2 arg3 arg1 and arg2 are same with 2). arg3 is an output stand-off file name. ------------ SS MaxEnt This is a simple C++ class library for maximum entropy classifiers. If you are familiar with C++ and STL, you will easily understand how to use the library by having a look at the sample code. The main features of this library are: - fast parameter estimation using the BLMVM algorithm (Benson and More, 2001) - smoothing with Gausian prior (Chen and Rosenfeld, 1999) - modelling with inequality constraints (Kazama and Tsujii, 2003) - saving/loading the model to/from a file - can integrate the model data into your source code. * How to use 1) make - if you encounter errors with hash, try commenting out #define USE_HASH_MAP in "maxent.h". 2) ./a.out 3) see sample.cpp and maxent.h * Tips 1) If you have many samples for training, use a portion of the data as held-out data to see if overfitting is happening or not. ex.) model.set_heldout(1000); 2) If you see overfitting, try one of the followings: - feature cut-off ex.) model.train(3); - Gausian prior ex.) model.train(0, 1000, 0); - inequality constrains ex.) model.train(0, 0, 1.0); * I like the third one because it produces a compact model and gives equally good performance with gausian prior. 3) If you want to integrate the generated model file into your code, see model2c.cpp. * References  Jun'ichi Kazama and Jun'ichi Tsujii, Evaluation and Extension of Maximum Entropy Models with Inequality Constraints, In the Proceedings of EMNLP 2003, pp. 137-144.  Steven J. Benson and Jorge J. More, A Limited-Memory Variable-Metric Method for Bound-Constrained Minimization, Preprint ANL/MCS-P909-0901 http://www-unix.mcs.anl.gov/~benson/blmvm/  Stanley F. Chen and Ronald Rosenfeld, A Gaussian Prior for Smoothing Maximum Entropy Models, Technical Report CMU-CS-99-108, Computer Science Department, Carnegie Mellon University, 1999. * History 2005 Jul. 8 version 1.2.2 - initial public release 2005 Sep. 13 version 1.3 - requires less memory in training 2005 Sep. 13 version 1.3.1 - update README 2005 Oct. 28 version 1.3.2 - fix for overflow (thanks to Ming Li) ------------------------------------------------------------------------- Yoshimasa Tsuruoka (email@example.com)