Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
lib
 
 
 
 
 
 
 
 
 
 
 
 

README.rst

Obsolete Perl code for non-parametric Bayesian text segmentation

This repository contains Perl code I no longer use, including

  • a group of Dirichlet/Pitman-Yor processes,
  • a character-bigram-based zerogram word model, and
  • unigram/bigram word models with token-based, block and type-based sampling.

Requirements

The following CPAN modules are required:

  • Math::GSL
  • Math::Cephes
  • Regexp::Assemble
  • Carp::Assert

Run a sample script

% perl -Ilib scripts/sample-token.pl --seed=1 --type=Dirichlet --input=samples/alice.unseg --iter=100 --nested --debug --randInit=0.1

About

non-parametric Bayesian text segmenter implemented in Perl

Resources

License

Releases

No releases published

Languages

You can’t perform that action at this time.