Skip to content

murawaki/nbseg-perl

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
lib
 
 
 
 
 
 
 
 
 
 
 
 

Obsolete Perl code for non-parametric Bayesian text segmentation

This repository contains Perl code I no longer use, including

  • a group of Dirichlet/Pitman-Yor processes,
  • a character-bigram-based zerogram word model, and
  • unigram/bigram word models with token-based, block and type-based sampling.

Requirements

The following CPAN modules are required:

  • Math::GSL
  • Math::Cephes
  • Regexp::Assemble
  • Carp::Assert

Run a sample script

% perl -Ilib scripts/sample-token.pl --seed=1 --type=Dirichlet --input=samples/alice.unseg --iter=100 --nested --debug --randInit=0.1

About

non-parametric Bayesian text segmenter implemented in Perl

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages