Skip to content
This repository was archived by the owner on Jul 23, 2024. It is now read-only.

mozilla/ssplit-cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ssplit-cpp

This is an approximate reimplementation of the sentence splitter from the Moses toolkit.

  • Currently doesn't support CJK character sets.
  • requires the pcrecpp libraries On Ubuntu, sudo apt-get install libpcre3 and libpcre3-cpp should do the trick

Build instructions

mkdir build
cd build
cmake ..
make -j

This produces an executable ssplit.

Usage

cat <text with one paragraph per line> | ssplit <path to nonbreaking_prefix file>