Skip to content

seanghay/khmercut-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

khmercut.rs

A Blazingly Fast Khmer Word Segmentation Tool written in Rust.

Build

cargo build --release
# binary file: ./target/release/khmercut

Usage in CLI

echo "ឃាត់ខ្លួនជនសង្ស័យ០៤នាក់ Hello, world ករណីលួចខ្សែភ្លើង នៅស្រុកព្រៃនប់។" | khmercut -d '|'

# => ឃាត់ខ្លួន|ជនសង្ស័យ|០៤|នាក់| |Hello,| |world| |ករណី|លួច|ខ្សែភ្លើង| |នៅ|ស្រុក|ព្រៃនប់|។|

# with file

khmercut < file.txt

Rust

use std::fs;
use crfs:Model;

fn main() {
    let buf = fs::read("src/crf_ner_10000.crfsuite").unwrap();
    let model = Model::new(&buf).unwrap();
    let input_str = "ឃាត់ខ្លួនជនសង្ស័យ០៤នាក់ Hello, world ករណីលួចខ្សែភ្លើង នៅស្រុកព្រៃនប់។".to_string();
    for token in khmercut::tokenize(&model, &input_str) {
        print!("{}|", token);
    }
}

References

About

A Blazingly Fast Khmer Word Segmentation Tool written in Rust

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages