Skip to content

C#.Net Library for splitting concatenated words, using NLP based on English Wikipedia unigram frequencies.

License

Notifications You must be signed in to change notification settings

nodirmcsd/WordNinjaSharp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word Ninja C#

C#.Net Library for splitting concatenated words, using NLP based on English Wikipedia unigram frequencies.

wordninjasharp-250

This project is repackaging the excellent work from here: http://stackoverflow.com/a/11642687/2449774

This project is C#.Net version of https://github.com/keredson/wordninja

Use this library for slicing damaged data, etc.

Usage

using WordNinjaSharp.App;

var result = WordNinja.Split("thequickbrownfoxjumpsover1978thelazydog");
Console.WriteLine(string.Join(" ", result));

//the quick brown fox jumps over 1978 the lazy dog

Performance

The fastest then all other!!!

var sw = Stopwatch.StartNew();

var res = WordNinja.Split("denythyfatherandrefusethyname");

Console.WriteLine($"{sw.ElapsedMilliseconds} ms");

Console.WriteLine(string.Join(" ", res));

//182 ms
//deny thy father and refuse thy name

Install

dotnet add package WordNinjaSharp 

or

Install-Package WordNinjaSharp

Custom dictionary

var path = @"path/to/your/words/list/gzarchive/or/txtfile";
var res = WordNinja.Split("denythyfatherandrefusethyname", path);

About

C#.Net Library for splitting concatenated words, using NLP based on English Wikipedia unigram frequencies.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages