C#.Net Library for splitting concatenated words, using NLP based on English Wikipedia unigram frequencies.
This project is repackaging the excellent work from here: http://stackoverflow.com/a/11642687/2449774
This project is C#.Net version of https://github.com/keredson/wordninja
Use this library for slicing damaged data, etc.
using WordNinjaSharp.App;
var result = WordNinja.Split("thequickbrownfoxjumpsover1978thelazydog");
Console.WriteLine(string.Join(" ", result));
//the quick brown fox jumps over 1978 the lazy dog
The fastest then all other!!!
var sw = Stopwatch.StartNew();
var res = WordNinja.Split("denythyfatherandrefusethyname");
Console.WriteLine($"{sw.ElapsedMilliseconds} ms");
Console.WriteLine(string.Join(" ", res));
//182 ms
//deny thy father and refuse thy name
dotnet add package WordNinjaSharp
or
Install-Package WordNinjaSharp
var path = @"path/to/your/words/list/gzarchive/or/txtfile";
var res = WordNinja.Split("denythyfatherandrefusethyname", path);