Skip to content

ivanakcheurov/ntextcat

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
June 2, 2020 16:13
src
May 6, 2020 13:23
January 2, 2017 02:22

ntextcat

Appveyor NuGet License
Build status NuGet Usage License

Why NTextCat?

  • NTextCat helps to recognize (identify) the language of a given text (e.g. read a sentence and say it is Italian).
  • NTextCat can also be used for text classification (e.g. read a paragraph and say it belongs to Sports category).

Try it out yourself: ONLINE DEMO. Recommended input: a snippet of text with at least 5 words (though it works quite OK with just a couple of words).

How to use

NTextCat supports .NET Standard 2.0. Just install the NTextCat NuGet package:

dotnet add package NTextCat

Then we can use NTextCat to detect the language of a text.

using NTextCat;
...
// Don't forget to deploy a language profile (e.g. Core14.profile.xml) with your application.
// (take a look at "content" folder inside of NTextCat nupkg and here: https://github.com/ivanakcheurov/ntextcat/tree/master/src/LanguageModels).
var factory = new RankedLanguageIdentifierFactory();
var identifier = factory.Load("Core14.profile.xml"); // can be an absolute or relative path. Beware of 260 chars limitation of the path length in Windows. Linux allows 4096 chars.
var languages = identifier.Identify("your text to get its language identified");
var mostCertainLanguage = languages.FirstOrDefault();
if (mostCertainLanguage != null)  
    Console.WriteLine("The language of the text is '{0}' (ISO639-3 code)", mostCertainLanguage.Item1.Iso639_3);  
else 
    Console.WriteLine("The language couldn’t be identified with an acceptable degree of certainty");

// outputs: The language of the text is 'eng' (ISO639-3 code)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published