A pure C# implementation of YAKE (Yet Another Keyword Extractor) — an unsupervised, lightweight, and language-agnostic keyword extraction algorithm for single documents.
Lower score = more relevant keyword.
- ✅ No external dependencies — 100% pure C#
- ✅ Unsupervised — no training data or models needed
- ✅ Language-agnostic — works with any language when paired with custom stopwords
- ✅ N-gram support — extracts single words and multi-word keyphrases
- ✅ Built-in English stopwords
- ✅ Deduplication of near-duplicate candidates
- ✅ Fully configurable via
YakeOptions - ✅ .NET 10
dotnet add package Yake.NETusing Yake.NET;
var extractor = new YakeExtractor();
var keywords = extractor.Extract("Your document text goes here...");
foreach (var kw in keywords)
Console.WriteLine(kw); // e.g. "keyword extraction (score: 0.001234)"var extractor = new YakeExtractor(new YakeOptions
{
MaxNGramSize = 3, // include up to trigrams
TopN = 10, // return top 10 keywords
DeduplicationThreshold = 0.9, // similarity threshold for dedup (0–1)
WindowSize = 2, // co-occurrence window
Stopwords = null // null = use built-in English list
});// Extend the built-in English stopwords
var customStopwords = StopwordsEn.Words.Concat(["custom", "words"]);
var extractor = new YakeExtractor(new YakeOptions
{
Stopwords = customStopwords
});YAKE scores each candidate keyword using 5 statistical features:
| Feature | Description |
|---|---|
| T_Casing | Ratio of capitalised (non-sentence-start) occurrences |
| T_Position | Inverse log of the sentence where the word first appears |
| T_Frequency | Normalised term frequency |
| T_Relatedness | Diversity of co-occurring words relative to frequency |
| T_DifferentSentence | Fraction of sentences containing the word |
The final score formula for n-grams:
score(candidate) = ∏(word_scores) / (n × (n + Σ word_scores))
| Member | Description |
|---|---|
YakeExtractor() |
Creates extractor with default English options |
YakeExtractor(YakeOptions) |
Creates extractor with custom options |
IReadOnlyList<KeywordResult> Extract(string text) |
Extracts keywords from text |
| Property | Type | Description |
|---|---|---|
Keyword |
string |
The extracted keyword or keyphrase |
Score |
double |
YAKE score (lower = more relevant) |
| Property | Default | Description |
|---|---|---|
MaxNGramSize |
3 |
Maximum n-gram length |
TopN |
10 |
Number of keywords to return |
DeduplicationThreshold |
0.9 |
Similarity cutoff for deduplication |
WindowSize |
2 |
Co-occurrence context window size |
Stopwords |
null |
Custom stopwords (null = English built-ins) |
- Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., & Jatowt, A. (2020). YAKE! Keyword extraction from single documents using multiple local features. Information Sciences, 509, 257–289.
MIT