Skip to content

Helius01/Yake.NET

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Yake.NET

A pure C# implementation of YAKE (Yet Another Keyword Extractor) — an unsupervised, lightweight, and language-agnostic keyword extraction algorithm for single documents.

Lower score = more relevant keyword.


Features

  • ✅ No external dependencies — 100% pure C#
  • ✅ Unsupervised — no training data or models needed
  • ✅ Language-agnostic — works with any language when paired with custom stopwords
  • ✅ N-gram support — extracts single words and multi-word keyphrases
  • ✅ Built-in English stopwords
  • ✅ Deduplication of near-duplicate candidates
  • ✅ Fully configurable via YakeOptions
  • ✅ .NET 10

Installation

dotnet add package Yake.NET

Quick Start

using Yake.NET;

var extractor = new YakeExtractor();
var keywords = extractor.Extract("Your document text goes here...");

foreach (var kw in keywords)
    Console.WriteLine(kw); // e.g. "keyword extraction (score: 0.001234)"

Configuration

var extractor = new YakeExtractor(new YakeOptions
{
    MaxNGramSize          = 3,    // include up to trigrams
    TopN                  = 10,   // return top 10 keywords
    DeduplicationThreshold = 0.9, // similarity threshold for dedup (0–1)
    WindowSize            = 2,    // co-occurrence window
    Stopwords             = null  // null = use built-in English list
});

Custom Stopwords

// Extend the built-in English stopwords
var customStopwords = StopwordsEn.Words.Concat(["custom", "words"]);

var extractor = new YakeExtractor(new YakeOptions
{
    Stopwords = customStopwords
});

How YAKE Works

YAKE scores each candidate keyword using 5 statistical features:

Feature Description
T_Casing Ratio of capitalised (non-sentence-start) occurrences
T_Position Inverse log of the sentence where the word first appears
T_Frequency Normalised term frequency
T_Relatedness Diversity of co-occurring words relative to frequency
T_DifferentSentence Fraction of sentences containing the word

The final score formula for n-grams:

score(candidate) = ∏(word_scores) / (n × (n + Σ word_scores))

API Reference

YakeExtractor

Member Description
YakeExtractor() Creates extractor with default English options
YakeExtractor(YakeOptions) Creates extractor with custom options
IReadOnlyList<KeywordResult> Extract(string text) Extracts keywords from text

KeywordResult

Property Type Description
Keyword string The extracted keyword or keyphrase
Score double YAKE score (lower = more relevant)

YakeOptions

Property Default Description
MaxNGramSize 3 Maximum n-gram length
TopN 10 Number of keywords to return
DeduplicationThreshold 0.9 Similarity cutoff for deduplication
WindowSize 2 Co-occurrence context window size
Stopwords null Custom stopwords (null = English built-ins)

References

  • Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., & Jatowt, A. (2020). YAKE! Keyword extraction from single documents using multiple local features. Information Sciences, 509, 257–289.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages