In [1]:
@file:DependsOn("com.londogard:nlp:1.2.0-BETA2")

First thing to do is import the required dependencies and then we'll fetch some word frequencies to compare.

In [2]:
import com.londogard.nlp.wordfreq.WordFrequencies
import com.londogard.nlp.utils.LanguageSupport

val hej = WordFrequencies.wordFrequency("hej", LanguageSupport.sv)
val och = WordFrequencies.wordFrequency("och", LanguageSupport.sv)

"WordFrequency of 'hej'=$hej and 'och'=$och"

WordFrequency of 'hej'=2.9512093E-4 and 'och'=0.025118863

What about using the Zipf Frequency instead?

In [3]:
val hej = WordFrequencies.zipfFrequency("hej", LanguageSupport.sv)
val och = WordFrequencies.zipfFrequency("och", LanguageSupport.sv)

"ZipfFrequency of 'hej'=$hej and 'och'=$och"

ZipfFrequency of 'hej'=5.4700003 and 'och'=7.4

Non existing words are handled to default to 0, using `OrNull` suffix you'll retrieve a null instead

In [4]:
val weird = WordFrequencies.wordFrequency("hraihaodjasmdiamo", LanguageSupport.sv)
val weirdOrNull = WordFrequencies.wordFrequencyOrNull("hraihaodjasmdiamo", LanguageSupport.sv)

"WordFrequency of 'hraihaodjasmdiamo' (non-word) using `wordFrequency` $weird and using `wordFrequencyOrNull` $weirdOrNull"

WordFrequency of 'hraihaodjasmdiamo' (non-word) using `wordFrequency` 0.0 and using `wordFrequencyOrNull` null

Trying to get frequencies from a `LanguageSupport` that is not supported will result in a failure and exception thrown.

In [None]:
runCatching { WordFrequencies.wordFrequency("hello", LanguageSupport.af) }.isFailure // WordFrequencies does not support af

Again, using `OrNull` suffix will help in returning `null` rather than throwing.

In [None]:
WordFrequencies.wordFrequencyOrNull("hello", LanguageSupport.af) == null

Finally it's possible to retrieve all Word Frequencies for a language, this will return a `Map<String, Float>`

In [6]:
WordFrequencies.wordFrequencyOrNull("hello", LanguageSupport.af) == null

true

Finally it's possible to retrieve all Word Frequencies for a language, this will return a `Map<String, Float>`

In [7]:
WordFrequencies.getAllWordFrequenciesOrNull(LanguageSupport.sv)?.entries?.take(3)

[är=0.037153527, det=0.031622775, att=0.026302677]