Skip to content

remusao/NGram.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NGram

Linear interpolation

This implementation uses the linear interpolation to build the model. For example, with a simple trigram model

p("book" | "the", "green") = count("the green book") / count("the green")

But there are some limitations

  • We need a bigger corpus to efficiently train a trigram model compared to bigram or unigram
  • Count(trigram) is often equal to zero
  • With bigram or unigram we don't capture as much information

The idea is then to combine the results of trigram with bigram and unigram. We can generalize by saying that to compute ngram, we also use the results of (n-1)gram, ..., bigram, unigram. Here is an exemple in the case of a trigram model.

p("book" | "the", "green") = a * count("the green book") / count("the green")
                          +  b * count("the green") / count("the")
                          +  c * count("the") / count()
    where
        a + b + c = 1
        a >= 0
        b >= 0
        c >= 0

# For example: a = b = c = 1 / 3

Example

using NGram

texts = String["the green book", "my blue book", "his green house", "book"]

# Train a trigram model on the documents
model = NGramModel(texts, 3)

# Query on the model
# p(book | the, green)
model["the green book"]

About

Implement the NGram model in julia

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages