This is an N-gram model implemented by go.
A language model is a probability distribution over sequences of words W
, namely:
According to the chain rule,
We can use Maximum Likelihood Estimation
You can run it directly, as:
go run . -word "中国"
or run it after compilation:
go build .
./go-n-gram -word "中国"
the output is:
The next word is 人, probability is 0.071429
The next word is 扶, probability is 0.055556
The next word is 的, probability is 0.039683
The next word is 社, probability is 0.039683
The next word is ,, probability is 0.039683
......