Part of Speech Tagger

The tagger uses Hidden Markov Model to encode the a language corpus with words tagged with corresponding tags. Uses Viterbi algorithm to decode and tag sentences from test data.

The encoder is generic and it works for ANY language.

The encoder models the corpus and writes the probabilities into hmmmodel.txt The decoder consumes the model and tags the test data and writes the output into hmmoutput.txt

Accuracy for the model trained on given corpa

English - 88.93%
Chinese - 87.08%
Hindi - 92.34%

These accuracies are obtained using a single generic encoder for 3 different languages.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
additional_corpa		additional_corpa
eval		eval
submission		submission
submission_2		submission_2
support_files		support_files
.gitignore		.gitignore
README.md		README.md
en_dev_raw.txt		en_dev_raw.txt
en_dev_tagged.txt		en_dev_tagged.txt
en_train_tagged.txt		en_train_tagged.txt
hi_test_raw.txt		hi_test_raw.txt
hi_test_tagged.txt		hi_test_tagged.txt
hi_train_dev_tagged.txt		hi_train_dev_tagged.txt
hmmdecode.py		hmmdecode.py
hmmlearn.py		hmmlearn.py
hmmmodel.txt		hmmmodel.txt
hmmoutput.txt		hmmoutput.txt
readme-en-zh.txt		readme-en-zh.txt
test.py		test.py
utils.py		utils.py
zh_dev_raw.txt		zh_dev_raw.txt
zh_dev_tagged.txt		zh_dev_tagged.txt
zh_train_tagged.txt		zh_train_tagged.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Part of Speech Tagger

Accuracy for the model trained on given corpa

About

Releases

Packages

Languages

gsriram7/POS

Folders and files

Latest commit

History

Repository files navigation

Part of Speech Tagger

Accuracy for the model trained on given corpa

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages