Trigram HMM

For both the English and Chinese part-of-speech tagging problem, I designed, implemented and tuned a Trigram HMM tagger.

English

For the English part, I used the Penn Treebank Wall Street Journal corpus.

WSJ_02-21.pos: the training file
WSJ_24.pos: the development file
WSJ_23.words and WSJ_23.pos: the test files

To run the program: python English_trigram.py

To evaluate the result: python score.py WSJ_23.pos english_output.txt. It should return an accuracy of 96.53%.

Chinese

For the Chinese part, I used the Penn Chinese Treebank. I preprocessed the data into the following tree parts:

chinese_training.txt
chinese_dev_pos.txt
chinese_test_words.txt and chinese_test_pos.txt

To run the program: python Chinese_trigram.py

To evaluate the result: python score.py chinese_test_pos.txt chinese_output.txt. It should return an accuracy of 90.84%.

*A final report on the project is also included here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chinese_trigram.py

Chinese_trigram.py

English_trigram.py

English_trigram.py

README.md

README.md

WSJ_02-21.pos

WSJ_02-21.pos

WSJ_23.pos

WSJ_23.pos

WSJ_23.words

WSJ_23.words

WSJ_24.pos

WSJ_24.pos

chinese_dev_pos.txt

chinese_dev_pos.txt

chinese_test_pos.txt

chinese_test_pos.txt

chinese_test_words.txt

chinese_test_words.txt

chinese_training.txt

chinese_training.txt

final_report.pdf

final_report.pdf

score.py

score.py

Repository files navigation

Trigram HMM

English

Chinese

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Chinese_trigram.py		Chinese_trigram.py
English_trigram.py		English_trigram.py
README.md		README.md
WSJ_02-21.pos		WSJ_02-21.pos
WSJ_23.pos		WSJ_23.pos
WSJ_23.words		WSJ_23.words
WSJ_24.pos		WSJ_24.pos
chinese_dev_pos.txt		chinese_dev_pos.txt
chinese_test_pos.txt		chinese_test_pos.txt
chinese_test_words.txt		chinese_test_words.txt
chinese_training.txt		chinese_training.txt
final_report.pdf		final_report.pdf
score.py		score.py

fendouai/trigram-HMM

Folders and files

Latest commit

History

Repository files navigation

Trigram HMM

English

Chinese

About

Resources

Stars

Watchers

Forks

Languages