Structured Perceptron and Viterbi Based Part of Speech Tagger
Trains a Structured Perceptron Linear Classifier to tag parts of speech using the Viterbi algorithm for decoding. The assignment code has been cleaned up and streamlined to facilitate reading and usage. This means the complete solution to the assignment is not here, just what I deemed the most relevant part for sharing.
Modifications to Instructor Implementations
local_emission_features: Added suffix features
train: Implemented inner loop, core of the training algorithm. Instructor code just a skeleton.
Implementations I provided
To train a tagger with 10 iterations of structured perceptron, using viterbi:
baseline.py checks the accuracy of assuming every word has the same tag. To check this baseline:
# Import from structperc import train # Reads tagging files in the format of oct27.train and oct27.dev import read_tagging_file # Train with averaging on the oct27.train data, evaluating with oct27.dev data train(read_tagging_file('oct27.train'), do_averaging=True, devdata=read_tagging_file('oct27.dev'))