How do we train this tagger on another dataset? #9

FatemehMashhadi · 2017-07-24T08:12:09Z

And can we use a word2vec-pre-trained word vectors?

guillaumegenthial · 2017-07-24T09:01:59Z

For the dataset, you'll just need to keep the data format described in the README (one line -> one word and its tag, new sentence = empty line). Optionally, you can change the way the class Dataset works.
For the word vectors, for now only GloVe is supported, but you could easily write some code to load the Word2Vec vectors. You would only need to adapt the export_trimmed_glove_vectors functions defined in data_utils.py. This function takes the name of the file of word vectors, the vocab (word -> id) and exports a np array E such that E[id] = the word vector. There should be a lot of code available online to perform such a task. You can have a look at gensim.

FatemehMashhadi closed this as completed Jul 25, 2017

Provide feedback