Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we train this tagger on another dataset? #9

Closed
FatemehMashhadi opened this issue Jul 24, 2017 · 1 comment
Closed

How do we train this tagger on another dataset? #9

FatemehMashhadi opened this issue Jul 24, 2017 · 1 comment

Comments

@FatemehMashhadi
Copy link

And can we use a word2vec-pre-trained word vectors?

@guillaumegenthial
Copy link
Owner

  1. For the dataset, you'll just need to keep the data format described in the README (one line -> one word and its tag, new sentence = empty line). Optionally, you can change the way the class Dataset works.
  2. For the word vectors, for now only GloVe is supported, but you could easily write some code to load the Word2Vec vectors. You would only need to adapt the export_trimmed_glove_vectors functions defined in data_utils.py. This function takes the name of the file of word vectors, the vocab (word -> id) and exports a np array E such that E[id] = the word vector. There should be a lot of code available online to perform such a task. You can have a look at gensim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants