Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomly initialising word vectors #32

Closed
Henry-E opened this issue May 2, 2017 · 8 comments
Closed

Randomly initialising word vectors #32

Henry-E opened this issue May 2, 2017 · 8 comments

Comments

@Henry-E
Copy link

Henry-E commented May 2, 2017

There doesn't seem to be the option to initialise word vectors without using pretrained embeddings. There's an option to fill in vectors for tokens missing from the pretrained embeddings with normally distributed values. It would be cool if there was a built in option to initialise embeddings from a uniform distribution without having to specify a word embedding file.

@jekbradbury
Copy link
Contributor

You'd do that in your model class's __init__ by using nn.init or some other weight initializer rather than passing the weight matrix from torchtext.

@Henry-E
Copy link
Author

Henry-E commented May 3, 2017

Thanks for the recommendation, a bit new to this. The initialised vectors would still go into TEXT.vocab.vectors?

@jekbradbury
Copy link
Contributor

No, you don’t need to put them there -- the only place where your embeddings actually need to be is in your model; TEXT.vocab.vectors offers a way to get pretrained vectors corresponding to your vocabulary and then use those to initialize your model's embeddings.

@Henry-E Henry-E closed this as completed May 3, 2017
@jeremy-rutman
Copy link

....There's an option to fill in vectors for tokens missing from the pretrained embeddings with normally distributed values....

how do i find that option??

@zhangguanheng66
Copy link
Contributor

zhangguanheng66 commented Jun 24, 2020

We have new Vector class in torchtext/experimental/vectors.py. You can build a custom vector. See here https://github.com/pytorch/text/blob/master/torchtext/experimental/vectors.py#L219

@jeremy-rutman
Copy link

jeremy-rutman commented Jun 27, 2020

Is there an example for this somewhere? Ive been looking at the build_vocab docs and can't find any description of its arguments

@jeremy-rutman
Copy link

jeremy-rutman commented Jun 27, 2020

Can I do the following:


    TEXT.build_vocab(train_data) 
    vocab_size = len(TEXT.vocab)
    embedding_vectors = torch.FloatTensor(np.random.rand(vocab_size,embedding_length)
    word_embeddings = nn.Embedding(vocab_size, embedding_length)
    word_embeddings.weight = nn.Parameter(embedding_vectors, requires_grad=True)

if I am after randomly-initialized embeddings ?

@zhangguanheng66
Copy link
Contributor

you don't need the last line.

word_embeddings = nn.Embedding(vocab_size, embedding_length)

should set the vector randomly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants