Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomly initialising word vectors #32

Closed
Henry-E opened this issue May 2, 2017 · 8 comments
Closed

Randomly initialising word vectors #32

Henry-E opened this issue May 2, 2017 · 8 comments

Comments

@Henry-E
Copy link

@Henry-E Henry-E commented May 2, 2017

There doesn't seem to be the option to initialise word vectors without using pretrained embeddings. There's an option to fill in vectors for tokens missing from the pretrained embeddings with normally distributed values. It would be cool if there was a built in option to initialise embeddings from a uniform distribution without having to specify a word embedding file.

@jekbradbury
Copy link
Collaborator

@jekbradbury jekbradbury commented May 3, 2017

You'd do that in your model class's __init__ by using nn.init or some other weight initializer rather than passing the weight matrix from torchtext.

@Henry-E
Copy link
Author

@Henry-E Henry-E commented May 3, 2017

Thanks for the recommendation, a bit new to this. The initialised vectors would still go into TEXT.vocab.vectors?

@jekbradbury
Copy link
Collaborator

@jekbradbury jekbradbury commented May 3, 2017

No, you don’t need to put them there -- the only place where your embeddings actually need to be is in your model; TEXT.vocab.vectors offers a way to get pretrained vectors corresponding to your vocabulary and then use those to initialize your model's embeddings.

@Henry-E Henry-E closed this May 3, 2017
@jeremy-rutman
Copy link

@jeremy-rutman jeremy-rutman commented Jun 24, 2020

....There's an option to fill in vectors for tokens missing from the pretrained embeddings with normally distributed values....

how do i find that option??

@zhangguanheng66
Copy link
Collaborator

@zhangguanheng66 zhangguanheng66 commented Jun 24, 2020

We have new Vector class in torchtext/experimental/vectors.py. You can build a custom vector. See here https://github.com/pytorch/text/blob/master/torchtext/experimental/vectors.py#L219

@jeremy-rutman
Copy link

@jeremy-rutman jeremy-rutman commented Jun 27, 2020

Is there an example for this somewhere? Ive been looking at the build_vocab docs and can't find any description of its arguments

@jeremy-rutman
Copy link

@jeremy-rutman jeremy-rutman commented Jun 27, 2020

Can I do the following:


    TEXT.build_vocab(train_data) 
    vocab_size = len(TEXT.vocab)
    embedding_vectors = torch.FloatTensor(np.random.rand(vocab_size,embedding_length)
    word_embeddings = nn.Embedding(vocab_size, embedding_length)
    word_embeddings.weight = nn.Parameter(embedding_vectors, requires_grad=True)

if I am after randomly-initialized embeddings ?

@zhangguanheng66
Copy link
Collaborator

@zhangguanheng66 zhangguanheng66 commented Jun 29, 2020

you don't need the last line.

word_embeddings = nn.Embedding(vocab_size, embedding_length)

should set the vector randomly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.