why skip-gram takes context word as input and predict word itself #2579

truythu169 · 2019-08-14T08:15:46Z

Problem description

I'm going to use the python code of skip-gram (sg) in my research but recognize difference between the implementation and the original in Mikolov's paper.
The detail of the difference will be mentioned below.
Please let me know if this difference is intentionally or just a bug.

Steps/code/corpus to reproduce

code in:

gensim/gensim/models/word2vec.py

https://github.com/RaRe-Technologies/gensim/blob/f97d0e793faa57877a2bbedc15c287835463eaa9/gensim/models/word2vec.py#L399-L414

We can see input word is treated as output of NN, while context is embedded by matrix syn0 (vectors matrix)

...

https://github.com/RaRe-Technologies/gensim/blob/f97d0e793faa57877a2bbedc15c287835463eaa9/gensim/models/word2vec.py#L443-L456

as a result, we're going to optimize P( input / context ) while, in the original paper, they tried to optimize P( context / input) in skip-gram architecture.

AMR-KELEG · 2019-08-14T13:10:01Z

@truythu169 I have a feeling you have found a bug but let's wait for a confirmation from the maintainers since this would be a major one conceptually 😅

mpenkov · 2019-09-07T05:23:16Z

@piskvorky WDYT? Does this look like a bug? I'm not familiar with the implementation (TBH, I don't think at this stage anyone is).

@truythu169 Are you able to make a PR? A good start would be a unit test that fails because of the suspected bug.

piskvorky · 2019-09-07T08:42:49Z

No, zero chance there's a bug in the word2vec algo.

@AMR-KELEG The question of context-vs-target direction comes up a lot, check the mailing list. I remember @gojomo answered it repeatedly, although I cannot find his great answers now. @gojomo can you add it to the Gensim FAQ?

gojomo · 2019-09-07T19:14:27Z

I recall answering this a few times, and though I can't find my answers at the moment, it was @piskvorky first at: #300 (comment)

While this has come up a few times – like confusion about the proper handling of averaging/dividing CBOW vectors/gradients – it's still very insider, for people obsessing over the source – I wouldn't assign it a slot in the overall FAQ. Maybe a new "implementation details FAQ"?

mpenkov self-assigned this Sep 7, 2019

mpenkov added the bug Issue described a bug label Sep 7, 2019

piskvorky closed this as completed Sep 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why skip-gram takes context word as input and predict word itself #2579

why skip-gram takes context word as input and predict word itself #2579

truythu169 commented Aug 14, 2019 •

edited

AMR-KELEG commented Aug 14, 2019

mpenkov commented Sep 7, 2019

piskvorky commented Sep 7, 2019 •

edited

gojomo commented Sep 7, 2019

why skip-gram takes context word as input and predict word itself #2579

why skip-gram takes context word as input and predict word itself #2579

Comments

truythu169 commented Aug 14, 2019 • edited

Problem description

Steps/code/corpus to reproduce

AMR-KELEG commented Aug 14, 2019

mpenkov commented Sep 7, 2019

piskvorky commented Sep 7, 2019 • edited

gojomo commented Sep 7, 2019

truythu169 commented Aug 14, 2019 •

edited

piskvorky commented Sep 7, 2019 •

edited