issue with fasttext model #10

peter-pogorelov · 2019-09-14T15:24:57Z

The following code throws an error (TypeError: Cannot convert numpy.float32 to numpy.ndarray):

fb = load_facebook_model(path_to_model)
model = SIF(fb, alpha=1e-7, components=1)
model.train([IndexedSentence(s, i) for i, s in enumerate(sentences)])
this line >> model.sv.similar_by_sentence(['документы', 'бухгалтерия'], model=model, indexable=sentences)

However, if we replace the model with vectors, everything seems alright.

ft = KeyedVectors.load_word2vec_format(path_to_vectors)
model = SIF(ft, alpha=1e-7, components=1)
model.train([IndexedSentence(s, i) for i, s in enumerate(sentences)])
model.sv.similar_by_sentence(['документы', 'бухгалтерия'], model=model, indexable=sentences)

This problem is really important since word counts (ft.wv.vocab) from vectors look like they were automatically recovered from vectors using cosine similarity (not sure about that) and they are not the same as from the model.

The text was updated successfully, but these errors were encountered:

oborchers · 2019-09-14T18:28:41Z

Hi, thank you for the issue. I was already contacted and the issue should now be resolved.

Make sure to upgrade to the latest version by pip install -U fse or by building from the master branch, as I've just released 0.1.15.

If the issue persists, please feel free to contact me again.

oborchers closed this as completed Sep 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue with fasttext model #10

issue with fasttext model #10

peter-pogorelov commented Sep 14, 2019

oborchers commented Sep 14, 2019

issue with fasttext model #10

issue with fasttext model #10

Comments

peter-pogorelov commented Sep 14, 2019

oborchers commented Sep 14, 2019