How fasttext predicts the words no in the train set? #475

bringtree · 2018-04-05T02:03:51Z

python3

import fastText as fasttext
import os
import numpy as np
pwd = os.getcwd()
model_bin = "/home/bringtree/data/wiki.zh.bin"
model_vec = "/home/bringtree/data/wiki.zh.vec"
model = fasttext.load_model(model_bin)
word_1 = model.get_word_vector('asdhasjhdkajshd')
print(word_1[:20])

[-0.10704836 -0.5085796  -0.05533567 -0.45416433  0.36912176 -0.04111901
 -0.3435909  -0.13083233  0.07110099 -0.23444724  0.26429185  0.31326798
  0.20615076 -0.23127083 -0.11359369  0.21303149 -0.19785886  0.32893217
 -0.14822693  0.02602408]

"asdhasjhdkajshd" is not in the train set. And i want to know how do the model predict it?

The text was updated successfully, but these errors were encountered:

nixphix · 2018-04-05T05:19:25Z

FT breaks down each word into a bag of n-grams of chars, like

'awesome' => <aw>, <awe>, <wes>, <eso>, <som>, <ome>, <me>
if we set minn = maxn = 3

each subword n-grams are assigned a vector value when an OOV(out of vocabulary) word is encountered FT will try and build a vector by summing up subword vectors that would make up the word, so if you try to get a vector for awme then a vector sum of subwords <aw> and <me> is returned.

This is what makes FT robust in dealing with misspelled words and internet slag.

Also subword vector is not same as word vector <me> != me

you can get your subwords with model.get_subwords('asdhasjhdkajshd')

FT unsupervised model is based on this paper Enriching Word Vectors with Subword Information

bringtree closed this as completed Apr 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How fasttext predicts the words no in the train set? #475

How fasttext predicts the words no in the train set? #475

bringtree commented Apr 5, 2018

nixphix commented Apr 5, 2018 •

edited

How fasttext predicts the words no in the train set? #475

How fasttext predicts the words no in the train set? #475

Comments

bringtree commented Apr 5, 2018

nixphix commented Apr 5, 2018 • edited

nixphix commented Apr 5, 2018 •

edited