You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 19, 2024. It is now read-only.
FT breaks down each word into a bag of n-grams of chars, like
'awesome' => <aw>, <awe>, <wes>, <eso>, <som>, <ome>, <me>
if we set minn = maxn = 3
each subword n-grams are assigned a vector value when an OOV(out of vocabulary) word is encountered FT will try and build a vector by summing up subword vectors that would make up the word, so if you try to get a vector for awme then a vector sum of subwords <aw> and <me> is returned.
This is what makes FT robust in dealing with misspelled words and internet slag.
Also subword vector is not same as word vector <me> != me
you can get your subwords with model.get_subwords('asdhasjhdkajshd')
python3
"asdhasjhdkajshd" is not in the train set. And i want to know how do the model predict it?
The text was updated successfully, but these errors were encountered: