You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Always when I have a huge model trained using c version of word2vec I have to fix many lines that rae rather containing some strange chars or they are bad-formed. There are only few lines. Sincee finding and replacing them takes forever, is there anyway to just ignore them and skip them in the load_word2vec_format function?
The text was updated successfully, but these errors were encountered:
If you can work with the develop branch, load_word2vec_format() now takes an optional unicode_errors argument (see #466). The value is passed to the native python unicode() function, and using 'ignore' or 'replace' should help most reads survive any mangling from the word2vec.c files...
The unicode_errors option is in the 0.12.3 release and so is available to anyone who hits this problem. Closing. (If a lot of people hit this problem for reasons outside their control, perhaps 'ignore' should be the new default. TBD.)
Always when I have a huge model trained using c version of word2vec I have to fix many lines that rae rather containing some strange chars or they are bad-formed. There are only few lines. Sincee finding and replacing them takes forever, is there anyway to just ignore them and skip them in the load_word2vec_format function?
The text was updated successfully, but these errors were encountered: