-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python gensim 不能加载词向量文件 #8
Comments
用这个函数加载 知乎问答数据 sgns.zhihu.char |
@sudazzk 我也是用这个函数加载,可以正常运行。 from gensim.models.keyedvectors import KeyedVectors |
我更新了词向量,应该不会有unicode编码问题了。
|
还不错哦 |
from gensim.models.keyedvectors import KeyedVectors 这一句靠谱 |
D:\Program\Anaconda3\lib\site-packages\gensim\utils.py:860: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
Traceback (most recent call last):
File ".\zzk_word2vec.py", line 101, in
test_word_embedding('D:\data\pretrain_word2vec\Chinese-Word-Vectors\sgns.zhihu.char\sgns.zhihu.char')
File ".\zzk_word2vec.py", line 76, in test_word_embedding
model = gensim.models.KeyedVectors.load_word2vec_format(vector_file, binary=False, encoding='utf8')
File "D:\Program\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py", line 250, in load_word2vec_format
parts = utils.to_unicode(line.rstrip(), encoding=encoding, errors=unicode_errors).split(" ")
File "D:\Program\Anaconda3\lib\site-packages\gensim\utils.py", line 242, in any2unicode
return unicode(text, encoding, errors=errors)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 96-97: invalid continuation byte
The text was updated successfully, but these errors were encountered: