MmCorpus.load --> UnpicklingError: invalid load key, '%'. #1889

obeavers · 2018-02-09T04:04:11Z

Description

I'm getting an error in using MmCorpus.load('file.mm'), even immediately after saving saving with MmCorpus.serialize('file.mm', corpus). I am using windows10.

Steps/Code/Corpus to Reproduce

Corpus created with:

corpus = [dictionary.doc2bow(text) for text in texts]
MmCorpus.serialize('file.mm', corpus')

corpus = MmCorpus.serialize('file.mm') #breaks here

Expected Results

Expecting corpus to load as called.

Actual Results

1 c = MmCorpus.load(str(path))

c:\users\user.virtualenvs\key_log-v5coq-ss\lib\site-packages\gensim\utils.py in load(cls, fname, mmap)
393 compress, subname = SaveLoad._adapt_by_suffix(fname)
394
--> 395 obj = unpickle(fname)
396 obj._load_specials(fname, mmap, compress, subname)
397 logger.info("loaded %s", fname)

c:\users\user.virtualenvs\key_log-v5coq-ss\lib\site-packages\gensim\utils.py in unpickle(fname)
1300 # Because of loading from S3 load can't be used (missing readline in smart_open)
1301 if sys.version_info > (3, 0):
-> 1302 return _pickle.load(f, encoding='latin1')
1303 else:
1304 return _pickle.loads(f.read())

UnpicklingError: invalid load key, '%'.

Versions

Windows-10-10.0.16299-SP0
Python 3.6.3 |Anaconda, Inc.| (default, Oct 15 2017, 03:27:45) [MSC v.1900 64 bit (AMD64)]
NumPy 1.14.0
SciPy 1.0.0
gensim 3.3.0
FAST_VERSION 0

menshikh-iv · 2018-02-09T04:05:27Z

Thanks for report @obeavers, can you share your corpus (needed for reproducing your error)?

arlenk · 2018-02-12T03:03:26Z

MmCorpus.load('file.mm')

Are you definitely calling MmCorpus.load('file.mm') or are you calling MmCorpus('file.mm')?
It should be the latter per https://radimrehurek.com/gensim/tut1.html#corpus-formats

menshikh-iv · 2018-02-16T06:45:20Z

@obeavers also, you said that

corpus = MmCorpus.serialize('file.mm') #breaks here

but in your stacktrace, I see different line

c = MmCorpus.load(str(path))

This looks strange, can you fix your first message & share file?

Also, as @arlenk suggested, if you call "serialize", you should load it as MmCorpus(path) (not MmCorpus.load)

menshikh-iv · 2018-02-28T12:59:19Z

So, I investigate it again, the problem really with serialize + load, this is an incorrect way of usage.

You should call MmCorpus.serizalize("file.mm", corpus) and after - load it as MmCorpus("file.mm") (don't use save/load here, this have no sense).

sreenathelloti · 2018-05-13T16:59:39Z

Error while executing following command
self.train_data = pickle.load(f,encoding='cp1252')
UnpicklingError: invalid load key, '\xd9'.

menshikh-iv · 2018-07-30T10:44:20Z

@sreenathelloti how this related with current thread? What is this code?

aristila · 2021-09-21T10:41:46Z

Here's my two cents: I had serialized a corpus in a Linux server and transferred the .mm and .mm.index files into my windows 10 environment, then tried to load the corpus.

try #1:
corpus = corpora.MmCorpus(path_to_mm_file)
(NB! this worked fine in the original Linux environment)

resulting error:
Traceback (most recent call last):
File [path_to_code], line 178, in
corpus = corpora.MmCorpus(ser_path)
File ".......\gensim\corpora\mmcorpus.py", line 55, in init
matutils.MmReader.init(self, fname)
File "gensim/corpora/_mmreader.pyx", line 55, in gensim.corpora._mmreader.MmReader.init
self.input, self.transposed = input, transposed
File "gensim/corpora/_mmreader.pyx", line 70, in gensim.corpora._mmreader.MmReader.init
if not line.startswith('%'):
ValueError: need more than 0 values to unpack

try #2:
corpus = corpora.MmCorpus.load(ser_path)

resulting error:
Traceback (most recent call last):
File [path_to_code], line 178, in
corpus = corpora.MmCorpus.load(ser_path)
File "......\gensim\utils.py", line 486, in load
obj = unpickle(fname)
File ".......\gensim\utils.py", line 1458, in unpickle
return _pickle.load(f, encoding='latin1') # needed because loading from S3 doesn't support readline()
_pickle.UnpicklingError: invalid load key, '%'.

arlenk · 2021-09-21T15:37:25Z

Here's my two cents: I had serialized a corpus in a Linux server and transferred the .mm and .mm.index files into my windows 10 environment, then tried to load the corpus.

The mmCorpus file (path_to_mm_file) should just be a plain text file. Have you tried looking at the file to make sure the transfer from linux didn't somehow corrupt the file?

aristila · 2021-09-21T17:26:29Z

Have you tried looking at the file to make sure the transfer from linux didn't somehow corrupt the file?

I compared checksums and they match, at least.

One thing that caught my eye is encoding='latin1' in the second try. I have been using encoding='utf-8' everywhere I can, but here the code seems to just guess. That's probably not the cause of this problem though, just thought I'd mention.

piskvorky · 2021-09-21T18:43:30Z

The first way is correct and should work:

corpus = corpora.MmCorpus(path_to_mm_file)

The error ValueError: need more than 0 values to unpack is weird though, I cannot reproduce it. What Python are you using?

Seems unrelated to this ticket. Please open a new ticket, with the necessary info (incl. a minimal example, if possible), thanks.

menshikh-iv added the need info Not enough information for reproduce an issue, need more info from author label Feb 9, 2018

menshikh-iv closed this as completed Feb 28, 2018

Repository owner locked as resolved and limited conversation to collaborators Sep 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MmCorpus.load --> UnpicklingError: invalid load key, '%'. #1889

MmCorpus.load --> UnpicklingError: invalid load key, '%'. #1889

obeavers commented Feb 9, 2018

menshikh-iv commented Feb 9, 2018 •

edited

arlenk commented Feb 12, 2018 •

edited

menshikh-iv commented Feb 16, 2018

menshikh-iv commented Feb 28, 2018

sreenathelloti commented May 13, 2018

menshikh-iv commented Jul 30, 2018

aristila commented Sep 21, 2021 •

edited

arlenk commented Sep 21, 2021

aristila commented Sep 21, 2021

piskvorky commented Sep 21, 2021 •

edited

MmCorpus.load --> UnpicklingError: invalid load key, '%'. #1889

MmCorpus.load --> UnpicklingError: invalid load key, '%'. #1889

Comments

obeavers commented Feb 9, 2018

Description

Steps/Code/Corpus to Reproduce

Expected Results

Actual Results

Versions

menshikh-iv commented Feb 9, 2018 • edited

arlenk commented Feb 12, 2018 • edited

menshikh-iv commented Feb 16, 2018

menshikh-iv commented Feb 28, 2018

sreenathelloti commented May 13, 2018

menshikh-iv commented Jul 30, 2018

aristila commented Sep 21, 2021 • edited

arlenk commented Sep 21, 2021

aristila commented Sep 21, 2021

piskvorky commented Sep 21, 2021 • edited

menshikh-iv commented Feb 9, 2018 •

edited

arlenk commented Feb 12, 2018 •

edited

aristila commented Sep 21, 2021 •

edited

piskvorky commented Sep 21, 2021 •

edited