Fix mmio/fromfile on gzip on Python 3 (Trac #1627) #2152

Closed
scipy-gitbot opened this Issue Apr 25, 2013 · 4 comments

5 participants

@scipy-gitbot

Original ticket http://projects.scipy.org/scipy/ticket/1627 on 2012-03-22 by @pv, assigned to unknown.

Consider this:

Python 3.2.2
>>> from scipy.io import mmread
>>> mmread('illc1033.mtx.gz')
Traceback (most recent call last):
  ...
File ".../scipy/io/mmio.py", line 447, in _parse_body
    flat_data = flat_data.reshape(-1,3)
ValueError: total size of new array must be unchanged

with ftp://math.nist.gov/pub/MatrixMarket2/Harwell-Boeing/lsq/illc1033.mtx.gz

The reason is passing a GzipFile handle to numpy.fromfile. On Python 3, PyObject_AsFileDescriptor in fromfile succeeds on a GzipFile, although it should fail (there is no OS level file handle giving the uncompressed stream). As a consequence, the fromfile call in mmread apparently ends up reading an compressed data stream, which causes this error.

This should be worked around in scipy.io.mmio until Python 3 (and Numpy) are fixed.

@andreas-h

I just tried this, and np.fromfile does work on the gzip handle:

$ python3 
Python 3.2.5 (default, Oct 29 2013, 10:03:36) 
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import gzip
>>> G = gzip.open("illc1033.mtx.gz", "r")
>>> np.fromfile(G)
array([  1.02866359e-064,   1.50168922e-076,   1.31490163e+294, ...,
         1.15488103e+120,   1.31512507e+171,   6.63818119e-221])
>>> 
>>> from scipy.io import mmread
>>> mmread("illc1033.mtx.gz")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.2/site-packages/scipy/io/mmio.py", line 73, in mmread
    return MMFile().read(source)
  File "/usr/lib64/python3.2/site-packages/scipy/io/mmio.py", line 326, in read
    return self._parse_body(stream)
  File "/usr/lib64/python3.2/site-packages/scipy/io/mmio.py", line 475, in _parse_body
    flat_data = flat_data.reshape(-1,3)
ValueError: total size of new array must be unchanged

Did I understand you wrong, @pv? This is on scipy 0.13.0.

@pv
SciPy member
pv commented Feb 12, 2014

np.fromfile reads the data from the raw stream and not the uncompressed one, so it doesn't work.
The numbers you get from fromfile are garbage.

@andreas-h andreas-h added a commit to andreas-h/scipy that referenced this issue Feb 12, 2014
@andreas-h andreas-h BUG: fix scipy.io.mmread() of gzipped files under Python3
fixes #2152
4c62ad7
@andreas-h andreas-h added a commit to andreas-h/scipy that referenced this issue Feb 12, 2014
@andreas-h andreas-h BUG: fix scipy.io.mmread() of gzipped files under Python3
fixes #2152
68f969c
@andreas-h andreas-h added a commit to andreas-h/scipy that referenced this issue Feb 12, 2014
@andreas-h andreas-h BUG: fix scipy.io.mmread() of gzipped files under Python3
fixes #2152
a5c4210
@andreas-h andreas-h added a commit to andreas-h/scipy that referenced this issue Feb 12, 2014
@andreas-h andreas-h BUG: fix scipy.io.mmread() of gzipped files under Python3
fixes #2152
8daf621
@juliantaylor

hm maybe numpy should cave and just use the python api, that would avoid a lot of problems, its only string reading that would be measurably slower (fgetc usage)

@pv
SciPy member
pv commented Feb 12, 2014

The Python file API AFAIK also has no ungetc, so the ability to parse stuff directly from streams is lost.

@andreas-h andreas-h added a commit to andreas-h/scipy that referenced this issue Feb 19, 2014
@andreas-h andreas-h TST: test if fix for #2152 works 14cb83d
@andreas-h andreas-h added a commit to andreas-h/scipy that referenced this issue Feb 19, 2014
@andreas-h andreas-h TST: test if fix for #2152 works 9cb3ae2
@andreas-h andreas-h added a commit to andreas-h/scipy that referenced this issue Feb 20, 2014
@andreas-h andreas-h TST: test if fix for #2152 works ac31bd8
@andreas-h andreas-h added a commit to andreas-h/scipy that referenced this issue Feb 23, 2014
@andreas-h andreas-h TST: test if fix for #2152 works 541dcc4
@rgommers rgommers added this to the 0.14.0 milestone Feb 23, 2014
@rgommers rgommers closed this in #3314 Feb 24, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment