Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading empty (no data) arff file fails #5276

Closed
jacintoArias opened this Issue Sep 22, 2015 · 2 comments

Comments

Projects
None yet
3 participants
@jacintoArias
Copy link

commented Sep 22, 2015

Although many people would find pointless to read an empty arff dataset, it can be used as an standard format to store a dataset metadata.

I came across this error when trying to replicate some code I've developed in weka, as in the original software empty arff are allowed to be read.

Working example:

Trying to load this file:

@RELATION iris

@ATTRIBUTE sepallength  REAL
@ATTRIBUTE sepalwidth   REAL
@ATTRIBUTE petallength  REAL
@ATTRIBUTE petalwidth   REAL
@ATTRIBUTE class    {Iris-setosa,Iris-versicolor,Iris-virginica}

@DATA 

Results in the following error:

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-18-6c0254044d24> in <module>()
      1 f = open("/Volumes/DATA/datasets/big/poker/pokerhand_header.arff", "r")
      2 f = open("/tmp/iris-empty.arff", "r")
----> 3 [data, meta] = scipy.io.arff.loadarff(f)

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/io/arff/arffread.pyc in loadarff(f)
    546         ofile = open(f, 'rt')
    547     try:
--> 548         return _loadarff(ofile)
    549     finally:
    550         if ofile is not f:  # only close what we opened

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/io/arff/arffread.pyc in _loadarff(ofile)
    614     try:
    615         try:
--> 616             dtline = next_data_line(ofile)
    617             delim = get_delim(dtline)
    618         except ValueError as e:

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/io/arff/arffread.pyc in next_data_line(row_iter)
    609         raw = next(row_iter)
    610         while r_empty.match(raw) or r_comment.match(raw):
--> 611             raw = next(row_iter)
    612         return raw
    613 

StopIteration: 

@jacintoArias jacintoArias changed the title Reading empty (no data) arff fails Reading empty (no data) arff file fails Sep 22, 2015

@WarrenWeckesser

This comment has been minimized.

Copy link
Member

commented Sep 22, 2015

This is not an unreasonable request, but I don't know how soon someone will get around to fixing it.

If the code is changed to handle an empty data section, the returned array should be an array with length 0 and with the same data type that it would have had if there had been data. For example, with the following file (oneline.arff),

@RELATION iris

@ATTRIBUTE sepallength  REAL
@ATTRIBUTE sepalwidth   REAL
@ATTRIBUTE petallength  REAL
@ATTRIBUTE petalwidth   REAL
@ATTRIBUTE class    {Iris-setosa,Iris-versicolor,Iris-virginica}

@DATA
1.0,2.0,3.0,4.0,Iris-setosa

we get

In [19]: data, meta = arffread.loadarff('oneline.arff')

In [20]: data
Out[20]: 
array([(1.0, 2.0, 3.0, 4.0, 'Iris-setosa')], 
      dtype=[('sepallength', '<f8'), ('sepalwidth', '<f8'), ('petallength', '<f8'), ('petalwidth', '<f8'), ('class', 'S15')])

In [21]: meta
Out[21]: 
Dataset: iris
    sepallength's type is numeric
    sepalwidth's type is numeric
    petallength's type is numeric
    petalwidth's type is numeric
    class's type is nominal, range is ('Iris-setosa', 'Iris-versicolor', 'Iris-virginica')

When the file with an empty data section is read, meta should be exactly the same as above, and data should have the same data type as above, but with length 0. That is, it should look like data[:0]:

In [22]: data[:0]
Out[22]: 
array([], 
      dtype=[('sepallength', '<f8'), ('sepalwidth', '<f8'), ('petallength', '<f8'), ('petalwidth', '<f8'), ('class', 'S15')])
@WarrenWeckesser

This comment has been minimized.

Copy link
Member

commented Sep 23, 2015

... but I don't know how soon someone will get around to fixing it.

Faster than I thought :) ... #5278

@rgommers rgommers added this to the 0.17.0 milestone Oct 9, 2015

sumitbinnani added a commit to sumitbinnani/scipy that referenced this issue Oct 9, 2015

BUG: io: Stop guessing the data delimiter in ARFF files.
In the ARFF reader, there were several dozen lines of code that
determined whether the delimiter in the @DaTa section of the ARFF
file was a comma or a space.  This code has been removed.  The ARFF
file format specification says the delimiter must be a comma.

As a side effect, this closes scipygh-5276.  'loadarff' can now handle
a file with no data in the @DaTa section.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.