ENH: io: support sparse format in `loadarff` #3535

fbenites · 2014-04-09T10:43:24Z

loadarff also for sparse format with support for meka multi-label assignments

rgommers · 2014-04-13T19:47:18Z

@cournape you added the arff reader, so would be great if you could have a look at this enhancement.

rgommers · 2014-04-13T19:49:17Z

@fbenites there's a mix of tabs and spaces in your commits: https://travis-ci.org/scipy/scipy/jobs/22599231
Could you remove all the tabs please?

rgommers · 2014-04-13T19:50:23Z

There's also lots of pep8 issues: https://travis-ci.org/scipy/scipy/jobs/22599234

You can install pep8 on your own machine; running pep8 scipy in the root scipy dir should pass.

fbenites · 2014-04-14T10:56:04Z

see https://travis-ci.org/scipy/scipy/builds/22940084, now seems pep8 conform and tests built successfully

rgommers · 2014-04-14T20:36:20Z

Looks much better now. May be good to ask on the scipy-dev mailing list if there's anyone who has a use for this functionality and wants to test / provide feedback.

WarrenWeckesser · 2014-04-14T21:14:52Z

Adding a third return value to loadarff is a backwards-incompatible API change. This change needs to be implemented in a way that won't break old code. Perhaps an additional argument that controls whether the new third value is returned? (It might get ugly, but that's part of the cost of API stability.) There are also incompatible API changes to read_header and MetaData. These parts of the code aren't "advertised" (e.g. http://docs.scipy.org/doc/scipy/reference/io.html#module-scipy.io.arff lists only loadarff), but they don't have leading underscores in the names and they are easily discoverable, so they are public. I haven't thought too much about it (I don't use this module "in anger"), but I guess the change to read_header could be handled the same as loadarff, and the new argument to MetaData.__init__ could be given a default value.

cournape · 2014-04-15T06:46:33Z

@WarrenWeckesser agreed, returning a 3rd value should be avoided.

Returning a 3rd value conditionally is even worse IMO. We should add a new API to use the new features, with a provision to be more extensible (looking at my original code is humbling :) ).

fbenites · 2014-04-15T07:17:30Z

I must admit that this class thing is pretty much a Meka thing, I could write the code for a conditional 3rd argument, since this is a special case. Mulan does handle this problem with a separate xml file for the classes. Further, there is a feature, which I never saw it out there but theoretically.., which gives weight to the instance. I did not implement it. So it does not cover the whole spec as in http://weka.wikispaces.com/ARFF+%28developer+version%29 . I also did not test for sparse and undefined.

coveralls · 2014-04-17T14:40:12Z

Coverage remained the same when pulling 7010687 on fbenites:master into 5b94656 on scipy:master.

pv · 2014-05-04T12:29:05Z

How about putting the class data in the MetaData object, e.g., by adding a def classes(self) method to it? This should be fully backward-compatible.

fbenites · 2014-05-04T16:05:06Z

from the docs:
def Metadata(object):
Small container to keep useful informations on a ARFF dataset.

Knows about attributes names and types.

The classes are the classes for each object. In multi-label the objects can have multiple classes assigned to it, like tags. So for every instance there are attributes and classes. In normal weka the classes are part of the data, I wanted to split up. It is also possible to implement so that data also have the classes in it. So we could pass the number of classes in metadata. Meka uses the first x attributes as classes, MULAN (other multilabel library build over weka) uses the last x as classes. So it should clear that also then, if that important for conformidity. I hoped to use like that and later, if there are many interested in the functionality, change it accordingly as the most users need it.

sparse arff

4501024

WarrenWeckesser added enhancement labels Apr 13, 2014

Fernando Benites and others added 6 commits April 14, 2014 09:28

scipy sparse

d129215

sparse

27e3a27

rm unneeded file

3c63c91

fixed var row missing

ae227f9

fixed a.next to next(a)

f04d433

fixed testdata

913a316

optional classes return for backwards compatibility

7010687

pv removed the PR label Aug 13, 2014

This was referenced Mar 3, 2019

ENH: loadarff now supports relational attributes. #9854

Merged

readarff sparse #3530

Closed

pv added the needs-work Items that are pending response from the author label Aug 7, 2019

lucascolley changed the title ~~sparse arff~~ ENH: io: support sparse format in loadarff Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: io: support sparse format in `loadarff` #3535

ENH: io: support sparse format in `loadarff` #3535

fbenites commented Apr 9, 2014

rgommers commented Apr 13, 2014

rgommers commented Apr 13, 2014

rgommers commented Apr 13, 2014

fbenites commented Apr 14, 2014

rgommers commented Apr 14, 2014

WarrenWeckesser commented Apr 14, 2014

cournape commented Apr 15, 2014

fbenites commented Apr 15, 2014

coveralls commented Apr 17, 2014

pv commented May 4, 2014

fbenites commented May 4, 2014

ENH: io: support sparse format in loadarff #3535

Are you sure you want to change the base?

ENH: io: support sparse format in loadarff #3535

Conversation

fbenites commented Apr 9, 2014

rgommers commented Apr 13, 2014

rgommers commented Apr 13, 2014

rgommers commented Apr 13, 2014

fbenites commented Apr 14, 2014

rgommers commented Apr 14, 2014

WarrenWeckesser commented Apr 14, 2014

cournape commented Apr 15, 2014

fbenites commented Apr 15, 2014

coveralls commented Apr 17, 2014

pv commented May 4, 2014

fbenites commented May 4, 2014

ENH: io: support sparse format in `loadarff` #3535

ENH: io: support sparse format in `loadarff` #3535