ENH: make itertuples() return namedtuples #11269

Closed
mjoud opened this Issue Oct 9, 2015 · 7 comments

Comments

Projects
None yet
2 participants

mjoud commented Oct 9, 2015

I propose that itertuples() should return collections.namedtuple objects, a drop-in replacement for the standard tuple but with the benefit of having named fields. I have tested the following with Python 3.4 (only slight changes compared to the current implementation).

def itertuples(self, index=True):
    arrays = []
        if index:
            arrays.append(self.index)
            fields = ["Index"] + list(self.columns)
        else:
            fields = self.columns

        itertuple = collections.namedtuple("Itertuple", fields, rename=True)

        # use integer indexing because of possible duplicate column names
        arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))

        return (itertuple(*row) for row in zip(*arrays))

Example

In [3]: df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]}, index=['a', 'b'])

In [4]: for row in df.itertuples():
   ...:     print(row)
   ...:     
Itertuple(Index='a', col1=1, col2=0.10000000000000001)
Itertuple(Index='b', col1=2, col2=0.20000000000000001)

In [5]: row.Index, row.col1, row.col2
Out[5]: ('b', 2, 0.20000000000000001)

There is no performance overhead. I'm not sure about the compatibility for older versions of Python, though. The rename parameter is needed for renaming disallowed field names and duplicate identifiers to standard position-based identifiers, and this feature was added in Python 2.7/3.1.

Contributor

jreback commented Oct 9, 2015

sure this would make sense. I would prob name the tuple Pandas or somesuch.

pull-requests welcome.

I believe this should work on all versions (inc 2.6) as namedtuple was back-ported.

jreback added this to the Next Major Release milestone Oct 9, 2015

mjoud commented Oct 9, 2015

Just checked Python 2.6.6, and there is no support for the rename parameter, so this would require some compatibility handling.

Contributor

jreback commented Oct 9, 2015

oh, why do you need rename?

mjoud commented Oct 9, 2015

rename=False will raise a ValueError for invalid field names, while rename=True will rename them to _pos:

In [7]: itertuple = namedtuple("Pandas", ["def", "return"], rename=True)

In [8]: itertuple(1,2)
Out[8]: Pandas(_0=1, _1=2)

In [9]: itertuple = namedtuple("Pandas", ["def", "return"], rename=False)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-098cde3289d7> in <module>()
----> 1 itertuple = namedtuple("Pandas", ["def", "return"], rename=False)

/usr/lib/python3.5/collections/__init__.py in namedtuple(typename, field_names, verbose, rename)
    394         if _iskeyword(name):
    395             raise ValueError('Type names and field names cannot be a '
--> 396                              'keyword: %r' % name)
    397     seen = set()
    398     for name in field_names:

ValueError: Type names and field names cannot be a keyword: 'def'
Contributor

jreback commented Oct 9, 2015

ahh I see, ok, you can just do this conditionally and raise in py2.6 if its not supported

mjoud commented Oct 14, 2015

This was also discussed in #7958.

I'm not sure how to handle the rename bit; if I just set rename=True, this will make itertuple unusable for anyone still using Python 2.6, and rename=False will raise an error for any disallowed field name. Should we wait until 2.6 support is dropped?

Contributor

jreback commented Oct 14, 2015

@mjoud as I said above, you can pass the rename argument in a try/except. If it doesn't work (iow in py2.6), then try w/o the rename, if that blows up, then you give up.

@jreback jreback modified the milestone: 0.17.1, Next Major Release Oct 20, 2015

jreback closed this in #11325 Oct 28, 2015

@jreback jreback added a commit that referenced this issue Oct 28, 2015

@jreback jreback Merge pull request #11325 from mjoud/namedtuples
ENH: itertuples() returns namedtuples (closes #11269)
8a46de4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment