Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: make itertuples() return namedtuples #11269

Closed
mjoud opened this issue Oct 9, 2015 · 7 comments · Fixed by #11325
Closed

ENH: make itertuples() return namedtuples #11269

mjoud opened this issue Oct 9, 2015 · 7 comments · Fixed by #11325
Labels
Compat pandas objects compatability with Numpy or Python functions Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@mjoud
Copy link

mjoud commented Oct 9, 2015

I propose that itertuples() should return collections.namedtuple objects, a drop-in replacement for the standard tuple but with the benefit of having named fields. I have tested the following with Python 3.4 (only slight changes compared to the current implementation).

def itertuples(self, index=True):
    arrays = []
        if index:
            arrays.append(self.index)
            fields = ["Index"] + list(self.columns)
        else:
            fields = self.columns

        itertuple = collections.namedtuple("Itertuple", fields, rename=True)

        # use integer indexing because of possible duplicate column names
        arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))

        return (itertuple(*row) for row in zip(*arrays))

Example

In [3]: df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]}, index=['a', 'b'])

In [4]: for row in df.itertuples():
   ...:     print(row)
   ...:     
Itertuple(Index='a', col1=1, col2=0.10000000000000001)
Itertuple(Index='b', col1=2, col2=0.20000000000000001)

In [5]: row.Index, row.col1, row.col2
Out[5]: ('b', 2, 0.20000000000000001)

There is no performance overhead. I'm not sure about the compatibility for older versions of Python, though. The rename parameter is needed for renaming disallowed field names and duplicate identifiers to standard position-based identifiers, and this feature was added in Python 2.7/3.1.

@jreback
Copy link
Contributor

jreback commented Oct 9, 2015

sure this would make sense. I would prob name the tuple Pandas or somesuch.

pull-requests welcome.

I believe this should work on all versions (inc 2.6) as namedtuple was back-ported.

@jreback jreback added Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode Difficulty Novice Compat pandas objects compatability with Numpy or Python functions labels Oct 9, 2015
@jreback jreback added this to the Next Major Release milestone Oct 9, 2015
@mjoud
Copy link
Author

mjoud commented Oct 9, 2015

Just checked Python 2.6.6, and there is no support for the rename parameter, so this would require some compatibility handling.

@jreback
Copy link
Contributor

jreback commented Oct 9, 2015

oh, why do you need rename?

@mjoud
Copy link
Author

mjoud commented Oct 9, 2015

rename=False will raise a ValueError for invalid field names, while rename=True will rename them to _pos:

In [7]: itertuple = namedtuple("Pandas", ["def", "return"], rename=True)

In [8]: itertuple(1,2)
Out[8]: Pandas(_0=1, _1=2)

In [9]: itertuple = namedtuple("Pandas", ["def", "return"], rename=False)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-098cde3289d7> in <module>()
----> 1 itertuple = namedtuple("Pandas", ["def", "return"], rename=False)

/usr/lib/python3.5/collections/__init__.py in namedtuple(typename, field_names, verbose, rename)
    394         if _iskeyword(name):
    395             raise ValueError('Type names and field names cannot be a '
--> 396                              'keyword: %r' % name)
    397     seen = set()
    398     for name in field_names:

ValueError: Type names and field names cannot be a keyword: 'def'

@jreback
Copy link
Contributor

jreback commented Oct 9, 2015

ahh I see, ok, you can just do this conditionally and raise in py2.6 if its not supported

@mjoud
Copy link
Author

mjoud commented Oct 14, 2015

This was also discussed in #7958.

I'm not sure how to handle the rename bit; if I just set rename=True, this will make itertuple unusable for anyone still using Python 2.6, and rename=False will raise an error for any disallowed field name. Should we wait until 2.6 support is dropped?

@jreback
Copy link
Contributor

jreback commented Oct 14, 2015

@mjoud as I said above, you can pass the rename argument in a try/except. If it doesn't work (iow in py2.6), then try w/o the rename, if that blows up, then you give up.

@jreback jreback modified the milestones: 0.17.1, Next Major Release Oct 20, 2015
jreback added a commit that referenced this issue Oct 28, 2015
ENH: itertuples() returns namedtuples (closes #11269)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants