New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: make itertuples() return namedtuples #11269

Closed
mjoud opened this Issue Oct 9, 2015 · 7 comments

Comments

Projects
None yet
2 participants
@mjoud

mjoud commented Oct 9, 2015

I propose that itertuples() should return collections.namedtuple objects, a drop-in replacement for the standard tuple but with the benefit of having named fields. I have tested the following with Python 3.4 (only slight changes compared to the current implementation).

def itertuples(self, index=True):
    arrays = []
        if index:
            arrays.append(self.index)
            fields = ["Index"] + list(self.columns)
        else:
            fields = self.columns

        itertuple = collections.namedtuple("Itertuple", fields, rename=True)

        # use integer indexing because of possible duplicate column names
        arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))

        return (itertuple(*row) for row in zip(*arrays))

Example

In [3]: df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]}, index=['a', 'b'])

In [4]: for row in df.itertuples():
   ...:     print(row)
   ...:     
Itertuple(Index='a', col1=1, col2=0.10000000000000001)
Itertuple(Index='b', col1=2, col2=0.20000000000000001)

In [5]: row.Index, row.col1, row.col2
Out[5]: ('b', 2, 0.20000000000000001)

There is no performance overhead. I'm not sure about the compatibility for older versions of Python, though. The rename parameter is needed for renaming disallowed field names and duplicate identifiers to standard position-based identifiers, and this feature was added in Python 2.7/3.1.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 9, 2015

Contributor

sure this would make sense. I would prob name the tuple Pandas or somesuch.

pull-requests welcome.

I believe this should work on all versions (inc 2.6) as namedtuple was back-ported.

Contributor

jreback commented Oct 9, 2015

sure this would make sense. I would prob name the tuple Pandas or somesuch.

pull-requests welcome.

I believe this should work on all versions (inc 2.6) as namedtuple was back-ported.

@jreback jreback added this to the Next Major Release milestone Oct 9, 2015

@mjoud

This comment has been minimized.

Show comment
Hide comment
@mjoud

mjoud Oct 9, 2015

Just checked Python 2.6.6, and there is no support for the rename parameter, so this would require some compatibility handling.

mjoud commented Oct 9, 2015

Just checked Python 2.6.6, and there is no support for the rename parameter, so this would require some compatibility handling.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 9, 2015

Contributor

oh, why do you need rename?

Contributor

jreback commented Oct 9, 2015

oh, why do you need rename?

@mjoud

This comment has been minimized.

Show comment
Hide comment
@mjoud

mjoud Oct 9, 2015

rename=False will raise a ValueError for invalid field names, while rename=True will rename them to _pos:

In [7]: itertuple = namedtuple("Pandas", ["def", "return"], rename=True)

In [8]: itertuple(1,2)
Out[8]: Pandas(_0=1, _1=2)

In [9]: itertuple = namedtuple("Pandas", ["def", "return"], rename=False)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-098cde3289d7> in <module>()
----> 1 itertuple = namedtuple("Pandas", ["def", "return"], rename=False)

/usr/lib/python3.5/collections/__init__.py in namedtuple(typename, field_names, verbose, rename)
    394         if _iskeyword(name):
    395             raise ValueError('Type names and field names cannot be a '
--> 396                              'keyword: %r' % name)
    397     seen = set()
    398     for name in field_names:

ValueError: Type names and field names cannot be a keyword: 'def'

mjoud commented Oct 9, 2015

rename=False will raise a ValueError for invalid field names, while rename=True will rename them to _pos:

In [7]: itertuple = namedtuple("Pandas", ["def", "return"], rename=True)

In [8]: itertuple(1,2)
Out[8]: Pandas(_0=1, _1=2)

In [9]: itertuple = namedtuple("Pandas", ["def", "return"], rename=False)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-098cde3289d7> in <module>()
----> 1 itertuple = namedtuple("Pandas", ["def", "return"], rename=False)

/usr/lib/python3.5/collections/__init__.py in namedtuple(typename, field_names, verbose, rename)
    394         if _iskeyword(name):
    395             raise ValueError('Type names and field names cannot be a '
--> 396                              'keyword: %r' % name)
    397     seen = set()
    398     for name in field_names:

ValueError: Type names and field names cannot be a keyword: 'def'
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 9, 2015

Contributor

ahh I see, ok, you can just do this conditionally and raise in py2.6 if its not supported

Contributor

jreback commented Oct 9, 2015

ahh I see, ok, you can just do this conditionally and raise in py2.6 if its not supported

@mjoud

This comment has been minimized.

Show comment
Hide comment
@mjoud

mjoud Oct 14, 2015

This was also discussed in #7958.

I'm not sure how to handle the rename bit; if I just set rename=True, this will make itertuple unusable for anyone still using Python 2.6, and rename=False will raise an error for any disallowed field name. Should we wait until 2.6 support is dropped?

mjoud commented Oct 14, 2015

This was also discussed in #7958.

I'm not sure how to handle the rename bit; if I just set rename=True, this will make itertuple unusable for anyone still using Python 2.6, and rename=False will raise an error for any disallowed field name. Should we wait until 2.6 support is dropped?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 14, 2015

Contributor

@mjoud as I said above, you can pass the rename argument in a try/except. If it doesn't work (iow in py2.6), then try w/o the rename, if that blows up, then you give up.

Contributor

jreback commented Oct 14, 2015

@mjoud as I said above, you can pass the rename argument in a try/except. If it doesn't work (iow in py2.6), then try w/o the rename, if that blows up, then you give up.

@jreback jreback modified the milestones: 0.17.1, Next Major Release Oct 20, 2015

jreback added a commit that referenced this issue Oct 28, 2015

Merge pull request #11325 from mjoud/namedtuples
ENH: itertuples() returns namedtuples (closes #11269)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment