iterrows: when upcasting to object, values are converted to python types #13468

jorisvandenbossche · 2016-06-16T22:46:45Z

I know iterrows is not the most recommended function, but I noticed a strange behaviour (triggered by a problem of a geopandas user: geopandas/geopandas#348). When using iterrows on a df with mixed dtypes (so the resulting series is of object dtype), the numeric values are converted to python types, while with loc/iloc the numpy types are preserved:

In [254]: df = pd.DataFrame({'int':[0,1], 'float':[0.1,0.2], 'str':['a','b']})

In [255]: df
Out[255]:
   float  int str
0    0.1    0   a
1    0.2    1   b

In [256]: row1 = df.iloc[0]

In [257]: i, row2 = next(df.iterrows())

In [258]: row3= next(df.itertuples())

In [260]: type(row1['float'])
Out[260]: numpy.float64

In [261]: type(row2['float'])
Out[261]: float

In [269]: type(row3.float)
Out[269]: numpy.float64

Is this intentional? (it's a consequence of using self.values in the implementation, and numpy does this conversion to python types in an object array) And if so, is this worth documenting?

(note it was actually the numpy types in an object dtyped series that caused an issue for the geopandas user, because fiona couldn't handle those numpy scalars in an object dtyped column, but that's not an issue to blame pandas)

The text was updated successfully, but these errors were encountered:

jreback · 2016-06-16T23:03:38Z

see discussion #13236

should be the same (eg Python types)

jreback · 2017-09-12T12:54:23Z

so after #17491
[269] is also float.

I think we could actually/should fix [260], but that's another item.

jorisvandenbossche · 2017-09-12T15:43:47Z

Yes, I think this can actually be closed now, apart from a doc update to iterrows / itertuples to make it clear that it boxes to python / custom pandas types.

jreback · 2017-09-12T16:05:08Z

i think ok to keep open for now

I want to fix the scalar getting as well (will reirose for that)

mitar · 2018-04-23T16:08:27Z

To fix [260] you can call item on the underlying numpy arrays the same as I am doing in #20796. So this seems to do the same thing as tolist does on whole array. So you could call item for each cell in a row, when constructing the result for df.iloc[0].

mitar · 2018-04-23T16:18:47Z

There is something strange going on here. Taking an example from documentation:

>>> df = pd.DataFrame([[1, 1.5]], columns=['int', 'float'])
>>> row = next(df.iterrows())[1]
>>> row
int      1.0
float    1.5
Name: 0, dtype: float64
>>> print(row['int'].dtype)
float64

But:

>>> df = pd.DataFrame([[1, 1.5, 'a']], columns=['int', 'float', 'str'])
>>> row = next(df.iterrows())[1]
>>> row
int        1
float    1.5
str        a
Name: 0, dtype: object
>>> print(row['int'].dtype)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'int' object has no attribute 'dtype'

So it seems conversion to Python types happens only if there is some object dtype present. Otherwise we get (and keep) numpy types, only upcast to a common dtype.

bscheetz · 2018-05-08T17:21:28Z

@jreback In the first example posted by @mitar, python type int should be returned because we're iterating, correct?

It also sounds like we want to fix the type returned by iloc - should return python type int but instead returns numpy.int64

jorisvandenbossche · 2018-05-11T09:20:07Z

@jreback In the first example posted by @mitar, python type int should be returned because we're iterating, correct?

I don't think so, as in that example there are only numeric dtypes, so it makes sense to keep the row / Series as float dtype.
And if we do that, this boils down to the fact that accessing a single element from a numerical Series gives a numpy scalar type (type(pd.Series([1.0, 2.0])[0]) == np.float64)

I agree it is a bit confusing that it depends on whether there is a string column or not. But I think the dtype of the resulting Series of float vs object makes sense.

TomAugspurger · 2019-12-30T14:19:20Z

@jorisvandenbossche is the only remaining issue here documenting the behavior?

MarioProjects · 2021-02-15T17:46:01Z

There is something strange going on here. Taking an example from documentation:
>>> df = pd.DataFrame([[1, 1.5]], columns=['int', 'float'])
>>> row = next(df.iterrows())[1]
>>> row
int      1.0
float    1.5
Name: 0, dtype: float64
>>> print(row['int'].dtype)
float64
But:
>>> df = pd.DataFrame([[1, 1.5, 'a']], columns=['int', 'float', 'str'])
>>> row = next(df.iterrows())[1]
>>> row
int        1
float    1.5
str        a
Name: 0, dtype: object
>>> print(row['int'].dtype)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'int' object has no attribute 'dtype'
So it seems conversion to Python types happens only if there is some object dtype present. Otherwise we get (and keep) numpy types, only upcast to a common dtype.

I found the same problem printing a dataframe. When printing an int column as [1992, 1993, 1994], prints [1992.0, 1993.0, 1994.0]. I tried

wm["Year"] = wm["Year"].astype(int)
wm.astype(int)

and nothing

jorisvandenbossche mentioned this issue Jun 17, 2016

GeoDataFrame.to_file() raises ValueError: Invalid field type <class 'numpy.int64'> geopandas/geopandas#348

Closed

fmaussion mentioned this issue Jun 17, 2016

Writing np.int64 or np.int32 scalars raises an error Toblerity/Fiona#365

Open

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions Difficulty Intermediate labels Jun 17, 2016

jreback added this to the Next Major Release milestone Jun 17, 2016

jreback modified the milestones: 0.21.0, Next Major Release Sep 12, 2017

jorisvandenbossche mentioned this issue Sep 15, 2017

COMPAT: Iteration should always yield a python scalar #17491

Merged

jreback modified the milestones: 0.21.0, 1.0 Oct 2, 2017

jreback mentioned this issue Apr 23, 2018

Surprising type conversion when iterating #20791

Open

jreback modified the milestones: 1.0, 0.24.0 Apr 23, 2018

jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Apr 23, 2018

chris-b1 mentioned this issue Sep 6, 2018

DataFrame.to_dict(orient='records') numeric inconsistency #22620

Closed

jreback modified the milestones: 0.24.0, 0.25.0 Oct 23, 2018

jorisvandenbossche mentioned this issue Nov 17, 2018

DataFrame.to_dict returning numpy scalars in certain cases #23753

Closed

jorisvandenbossche modified the milestones: 0.25.0, 1.0 Jun 30, 2019

jbrockmendel removed Effort Medium labels Oct 21, 2019

TomAugspurger modified the milestones: 1.0, Contributions Welcome Dec 30, 2019

arw2019 added the Docs label Nov 5, 2020

mroeschke added Bug and removed Docs Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 1, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iterrows: when upcasting to object, values are converted to python types #13468

iterrows: when upcasting to object, values are converted to python types #13468

jorisvandenbossche commented Jun 16, 2016

jreback commented Jun 16, 2016

jreback commented Sep 12, 2017 •

edited

Loading

jorisvandenbossche commented Sep 12, 2017

jreback commented Sep 12, 2017

mitar commented Apr 23, 2018

mitar commented Apr 23, 2018

bscheetz commented May 8, 2018

jorisvandenbossche commented May 11, 2018

TomAugspurger commented Dec 30, 2019

MarioProjects commented Feb 15, 2021

iterrows: when upcasting to object, values are converted to python types #13468

iterrows: when upcasting to object, values are converted to python types #13468

Comments

jorisvandenbossche commented Jun 16, 2016

jreback commented Jun 16, 2016

jreback commented Sep 12, 2017 • edited Loading

jorisvandenbossche commented Sep 12, 2017

jreback commented Sep 12, 2017

mitar commented Apr 23, 2018

mitar commented Apr 23, 2018

bscheetz commented May 8, 2018

jorisvandenbossche commented May 11, 2018

TomAugspurger commented Dec 30, 2019

MarioProjects commented Feb 15, 2021

jreback commented Sep 12, 2017 •

edited

Loading