Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/REGR: construction of Series with scalar-like / len-1 lists #20391

Closed
jorisvandenbossche opened this Issue Mar 17, 2018 · 6 comments

Comments

Projects
None yet
2 participants
@jorisvandenbossche
Copy link
Member

commented Mar 17, 2018

At geopandas some tests started failing with pandas master:

In [8]: from geopandas import GeoSeries
   ...: from shapely.geometry import Point

In [9]: p = Point(1, 2)

In [10]: GeoSeries(p, index=['a', 'b', 'c', 'd'])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-9a0cacdb2179> in <module>()
----> 1 GeoSeries(p, index=['a', 'b', 'c', 'd'])

/home/joris/scipy/geopandas/geopandas/geoseries.py in __new__(cls, data, index, crs, **kwargs)
     96                 name = kwargs.get('name', None)
     97             else:
---> 98                 s = pd.Series(data, index=index, **kwargs)
     99                 # prevent trying to convert non-geometry objects
    100                 if s.dtype != object and not s.empty:

/home/joris/scipy/pandas/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    253                             'Length of passed values is {val}, '
    254                             'index implies {ind}'
--> 255                             .format(val=len(data), ind=len(index)))
    256                 except TypeError:
    257                     pass

ValueError: Length of passed values is 1, index implies 4

previously this replicated the single point multiple times, just as pd.Series(1, index=['a', 'b', 'c', 'd']) gives a Series with four 1's.

This is related to #19714, which removed the broadcasting of 1-length lists in the Series constructor (pd.Series([1], index=['a', 'b', 'c', 'd'])

The reason that geopandas converted the geometry to single element lists, is because geometries are convertable to array (and some are also iterable), and hence not seen as a 'scalar' by pandas (added 4 years ago: geopandas/geopandas#70).

It still works when you do not pass an index:

In [36]: GeoSeries(p)
Out[36]:
0    POINT (1 2)
dtype: object

Note there is also some inconsistency within pandas itself:

In [39]: pd.Series(p)
Out[39]: 
0    POINT (1 2)
dtype: object

In [40]: pd.Series(p, index=['a', 'b', 'c', 'd'])
...
ValueError: Wrong number of items passed 2, placement implies 4

(because in the first case when no index is specifed, p is converted to [p] before passing it to _sanitize_array, it works, but in the seconds case _sanitize_array converts the point p to np.array[1, 2]) (array of its coordinates))

@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Mar 17, 2018

@jorisvandenbossche

This comment has been minimized.

Copy link
Member Author

commented Mar 17, 2018

So the question is, do we want to keep the special case of len-1 lists being broadcasted? (so add that behaviour back)
But I agree this is somewhat strange behaviour.

And if not, do we have others ways to ensure pandas regards something as a scalar?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Mar 17, 2018

no this was disallowed as it’s wrong - you can broadcast a scalar but not a list (when u specify and index) - you have to raise here as you don’t know what’s correct

@jorisvandenbossche

This comment has been minimized.

Copy link
Member Author

commented Mar 17, 2018

hmm, yeah, I can of course easily in geopandas create a list with the correct length if the index is specified, instead of always a list of 1 element

@jorisvandenbossche

This comment has been minimized.

Copy link
Member Author

commented Mar 17, 2018

Given we apparently relied on this behaviour on purpose for geopandas, I added a notice of this change to the API changes: #20392

@jreback

This comment has been minimized.

Copy link
Contributor

commented Mar 17, 2018

@jorisvandenbossche

This comment has been minimized.

Copy link
Member Author

commented Mar 17, 2018

That's another issue (specific about Categorical, and in the bug fixes section). At least that was the actual bug being solved, the thing I raise here was a side-effect of the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.