Value error using sjoin with pandas v0.23 #731

uvchik · 2018-05-24T17:27:46Z

I use the sjoin function to add the region name (polygons) to every point within the region. Some points are not in any region, therefore I filter these points and buffer them step by step. So the points layer without intersection becomes smaller and smaller. If there is only one row left I get the following error in pandas v0.23 which I did not get before (pandas < v0.23). Using geopandas v0.3.0.

My call:

new = gpd.sjoin(rest_points, polygons, how='left', op='intersects')

Error message:

ValueError: You are trying to merge on object and int64 columns.
If you wish to proceed you should use pd.concat

class: GeoDataFrame
method: merge(self, *args, **kwargs)
line: result = DataFrame.merge(self, *args, **kwargs)

I do not understand the error and why it happens only with the last point (last row) and only with the newest pandas version. I had a look at "What's New" but could not find anything.

Full message:

  File "virtualenv/lib/python3.5/site-packages/geopandas/tools/sjoin.py", line 140,
    in sjoin suffixes=('_%s' % lsuffix, '_%s' % rsuffix))
  File "virtualenv/lib/python3.5/site-packages/geopandas/geodataframe.py", line 418,
     in merge result = DataFrame.merge(self, *args, **kwargs)
  File "virtualenv/lib/python3.5/site-packages/pandas/core/frame.py", line 6379,
     in merge copy=copy, indicator=indicator, validate=validate)
  File "virtualenv/lib/python3.5/site-packages/pandas/core/reshape/merge.py", line 60,
     in mergevalidate=validate)
  File "virtualenv/lib/python3.5/site-packages/pandas/core/reshape/merge.py", line 554,
     in __init__self._maybe_coerce_merge_keys()
  File "virtualenv/lib/python3.5/site-packages/pandas/core/reshape/merge.py", line 980,
        in _maybe_coerce_merge_keys
    raise ValueError(msg)
ValueError: You are trying to merge on object and int64 columns.
If you wish to proceed you should use pd.concat

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2018-05-24T20:21:50Z

@uvchik Thanks for the report!

This is related to a change in pandas to prevent merging on in compatible columns: pandas-dev/pandas#18352 (but the detection of those cases seems a bit too eager).

Would it be possible to make a small reproducible example for this case? (when you only have a single row). Does the row actually fall within any of the polygons?

jorisvandenbossche · 2018-05-24T20:39:06Z

OK, it seems that it gives this error if there is no matching row. Eg:

In [67]: from shapely.geometry import Point, Polygon

In [68]: import geopandas

In [74]: polygons = geopandas.GeoDataFrame({'col2': [1, 2], 'geometry': [Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]), Polygon([(1, 0), (2, 0), (2, 1), (1, 1)])]})

In [75]: rest_points = geopandas.GeoDataFrame({'col1': [1], 'geometry': [Point(0.5, 0.5)]})

In [76]: geopandas.sjoin(rest_points, polygons, how='left', op='intersects')
Out[76]: 
   col1         geometry  index_right  col2
0     1  POINT (0.5 0.5)            0     1

In [77]: rest_points = geopandas.GeoDataFrame({'col1': [1], 'geometry': [Point(-0.5, 0.5)]})

In [78]: geopandas.sjoin(rest_points, polygons, how='left', op='intersects')
...
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

The underlying reason is that the "key column" that gets created under the hood is of object dtype if it is empty. So we would need to ensure it is of float dtype.

vedal · 2018-06-24T20:25:24Z

In case someone else encounters this error: I got the same error when my longitude/latitude coordinates where in the wrong order.

point_df = geopandas.GeoDataFrame({'geometry': [Point(69.905930,17.169982)]})
poly_df = geopandas.GeoDataFrame({'geometry': shapes})
pointInPolys = sjoin(point_df, poly_df, how='right', op='intersects')

only worked after changing the order of the Point coordinates to Point(17.169982,69.905930). Here, shapes is a list of Polygon objects. A correct output for the point Point(69.905930,17.169982) should have been "out of range" or something similar, since it ends up far from all the Polygons in shapes

uvchik · 2018-06-25T11:12:36Z

This confirms the bug because as @jorisvandenbossche already pointed out, the error occurs if there is no matching row. It does not matter for what reason no match can be found (thank you @jorisvandenbossche for pointing that out).

Both points are within the range. One is near Norway, the other is near India. You could expect geopandas to raise an out-of-range-error if one coordinate is greater than 90. In that case a wrong order could be automatically detected, otherwise it is not possible.

You could write your own test against your own bounding box, but this is not the topic of this issue.

bnaul · 2018-07-02T22:02:34Z

Does anyone have a proposed solution for this? I would say geopandas is basically incompatible with the latest pandas since this bug affects a common core use case.

One approach I can see is to just add a temporary column w/ the right dtype enforced to use for the join. Kinda gross but would get the job done:

result = result.set_index('_key_left')
joined = (
          left_df
          .merge(result, left_index=True, right_index=True, how='left')
          )
right_df['_key'] = right_df.index.values.astype(joined_df._key_right.dtype)  # tmp key
joined = (
              joined
              .merge(right_df.drop(right_df.geometry.name, axis=1),
              how='left', left_on='_key_right', right_on='_key',
              suffixes=('_%s' % lsuffix, '_%s' % rsuffix))
         )
right_df.drop('_key', axis=1, inplace=True)
joined = joined.set_index(index_left).drop(['_key_right'], axis=1)

jorisvandenbossche · 2018-07-02T23:22:20Z

I think a fix would be:

--- a/geopandas/tools/sjoin.py
+++ b/geopandas/tools/sjoin.py
@@ -114,7 +114,7 @@ def sjoin(left_df, right_df, how='inner', op='intersects',
 
     else:
         # when output from the join has no overlapping geometries
-        result = pd.DataFrame(columns=['_key_left', '_key_right'])
+        result = pd.DataFrame(columns=['_key_left', '_key_right'], dtype=float)
 
     if op == "within":
         # within implemented as the inverse of contains; swap names

Can you check if that solves the issue for you?

bnaul · 2018-07-03T00:07:58Z

I think that does it! dtype=object does not since I guess it gets coerced back into an int at some subsequent step. But I can confirm the example you posted above works now w/ this patch and pandas 0.23.

I thought this might cause one of the columns to stay a float but it seems like in the case where everything can remain an int the behavior is the same. Any other downsides you can think of?

uvchik · 2018-07-16T07:30:03Z

Thank you fixing this bug 😄 🎉

grant-smittkamp · 2018-07-17T17:42:49Z

The new panda update is very bad. Whatever this merge issue on version 0.23.0 sucks. It is breaking something to do with merging strings, floats and objects.

jorisvandenbossche · 2018-07-23T10:03:18Z

@grant-smittkamp This should be fixed with the new geopandas 0.4 release

uvchik · 2018-08-20T15:53:16Z

Is this fix backward compatible? Should it work with pandas v0.22 and geopandas v0.4.0? We have some problems with this combination but I am not sure if it is caused by this fix or not.

Malouke · 2018-09-06T19:17:48Z

i am sorry but the version 0.23 is sucks ,
i love to use pandas but sorry gus someimes you take the decison to change the great features by the bads ones.

Popebl · 2018-10-11T06:43:21Z

pd.merge is a great feature for process data. i develop a tool on 0.22 using may pd.merge. but it can not work on 0.23. i love to use pandas but sorry guy sometimes you take the decision to change the great features by the bad ones.

jorisvandenbossche added this to the 0.4 milestone May 24, 2018

jorisvandenbossche added the bug label May 24, 2018

ljwolf added a commit to ljwolf/geopandas that referenced this issue Jul 7, 2018

add patch for sjoin in geopandas#731 and test

b13ab98

ljwolf mentioned this issue Jul 7, 2018

BUG: fix empty sjoin #762

Merged

ljwolf added a commit to ljwolf/geopandas that referenced this issue Jul 7, 2018

add patch for sjoin in geopandas#731 and test

32d9254

jorisvandenbossche closed this as completed in #762 Jul 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Value error using sjoin with pandas v0.23 #731

Value error using sjoin with pandas v0.23 #731

uvchik commented May 24, 2018 •

edited

jorisvandenbossche commented May 24, 2018

jorisvandenbossche commented May 24, 2018

vedal commented Jun 24, 2018 •

edited

uvchik commented Jun 25, 2018 •

edited

bnaul commented Jul 2, 2018 •

edited

jorisvandenbossche commented Jul 2, 2018

bnaul commented Jul 3, 2018

uvchik commented Jul 16, 2018

grant-smittkamp commented Jul 17, 2018

jorisvandenbossche commented Jul 23, 2018

uvchik commented Aug 20, 2018 •

edited

Malouke commented Sep 6, 2018

Popebl commented Oct 11, 2018

Value error using sjoin with pandas v0.23 #731

Value error using sjoin with pandas v0.23 #731

Comments

uvchik commented May 24, 2018 • edited

jorisvandenbossche commented May 24, 2018

jorisvandenbossche commented May 24, 2018

vedal commented Jun 24, 2018 • edited

uvchik commented Jun 25, 2018 • edited

bnaul commented Jul 2, 2018 • edited

jorisvandenbossche commented Jul 2, 2018

bnaul commented Jul 3, 2018

uvchik commented Jul 16, 2018

grant-smittkamp commented Jul 17, 2018

jorisvandenbossche commented Jul 23, 2018

uvchik commented Aug 20, 2018 • edited

Malouke commented Sep 6, 2018

Popebl commented Oct 11, 2018

uvchik commented May 24, 2018 •

edited

vedal commented Jun 24, 2018 •

edited

uvchik commented Jun 25, 2018 •

edited

bnaul commented Jul 2, 2018 •

edited

uvchik commented Aug 20, 2018 •

edited