New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Value error using sjoin with pandas v0.23 #731
Comments
@uvchik Thanks for the report! This is related to a change in pandas to prevent merging on in compatible columns: pandas-dev/pandas#18352 (but the detection of those cases seems a bit too eager). Would it be possible to make a small reproducible example for this case? (when you only have a single row). Does the row actually fall within any of the polygons? |
OK, it seems that it gives this error if there is no matching row. Eg: In [67]: from shapely.geometry import Point, Polygon
In [68]: import geopandas
In [74]: polygons = geopandas.GeoDataFrame({'col2': [1, 2], 'geometry': [Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]), Polygon([(1, 0), (2, 0), (2, 1), (1, 1)])]})
In [75]: rest_points = geopandas.GeoDataFrame({'col1': [1], 'geometry': [Point(0.5, 0.5)]})
In [76]: geopandas.sjoin(rest_points, polygons, how='left', op='intersects')
Out[76]:
col1 geometry index_right col2
0 1 POINT (0.5 0.5) 0 1
In [77]: rest_points = geopandas.GeoDataFrame({'col1': [1], 'geometry': [Point(-0.5, 0.5)]})
In [78]: geopandas.sjoin(rest_points, polygons, how='left', op='intersects')
...
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat The underlying reason is that the "key column" that gets created under the hood is of object dtype if it is empty. So we would need to ensure it is of float dtype. |
In case someone else encounters this error: I got the same error when my longitude/latitude coordinates where in the wrong order. point_df = geopandas.GeoDataFrame({'geometry': [Point(69.905930,17.169982)]})
poly_df = geopandas.GeoDataFrame({'geometry': shapes})
pointInPolys = sjoin(point_df, poly_df, how='right', op='intersects') only worked after changing the order of the Point coordinates to |
This confirms the bug because as @jorisvandenbossche already pointed out, the error occurs if there is no matching row. It does not matter for what reason no match can be found (thank you @jorisvandenbossche for pointing that out). Both points are within the range. One is near Norway, the other is near India. You could expect geopandas to raise an out-of-range-error if one coordinate is greater than 90. In that case a wrong order could be automatically detected, otherwise it is not possible. You could write your own test against your own bounding box, but this is not the topic of this issue. |
Does anyone have a proposed solution for this? I would say geopandas is basically incompatible with the latest pandas since this bug affects a common core use case. One approach I can see is to just add a temporary column w/ the right dtype enforced to use for the join. Kinda gross but would get the job done:
|
I think a fix would be: --- a/geopandas/tools/sjoin.py
+++ b/geopandas/tools/sjoin.py
@@ -114,7 +114,7 @@ def sjoin(left_df, right_df, how='inner', op='intersects',
else:
# when output from the join has no overlapping geometries
- result = pd.DataFrame(columns=['_key_left', '_key_right'])
+ result = pd.DataFrame(columns=['_key_left', '_key_right'], dtype=float)
if op == "within":
# within implemented as the inverse of contains; swap names Can you check if that solves the issue for you? |
I think that does it! I thought this might cause one of the columns to stay a float but it seems like in the case where everything can remain an int the behavior is the same. Any other downsides you can think of? |
Thank you fixing this bug 😄 🎉 |
The new panda update is very bad. Whatever this merge issue on version 0.23.0 sucks. It is breaking something to do with merging strings, floats and objects. |
@grant-smittkamp This should be fixed with the new geopandas 0.4 release |
Is this fix backward compatible? Should it work with pandas v0.22 and geopandas v0.4.0? We have some problems with this combination but I am not sure if it is caused by this fix or not. |
i am sorry but the version 0.23 is sucks , |
pd.merge is a great feature for process data. i develop a tool on 0.22 using may pd.merge. but it can not work on 0.23. i love to use pandas but sorry guy sometimes you take the decision to change the great features by the bad ones. |
I use the
sjoin
function to add the region name (polygons) to every point within the region. Some points are not in any region, therefore I filter these points and buffer them step by step. So the points layer without intersection becomes smaller and smaller. If there is only one row left I get the following error in pandas v0.23 which I did not get before (pandas < v0.23). Using geopandas v0.3.0.My call:
Error message:
class:
GeoDataFrame
method:
merge(self, *args, **kwargs)
line:
result = DataFrame.merge(self, *args, **kwargs)
I do not understand the error and why it happens only with the last point (last row) and only with the newest pandas version. I had a look at "What's New" but could not find anything.
Full message:
The text was updated successfully, but these errors were encountered: