-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: set_index results in invalid dask GeoDataFrame (partitions are DataFrames) #59
Comments
Interestingly, this is not the case for all the partitions: >>> [type(partition.compute()) for partition in ddf2.partitions]
[pandas.core.frame.DataFrame,
geopandas.geodataframe.GeoDataFrame,
pandas.core.frame.DataFrame,
pandas.core.frame.DataFrame] The change seems to happen in |
Ah, interesting. Then it might also be a bug in GeoPandas (if some operation on a GeoDataFrame results in a pandas DataFrame where it could have preserved the GeoDataFrame type) |
I believe the problem is in the fact that Dask uses import geopandas as gpd
from partd.pandas import serialize, deserialize
df = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
type(deserialize(serialize(df))) # pandas.core.frame.DataFrame The reason we sometimes get GeoDataFrame is that |
@DahnJ indeed, good catch. I opened an issue on the dask/partd side about this: dask/partd#52 Alternative for now is to specify |
Using the dask
set_index
method results in an "invalid" dask_geopandas.GeoDataFrame, where the partitions are no longer GeoDataFrames but DataFrames (which then results in errors when computing spatial operations)The text was updated successfully, but these errors were encountered: