-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Memory error running cuspatial.point_in_polygon
#1092
Comments
Hi, can you tell me what version fails? We don't have a branch called "master". We do have "main". I don't recognize this version number.
|
Sorry, I mean I Built using docker:
where
The version: '0+untagged.1758.gc856fa0' is the one I tested that doesn't not have the afore mentioned issue and is built is from a cuspatial fork, where I was using a different version for GDAL |
I am seeing a couple different errors wIth point_in_polygon.
which gives
which gives from line 91 in join.py. In other words, this attempt returned from cpp_point_in_python() without error, but with a null result. I also tried the above example with many thousands of data points, but got the same result. Version: Installation CUDA Information Note also, I had to make a small change in cuspatial.core.spatial.join.py at lines 77:80 to get the above results:
|
Hi @jeb2112, your first example does not fail for me with cuSpatial 23.10 (upcoming release, but this should be the same as 23.08). It returns
I do get the error for your second example. but your example is wrong -- it creates a geoseries of points rather than a geoseries with a single polygon. Here's a corrected version:
|
OK Mark thanks for the quick feedback. Based on that info, I concluded my conda env must be broken in some subtle way, so I went back to try another conda installation with cuda 11.2, this time for the entire rapids base package... and that appears to have worked. I went back over what I did, and found the problem. I had started with the rapidsai installation matrix to come up with a conda command for cuda=11.8, python=3.10, cudf, cuspatial. This failed with some incompatibility errors. I then added on a --no-channel-priority, which had been mentioned on stack overflow in some different context i think, and that installed the conda env... but then my point-in-polygon call didn't work. I now understand the --no-channel-priority option somehow overshadowed and/or overrode conflict messages, thus permitting an incorrect install of what appears to a broken combo in the installation matrix for specific packages. |
@epifanio I can explain the original out of memory (OOM) error. This is a regression from earlier versions (as you pointed out) because we added compatibility with GeoArrow GeoSeries. The problem is that for flat point arrays, we used to just be able to take the X and Y coordinates. But a GeoSeries is a DenseUnion type which has an array of types (one per row) and an array of offsets. So for 1B points, you have 16GiB for positions, 4GiB for offsets, and 1GiB for types. However because cuDF does not support Arrow Fixed-size List, we have to use a regular list, requires an additional indices buffer (which is identical to the DenseUnion offsets!). This adds a redundant 4GiB. So in all we have 25GiB storage for 1B points. On my 32GiB V100 I can create the GeoSeries, but I OOM in the |
Version
master
On which installation method(s) does this occur?
Source
Describe the issue
Memory error running
cuspatial.point_in_polygon
- same code return no errors when running fromcuspatial
v23.02
Minimum reproducible example
Return a memory error
Equivalent code, except the new API syntax:
Relevant log output
Environment details
Other/Misc.
No response
The text was updated successfully, but these errors were encountered: