Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix STRtree insert/query to avoid void nullptr #261

Closed

Conversation

jorisvandenbossche
Copy link
Member

xref #233 (comment) and the discussion the geos-devel mailing list (https://lists.osgeo.org/pipermail/geos-devel/2020-November/009870.html).

This change is what I understand from that discussion should be the solution, but this is not actually working (it's giving garbage / crashing).

One thing I am wondering: I am now passing the address of an integer, but this int is a loop variable, so that is only a local variable that I suppose is cleaned up at the end of STRtree_new.
So if we want to be able to access those integers in STRtree_query we might need to store them on the STRtree object? (the python wrapper object, not the GEOS STRtree).

@dbaston
Copy link

dbaston commented Nov 29, 2020

Right, you don't want to keep a pointer to a local variable that has gone out-of-scope. You can see how it's handled in sf here: https://github.com/r-spatial/sf/blob/master/src/geos.cpp#L398

Copy link
Contributor

@brendan-ward brendan-ward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.

What is the advantage to using resizable vectors here? Per #129, I've been meaning to replace those with allocating an array of known size in advance.

Both the geometry array and the index array should be the same size as the incoming array; this could be managed in the tree via malloc and free instead of delegating that to the resizable vector. That should also make passing the pointer to the values into GEOS a little cleaner. I'll add a few suggestions inline that may help show this (untested); probably also needs an update on usage in query callback but I didn't touch that.

(updating the geometry vector should probably be in a different PR; will also need to add a _size member to track the size of that array in that case, since right now that is managed by the vector).

@@ -172,7 +179,7 @@ static PyObject* STRtree_new(PyTypeObject* type, PyObject* args, PyObject* kwds)
Py_INCREF(obj);
kv_push(GeometryObject*, _geoms, obj);
count++;
GEOSSTRtree_insert_r(ctx, tree, geom, (void*)i);
GEOSSTRtree_insert_r(ctx, tree, geom, &kv_A(_tree_indexes, i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with array managed in tree

Suggested change
GEOSSTRtree_insert_r(ctx, tree, geom, &kv_A(_tree_indexes, i));
GEOSSTRtree_insert_r(ctx, tree, geom, &(_tree_indexes[i]);

@@ -115,6 +116,7 @@ static PyObject* STRtree_new(PyTypeObject* type, PyObject* args, PyObject* kwds)
npy_intp n, i, count = 0;
GEOSGeometry* geom;
pg_geom_obj_vec _geoms;
npy_intp_vec _tree_indexes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
npy_intp_vec _tree_indexes;
npy_intp* _tree_indexes;

@@ -145,7 +147,11 @@ static PyObject* STRtree_new(PyTypeObject* type, PyObject* args, PyObject* kwds)
n = PyArray_SIZE((PyArrayObject*)arr);

kv_init(_geoms);
kv_init(_tree_indexes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kv_init(_tree_indexes);
_tree_indexes = (npy_intp*)malloc(n * sizeof(npy_intp));

(and drop following line)

for (i = 0; i < n; i++) {
kv_push(npy_intp, _tree_indexes, i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kv_push(npy_intp, _tree_indexes, i);
_tree_indexes[i] = i;

@@ -159,6 +165,7 @@ static PyObject* STRtree_new(PyTypeObject* type, PyObject* args, PyObject* kwds)
Py_XDECREF(kv_A(_geoms, i));
}
kv_destroy(_geoms);
kv_destroy(_tree_indexes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kv_destroy(_tree_indexes);
free(_tree_indexes);

@@ -22,6 +22,7 @@ typedef struct {
PyObject_HEAD void* ptr;
npy_intp count;
pg_geom_obj_vec _geoms;
npy_intp_vec _tree_indexes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
npy_intp_vec _tree_indexes;
npy_intp* _tree_indexes;

@@ -104,6 +104,7 @@ static void STRtree_dealloc(STRtreeObject* self) {
Py_XDECREF(kv_A(self->_geoms, i));
}
kv_destroy(self->_geoms);
kv_destroy(self->_tree_indexes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kv_destroy(self->_tree_indexes);
free(self->_tree_indexes);

@dbaston
Copy link

dbaston commented Nov 30, 2020

If the geometries you're putting in the tree are stored in fixed-size array, and you need to get the index when you query the tree (instead of getting the geometry directly), you can still store the geometry pointer in the tree and recover the index via pointer arithmetic from the head of the geometry array.

@brendan-ward
Copy link
Contributor

store the geometry pointer in the tree and recover the index via pointer arithmetic

I think this might not work, since the incoming array we need to use for indexing may have geometries that we do not add to the tree (they may include Python None values for non-geometries, which we treat as NULL here and don't add to tree); thus the pointer offsets in the tree would be for only those geometries added to the tree.

But, if there is something I'm missing, this would be a clever way of avoiding a redundant array of indexes.

@dbaston
Copy link

dbaston commented Nov 30, 2020

Not missing anything. This would only work if all the geometries in the contiguous array are added to the tree.

@jorisvandenbossche
Copy link
Member Author

What is the advantage to using resizable vectors here?

Yeah, I did it mostly out of consistency (to not use a mixture of kvec and plain arrays). And, not having to store the size separately is nice (in general, for the tree_indexes we don't actually need it, that's true). But I know here it has a fixed size, that's also the reason I resized it up front.
But I agree that using a plain array here would be nice (now, using C++ instead of C would probably even be nicer for such things .. ;)).

I think this might not work, since the incoming array we need to use for indexing may have geometries that we do not add to the tree

We indeed don't add each geometry to the tree, but looking more closely, we are actually adding something to _geoms for each element (just a NULL in case of emtpy/missing geometries). So pointer arithmetic with _geoms could actually work I think.

I am only not sure how to do this in the query_callback function, as this does not have access to the full array. It receives the address of the item to be inserted, but to calculate its "index", I also need the address of the first element of the _geom array, which the query_callback function doesn't know?

@dbaston
Copy link

dbaston commented Dec 1, 2020

I also need the address of the first element of the _geom array, which the query_callback function doesn't know?

You'd have to pass both the start of the geom array and the result vector to the callback by wrapping them in a struct. Something like this:

struct callback_userdata {
  GEOSGeometry* geom_array;
  npy_intp_vec* query_indexes;
};

void query_callback(void* item, void* user_data) {
  struct callback_userdata* ud = (struct callback_userdata*) user_data;
  size_t item_index = ((GEOSGeometry*) item) - ud->geom_array;

  kv_push(npy_intp, ud->query_indexes, item_index);
}

Not sure if it's worth the complexity.

@brendan-ward
Copy link
Contributor

New idea (testing now):

  • store _geoms as an array of GeometryObjects that we malloc / free instead of resizable vector
  • store _size on the STRtreeObject struct so that we know the size of that array at various places (some potential confusion vs count that is already there, which is the count of geometries added to the tree, whereas _size is the size of _geoms, which is the size of the input array )
  • on insert to the tree, pass the address of the GeometryObject
  • in query_callback, push the address of a GeometryObject onto a resizeable vector of GeometryObject*s
  • use pointer offsets from the head of _geoms to calculate an index when we need it (either to return indexes or to get underlying GEOS geometry for predicate tests)

@brendan-ward
Copy link
Contributor

I'm getting some results from the pointer math that I don't completely understand (never used pointer math directly before).

Given _geoms is an array of GeometryObject* and that GeometryObject* has a size of 8 (on my machine), and given these addresses:

  • head of _geoms: 0x7f8f52c3fb00
  • first geometry: 0x7f8f52c3fb00
  • second geometry: 0x7f8f52c3fb08 (offset by 8 from head of _geoms)

When I calculate the offset of the second geometry like geom - (GeometryObject*)_geoms (otherwise compiler error comparing GeometryObject* to GeometryObject**), I get back 0 instead of 1.

When I calculate what the address would be by incrementing the pointer (GeometryObject*)_geoms + 1 => 0x7f8f52c3fb20 (offset by 32 from head of _geoms).

Perhaps I'm missing a type cast somewhere?

@dbaston
Copy link

dbaston commented Dec 6, 2020

I think int index = geom - _geoms[0] would work.

@brendan-ward
Copy link
Contributor

I think I figured it out, just needed to be more explicit about calculating the offsets ourselves and not relying on the compiler to infer the correct item size when using subtraction in this case:

char* head_ptr = (char*)_geoms;
char* geom_ptr = (char*)geom;  // e.g., 2nd geom in array
npy_intp geom_index = (npy_intp)((geom_ptr - head_ptr) / sizeof(GeometryObject*));

This now gives me the correct indices back out.

I think I might have fallen afoul of the C standard behavior around subtraction only being well defined when the elements are of the same array, and by the time I was running the subtraction (after first adding a geometry to a resizeable vector in query_callback) these were likely no longer considered part of the same array.

@jorisvandenbossche
Copy link
Member Author

Sorry for the slow follow up here, and thaks @brendan-ward for picking it up! I actually had a version locally with the suggestion of @dbaston, but only the minimal change, your PR is much more comprehensive change, so closing this PR as superseded by #265

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants