Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Fix typo in PyArray_RegisterDataType error #18303

Merged
merged 1 commit into from
Feb 2, 2021

Conversation

lpsinger
Copy link
Contributor

@lpsinger lpsinger commented Feb 2, 2021

No description provided.

@seberg
Copy link
Member

seberg commented Feb 2, 2021

Thanks, to be clear, are you actually running into this error?

@seberg seberg merged commit fd07632 into numpy:master Feb 2, 2021
@seberg
Copy link
Member

seberg commented Feb 2, 2021

The reason I ask is, that I do not really expect anyone hitting this. So if you do and it isn't completely obviously just an oversight I would like to know (even if you trivially work around or so).

@lpsinger lpsinger deleted the PyArray_RegisterDataType-typo branch February 2, 2021 20:15
@lpsinger
Copy link
Contributor Author

lpsinger commented Feb 2, 2021

Yes, I am, while debugging lscsoft/lalsuite!1526. What is the correct way in C to declare a new PyArray_Descr to wrap a specific kind of Python object?

@seberg
Copy link
Member

seberg commented Feb 2, 2021

Some people have used a trick by making structured descriptor instead. I am not sure how you managed to wrap a specific python object. As far as I could tell, you should never be able to delete such an array correctly, because you cannot implement reference counting for your custom dtype's items!

I really want API to wrap a custom Python kind of python object as a dtype, but I am slightly shocked you managed to do it; and it wasn't super high on my agenda. I have to dig in and see what exactly you are doing and what is working correctly and what may not be...

So there is the hack I have seen, which I hope will continue to work is something I created a test for here: #17320

@seberg
Copy link
Member

seberg commented Feb 2, 2021

@lpsinger I tried to understand what the code does to see if there is some clear different solution, or this is a problem I need to dig into. But I have to admit, the whole SWIG logic is making it hard for me to really follow things. The dtype doesn't really seem to do a whole lot?

For example, your copyswap function seems to simply copy memory around, so do you actually need the HAS_REF flag at all?

@lpsinger
Copy link
Contributor Author

lpsinger commented Feb 2, 2021

@lpsinger I tried to understand what the code does to see if there is some clear different solution, or this is a problem I need to dig into. But I have to admit, the whole SWIG logic is making it hard for me to really follow things. The dtype doesn't really seem to do a whole lot?

For example, your copyswap function seems to simply copy memory around, so do you actually need the HAS_REF flag at all?

I don't know the answers to those questions. @kwwette would.

Meanwhile, though, I'm trying the workaround that you suggested:

      // Numpy workaround for registering user-defined object types. See
      // https://github.com/numpy/numpy/pull/18303,
      // https://github.com/numpy/numpy/pull/17320
      (*pdescr)->names = Py_BuildValue("(s)", "swigobj");
      (*pdescr)->fields = Py_BuildValue("{s:(s,i)}", "swigobj", "O", 0);

It's working so far on my own machine; now I'm waiting to see the results of our CI pipeline.

@seberg
Copy link
Member

seberg commented Feb 2, 2021

OK, to be clear: I am happy to discuss and look into this more; it just may be useful to have a brief chat to jump into whats going on.
Even if nothing much comes out, it could be an interesting case study for when the new DType API has ripened enough.

I added the error thinking that nobody can (usefully) do this. I am still not sure whether you managed or not, but I never had the intention of breaking working code. The idea was to warn new users and remove one worry of strange corner cases that might be hard or impossible to keep supporting unmodified.

@lpsinger
Copy link
Contributor Author

lpsinger commented Feb 2, 2021

@kwwette, @duncanmmacleod, would you care to take over from here and follow up with @seberg?

@kwwette
Copy link

kwwette commented Feb 3, 2021

@seberg Thanks for helping us with this.

LALSuite is a large scientific library written in C, for which we generate Python wrappings using SWIG plus some custom code. LALSuite contains a number of data structures of the following form:

typedef struct {
   double value;
   ...
} MyType;

typedef struct {
   size_t length;
   MyType **data;   // array of length 'length' containing items of type 'MyType*'
} MyTypeVector;

We want to be able to view the data member of a MyTypeVector instance from Python, i.e. given an instance v of type MyTypeVector, the user can write v.data[0].value and access the first MyType array element and its fields.

To do this we've been using NumPy arrays as a view of the data member of MyTypeVector. When the user types v.data, they get a NumPy array with an object type, where each element is viewing a MyType* pointer. To then make each element accessible from Python when the use types v.data[0], we provide a custom getitem function which accesses the data vector, and returns the requested MyType* element wrapped in a class provided by SWIG. The SWIG wrapper class then makes the MyType* fields available, e.g. so that v.data[0].value works.

This has worked well so far. We've not noticed any problems with memory leaks with e.g. reference counting. In the statement val = v.data[0].value, a wrapper class to the v.data[0] would be created on the fly, exist temporarily while its .value attribute is stored in val, and then (I presume) would be destroyed as it's no longer referenced anywhere. The underlying memory of the array is owned by the MyTypeVector type, and we have a custom memory tracking system in place that e.g. prevents v from being destroyed as long as v.data, v.data[0], etc. are referenced anywhere.

It seems so far that for Numpy 1.20 we just need to work out a more portable way of creating the NumPy array descriptor that we've done previously. We'd appreciate though whether you're aware of any other changes to NumPy that might mean we can no longer using the NumPy array views as described above.

@seberg
Copy link
Member

seberg commented Feb 3, 2021

@kwwette OK, I honestly expect you can probably just delete those flags.

The point here is that data (the MyType **data field representation in Python) is the only place where memory gets allocated. I assume that MyType is really just a "naive" type (i.e. it has a fixed size and contains no pointers). So you have to create a dtype for MyType (which you are free to do). Issues would only arise if MyType included a "reference":

typedef struct {
    double value;
    char *dynamically_allocated_storage;  /* If I copy/delete this element, what would happen? */
} MyType;

or alternatively:

typedef struct {
    double value;
    PyObject *obj;  /* object field needs reference counting */
} MyType;

which for example is a use case that users who want to implement strings run into. To implement those correctly you do need NPY_ITEM_IS_POINTER and/or NPY_ITEM_REFCOUNT (but it won't really help you...). If the MyType struct does not have these properties, these flags should simply be unnecessary, the only thing I can think of that they may achieve is that data.view(...) might be forbidden.

Just be sure, if MyType is a trivial struct of C types (or more specifically NumPy dtypes), it might be easier to use a structured dtype, though?

@seberg
Copy link
Member

seberg commented Feb 3, 2021

Oh, one thing. If you want v.data[0].value to work, you would have to use np.record. That is a bit weirder, but np.dtype((np.record, np.dtype([("value", "d")]))) for example should do the trick if used as a dtype.

@kwwette
Copy link

kwwette commented Feb 3, 2021

@seberg Thanks for the pointers. We'll try dropping the flags you mentioned and see how that goes.

I don't think we'd be able to translate the C structs into NumPy structured types, at least not easily. LALSuite is a library of thousands of functions, structs, etc. so the point of using SWIG is to automatically generate the Python interface from the library C headers, without having to manually maintain/sync separate C and Python interface descriptions. So we'd have to find some way of automating the generation of NumPy structured types. That may be possible but would require a lot more work, and might break existing code.

There are also some array element types in LALSuite (i.e. MyType) which are more than just "naive" types, i.e. they themselves point to dynamic memory. In these cases the consequence of assignment, e.g. v.data[0] = elem should be that the MyType pointer data[0] is assigned to elem (clobbering any existing value), and elem is marked as no longer "owning" the MyType pointer it wraps, which is now owned by v. That's probably not entirely Pythonic, but there's limits to what you can do wrapping a C library ... It would be useful to keep this functionality e.g. for initialising arrays, but I guess it could be dropped - and v.data made read-only - if it can no longer be supported.

@seberg
Copy link
Member

seberg commented Feb 3, 2021

OK, so the point of those flags (internally to NumPy) is dynamic memory (really we only use Python Objects, which are also a type of dynamic memory). But the problem is that the additional functionality to handle dynamic memory correctly is not accessible for user dtypes. The problem I see is this:

arr = v.data
arr_copy = arr.copy()
del v, arr  # I assume `v` may "clean up"
arr_copy[0].value  # segfault?
# and/or:
del arr_copy  # may not clean up the elements correctly

At that point arr_copy does not know anything about v and cannot keep its memory alive (arr itself probably has v as its base. It does not own its memory, so these issues do not apply.).
So if you have a MyType with dynamically allocated data you may have a problem (with or without the NumPy change).

If this is a larger headache for you, I can also undo the change. Right now, all the flag probably achieves is disabling a few things like: v.data.view(some_dtype_with_correct_shape). But there whether clean or not, it probably will just stay identically working/broken...

I would like to get to a point where user DTypes can contain references (i.e. have their own memory management if necessary), but it is on the backburner list, simply because it is something that can be achieved later on.

@kwwette
Copy link

kwwette commented Feb 3, 2021

@seberg I'm not sure I've ever tried to copy one of our NumPy array views, as in your arr_copy = arr.copy() line. I suspect you're right in that the correct memory tracking would break down here. It's not a problem we've encountered so far, however, so I guess none of our users have tried it, at least not for these struct-type arrays. (I imagine it would fine for an array of double, however.)

Is there a way to disable copy() operations for a particular NumPy type, e.g. a flag, or a "not implemented" error we can raise. I think I'd be okay with just disallowing this behaviour in instances where it doesn't make sense or may be unsafe.

@seberg
Copy link
Member

seberg commented Feb 4, 2021

I honestly do not think there is currently a way to do that. It might be possible to add one. The only way I would have thought of is keeping copyswap and/or copyswapn to be NULL. But since you implemented copyswap, I assume you need it. If copying already doesn't work (for these datatypes containing flexible data), you would be lucky, and just dropping the flags seems perfect.
One thing I noticed glancing at the code is that it looks like the function probably registers a new dtype every time (NumPy appears to have a guard against this, but only if the passed pointer remains identical, the reason is that NumPy expects you to pass in a static struct as descr. If you call it more than once, it might make sense to make the descriptor static ... = NULL; and just return it after the first call?)

Let me know if I can help you out now or in the future. DTypes will be revamped hopefully by the 1.21 release, which should give you cleaner solutions in the future (if you are still using this approach), but I guess that would be a bit of a long haul.

For now: We are planning on a 1.20.1 release probably this weekend already to fix some simple, but annoying issues. If it will help you, I don't need much prodding to be convinced to just remove this error message again! :)

@kwwette
Copy link

kwwette commented Feb 5, 2021

@seberg We now have a fix for our issue; see the diff here. Basically we didn't need the NPY_ITEM_IS_POINTER flag, as you suggested - I must have put that in originally out of paranoia - so we just needed to adapt to using PyArray_DescrNewFromType() instead of a static struct.

I don't remember where copyswap is used, it may be this only works for basic types e.g. views of double or structs without pointers.

The dtypes are static in that they're stored in a lookup table against a static list of type information generated by SWIG. So when a user wants to view an array of elements MyType*, the SWIG wrapper code provides type information for MyType*, which is then used as a key to find a matching NumPy dtype - or create one if none exists yet.

Thanks again for your help!

@seberg
Copy link
Member

seberg commented Feb 5, 2021

OK, glad it seems to work for you. I do think that NPY_ITEM_IS_POINTER might have bought you some small things for the structs with dynamic data (and its techncially more correct there). Basically, it probably would disallow a few things that cannot work in any case. By no means would it protect you from all the ways it could go wrong, though.
Unfortunately that may just be impossible right now, but maybe we can give you new API in the future!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants