-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Description
Describe the issue:
I have been running into issues with trying to use Numpy object arrays to contruct ND-Arrays of Python objects where I want to take advantage of the ND-indexing and other array features, but the values of the array are Python objects, which themselves have an __array__
method defined, but the __array__
method is inefficient and should be avoided.
In these cases I am not able to construct an object array from a nested list via slicing without invoking the objects __array__
method.
A simplified example of where this is a problem is as follows:
Consider a skeleton implementation of a Python class for N-qubit Pauli matrices. These objects have an efficient representation (for simplicity I will just use the length N string labels), but can also be represented inefficiently as shape (2 ** N, 2 **N) complex matrices. Suppose this class defines an __array__
method to convert to these inefficient complex arrays like shown in the bellow code example.
Now suppose I want to construct an object
ND-array of Paulis. The simplest case would be
arr1 = np.empty([1], dtype=object)
arr1[0] = Pauli("X")
This does not invoke the Pauli.__array__
method and returns array([Pauli(X)], dtype=object)
.
However if I try and set via slicing
arr2 = np.empty([1], dtype=object)
arr2[:] = [Pauli("X")]
# Prints: Pauli.__array__ called
This invokes the array method, even though the returned array is an object array with the original objects as values (np.all(arr1 == arr2)
is True).
Complete code shown bellow, where you can see where this becomes a problem for example by setting nq=20
which would require ~17TB memory per Pauli for each __array__
return.
Reproduce the code example:
import numpy as np
class Pauli:
"""Basic N-qubit Pauli"""
_mats = {
"I": np.eye(2, dtype=complex),
"X": np.array([[0, 1], [1, 0]], dtype=complex),
"Y": np.array([[0, -1j], [0, 1j]], dtype=complex),
"Z": np.array([[1, 0], [0, -1]], dtype=complex),
}
def __init__(self, label):
"""Initialize a Pauli from a string label"""
self.label = label
def __repr__(self):
return f"Pauli({self.label})"
def to_matrix(self):
"""Convert to a (2**N, 2**N) complex matrix"""
ret = np.eye(1, dtype=complex)
for i in self.label:
ret = np.kron(self._mats[i], ret)
return ret
def __array__(self, dtype=None):
# Print to know when the `__array__` method is invoked
print("Pauli.__array__ called")
mat = self.to_matrix()
if dtype:
mat = mat.astype(dtype)
return mat
def __eq__(self, other):
if isinstance(other, Pauli):
return self.label == other.label
return False
# Create an array-like nested list of Pauli objects
nq = 1
paulis = [[Pauli(nq * "I"), Pauli(nq * "X")],
[Pauli(nq * "Y"), Pauli(nq * "Z")]]
# Initialize and empty array and set values to nested list elements
arr = np.empty([2, 2], dtype=object)
arr[:] = paulis
# This print `Pauli.__array__ called` 4 times, so invokes it for each Pauli
# Still returns an object array with the `Pauli` objects as values.
Error message:
No response
Runtime information:
My main numpy runtime is
1.23.5
3.9.13 | packaged by conda-forge | (main, May 27 2022, 17:01:00)
[Clang 13.0.1 ]
I also tried on Numpy 1.25.2 and get same results.
Context for the issue:
This is probably quite a specific edge case, but it is something I've been struggling with for awhile and have had to come up with various messy work arounds to avoid the issue.