You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have several 1D arrays of varying but comparable lengths to be merged (vstack) into a contiguous 2D array.
I merge them into a masked array where padding entries are masked out.
I simply run np.unique(return_inverse=True) on the masked array.
The output is two arrays: a masked key array with unique entries which optionally includes a single masked padding entry - and a plain inverse array with the size corresponding to the input.
I would expect the other way around: key array to be a plain 1D array while inverse to be masked. There are two separate issues here:
len(key) should represent the number of unique entries. Right now it does not: the masking element (999999 in the example below) may be present or may be not, depending on whether the mask is empty or not. This makes masking pretty much useless for np.unique: if I pass a masked array I clearly want to avoid masked entries in the key entries. I could equally just do np.unique(masked_array.data) otherwise.
Given np.unique is a transparent operation (i.e. I can run it on both arrays and masked arrays) I would expect transparent output. Without knowing anything what unique does and what is it for, inverse should definitely be a masked array because it has its elements corresponding one-to-one to the input.
As a result of this inconsistency I have to (a) check whether anything has been masked at all (b) conditionally pick out the padding entry from key output (c) apply a mask to inverse output. Something like the following.
This is what I ended up with. It is easier to implement for numeric arrays because there, as far as I remember, the mask entry is always at the end. Unfortunately, I am working with char arrays which behave differently (probably, at the masked sort level).
defmasked_unique(a, return_inverse=False, fill_value=None):
""" A proper implementation of `np.unique` for masked arrays. Parameters ---------- a : np.ma.masked_array The array to process. return_inverse : bool If True, returns the masked inverse. fill_value : int An optional value to fill the `return_inverse` array. Returns ------- key : np.ndarray Unique entries. inverse : np.ma.masked_array, optional Integer masked array with the inverse. """key=np.unique(a, return_inverse=return_inverse)
ifreturn_inverse:
key, inverse=keybarrier=np.argwhere(key.mask)
iflen(barrier) >0:
barrier=barrier.squeeze() # all indices after the barrier have to be shifted (char only?)inverse[inverse>barrier] -=1# shift everything after the barrieriffill_valueisNone:
inverse[a.mask.reshape(-1)] =len(key) -1# shift masked stuff to the endelse:
inverse[a.mask.reshape(-1)] =fill_valueinverse=np.ma.masked_array(data=inverse, mask=a.mask)
key=key.data[np.logical_not(key.mask)]
ifreturn_inverse:
returnkey, inverseelse:
returnkey
vstack
) into a contiguous 2D array.np.unique(return_inverse=True)
on the masked array.key
array with unique entries which optionally includes a single masked padding entry-
and a plaininverse
array with the size corresponding to the input.I would expect the other way around:
key
array to be a plain 1D array whileinverse
to be masked. There are two separate issues here:len(key)
should represent the number of unique entries. Right now it does not: the masking element (999999
in the example below) may be present or may be not, depending on whether the mask is empty or not. This makes masking pretty much useless fornp.unique
: if I pass a masked array I clearly want to avoid masked entries in thekey
entries. I could equally just donp.unique(masked_array.data)
otherwise.np.unique
is a transparent operation (i.e. I can run it on both arrays and masked arrays) I would expect transparent output. Without knowing anything whatunique
does and what is it for,inverse
should definitely be a masked array because it has its elements corresponding one-to-one to the input.As a result of this inconsistency I have to (a) check whether anything has been masked at all (b) conditionally pick out the padding entry from
key
output (c) apply a mask toinverse
output. Something like the following.Strictly speaking, I could equally run
np.unique
on rawa.data
in the above example to fix this. I pretty much do all the job by myself.Reproducing code example:
Numpy/Python version information:
The text was updated successfully, but these errors were encountered: