Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix several issues with relabel_sequential #3740

Merged
merged 6 commits into from Apr 4, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
27 changes: 19 additions & 8 deletions skimage/segmentation/_join.py
Expand Up @@ -62,7 +62,7 @@ def relabel_sequential(label_field, offset=1):
Parameters
----------
label_field : numpy array of int, arbitrary shape
An array of labels.
An array of labels, which must be non-negative integers.
offset : int, optional
The return labels will start at `offset`, which should be
strictly positive.
Expand All @@ -72,14 +72,16 @@ def relabel_sequential(label_field, offset=1):
relabeled : numpy array of int, same shape as `label_field`
The input label field with labels mapped to
{offset, ..., number_of_labels + offset - 1}.
The data type will be the same as `label_field`, except when
offset + number_of_labels causes overflow of the current data type.
forward_map : numpy array of int, shape ``(label_field.max() + 1,)``
The map from the original label space to the returned label
space. Can be used to re-apply the same mapping. See examples
for usage.
for usage. The data type will be the same as `relabeled`.
inverse_map : 1D numpy array of int, of length offset + number of labels
The map from the new label space to the original space. This
can be used to reconstruct the original label field from the
relabeled one.
relabeled one. The data type will be the same as `relabeled`.

Notes
-----
Expand Down Expand Up @@ -114,20 +116,29 @@ def relabel_sequential(label_field, offset=1):
>>> relab
array([5, 5, 6, 6, 7, 9, 8])
"""
offset = int(offset)
if offset <= 0:
raise ValueError("Offset must be strictly positive.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, yeah the error message was non-sensical before

In [14]: relabel_sequential(np.asarray([2, 3, 4]), -1)                                                  
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-e5025e3e4264> in <module>
----> 1 relabel_sequential(np.asarray([2, 3, 4]), -1)

~/miniconda3/envs/owl/lib/python3.7/site-packages/skimage/segmentation/_join.py in relabel_sequential(label_field, offset)
    129         labels = np.concatenate(([0], labels))
    130     inverse_map = np.zeros(offset - 1 + len(labels), dtype=np.intp)
--> 131     inverse_map[(offset - 1):] = labels
    132     relabeled = forward_map[label_field]
    133     return relabeled, forward_map, inverse_map

ValueError: could not broadcast input array from shape (4) into shape (2)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if int(label_field.min()) < 0:
hmaarrfk marked this conversation as resolved.
Show resolved Hide resolved
uschmidt83 marked this conversation as resolved.
Show resolved Hide resolved
raise ValueError("Cannot relabel array that contains negative values.")
m = label_field.max()
if not np.issubdtype(label_field.dtype, np.signedinteger):
if not np.issubdtype(label_field.dtype, np.integer):
new_type = np.min_scalar_type(int(m))
label_field = label_field.astype(new_type)
m = m.astype(new_type) # Ensures m is an integer
labels = np.unique(label_field)
labels0 = labels[labels != 0]
if m == len(labels0): # nothing to do, already 1...n labels
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, the previous line

if m == len(labels0):  # nothing to do, already 1...n labels

was only valid for offset = 1. This was only mentioned in the comment but not checked, i.e. it should've been

if offset == 1 and m == len(labels0):  # nothing to do, already 1...n labels

Anyway, I guess hardly anyone uses offset != 1. Why was the previous function relabel_from_one replaced with relabel_sequential? (Since the offset introduces quite a bit of (subtle) complexity.)

Also, relabel_sequential assumes that the background has value 0, but I noticed color.label2rgb assumes (by default) that the background label has value -1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the previous function relabel_from_one replaced with relabel_sequential?

Someone requested the functionality and it seemed like the right thing to do and an easy fix. Oops. =)

Also, relabel_sequential assumes that the background has value 0, but I noticed color.label2rgb assumes (by default) that the background label has value -1.

Yes, historically, skimage used -1 as the background label, but we have slowly started homogenising to 0. But this will take time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, thanks!

required_type = np.min_scalar_type(offset + len(labels0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min_scalar_type is a little annoying. It can return unsigned types. I think I've personally come around to not liking the use of "unsigned" unless you need specific "unsigned" behavior, that is, 255+2 == 1 is something you want to be true in your math.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole required_type thing is really covering an edge case that I don't think will happen much in practice, unless someone purposefully uses restricted data types (especially uint8).

I frequently save label masks to disk as uint16 TIFF files. Hence, they have this type when loaded again from disk. I like working with 16-bit integers for space reasons, which can be important for large 3D arrays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to allow a dtype parameter to ensure uint16???

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new logic (use the larger of min_dtype and input dtype) is sufficient here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

if np.dtype(required_type).itemsize > np.dtype(label_field.dtype).itemsize:
label_field = label_field.astype(required_type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice.

new_labels0 = np.arange(offset, offset + len(labels0))
if np.all(labels0 == new_labels0):
return label_field, labels, labels
forward_map = np.zeros(m + 1, int)
forward_map[labels0] = np.arange(offset, offset + len(labels0))
forward_map = np.zeros(int(m + 1), dtype=label_field.dtype)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I replaced m + 1 with int(m + 1) to fix an issue when m is of type np.uint64.

Is this intended behavior or a numpy bug?

>>> (np.uint64(5) + 1).dtype
dtype('float64')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yikes! I think actually I came across this and the answer was that it is indeed intended, because (a) NumPy's type promotion rules do not depend on the input values, only on the types, and (b) they are supposed to be safe in the sense of being able to represent the result, and float64 is the only thing that can represent anything with a Python int, since they are unbounded.

At least, that's my memory of it. Perhaps @stefanv has more details. But, either way, thanks for the fix!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this sorcery!!!!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the argument Juan outlines is being followed here. Counter-intuitive, but probably correct? Same as (np.uint8(3)/2).dtype.

forward_map[labels0] = new_labels0
if not (labels == 0).any():
labels = np.concatenate(([0], labels))
inverse_map = np.zeros(offset - 1 + len(labels), dtype=np.intp)
inverse_map = np.zeros(offset - 1 + len(labels), dtype=label_field.dtype)
inverse_map[(offset - 1):] = labels
relabeled = forward_map[label_field]
return relabeled, forward_map, inverse_map
55 changes: 55 additions & 0 deletions skimage/segmentation/tests/test_join.py
Expand Up @@ -3,6 +3,7 @@

from skimage._shared import testing
from skimage._shared.testing import assert_array_equal
import pytest


def test_join_segmentations():
Expand Down Expand Up @@ -91,3 +92,57 @@ def test_relabel_sequential_dtype():
assert_array_equal(fw, fw_ref)
inv_ref = np.array([0, 0, 0, 0, 0, 1, 5, 8, 42, 99])
assert_array_equal(inv, inv_ref)


@pytest.mark.parametrize('dtype', (np.byte, np.short, np.intc, np.int_,
np.longlong, np.ubyte, np.ushort,
np.uintc, np.uint, np.ulonglong))
@pytest.mark.parametrize('data_already_sequential', (False, True))
def test_relabel_sequential_int_dtype_stability(data_already_sequential,
dtype):
if data_already_sequential:
ar = np.array([1, 3, 0, 2, 5, 4], dtype=dtype)
else:
ar = np.array([1, 1, 5, 5, 8, 99, 42, 0], dtype=dtype)
assert all(a.dtype == dtype for a in relabel_sequential(ar))


def test_relabel_sequential_int_dtype_overflow():
ar = np.array([1, 3, 0, 2, 5, 4], dtype=np.uint8)
offset = 255
uschmidt83 marked this conversation as resolved.
Show resolved Hide resolved
ar_relab, fw, inv = relabel_sequential(ar, offset=offset)
assert all(a.dtype == np.uint16 for a in (ar_relab, fw, inv))
ar_relab_ref = ar.astype(np.int, copy=True)
ar_relab_ref[ar_relab_ref > 0] += offset - 1
assert_array_equal(ar_relab, ar_relab_ref)


def test_relabel_sequential_negative_values():
ar = np.array([1, 1, 5, -5, 8, 99, 42, 0])
with pytest.raises(ValueError):
relabel_sequential(ar)
jni marked this conversation as resolved.
Show resolved Hide resolved


@pytest.mark.parametrize('offset', (0, -3))
@pytest.mark.parametrize('data_already_sequential', (False, True))
def test_relabel_sequential_nonpositive_offset(data_already_sequential,
offset):
if data_already_sequential:
ar = np.array([1, 3, 0, 2, 5, 4])
else:
ar = np.array([1, 1, 5, 5, 8, 99, 42, 0])
with pytest.raises(ValueError):
relabel_sequential(ar, offset=offset)


@pytest.mark.parametrize('offset', (1, 5))
@pytest.mark.parametrize('with0', (False, True))
def test_relabel_sequential_already_sequential(offset, with0):
if with0:
ar = np.array([1, 3, 0, 2, 5, 4])
else:
ar = np.array([1, 3, 2, 5, 4])
ar_relab, fw, inv = relabel_sequential(ar, offset=offset)
ar_relab_ref = ar.copy()
ar_relab_ref[ar_relab_ref > 0] += offset - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, ar_relab_ref = np.where(ar > 0, ar + offset - 1, 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I change and commit this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah go for it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

assert_array_equal(ar_relab, ar_relab_ref)