Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hierarchical Merging #1100

Merged
merged 31 commits into from Jan 30, 2015

Conversation

@vighneshbirodkar
Copy link
Contributor

commented Aug 14, 2014

Examples

thresh=40

40

thresh=40

40_2

thresh=40

ball_40

thresh=60

60

thresh=60

car_60

@@ -78,7 +78,7 @@ def merge_nodes(self, src, dst, weight_func=min_weight, extra_arguments=[],
"""
src_nbrs = set(self.neighbors(src))
dst_nbrs = set(self.neighbors(dst))
neighbors = (src_nbrs & dst_nbrs) - set([src, dst])
neighbors = (src_nbrs | dst_nbrs) - set([src, dst])

This comment has been minimized.

Copy link
@vighneshbirodkar

vighneshbirodkar Aug 14, 2014

Author Contributor

I don't know how this went through undetected till now.

This comment has been minimized.

Copy link
@stefanv

stefanv Aug 15, 2014

Member

No code review is perfect :)

@stefanv

This comment has been minimized.

Copy link
Member

commented Aug 15, 2014

I'd like to see an example of how this looks on one image with different thresholds.

===========
This example constructs a Region Adjacency Graph (RAG) and progressively merges
regions which are similar in color. Merging two adjacent regions produces

This comment has been minimized.

Copy link
@stefanv

stefanv Aug 15, 2014

Member

which -> that, if you want to be pedantic

This example constructs a Region Adjacency Graph (RAG) and progressively merges
regions which are similar in color. Merging two adjacent regions produces
a new regions with all the pixels from the merged regions. Regions are merged
till no two adjacent regions are similar enough.

This comment has been minimized.

Copy link
@stefanv

stefanv Aug 15, 2014

Member

till -> until

This comment has been minimized.

Copy link
@stefanv

stefanv Aug 15, 2014

Member

Perhaps "until no highly similar regions remain" or "until no neighboring regions are above a given similarity"

The Region Adjacency Graph.
thresh : float
The threshold. Nodes are merged until the minimum edge weight in the
graph exceeds `thresh`.

This comment has been minimized.

Copy link
@stefanv

stefanv Aug 15, 2014

Member

Please describe what this threshold should contain. E.g., how would the user know to compute a sensible value here? It needs to be obvious from the docstring.

min_wt = d['weight']

if min_wt < thresh:
total_color = (rag.node[y]['total color'] +

This comment has been minimized.

Copy link
@stefanv

stefanv Aug 15, 2014

Member

What does "total color" refer to?

This comment has been minimized.

Copy link
@vighneshbirodkar

vighneshbirodkar Aug 15, 2014

Author Contributor

It is the sum of the colors of all pixels belonging to a region.


count = 0
arr = np.arange(labels.max() + 1)
for n, d in rag.nodes_iter(data=True):

This comment has been minimized.

Copy link
@stefanv

stefanv Aug 15, 2014

Member
for ix, (n, d) in enumerate(rag.nodes_iter(data=True)):
    for label in d['labels']:
        arr[label] = ix

Please avoid the use of l as a variable name.

@vighneshbirodkar

This comment has been minimized.

Copy link
Contributor Author

commented Aug 15, 2014

@stefanv I will include the effects of changing threshold in my blog post.

thresh : float
The threshold. Regions connected by an edges with smaller wegiht than
`thresh` are merged. A high value of `thresh` would mean that a lot of
regions are merged, and the output will contain fewer regions.

This comment has been minimized.

Copy link
@vighneshbirodkar

vighneshbirodkar Aug 15, 2014

Author Contributor

@Stefan I modified this a bit. But the input RAG may not always be a color distance graph, so I can't be more specific.

@coveralls

This comment has been minimized.

Copy link

commented Aug 15, 2014

Coverage Status

Coverage increased (+0.02%) when pulling ec9aad0 on vighneshbirodkar:ha into a363802 on scikit-image:master.

import numpy as np


def _hmerge(rag, x, y, n):

This comment has been minimized.

Copy link
@jni

jni Aug 15, 2014

Contributor

Give an indication in the function name that the mean color is used — right now it could be any merging function

"""
min_wt = 0
while min_wt < thresh:

This comment has been minimized.

Copy link
@jni

jni Aug 15, 2014

Contributor

remove this blank line

min_wt = 0
while min_wt < thresh:

valid_edges = ((x, y, d) for x, y, d in rag.edges(data=True) if x != y)

This comment has been minimized.

Copy link
@jni

jni Aug 15, 2014

Contributor

This is extremely inefficient. You need to think about the efficiency of your algorithms. Because you are revisiting the entire edge list after each merge, this algorithm has complexity O(E^2). It can actually be done in O(ElogE), by maintaining a min-heap of all the edges.

This comment has been minimized.

Copy link
@jni

jni Aug 15, 2014

Contributor

Ok "extremely inefficient" is a bit harsh. =) But it's certainly not optimal. =)

@jni

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2014

It's a nice and simple algorithm, isn't it? =) If you add the heap queue, it should be ready to merge fairly quickly!

For the heap, there is an additional complication: you need to be able to update existing edges. These can then either bubble back to their correct positions, or they need to be "deleted" or marked as "invalid", and reinserted with the new values.

@vighneshbirodkar

This comment has been minimized.

Copy link
Contributor Author

commented Aug 15, 2014

Once we delete a node, we will have some edges which will need invalidating/deleting. How do we look for these edges in the heap ?
What might help us is a sorted set implemetation, something which stores element in a sorted set, which still allowing arbitraty access. I had implemented a similar data structure in C
https://github.com/vighneshbirodkar/vigredis/blob/master/src/set.c

blist library has a sorted set implementation.

@vighneshbirodkar

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2014

Just ap update, I have almost figured it out using Python's heapq module and @jni s idea of invalidating edges.

@vighneshbirodkar

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2014

@jni
The time has improved from 0.3s to 0.09s, thanks for pointing it out.

@coveralls

This comment has been minimized.

Copy link

commented Aug 16, 2014

Coverage Status

Coverage increased (+0.03%) when pulling eb5d3b7 on vighneshbirodkar:ha into a363802 on scikit-image:master.

@coveralls

This comment has been minimized.

Copy link

commented Aug 16, 2014

Coverage Status

Coverage increased (+0.03%) when pulling eb5d3b7 on vighneshbirodkar:ha into a363802 on scikit-image:master.

rag : RAG
The Region Adjacency Graph.
thresh : float
The threshold. Regions connected by an edges with smaller wegiht than

This comment has been minimized.

Copy link
@jni

jni Aug 17, 2014

Contributor

The threshold. Regions connected by an edge with weight smaller than thresh are merged. [Rest is unnecessary]

This comment has been minimized.

Copy link
@jni

jni Aug 18, 2014

Contributor

This docstring still needs updating

def merge_hierarchical(labels, rag, thresh, in_place=True):
"""Perform hierarchical merging of a RAG.
Given an image's labels and its RAG, the method merges the similar nodes

This comment has been minimized.

Copy link
@jni

jni Aug 17, 2014

Contributor

greedily merges the most similar pair of nodes until no edges lower than thresh remain.

# Validate all edges and push them in heap
data['valid'] = True
wt = data['weight']
heapq.heappush(edge_heap, (wt, x, y, data))

This comment has been minimized.

Copy link
@jni

jni Aug 17, 2014

Contributor

I think this implementation might be buggy: data is the data on the edge, so when we re-insert an edge (after invalidating and recomputing), data will again point to a valid edge. Instead, it's the heap item itself that must be valid or invalid.

This comment has been minimized.

Copy link
@vighneshbirodkar

vighneshbirodkar Aug 17, 2014

Author Contributor

@jni
Consider and edge, let's say (5, 7) with weghit 50.
Let's say we merge 3 and 7 and then the weight (5, 7) becomes 100.

We don't know where in the heap (5, 7) might be, so we invalidate the data dictionary. At this point the dictionary in the graph and the heap, both point to the same object. Once data is invalidated, we insert a new tuple with the correct weight.

This comment has been minimized.

Copy link
@jni

jni Aug 17, 2014

Contributor

@vighneshbirodkar but some edges repeat, so the new tuple will have to point back to the same data dictionary...?

You can track the tuples and mutate them (well, they would have to be lists), so that you can have g[x][y]['heap_item'] = tup. This allows you to invalidate on demand.

This comment has been minimized.

Copy link
@vighneshbirodkar

vighneshbirodkar Aug 17, 2014

Author Contributor

In that case, the dictionary is invalidated anyways, and a new tuple with the exact same values and is inserted and made vaild

This comment has been minimized.

Copy link
@jni

jni Aug 17, 2014

Contributor

Oh, right, I see below that you copy the entire dictionary over. That seems expensive!


rag.merge_nodes(x, y, _hmerge_mean_color)
for n in rag.neighbors(y):
if n != y:

This comment has been minimized.

Copy link
@jni

jni Aug 17, 2014

Contributor

Why is this if statement necessary? How could there be a self-edge in the graph?

This comment has been minimized.

Copy link
@jni

jni Aug 17, 2014

Contributor

As above, please try to compress the below block into a function call.


# invalidates the edge in the heap, if it all it exists
data = rag[y][n]
data['valid'] = False

This comment has been minimized.

Copy link
@jni

jni Aug 17, 2014

Contributor

I thought we already invalidated all necessary edges, above?

rag = rag.copy()

edge_heap = []
for x, y, data in rag.edges_iter(data=True):

This comment has been minimized.

Copy link
@jni

jni Aug 17, 2014

Contributor

Can we use the src, dst nomenclature here also? It makes it much clearer what is happening, later in the function


edge_heap = []
for x, y, data in rag.edges_iter(data=True):
if x != y:

This comment has been minimized.

Copy link
@jni

jni Aug 17, 2014

Contributor

Again, is this at all possible? Aren't we better off enforcing no self-edges elsewhere? I feel that this hurts readability.

This comment has been minimized.

Copy link
@vighneshbirodkar

vighneshbirodkar Aug 17, 2014

Author Contributor

Self edges are required during a normalized cut, it fails unless w[i, i] = 1

This comment has been minimized.

Copy link
@jni

jni Aug 17, 2014

Contributor

Interesting... Then I think the ncut functions should take care of that, rather than having these artifacts all over the place. How about making a separate pull request to fix up ncut so that self edges are assumed not to be in the rag, and are added just before export to sparse format? (along with the above explanation in a comment)

wt = data['weight']
heapq.heappush(edge_heap, (wt, x, y, data))

while min_wt < thresh:

This comment has been minimized.

Copy link
@jni

jni Aug 17, 2014

Contributor

This is confusing in combination with the below if. How about: while edge_heap[0][0] < thresh?

This comment has been minimized.

Copy link
@jni

jni Aug 17, 2014

Contributor

Then below you can just write if data['valid']

min_wt, x, y, data = heapq.heappop(edge_heap)

# Ensure popped edge is valid, if not, the edge is discarded
if min_wt < thresh and data['valid']:

This comment has been minimized.

Copy link
@jni

jni Aug 17, 2014

Contributor

Can you put everything below (up to for n in rag.neighbors(y)) into a function? As possible... I think this function needs to be compressed a bit to aid comprehension.

@jni

This comment has been minimized.

Copy link
Contributor

commented Aug 17, 2014

Hi @vighneshbirodkar! As I said before, great work here. I just left some additional comments but I think you should be able to address them pretty easily! =) Thanks!

rag[n][y] = data_copy

# validate this edge
rag.add_edge(y, n, valid=True)

This comment has been minimized.

Copy link
@jni

jni Aug 17, 2014

Contributor

And this is confusing, since the edge has to exist based on the above statements. So rag[y][n]['valid'] = True would be clearer... Assuming you continue with the dictionary approach.

@vighneshbirodkar vighneshbirodkar force-pushed the vighneshbirodkar:ha branch from 39fccc9 to c96fbe4 Jan 27, 2015

@coveralls

This comment has been minimized.

Copy link

commented Jan 28, 2015

Coverage Status

Coverage decreased (-0.0%) to 92.56% when pulling 237b11f on vighneshbirodkar:ha into b6a1bcd on scikit-image:master.

"""
return graph.merge_hierarchical(labels, rag, thresh, rag_copy,
in_place_merge, _pre_merge_mean_color,
_weight_mean_color)

This comment has been minimized.

Copy link
@jni

jni Jan 28, 2015

Contributor

Actually I don't think you need this definition here, it's too elaborate for an example. Just make the appropriate call below! ie:

labels2 = graph.merge_hierarchical(labels, g, thresh=40, 
                                   merge_func=_merge_mean_color,
                                   weight_func=_weight_mean_color)

And do change the function name to _merge_mean_color since the argument is merge_func.

2, 'non existant mode')

def _weight_mean_color(graph, src, dst, n):
#print 'merging

This comment has been minimized.

Copy link
@jni

jni Jan 28, 2015

Contributor

delete this commented-out line

labels = np.zeros((8, 8), dtype='uint8')

img[:, :, :] = 128
labels[:,:] = 1

This comment has been minimized.

Copy link
@jni

jni Jan 28, 2015

Contributor

PEP8

img[:, :, :] = 128
labels[:,:] = 1

img[0:4,0:4,:] = 255,255,255

This comment has been minimized.

Copy link
@jni

jni Jan 28, 2015

Contributor

PEP8 again


g = graph.rag_mean_color(img, labels)
result = merge_hierarchical_mean_color(labels, g, 300)
assert len(np.unique(result)) == 1

This comment has been minimized.

Copy link
@jni

jni Jan 28, 2015

Contributor

Thanks for adding this, but it's not a very good test unfortunately.

Here's what I would do: create an image with gray colours e.g. 0.1, 0.2, 0.30000001, 0.5. Set the threshold just above the difference between 0.1 and 0.2 (sqrt(0.1 + 0.1 + 0.1)+eps) (the three channels). Then show that merge_hierarchical will merge the 0.1 and 0.2 segments and stop, while merge_threshold will merge 0.1, 0.2, and 0.3 together. Does that make sense?

@jni

This comment has been minimized.

Copy link
Contributor

commented Jan 28, 2015

Hey @vighneshbirodkar thanks for the rebase and new push! I think the changes I've suggested should be quick and this could be finalised by tomorrow!


img[4:, 0:4, :] = 20, 20, 20
labels[4:, 0:4] = 3

This comment has been minimized.

Copy link
@vighneshbirodkar

vighneshbirodkar Jan 28, 2015

Author Contributor

@jni
30.00001 is not required, ince 10 and 20 ( or 20 and 30 ) are merged, the resulting node differs a lot from the remaining one and merging does not proceed.

This comment has been minimized.

Copy link
@jni

jni Jan 28, 2015

Contributor

@vighneshbirodkar But I want you to make the test more stringent: len(unique(labels)) can hide many things going wrong. I want you to check that result is equal to an expected_result array. So I wanted to make sure that 10 and 20 merge before 20 and 30, reliably and logically. =) One alternative to stay with uint8 is to use (11 * sqrt(3) + eps) as the threshold and then use 10, 20, 31.

@coveralls

This comment has been minimized.

Copy link

commented Jan 28, 2015

Coverage Status

Coverage increased (+0.03%) to 92.6% when pulling e79bbed on vighneshbirodkar:ha into b6a1bcd on scikit-image:master.

@stefanv

This comment has been minimized.

Copy link
Member

commented Jan 28, 2015

Ready to roll?

@jni

This comment has been minimized.

Copy link

commented on doc/examples/plot_rag_merge.py in e79bbed Jan 28, 2015

Please use the keyword arguments here (as I suggested), as they make this much more readable. This is supposed to be an example for someone who has never seen this functionality before. What will they make of "40, False, True"?

@jni

This comment has been minimized.

Copy link
Contributor

commented Jan 28, 2015

@stefanv aallllmost! =)

@coveralls

This comment has been minimized.

Copy link

commented Jan 29, 2015

Coverage Status

Coverage increased (+0.04%) to 92.61% when pulling d3775f9 on vighneshbirodkar:ha into b6a1bcd on scikit-image:master.

@coveralls

This comment has been minimized.

Copy link

commented Jan 29, 2015

Coverage Status

Coverage increased (+0.04%) to 92.61% when pulling 2ffa484 on vighneshbirodkar:ha into b6a1bcd on scikit-image:master.

@jni

This comment has been minimized.

Copy link
Contributor

commented Jan 30, 2015

@vighneshbirodkar Done! Brilliant! Thank you for your continued effort! =)

jni added a commit that referenced this pull request Jan 30, 2015

@jni jni merged commit 1992c35 into scikit-image:master Jan 30, 2015

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
@vighneshbirodkar

This comment has been minimized.

Copy link
Contributor Author

commented Jan 30, 2015

4515147cbd774c8af480405cbc8642d8f02d2a305cd1f2e73a3202675be1157f

@stefanv

This comment has been minimized.

Copy link
Member

commented Jan 30, 2015

Kudos, also for surviving @jni's relentless energy for reviewing. The code
quality shows!
On Jan 29, 2015 9:55 PM, "Vighnesh Birodkar" notifications@github.com
wrote:

[image: 4515147cbd774c8af480405cbc8642d8f02d2a305cd1f2e73a3202675be1157f]
https://cloud.githubusercontent.com/assets/3823490/5971745/a2a6426e-a872-11e4-84f1-7e073e8922cc.jpg


Reply to this email directly or view it on GitHub
#1100 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.