feat: replace forward pass with decision tree (#8)

* wip: replace forward pass with decision tree Elaboration into 3D of the Wu et al 2005 paper Wu, Otoo, and Suzuki. "Two Strategies to Speed up Connected Component Labeling Algorithms". 2005. LBNL-59102. https://crd-legacy.lbl.gov/~kewu/ps/LBNL-59102.html * fix: upgrade int to int64_t in pyx * fix: add top right element in 2D plane to mask * fix: voxels E,B,H require some unify commands Everything at Z = -1 needs to account for items at Z = 0 Voxel H doesn't account for D,E,F,J * fix: E is already tested by the time we test H Also, we need to test F regardless of D and J * chore: ensure cc3d.cpp is updated * docs: add citation for Wu et al 2005 to README * chore: remove unused definition * fix: remove unnecessary mask element * fix: potential out of bound error + drop excess mask element * fix: rename unify_cfi to unify_ch Names of voxels changed, and removed one * docs: improved comments * test: added some manual testing with drawn pictures * fix: properly decomposed problem into 2d and 3d components * docs: N is the current location chore: remove prototype decision tree code * chore: remove unused code from prototype decision tree * test: automated diagonal test * chore: whitespace * docs: partial update to give flavor of decision tree Needs more information and diagrams. * test: compare with scipy * fix+test: requirements_dev.txt * perf: removed redundant invocations of unify2d Both locations E and B are fully connected to the upper mask and therefore we do not need to reprocess it. * docs: describe the new 3D decision tree * docs: format table * docs: discussed multilabel and the potential DF optimization * docs: fixed listing of coverings Each covering should assume the previous first elements are background.
seung-lab · May 25, 2019 · 0980366 · 0980366
1 parent 974dce1
commit 0980366
Show file tree

Hide file tree

Showing 12 changed files with 1,992 additions and 1,815 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -15,6 +15,8 @@ install:
 - if [[ $PYTHON_MAJOR_VERSION == 3 ]]; then pip3 install numpy; fi
 - if [[ $PYTHON_MAJOR_VERSION == 2 ]]; then pip install -e .; fi
 - if [[ $PYTHON_MAJOR_VERSION == 3 ]]; then pip3 install -e .; fi
+- if [[ $PYTHON_MAJOR_VERSION == 2 ]]; then pip install -r requirements_dev.txt; fi
+- if [[ $PYTHON_MAJOR_VERSION == 3 ]]; then pip3 install -r requirements_dev.txt; fi
 script:
 - if [[ $PYTHON_MAJOR_VERSION == 2 ]]; then py.test -v -x automated_test.py; fi 
 - if [[ $PYTHON_MAJOR_VERSION == 3 ]]; then python3 -m pytest -v -x automated_test.py; fi

diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@
 Connected Components 3D
 =======================
 
-Implementation of connected components in three dimensions using a 26-connected neighborhood. This package uses a 3D variant of the two pass method by Rosenfeld and Pflatz augmented with Union-Find. This implementation is compatible with images containing many different labels, not just binary images. It can be used with 2D or 3D images. 
+Implementation of connected components in three dimensions using a 26-connected neighborhood. This package uses a 3D variant of the two pass method by Rosenfeld and Pflatz augmented with Union-Find and a decision tree based on the 2D 8-connected work of Wu, Otoo, and Suzuki. This implementation is compatible with images containing many different labels, not just binary images. It can be used with 2D or 3D images. 
 
 I wrote this package because I was working on densely labeled 3D biomedical images of brain tissue (e.g. 512x512x512 voxels). Other off the shelf implementations I reviewed were limited to binary images. This rendered these other packages too slow for my use case as it required masking each label and running the connected components algorithm once each time. For reference, there are often between hundreds to thousands of labels in a given volume. The benefit of this package is that it labels all connected components in one shot, improving performance by one or more orders of magnitude.
 
@@ -63,28 +63,66 @@ labels_out = connected_components(labels_in, max_labels=20000)
 // 3d array represented as 1d array
 int* labels = new int[512*512*512](); 
 
-int* cc_labels = cc3d::connected_components3d<int>(
+uint32_t* cc_labels = cc3d::connected_components3d<int>(
   labels, /*sx=*/512, /*sy=*/512, /*sz=*/512
 );
 ```
 
 ## Algorithm Description
 
-The algorithm contained in this package is an elaboration into 3D images of the 2D image connected components algorithm described by Rosenfeld and Pflatz in 1968 [1] (which is well illustrated by [this youtube video](https://www.youtube.com/watch?v=ticZclUYy88)) using an equivalency list implemented as a Union-Find disjoint set with path compression [2].  
+The algorithm contained in this package is an elaboration into 3D images of the 2D image connected components algorithm described by Rosenfeld and Pflatz (RP) in 1968 [1] (which is well illustrated by [this youtube video](https://www.youtube.com/watch?v=ticZclUYy88)) using an equivalency list implemented as Tarjan's Union-Find disjoint set with path compression and balancing [2] and augmented with a decision tree based on work by Wu, Otoo, and Suzuki (WOS). [3]
 
-In RP's two-pass method for 2D images, you raster scan and every time you first encounter a foreground pixel, mark it with a new label if the pixels to its top and left are background. If there is a preexisting label in its neighborhood, use that label instead. Whenever you see that two labels are adjacent, record they are equivalent as this will be used in the second pass. This equivalency table can be constructed in several ways, but some popular approaches are Union-Find with path compression and Selkow's algorithm (which can avoid pipeline stalls). However, Selkow's algorithm is designed for two trees of depth two, appropriate for binary images. We would like to process multiple labels at the same time, making union-find mandatory.
+### First Principles in 2D
 
-In the second pass, the pixels are relabeled using the equivalency table. Union-Find (disjoint sets) establishes one label as the root label of a tree, and the root is considered the representative label. Each pixel is then labeled with the representative label.
+In RP's 4-connected two-pass method for binary 2D images, the algorithm raster scans and every time it first encounters a foreground pixel (the pixels to its top and left are background), it marks it with a new label. If there is a preexisting label in its neighborhood, it uses that label instead. Whenever two labels are adjacent, it records they are equivalent so that they can be relabeled consistently in the second pass. This equivalency table can be constructed in several ways, but some popular approaches are Union-Find with path compression with balancing by rank and Selkow's algorithm (which can avoid pipeline stalls). [4] However, Selkow's algorithm is designed for two trees of depth two, appropriate for binary images. We would like to process multiple labels at the same time, making Union-Find preferable.
 
-To move to a 3D 26-connected neighborhood, we must first note that RP's method is 4-connected, in that they only examine the pixel to the top and left. In 2D, the 8-connected version would have looked at the top, left, top-left, and top-right. In a 3D 26-connected case, we have to look at the top, left, top-left, top-right, the same pattern shifted by z-1, and the voxel located at (x, y, z-1).
+In the second pass, the pixels are relabeled using the equivalency table. Union-Find establishes one label as the root label of a tree, and the root is considered the representative label. Each pixel is then labeled with the representative label. Union-Find is therefore appropriate for representing disjoint sets. Path compression with balancing radically reduces the height of the tree, which accelerates the second pass.  
 
-In the literature, modern connected components algorithms appear to do better than the simple one I selected by about 2x-5x depending on the data. 
-There appear to be some modern competing approaches involving decision trees, and an approach called "Light Speed Labeling". I picked this algorithm mainly because it is easy to understand and implement.  
+WOS approached the problem of accelerating 8-connected 2D connected components on binary images. 8-connected labeling is achieved by extending RP's forward pass mask to the top left and top right corner pixels. In Union-Find based connected components algorithms, the unify step in the first pass is the most expensive step. WOS showed how to optimize away a large fraction of these calls using a decision tree that takes advantage of local topology. For example, since the top-center neighbor of the current pixel is also adjacent to the other mask elements, all of which have already been processed by virtue of the raster scan direction, if it is present it is sufficient to copy its value and move on. If it is absent, pick one of the remaining foreground pixels, copy their value, and use unify for the mask element on the right as it is now known to be non-neighboring with the left hand side. WOS's algorithm continues in this fashion until a match is found or all mask elements are processed at which point a new label is created.  
 
-In order to make a reasonably fast implementation, I implemented union-find with union by rank and path compression. I conservatively used two arrays (a uint32 rank array, and the IDs as the image data type) equal to the size of the image for the union-find data structure instead of a sparse map. The union-find data structure plus the output labels means the memory consumption will be input + output + rank + equivalences. If your input labels are 32-bit, the memory usage will be 4x the input size. This becomes more problematic when 64-bit labels are used, but if you know something about your data, you can decrease the size of the union-find data structure.
+For several years, this algorithm was the world's fastest, though it has been superceded by a newer work that exchanges the static decision tree for a dynamic one or precalculated generated one amongst other improvements. However, WOS's work is significant for both its simplicity and speed and thus serves as the inspiration for this library.
+
+### Extending to 3D
+
+To move to a 3D 26-connected neighborhood, the mask must be extended into three dimensions in order to connect neighboring planes. Observe that the 8-connected mask covers the trailing half of the neighborhood (the part that will have been already processed) such that the current pixel can rely on those labels. Thus the mask for the 26-connected neighborhood covers only two out of three potential planes: the entire lower plane (nine voxels), and a mask identical to WOS's (four voxels) on the current plane. While some further optimizations are possible, to begin, the problem can be conceptually decomposed into two parts: establishing a 9-connected link to the bottom plane and then an 8-connected link to the current plane. This works because the current pixel functions as a hub that transmits the connection information from the 9-connected step to the 8-connected step.  
+
+Fig. 1: Mask for an 8-connected plane. If J,K,L, and M are all eliminated, only N remains and a new label is assigned.
+
+| j | k | l |
+|---|---|---|
+| m | n | . |
+| . | . | . |
+
+The very first Z plane (Z=0) the algorithm runs against is special: the edge effect omits the bottom plane of the mask. Therefore, as the remaining mask is only comprosed of the 8-connected 2D mask, after this pass, the bottom of the image is 8-connected. At Z=1, the 9-connected part of the mask kicks in, forming connections to Z=0, making the current plane now (8 + 9) 17-connected. At Z=2, the 9-connected bottom mask now forms connections from Z=1 to Z=2 on the top, making Z=1 (17 + 9) 26-connected. By induction, when this process proceeds to completion it results in a 26-connected labeling of the volume.   
+
+Following inspiration from WOS, we construct a decision tree on the densely labeled bottom plane that minimizes the number of unifications we need to perform.   
+
+Fig 2. The mask for the lower plane in 3D.
+
+| a | b | c |
+|---|---|---|
+| d | e | f |
+| g | h | i |
+
+As `e` is connected to all other voxels, if present, it can simply be copied. If `e` is absent, `b` and `h` fully cover the mask. If `b` is absent, `h`, `a`, `c` comprise a covering. If `h` is absent, `b`, `g`, `i` are one. Below is a list of coverings such that each proceeding entry in the list assumes the first letters in the entries above are background.
+
+1. `e`
+2. `b`, (`h` | `g`, `i`)
+3. `h`, `a`, `c`
+4. `d`, (`f` | `c`, `i`)
+5. `f`, `g`, `a`
+6. `a`, `c`, `g`, `i`
+7. `c`, `g`, `i`
+8. `g`, `i`
+9. `i`
+
+The decision tree is then constructed such that each of these coverings will be evaluated using the fewest unifications possible. It's possible to further optimize this by noting that `e` and `b` are both fully connected to the upper 2D mask. Therefore, if either of them are present, we can skip the 8-connected unification step. It's also possible to try the DF covering first if B is background, which would save one unification versus HAC given even statistics, but it seems to be slightly slower on the dataset I attempted. To move from binary data to multilabel data, I simply replaced tests for foreground and background with tests for matching labels.
+
+In order to make a reasonably fast implementation, I implemented union-find with union by rank and path compression. I conservatively used two arrays (a uint64 rank array, and the IDs as the image data type) equal to the size of the image for the union-find data structure instead of a sparse map. The union-find data structure plus the output labels means the memory consumption will be input + output + rank + equivalences. If your input labels are 32-bit, the memory usage will be 4x the input size. This becomes more problematic when 64-bit labels are used, but if you know something about your data, you can decrease the size of the union-find data structure.  
 
 ## References
 
 1. A. Rosenfeld and J. Pfaltz. "Sequential Operations in Digital Picture Processing". Journal of the ACM. Vol. 13, Issue 4, Oct. 1966, Pg. 471-494. doi: 10.1145/321356.321357 ([link](https://dl.acm.org/citation.cfm?id=321357))
 2. R. E. Tarjan. "Efficiency of a good but not linear set union algorithm". Journal of the ACM, 22:215-225, 1975. ([link](https://dl.acm.org/citation.cfm?id=321884))
 3. K. Wu, E. Otoo, K. Suzuki. "Two Strategies to Speed up Connected Component Labeling Algorithms". Lawrence Berkely National Laboratory. LBNL-29102, 2005. ([link](https://crd-legacy.lbl.gov/~kewu/ps/LBNL-59102.html))
+4. S. Selkow. "The Tree-to-Tree Editing Problem". Information Processing Letters. Vol. 6, No. 6. June 1977. doi: 10.1016/0020-0190(77)90064-3 ([link](http://www.grantjenks.com/wiki/_media/ideas:tree-to-tree_editing_problem.pdf))
diff --git a/automated_test.py b/automated_test.py
@@ -135,6 +135,31 @@ def test(order, ground_truth):
   test('F', gt_c2f(ground_truth))
 
 
+def test_2d_diagonals():
+  input_labels = np.array([
+    [0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+    [1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0],
+    [0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0],
+    [1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0],
+    [0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0],
+    [0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
+    [0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
+  ], dtype=np.uint32)
+
+  ground_truth = np.array([
+    [0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+    [3, 0, 0, 1, 0, 0, 4, 0, 6, 6, 0, 0, 0, 0, 0, 0, 0],
+    [0, 3, 0, 0, 1, 0, 0, 0, 6, 6, 0, 0, 0, 0, 0, 0, 0],
+    [3, 0, 3, 0, 0, 0, 6, 6, 0, 0, 6, 6, 0, 0, 0, 0, 0],
+    [0, 0, 0, 0, 0, 0, 6, 6, 0, 0, 6, 6, 0, 0, 0, 0, 0],
+    [0, 0, 7, 0, 8, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0],
+    [0, 7, 0, 0, 0, 8, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0],
+  ], dtype=np.uint32)
+
+  output_labels = cc3d.connected_components(input_labels)
+  print(output_labels)
+  assert np.all(output_labels == ground_truth)
+
 def test_2d_cross_with_intruder():
   def test(order, ground_truth):
     input_labels = np.zeros( (5,5), dtype=np.uint8, order=order)
@@ -339,6 +364,29 @@ def test_max_labels_nonsensical():
   assert np.all(real_labels == zero_labels)
   assert np.all(real_labels == negative_labels)
 
+def test_compare_scipy():
+  import scipy.ndimage.measurements
+  import fastremap
+
+  sx, sy, sz = 256, 256, 256
+  labels = np.random.randint(0,2, (sx,sy,sz), dtype=np.bool)
+
+  structure = [
+    [[1,1,1], [1,1,1], [1,1,1]],
+    [[1,1,1], [1,1,1], [1,1,1]],
+    [[1,1,1], [1,1,1], [1,1,1]]
+  ]
+
+  cc3d_labels = cc3d.connected_components(labels)
+  cc3d_labels, wow = fastremap.renumber(cc3d_labels)
+  scipy_labels, wow = scipy.ndimage.measurements.label(labels, structure=structure)
+
+  print(cc3d_labels)
+  print(scipy_labels)
+
+  assert np.all(cc3d_labels == scipy_labels)
+
+
 # def test_sixty_four_bit():
 #   input_labels = np.ones((1626,1626,1626), dtype=np.uint8)
 #   cc3d.connected_components(input_labels, max_labels=0)