feat: Added Detection post processor #24

charlesmindee · 2021-01-18T18:17:30Z

Postprocessor (meta class) + DBPostprocessor module added

fg-mindee

Thanks for the PR! I added a few comments to check. Could you switch your docstring convention?

Also, the dbpostprocessor being specific to db net, I think we should place it in the same file as the model definition (so differentiable_binarization.py if you defined the model there). That will avoid having files with implicit naming

fg-mindee · 2021-01-19T13:43:48Z

doctr/models/detection/postprocessor.py

+    """
+    class to postprocess documents
+    a postprocessor takes the raw output from a model
+    a postprocessor return a list of tensor, each tensor N X 5
+    """


Could you switch to Google style docstring like so : https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html

doctr/models/detection/postprocessor.py

fg-mindee · 2021-01-19T13:47:46Z

doctr/models/detection/dbpostprocessor.py

+
+
+class DBPostprocessor(Postprocessor):
+


docstring missing here

You need to add the constructor args as well like so https://github.com/teamMindee/doctr/blob/main/doctr/documents/elements.py#L13-L18

fg-mindee · 2021-01-19T13:50:10Z

doctr/models/detection/dbpostprocessor.py

+    def box_score(
+        self,
+        pred: np.ndarray,
+        _box: np.ndarray
+    ) -> float:
+        """
+        Compute the confidence score for a box : mean between p_map values on the box
+        :param pred: p_map (output of the model)
+        :param _box: box
+        """
+
+        h, w = pred.shape[:2]
+        box = _box.copy()
+        xmin = np.clip(np.floor(box[:, 0].min()).astype(np.int), 0, w - 1)
+        xmax = np.clip(np.ceil(box[:, 0].max()).astype(np.int), 0, w - 1)
+        ymin = np.clip(np.floor(box[:, 1].min()).astype(np.int), 0, h - 1)
+        ymax = np.clip(np.ceil(box[:, 1].max()).astype(np.int), 0, h - 1)
+
+        mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
+        box[:, 0] = box[:, 0] - xmin
+        box[:, 1] = box[:, 1] - ymin
+        cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1)
+
+        return cv2.mean(pred[ymin:ymax + 1, xmin:xmax + 1], mask)[0]


Can't we access the inside of the architecture to have the actual box objectness?
If we are to compute the objectness based on its geometry, in my opinion, we can remove this since it does not really bring value.

So I need to remove this function ? How do I compute the score of each box then ?

fg-mindee · 2021-01-19T13:51:22Z

doctr/models/detection/dbpostprocessor.py

+        height, width = bitmap.shape[:2]
+        boxes = []
+        contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
+        for contour in contours[:self.max_candidates]:


This is dangerous. I understand we can limit the number of candidates but they are not ordered. So if we limit the number of candidates, perhaps they should be ordered beforehand (by objectness for instance?)

fg-mindee · 2021-01-19T13:53:25Z

doctr/models/detection/dbpostprocessor.py

+        for raw_batch in raw_pred:
+            p = tf.squeeze(raw_batch, axis=-1)  # remove last dim
+            bitmap = tf.cast(p > self.bin_thresh, tf.float32)
+
+            p = tf.unstack(p, axis=0)
+            bitmap = tf.unstack(bitmap, axis=0)
+
+            boxes_batch = []
+
+            for p_, bitmap_ in zip(p, bitmap):
+                p_ = p_.numpy()
+                bitmap_ = bitmap_.numpy()
+                boxes = self.bitmap_to_boxes(p_, bitmap_)
+                boxes_batch.append(np.array(boxes))
+
+            bounding_boxes.append(boxes_batch)


Same here, some comments would be nice 🙏

fg-mindee · 2021-01-19T13:54:56Z

doctr/models/detection/dbpostprocessor.py

+        for raw_batch in raw_pred:
+            p = tf.squeeze(raw_batch, axis=-1)  # remove last dim
+            bitmap = tf.cast(p > self.bin_thresh, tf.float32)
+
+            p = tf.unstack(p, axis=0)
+            bitmap = tf.unstack(bitmap, axis=0)
+
+            boxes_batch = []
+
+            for p_, bitmap_ in zip(p, bitmap):
+                p_ = p_.numpy()
+                bitmap_ = bitmap_.numpy()
+                boxes = self.bitmap_to_boxes(p_, bitmap_)
+                boxes_batch.append(np.array(boxes))
+
+            bounding_boxes.append(boxes_batch)


Same here, some comments would be nice 🙏

fg-mindee · 2021-01-19T13:55:00Z

doctr/models/detection/dbpostprocessor.py

+        poly = Polygon(points)
+        distance = poly.area * self.unclip_ratio / poly.length
+        offset = pyclipper.PyclipperOffset()
+        offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
+        expanded_points = np.array(offset.Execute(distance))
+        x, y, w, h = cv2.boundingRect(expanded_points)


Missing some comments here

fg-mindee · 2021-01-19T14:00:24Z

doctr/models/detection/dbpostprocessor.py

+        poly = Polygon(points)
+        distance = poly.area * self.unclip_ratio / poly.length
+        offset = pyclipper.PyclipperOffset()
+        offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
+        expanded_points = np.array(offset.Execute(distance))
+        x, y, w, h = cv2.boundingRect(expanded_points)


Missing some comments here

fg-mindee · 2021-01-19T14:00:30Z

doctr/models/detection/dbpostprocessor.py

+        for raw_batch in raw_pred:
+            p = tf.squeeze(raw_batch, axis=-1)  # remove last dim
+            bitmap = tf.cast(p > self.bin_thresh, tf.float32)
+
+            p = tf.unstack(p, axis=0)
+            bitmap = tf.unstack(bitmap, axis=0)
+
+            boxes_batch = []
+
+            for p_, bitmap_ in zip(p, bitmap):
+                p_ = p_.numpy()
+                bitmap_ = bitmap_.numpy()
+                boxes = self.bitmap_to_boxes(p_, bitmap_)
+                boxes_batch.append(np.array(boxes))
+
+            bounding_boxes.append(boxes_batch)


Same here, some comments would be nice 🙏

fg-mindee

Thanks for the PR! I added a few comments to check

codecov · 2021-01-20T11:00:17Z

Codecov Report

Merging #24 (dc1fd1d) into main (5e9df6e) will decrease coverage by 0.39%.
The diff coverage is 97.56%.

@@            Coverage Diff             @@
##             main      #24      +/-   ##
==========================================
- Coverage   98.72%   98.32%   -0.40%     
==========================================
  Files           9       12       +3     
  Lines         157      239      +82     
==========================================
+ Hits          155      235      +80     
- Misses          2        4       +2

Impacted Files	Coverage Δ
doctr/models/detection/postprocessor.py	`90.90% <90.90%> (ø)`
...tr/models/detection/differentiable_binarization.py	`98.52% <98.52%> (ø)`
doctr/models/__init__.py	`100.00% <100.00%> (ø)`
doctr/models/detection/__init__.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e9df6e...dc1fd1d. Read the comment docs.

fg-mindee

A few things to change and we're good to go!

fg-mindee · 2021-01-20T11:02:58Z

doctr/models/detection/dbpostprocessor.py

+
+
+class DBPostprocessor(Postprocessor):
+


You need to add the constructor args as well like so https://github.com/teamMindee/doctr/blob/main/doctr/documents/elements.py#L13-L18

fg-mindee · 2021-01-20T11:04:05Z

doctr/models/detection/dbpostprocessor.py


-        return cv2.mean(pred[ymin:ymax + 1, xmin:xmax + 1], mask)[0]
+        return cv2.mean(pred[ymin:ymax + 1, xmin:xmax + 1])[0]


no need for cv2 here, numpy ndarray have a .mean() method that works just fine 👌

fg-mindee

Apart from the comments I made, you might want to add from .detection import * to models.__init__.py so that we can import detection models easily! But this is some nitpicking, the rest is fine!

fg-mindee · 2021-01-20T14:25:06Z

test/test_models.py

@@ -1,15 +1,21 @@
 import pytest
 from io import BytesIO
+
+from doctr import models


Let's keep "local deps" as last imports

fg-mindee · 2021-01-20T14:28:04Z

doctr/models/detection/postprocessor.py

+__all__ = ['Postprocessor']
+
+
+class Postprocessor:


Could you rename it "PostProcessor" please?

fg-mindee · 2021-01-20T14:28:26Z

test/test_models.py

+from doctr.models.detection.postprocessor import Postprocessor
+from doctr.models.detection.differentiable_binarization import DBPostprocessor


since this module is to test the "models" module, it's better to import it as before. Using the imports, you can then access the classes with models.PostProcessor and models.DBPostProcessor

fg-mindee · 2021-01-20T14:28:52Z

doctr/models/detection/differentiable_binarization.py

+__all__ = ['DBPostprocessor']
+
+
+class DBPostprocessor(Postprocessor):


"DBPostProcessor" and "PostProcessor" ;)

fg-mindee · 2021-01-20T14:30:49Z

doctr/models/detection/__init__.py

+from . import postprocessor
+from . import dbpostprocessor


dbpostprocessor doesn't exist anymore.
I would suggest replacing all of this with:

from .postprocessor import * from .differentiable_binarization import *

Since we added the __all__ values, the "*" will take care of this properly

That way we can import all objects in the __all__ by doing
from doctr.models.detection import DBPostProcessor for instance

fg-mindee · 2021-01-20T14:35:38Z

test/test_models.py

+@pytest.fixture(scope="module")
+def mock_db_output():


the fixture is not required here. We did it elsewhere to use the output of another test (or to reuse an object multiple times).

Here the mock_db_output is used once, and is not a test. I would remove the function, and simply assign the value to mock_db_output in test_dbpostprocessor.

doctr/models/detection/differentiable_binarization.py

fg-mindee

Just a small adjustment left on the unittests 🙏

test/test_models.py

fg-mindee

Thanks a lot for the edit!

fg-mindee

Thanks a lot!

charlesmindee added 22 commits January 11, 2021 15:34

feat: ✨ pdf reader

5329fcf

feat: ✨ add doc_to_string function

d160e9d

Merge branch 'main' into detection_module

b3b69f4

feat ✨ add inference_utilities + inference for DBnet

60ee9e2

save: saving work before switching to doc reader

eb1c91a

feat: ✨ add model meta class

f48b84c

add: postprocessor

06512ee

feat ✨ preprocessor

a30d8f5

test: passed test

0cc5a06

Merge branch 'main' into detection_module

0f66142

test: passed all tests except unitest

85cd1c9

refacto: remove deprecated file

4485844

test: passed all tests

229acf7

test: passed all tests

8be1d9d

test: passed tests

45bed76

test: passed tests

40f128f

test: passed tests

e61db48

test: passed tests

d92067d

Merge branch 'main' into postprocess_detection

43a54b7

Merge branch 'main' into postprocess_detection

42d2c01

feat: ✨ add posprocessor

50e2ba0

feat: ✨ add DBPostprocessor + Postprocessor

89cb7b6

charlesmindee requested a review from fg-mindee January 18, 2021 18:17

charlesmindee added 3 commits January 18, 2021 19:32

add: dependencies

e05d534

merging

83cedf2

remove: utils init

2dc8797

fg-mindee assigned charlesmindee Jan 19, 2021

fg-mindee added the module: models Related to doctr.models label Jan 19, 2021

fg-mindee added this to the 0.1.0 milestone Jan 19, 2021

fg-mindee suggested changes Jan 19, 2021

View reviewed changes

fg-mindee changed the title ~~Postprocess detection~~ feat: Added Detection post processor Jan 19, 2021

doc: added docstrings sphinx

eb9c4ee

fg-mindee suggested changes Jan 20, 2021

View reviewed changes

charlesmindee added 4 commits January 20, 2021 14:58

refacto: ♻️ mean box_score, filename, docstring

1c27b68

merging

6dcd1ed

test: flake8 issue

26d9780

bug: change dependencies

cccf44e

fg-mindee suggested changes Jan 20, 2021

View reviewed changes

fg-mindee reviewed Jan 20, 2021

View reviewed changes

doctr/models/detection/differentiable_binarization.py Outdated Show resolved Hide resolved

fg-mindee reviewed Jan 20, 2021

View reviewed changes

doctr/models/detection/differentiable_binarization.py Outdated Show resolved Hide resolved

fg-mindee mentioned this pull request Jan 20, 2021

[models] Add detection module #3

Closed

4 tasks

charlesmindee added 2 commits January 20, 2021 16:27

refacto: renamed mdoule

e1eae55

refacto: deps

446e5f2

fg-mindee suggested changes Jan 20, 2021

View reviewed changes

test/test_models.py Outdated Show resolved Hide resolved

charlesmindee added 2 commits January 20, 2021 17:27

bug: test

bdcf8e9

bug: test_model

14677ad

fg-mindee previously approved these changes Jan 20, 2021

View reviewed changes

add: mupu.ini

1448cdb

charlesmindee dismissed fg-mindee’s stale review via 1448cdb January 20, 2021 16:45

changed batched size

dc1fd1d

fg-mindee approved these changes Jan 20, 2021

View reviewed changes

charlesmindee merged commit 1905d27 into main Jan 20, 2021

charlesmindee deleted the postprocess_detection branch January 20, 2021 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Added Detection post processor #24

feat: Added Detection post processor #24

charlesmindee commented Jan 18, 2021

fg-mindee left a comment •

edited

Loading

fg-mindee Jan 19, 2021

fg-mindee Jan 19, 2021

fg-mindee Jan 20, 2021

fg-mindee Jan 19, 2021

charlesmindee Jan 20, 2021

fg-mindee Jan 19, 2021

fg-mindee Jan 19, 2021

fg-mindee Jan 19, 2021

fg-mindee Jan 19, 2021

fg-mindee Jan 19, 2021

fg-mindee Jan 19, 2021

fg-mindee left a comment

codecov bot commented Jan 20, 2021 •

edited

Loading

fg-mindee left a comment

fg-mindee Jan 20, 2021

fg-mindee Jan 20, 2021

fg-mindee left a comment

fg-mindee Jan 20, 2021

fg-mindee Jan 20, 2021

fg-mindee Jan 20, 2021

fg-mindee Jan 20, 2021

fg-mindee Jan 20, 2021

fg-mindee Jan 20, 2021

fg-mindee Jan 20, 2021

fg-mindee left a comment

fg-mindee left a comment

fg-mindee left a comment


		return cv2.mean(pred[ymin:ymax + 1, xmin:xmax + 1], mask)[0]
		return cv2.mean(pred[ymin:ymax + 1, xmin:xmax + 1])[0]

		from doctr.models.detection.postprocessor import Postprocessor
		from doctr.models.detection.differentiable_binarization import DBPostprocessor

		__all__ = ['DBPostprocessor']


		class DBPostprocessor(Postprocessor):

feat: Added Detection post processor #24

feat: Added Detection post processor #24

Conversation

charlesmindee commented Jan 18, 2021

fg-mindee left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fg-mindee left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 20, 2021 • edited Loading

Codecov Report

fg-mindee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fg-mindee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fg-mindee left a comment

Choose a reason for hiding this comment

fg-mindee left a comment

Choose a reason for hiding this comment

fg-mindee left a comment

Choose a reason for hiding this comment

fg-mindee left a comment •

edited

Loading

codecov bot commented Jan 20, 2021 •

edited

Loading