Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added Detection post processor #24

Merged
merged 36 commits into from
Jan 20, 2021
Merged

Conversation

charlesmindee
Copy link
Collaborator

Postprocessor (meta class) + DBPostprocessor module added

@fg-mindee fg-mindee added the module: models Related to doctr.models label Jan 19, 2021
@fg-mindee fg-mindee added this to the 0.1.0 milestone Jan 19, 2021
Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I added a few comments to check. Could you switch your docstring convention?

Also, the dbpostprocessor being specific to db net, I think we should place it in the same file as the model definition (so differentiable_binarization.py if you defined the model there). That will avoid having files with implicit naming

Comment on lines 14 to 18
"""
class to postprocess documents
a postprocessor takes the raw output from a model
a postprocessor return a list of tensor, each tensor N X 5
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you switch to Google style docstring like so : https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html

doctr/models/detection/postprocessor.py Show resolved Hide resolved


class DBPostprocessor(Postprocessor):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring missing here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add the constructor args as well like so https://github.com/teamMindee/doctr/blob/main/doctr/documents/elements.py#L13-L18

Comment on lines 38 to 61
def box_score(
self,
pred: np.ndarray,
_box: np.ndarray
) -> float:
"""
Compute the confidence score for a box : mean between p_map values on the box
:param pred: p_map (output of the model)
:param _box: box
"""

h, w = pred.shape[:2]
box = _box.copy()
xmin = np.clip(np.floor(box[:, 0].min()).astype(np.int), 0, w - 1)
xmax = np.clip(np.ceil(box[:, 0].max()).astype(np.int), 0, w - 1)
ymin = np.clip(np.floor(box[:, 1].min()).astype(np.int), 0, h - 1)
ymax = np.clip(np.ceil(box[:, 1].max()).astype(np.int), 0, h - 1)

mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
box[:, 0] = box[:, 0] - xmin
box[:, 1] = box[:, 1] - ymin
cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1)

return cv2.mean(pred[ymin:ymax + 1, xmin:xmax + 1], mask)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we access the inside of the architecture to have the actual box objectness?
If we are to compute the objectness based on its geometry, in my opinion, we can remove this since it does not really bring value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I need to remove this function ? How do I compute the score of each box then ?

height, width = bitmap.shape[:2]
boxes = []
contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
for contour in contours[:self.max_candidates]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dangerous. I understand we can limit the number of candidates but they are not ordered. So if we limit the number of candidates, perhaps they should be ordered beforehand (by objectness for instance?)

Comment on lines 129 to 144
for raw_batch in raw_pred:
p = tf.squeeze(raw_batch, axis=-1) # remove last dim
bitmap = tf.cast(p > self.bin_thresh, tf.float32)

p = tf.unstack(p, axis=0)
bitmap = tf.unstack(bitmap, axis=0)

boxes_batch = []

for p_, bitmap_ in zip(p, bitmap):
p_ = p_.numpy()
bitmap_ = bitmap_.numpy()
boxes = self.bitmap_to_boxes(p_, bitmap_)
boxes_batch.append(np.array(boxes))

bounding_boxes.append(boxes_batch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, some comments would be nice 🙏

Comment on lines 129 to 144
for raw_batch in raw_pred:
p = tf.squeeze(raw_batch, axis=-1) # remove last dim
bitmap = tf.cast(p > self.bin_thresh, tf.float32)

p = tf.unstack(p, axis=0)
bitmap = tf.unstack(bitmap, axis=0)

boxes_batch = []

for p_, bitmap_ in zip(p, bitmap):
p_ = p_.numpy()
bitmap_ = bitmap_.numpy()
boxes = self.bitmap_to_boxes(p_, bitmap_)
boxes_batch.append(np.array(boxes))

bounding_boxes.append(boxes_batch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, some comments would be nice 🙏

Comment on lines 74 to 79
poly = Polygon(points)
distance = poly.area * self.unclip_ratio / poly.length
offset = pyclipper.PyclipperOffset()
offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
expanded_points = np.array(offset.Execute(distance))
x, y, w, h = cv2.boundingRect(expanded_points)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing some comments here

Comment on lines 74 to 79
poly = Polygon(points)
distance = poly.area * self.unclip_ratio / poly.length
offset = pyclipper.PyclipperOffset()
offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
expanded_points = np.array(offset.Execute(distance))
x, y, w, h = cv2.boundingRect(expanded_points)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing some comments here

Comment on lines 129 to 144
for raw_batch in raw_pred:
p = tf.squeeze(raw_batch, axis=-1) # remove last dim
bitmap = tf.cast(p > self.bin_thresh, tf.float32)

p = tf.unstack(p, axis=0)
bitmap = tf.unstack(bitmap, axis=0)

boxes_batch = []

for p_, bitmap_ in zip(p, bitmap):
p_ = p_.numpy()
bitmap_ = bitmap_.numpy()
boxes = self.bitmap_to_boxes(p_, bitmap_)
boxes_batch.append(np.array(boxes))

bounding_boxes.append(boxes_batch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, some comments would be nice 🙏

Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I added a few comments to check

@fg-mindee fg-mindee changed the title Postprocess detection feat: Added Detection post processor Jan 19, 2021
@codecov
Copy link

codecov bot commented Jan 20, 2021

Codecov Report

Merging #24 (dc1fd1d) into main (5e9df6e) will decrease coverage by 0.39%.
The diff coverage is 97.56%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #24      +/-   ##
==========================================
- Coverage   98.72%   98.32%   -0.40%     
==========================================
  Files           9       12       +3     
  Lines         157      239      +82     
==========================================
+ Hits          155      235      +80     
- Misses          2        4       +2     
Impacted Files Coverage Δ
doctr/models/detection/postprocessor.py 90.90% <90.90%> (ø)
...tr/models/detection/differentiable_binarization.py 98.52% <98.52%> (ø)
doctr/models/__init__.py 100.00% <100.00%> (ø)
doctr/models/detection/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e9df6e...dc1fd1d. Read the comment docs.

Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things to change and we're good to go!



class DBPostprocessor(Postprocessor):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add the constructor args as well like so https://github.com/teamMindee/doctr/blob/main/doctr/documents/elements.py#L13-L18


return cv2.mean(pred[ymin:ymax + 1, xmin:xmax + 1], mask)[0]
return cv2.mean(pred[ymin:ymax + 1, xmin:xmax + 1])[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for cv2 here, numpy ndarray have a .mean() method that works just fine 👌

Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the comments I made, you might want to add from .detection import * to models.__init__.py so that we can import detection models easily! But this is some nitpicking, the rest is fine!

@@ -1,15 +1,21 @@
import pytest
from io import BytesIO

from doctr import models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep "local deps" as last imports

Comment on lines 11 to 14
__all__ = ['Postprocessor']


class Postprocessor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename it "PostProcessor" please?

Comment on lines 17 to 18
from doctr.models.detection.postprocessor import Postprocessor
from doctr.models.detection.differentiable_binarization import DBPostprocessor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this module is to test the "models" module, it's better to import it as before. Using the imports, you can then access the classes with models.PostProcessor and models.DBPostProcessor

__all__ = ['DBPostprocessor']


class DBPostprocessor(Postprocessor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"DBPostProcessor" and "PostProcessor" ;)

Comment on lines 1 to 2
from . import postprocessor
from . import dbpostprocessor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbpostprocessor doesn't exist anymore.
I would suggest replacing all of this with:

from .postprocessor import *
from .differentiable_binarization import *

Since we added the __all__ values, the "*" will take care of this properly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way we can import all objects in the __all__ by doing
from doctr.models.detection import DBPostProcessor for instance

Comment on lines 78 to 79
@pytest.fixture(scope="module")
def mock_db_output():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fixture is not required here. We did it elsewhere to use the output of another test (or to reuse an object multiple times).

Here the mock_db_output is used once, and is not a test. I would remove the function, and simply assign the value to mock_db_output in test_dbpostprocessor.

@fg-mindee fg-mindee mentioned this pull request Jan 20, 2021
4 tasks
Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small adjustment left on the unittests 🙏

test/test_models.py Outdated Show resolved Hide resolved
fg-mindee
fg-mindee previously approved these changes Jan 20, 2021
Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the edit!

Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@charlesmindee charlesmindee merged commit 1905d27 into main Jan 20, 2021
@charlesmindee charlesmindee deleted the postprocess_detection branch January 20, 2021 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: models Related to doctr.models
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants