[Fix] unify recognition dataset parts return signature #1041

felixdittrich92 · 2022-09-02T12:59:53Z

This PR:

revert mistake for recognition dataset parts (img, label) instead of (img, dict)
add target check in _read_sample
now we have two cases in the library for datasets
more than one annotation type: targets has to be a dict with boxes / labels keys
only one annotation type: target has to be str (label) or np.ndarray (boxes)

Hopefully that this is now a match to avoid mistakes for transformations

Closes:
#935

codecov · 2022-09-02T13:49:06Z

Codecov Report

Merging #1041 (e619807) into main (75aa42a) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1041   +/-   ##
=======================================
  Coverage   94.93%   94.94%           
=======================================
  Files         135      135           
  Lines        5625     5633    +8     
=======================================
+ Hits         5340     5348    +8     
  Misses        285      285

Flag	Coverage Δ
unittests	`94.94% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
doctr/datasets/cord.py	`97.72% <100.00%> (ø)`
doctr/datasets/datasets/pytorch.py	`100.00% <100.00%> (ø)`
doctr/datasets/datasets/tensorflow.py	`100.00% <100.00%> (ø)`
doctr/datasets/funsd.py	`97.43% <100.00%> (ø)`
doctr/datasets/ic03.py	`97.43% <100.00%> (ø)`
doctr/datasets/ic13.py	`96.77% <100.00%> (ø)`
doctr/datasets/iiit5k.py	`96.96% <100.00%> (ø)`
doctr/datasets/imgur5k.py	`92.53% <100.00%> (ø)`
doctr/datasets/mjsynth.py	`95.83% <100.00%> (ø)`
doctr/datasets/sroie.py	`97.36% <100.00%> (ø)`
... and 4 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

charlesmindee

thanks, LGTM

frgfm

Thanks! Small comment though :)

frgfm · 2022-09-05T10:10:14Z

doctr/datasets/datasets/pytorch.py

+
+        # Check target
+        if isinstance(target, dict):
+            assert "boxes" in target, "Target should contain 'boxes' key"
+            assert "labels" in target, "Target should contain 'labels' key"
+        else:
+            assert isinstance(target, str) or isinstance(
+                target, np.ndarray
+            ), "Target should be a string or a numpy array"


Considering that the target is unchanged since the constructor was run, that should be moved to the constructor (because otherwise it increases data reading latency)

you mean to check it inside every dataset ? otherwise we have only access inside _read_sample and getitem !? And i don't think it changes the runtime noticeably

frgfm · 2022-09-05T10:10:25Z

doctr/datasets/datasets/tensorflow.py

+
+        # Check target
+        if isinstance(target, dict):
+            assert "boxes" in target, "Target should contain 'boxes' key"
+            assert "labels" in target, "Target should contain 'labels' key"
+        else:
+            assert isinstance(target, str) or isinstance(
+                target, np.ndarray
+            ), "Target should be a string or a numpy array"
+


unify targets

058de6f

felixdittrich92 self-assigned this Sep 2, 2022

felixdittrich92 linked an issue Sep 2, 2022 that may be closed by this pull request

[datasets] Filter currupted and wrong annotated files in ready to use datasets #935

Closed

23 tasks

felixdittrich92 added this to the 0.6.0 milestone Sep 2, 2022

felixdittrich92 added type: enhancement Improvement module: datasets Related to doctr.datasets framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend labels Sep 2, 2022

felixdittrich92 requested review from aminemindee, charlesmindee, frgfm and odulcy-mindee September 2, 2022 13:01

felixdittrich92 added 3 commits September 2, 2022 15:26

update test

f6cde2f

fix mypy

0e7b169

mypy

e619807

charlesmindee approved these changes Sep 2, 2022

View reviewed changes

felixdittrich92 merged commit f9d3d78 into mindee:main Sep 2, 2022

felixdittrich92 deleted the unify-dataset-return branch September 2, 2022 15:39

frgfm reviewed Sep 5, 2022

View reviewed changes

felixdittrich92 mentioned this pull request Sep 5, 2022

update reco datasets and tests #954

Closed

felixdittrich92 mentioned this pull request Sep 26, 2022

Release tracker - v0.6.0 #791

Closed

85 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] unify recognition dataset parts return signature #1041

[Fix] unify recognition dataset parts return signature #1041

felixdittrich92 commented Sep 2, 2022 •

edited

codecov bot commented Sep 2, 2022 •

edited

charlesmindee left a comment

frgfm left a comment

frgfm Sep 5, 2022

felixdittrich92 Sep 5, 2022

frgfm Sep 5, 2022

[Fix] unify recognition dataset parts return signature #1041

[Fix] unify recognition dataset parts return signature #1041

Conversation

felixdittrich92 commented Sep 2, 2022 • edited

codecov bot commented Sep 2, 2022 • edited

Codecov Report

charlesmindee left a comment

Choose a reason for hiding this comment

frgfm left a comment

Choose a reason for hiding this comment

frgfm Sep 5, 2022

Choose a reason for hiding this comment

felixdittrich92 Sep 5, 2022

Choose a reason for hiding this comment

frgfm Sep 5, 2022

Choose a reason for hiding this comment

felixdittrich92 commented Sep 2, 2022 •

edited

codecov bot commented Sep 2, 2022 •

edited