feat: Split DocArtefacts into subsets and updated its class mapping #601

fg-mindee · 2021-11-09T17:46:21Z

This PR introduces the following modifications:

updates the URL of the zip which now contains a train and val subset
updates the labels target: rather than being a list of string, it will return a numpy.array of class indices
updates the docstring of the class

Any feedback is welcome!

charlesmindee

Thanks!

SiddhantBahuguna

Should we add the collate function here?
Also, we need to obtain absolute box coordinates. Right now we use relative ones.

fg-mindee · 2021-11-10T13:11:23Z

Should we add the collate function here?

Oh yeah, you're right, I'll change this

Also, we need to obtain absolute box coordinates. Right now we use relative ones.

But that's OK, most of our datasets are in relative coords, and we convert it if not.

fg-mindee · 2021-11-12T15:42:25Z

For the collate function, actually I just checked and it's already working @SiddhantBahuguna :

from doctr.datasets import DocArtefacts
from doctr.datasets import DataLoader

ds = DocArtefacts(train=True, download=True)
train_loader = DataLoader(ds, batch_size=2)
train_iter = iter(train_loader)
x, targets = next(train_iter)

And the results look satisfactory to me:

# Check shape of input
print(x.shape)

TensorShape([2, 1024, 800, 3])

print(len(targets))
print(targets[0])

2
{'boxes': array([[0.94625   , 0.39746094, 0.99375   , 0.4345703 ],
       [0.38375   , 0.5957031 , 0.4275    , 0.6269531 ],
       [0.41875   , 0.28027344, 0.52875   , 0.36621094],
       [0.39625   , 0.36816406, 0.6675    , 0.57910156],
       [0.26625   , 0.5332031 , 0.2975    , 0.5576172 ],
       [0.20125   , 0.1953125 , 0.2575    , 0.23925781],
       [0.105     , 0.00976562, 0.17125   , 0.06152344],
       [0.15875   , 0.078125  , 0.2425    , 0.14355469],
       [0.10125   , 0.5292969 , 0.25875   , 0.6796875 ],
       [0.73625   , 0.7138672 , 0.85375   , 0.8261719 ]], dtype=float32), 'labels': array([1, 2, 1, 1, 1, 1, 3, 3, 4, 4])}

codecov · 2021-11-12T16:33:08Z

Codecov Report

Merging #601 (bd74611) into main (f97e92b) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head bd74611 differs from pull request most recent head 0c7c672. Consider uploading reports for the commit 0c7c672 to get more accurate results

@@           Coverage Diff           @@
##             main     #601   +/-   ##
=======================================
  Coverage   96.06%   96.06%           
=======================================
  Files         110      110           
  Lines        4265     4269    +4     
=======================================
+ Hits         4097     4101    +4     
  Misses        168      168

Flag	Coverage Δ
unittests	`96.06% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
doctr/datasets/doc_artefacts.py	`94.11% <100.00%> (+0.78%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab26073...0c7c672. Read the comment docs.

SiddhantBahuguna

The output is following our expectations. Collate problem is resolved now. Thanks:)

fg-mindee added 4 commits November 9, 2021 18:41

refactor: split train and val in DocArtefacts

13a41e4

feat: Added class mapping in DocArtefacts

b2ef68b

docs: Updated docstring

fb5c4dc

style: Fixed typing

bc031a5

fg-mindee added type: enhancement Improvement module: datasets Related to doctr.datasets labels Nov 9, 2021

fg-mindee added this to the 0.5.0 milestone Nov 9, 2021

fg-mindee requested a review from SiddhantBahuguna November 9, 2021 17:46

fg-mindee self-assigned this Nov 9, 2021

fg-mindee mentioned this pull request Nov 9, 2021

[models] Implement an Artefact object detector #223

Closed

charlesmindee previously approved these changes Nov 9, 2021

View reviewed changes

SiddhantBahuguna reviewed Nov 9, 2021

View reviewed changes

fg-mindee added 5 commits November 12, 2021 16:44

Merge branch 'main' into artefact-update

f189035

feat: Added back extra_repr

3af390e

test: Updated unittests

0ad1756

refactor: Refactored file naming

649bff2

fix: Fixed typo

bd74611

fg-mindee dismissed charlesmindee’s stale review via bd74611 November 12, 2021 15:52

fg-mindee requested a review from SiddhantBahuguna November 12, 2021 15:52

Merge branch 'main' into artefact-update

0c7c672

SiddhantBahuguna approved these changes Nov 12, 2021

View reviewed changes

fg-mindee merged commit 400aec0 into main Nov 12, 2021

fg-mindee deleted the artefact-update branch November 12, 2021 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Split DocArtefacts into subsets and updated its class mapping #601

feat: Split DocArtefacts into subsets and updated its class mapping #601

fg-mindee commented Nov 9, 2021

charlesmindee left a comment

SiddhantBahuguna left a comment

fg-mindee commented Nov 10, 2021 •

edited

Loading

fg-mindee commented Nov 12, 2021

codecov bot commented Nov 12, 2021 •

edited

Loading

SiddhantBahuguna left a comment

feat: Split DocArtefacts into subsets and updated its class mapping #601

feat: Split DocArtefacts into subsets and updated its class mapping #601

Conversation

fg-mindee commented Nov 9, 2021

charlesmindee left a comment

Choose a reason for hiding this comment

SiddhantBahuguna left a comment

Choose a reason for hiding this comment

fg-mindee commented Nov 10, 2021 • edited Loading

fg-mindee commented Nov 12, 2021

codecov bot commented Nov 12, 2021 • edited Loading

Codecov Report

SiddhantBahuguna left a comment

Choose a reason for hiding this comment

fg-mindee commented Nov 10, 2021 •

edited

Loading

codecov bot commented Nov 12, 2021 •

edited

Loading