Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adds GaussianBlur, random font for CharGenerator and improves training scripts #758

Merged
merged 8 commits into from
Dec 26, 2021

Conversation

fg-mindee
Copy link
Contributor

This PR introduces the following modifications:

  • adds an implementation of GaussianBlur for TensorFlow using tf addons
  • adds the possibility to pick a random font in a list for CharGenerator
  • improved synthesize_txt_img to automatically resolve the font size and image size depending on the text
  • updates character classification training scripts to add more augmentations
  • fixed the validation set of char classification (by default, without color inversion, the images had a black background and text in white all the time, I switched the split to 90% of white background + dark text)

This PR was used to retrain all PyTorch classification models successfully, and TensorFlow are ongoing.

Any feedback is welcome!

@fg-mindee fg-mindee added topic: documentation Improvements or additions to documentation ext: tests Related to tests folder module: datasets Related to doctr.datasets ext: references Related to references folder module: transforms Related to doctr.transforms labels Dec 26, 2021
@fg-mindee fg-mindee added this to the 0.5.0 milestone Dec 26, 2021
@fg-mindee fg-mindee self-assigned this Dec 26, 2021
Copy link
Collaborator

@charlesmindee charlesmindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

@codecov
Copy link

codecov bot commented Dec 26, 2021

Codecov Report

Merging #758 (20bf498) into main (106a1e1) will decrease coverage by 0.22%.
The diff coverage is 78.26%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #758      +/-   ##
==========================================
- Coverage   96.20%   95.98%   -0.23%     
==========================================
  Files         129      129              
  Lines        4799     4829      +30     
==========================================
+ Hits         4617     4635      +18     
- Misses        182      194      +12     
Flag Coverage Δ
unittests 95.98% <78.26%> (-0.23%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
doctr/datasets/classification/pytorch.py 100.00% <ø> (ø)
doctr/datasets/classification/tensorflow.py 100.00% <ø> (ø)
doctr/datasets/classification/base.py 85.41% <76.66%> (-8.34%) ⬇️
doctr/transforms/modules/tensorflow.py 92.17% <81.25%> (-1.89%) ⬇️
doctr/models/_utils.py 95.60% <0.00%> (-3.30%) ⬇️
doctr/transforms/modules/base.py 94.44% <0.00%> (-1.39%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 106a1e1...20bf498. Read the comment docs.

@fg-mindee fg-mindee merged commit 56a5830 into main Dec 26, 2021
@fg-mindee fg-mindee deleted the training-scripts branch December 26, 2021 15:15
@fg-mindee fg-mindee added the type: new feature New feature label Dec 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ext: references Related to references folder ext: tests Related to tests folder module: datasets Related to doctr.datasets module: transforms Related to doctr.transforms topic: documentation Improvements or additions to documentation type: new feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants