Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added Vietnamese entry in VOCAB #878

Merged
merged 2 commits into from
Apr 6, 2022
Merged

feat: Added Vietnamese entry in VOCAB #878

merged 2 commits into from
Apr 6, 2022

Conversation

calibretaliation
Copy link
Contributor

I added vietnamese VOCABS for Vietnamese devs if they want to use doctr for vietnamese like me :)

@codecov
Copy link

codecov bot commented Mar 31, 2022

Codecov Report

Merging #878 (9c8420a) into main (7f396ca) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head 9c8420a differs from pull request most recent head dbfd583. Consider uploading reports for the commit dbfd583 to get more accurate results

@@            Coverage Diff             @@
##             main     #878      +/-   ##
==========================================
- Coverage   94.84%   94.82%   -0.02%     
==========================================
  Files         133      133              
  Lines        5200     5201       +1     
==========================================
  Hits         4932     4932              
- Misses        268      269       +1     
Flag Coverage Δ
unittests 94.82% <100.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
doctr/datasets/vocabs.py 100.00% <100.00%> (ø)
doctr/transforms/functional/base.py 95.65% <0.00%> (-1.45%) ⬇️
doctr/transforms/modules/base.py 94.59% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f396ca...dbfd583. Read the comment docs.

Copy link
Collaborator

@charlesmindee charlesmindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Just a style check not passing, you need to split the line 34 which is too long in 2 :)

@calibretaliation
Copy link
Contributor Author

@charlesmindee Thanks for your comment, I have added the new commit of style fix.
However, can you give me a help on how to use the library on Vietnamese OCR please ? Do I have to change something or just applying the new vocabs ?

@charlesmindee
Copy link
Collaborator

@charlesmindee Thanks for your comment, I have added the new commit of style fix. However, can you give me a help on how to use the library on Vietnamese OCR please ? Do I have to change something or just applying the new vocabs ?

Hi @calibretaliation, if you want to use the librairy on Vietnamese OCR, you need to apply the you new vocab on a recognition model, you can keep the detection model as it is. However, you need to retrain the recognition model on you vocabulary with a labelled vietnamese dataset of word crops. For your PR, it would indeed be nice to indent so that flake8 is running without raising any error 🙏

Copy link
Collaborator

@charlesmindee charlesmindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to align everything under the parenthesis, otherwise it is OK!

doctr/datasets/vocabs.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@charlesmindee charlesmindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@charlesmindee charlesmindee merged commit 2c697ff into mindee:main Apr 6, 2022
felixdittrich92 pushed a commit to felixdittrich92/doctr that referenced this pull request Apr 7, 2022
* feat: Added Vietnamese entry in VOCAB - update style fix 2

* feat: Added Vietnamese entry in VOCAB - update style fix 3
@calibretaliation calibretaliation deleted the vietnamese-vocabs branch April 7, 2022 06:54
@frgfm
Copy link
Collaborator

frgfm commented Apr 27, 2022

Missing PR labels here as well @charlesmindee :)

Also, perhaps we should add specific contribution guidelines for vocab addition? I remember that for portuguese we had back & forth iterations, so perhaps we could ask to add a reference in the PR or better, as a comment in the code?

@felixdittrich92
Copy link
Contributor

@charlesmindee
@frgfm
There are also missing additions in the documentation for the last added vocabs

@frgfm frgfm added module: datasets Related to doctr.datasets type: new feature New feature labels May 2, 2022
@felixdittrich92 felixdittrich92 added this to the 0.6.0 milestone Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: datasets Related to doctr.datasets type: new feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants