How to tune the base model or when are improvements planned #85

Isaac-Leonard · 2024-06-03T02:23:22Z

Isaac-Leonard
Jun 3, 2024

I'm finding ocrs to be far better then tesseract however I'm still getting lots of garbled results or words with letters missing.
I'm blind and my phones screen reader seems to be able to detect text in photos extremely well but I haven't been able to replicate that ability with any software I've found so far.
Is it possible to currently train the existing models further or are improvements to the current model expected soon?

robertknight · 2024-06-03T06:26:33Z

robertknight
Jun 3, 2024
Maintainer

There is a lot that can still be done to improve the models - mainly expanding, filtering and cleaning the datasets and improving evaluation. In addition there are preprocessing tasks which will help, such as rectifying rotated words before attempting recognition.

The models are trained in PyTorch and can be found in https://github.com/robertknight/ocrs-models. Model checkpoints are uploaded to Hugging Face. If you try to use the latest checkpoints with Ocrs you may need to adjust the text detection threshold.

I'm blind and my phones screen reader seems to be able to detect text in photos extremely well but I haven't been able to replicate that ability with any software I've found so far.

The iOS feature that detects text in photos is very cool and I assume Android is similar? It would be great to reach a similar level of accuracy with this project, but there is still a lot of work to be done as you can see.

2 replies

igor-yusupov Jun 14, 2024

Apple's model is closed but it seems to be possible to run this model as API: https://github.com/louisbrulenaudet/apple-ocr
But I guess there is need to have macbook. Maybe Is it worth using this API to generate a dataset for training?

robertknight Jun 14, 2024
Maintainer

You definitely could use other OCR models to generate labels for a dataset. An important goal for Ocrs is that everything should be open source and liberally licensed (including for commercial use). If the training data is produced using labels generated by a closed model, whose license doesn't explicitly allow outputs to be used for this purpose, it gets into grey territory.

Users who are fine tuning their own versions of the Ocrs models are not constrained by this requirement though - they can generate training data however they like.

This comment was marked as spam.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to tune the base model or when are improvements planned #85

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

This comment was marked as spam.

Select a reply

How to tune the base model or when are improvements planned #85

Isaac-Leonard Jun 3, 2024

Replies: 2 comments · 2 replies

robertknight Jun 3, 2024 Maintainer

igor-yusupov Jun 14, 2024

robertknight Jun 14, 2024 Maintainer

This comment was marked as spam.

Isaac-Leonard
Jun 3, 2024

Replies: 2 comments 2 replies

robertknight
Jun 3, 2024
Maintainer

robertknight Jun 14, 2024
Maintainer