Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About OCR_aligned and Lost or missing text #11

Open
USTCHJY opened this issue Jul 7, 2018 · 3 comments
Open

About OCR_aligned and Lost or missing text #11

USTCHJY opened this issue Jul 7, 2018 · 3 comments

Comments

@USTCHJY
Copy link

USTCHJY commented Jul 7, 2018

Hi,
I'm working on the OCR post-correction tasks and Ochre really helps me a lot. But I still have some questions looking forward to your reply.
When using the Ochre for OCR post-correction tasks,we only have the OCR_input . So how can I get OCR_aligned from OCR_input without gs? Otherwise,how to deal with the Lost or missing text without aligned text?
Thanks!

@jvdzwaan
Copy link
Collaborator

jvdzwaan commented Jul 9, 2018

The task ochre performs is a supervised machine learning task. So, without gold standard, you can't create aligned data or train a (supervised) model.

@USTCHJY
Copy link
Author

USTCHJY commented Jul 9, 2018

Sorry,maybe I expressed not clearly.
I mean after supervised training(for training data,we must have gold standard),how can I use this trained ochre model for actual OCR post-correction tasks? Because for actual tasks,we usually don't have gold standard and desire to get corrected text which similiar to the gold standard. On this occasion,how can I get OCR_aligned from the raw OCR_input of the actual tasks?
Thanks!

@jvdzwaan
Copy link
Collaborator

The README specifies how to use a trained model to do post correction: https://github.com/KBNLresearch/ochre#ocr-post-correction

If you want to calculate performance for this text, you'd still need to have ground truth/gold standard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants