Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip when there is no file matching the pageId #34

Closed
mikegerber opened this issue Oct 16, 2020 · 6 comments
Closed

Skip when there is no file matching the pageId #34

mikegerber opened this issue Oct 16, 2020 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@mikegerber
Copy link
Member

mikegerber commented Oct 16, 2020

ocrd-dinglehopper should issue a warning and skip a page if there is no matching GT or OCR file for a page.

Reported by @mnoelte in Gitter:
https://gitter.im/OCR-D/Lobby?at=5f76f0750dbbcf3dfa50648f

@mikegerber mikegerber added the bug Something isn't working label Oct 16, 2020
@mikegerber mikegerber self-assigned this Oct 16, 2020
@bertsky
Copy link
Contributor

bertsky commented Oct 16, 2020

See here for a recipe. You can omit the fallback search via matching imageFilename for efficiency.
Then use something like this instead of the typical loop around self.input_files...

@mikegerber
Copy link
Member Author

Side note: files[0] in that code might fail now that find_files() returns an iterator.

@bertsky
Copy link
Contributor

bertsky commented Oct 16, 2020

Side note: files[0] in that code might fail now that find_files() returns an iterator.

Hell yes, that broke all our multi-input-fileGrp processors!

Must replace with find_all_files ASAP

@mikegerber
Copy link
Member Author

Must replace with find_all_files ASAP

Is that an API function? https://ocr-d.de/core/search.html?q=find_all_files returns nothing

@bertsky
Copy link
Contributor

bertsky commented Oct 16, 2020

Is that an API function? https://ocr-d.de/core/search.html?q=find_all_files returns nothing

It is. @kba I guess the apidoc must be regenerated?

@mikegerber ocrd_models.ocrd_mets.OcrdMets.find_all_files

@kba
Copy link
Contributor

kba commented Oct 16, 2020

It is. @kba I guess the apidoc must be regenerated?

yep 😊 on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants