Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text output files should use .txt file extension #11

Closed
stweil opened this issue Aug 15, 2020 · 3 comments
Closed

Text output files should use .txt file extension #11

stweil opened this issue Aug 15, 2020 · 3 comments

Comments

@stweil
Copy link
Contributor

stweil commented Aug 15, 2020

Currently the conversion from PAGE XML to text ("page text") creates text files with the file extension .xml which is unexpected and can cause problems with viewers which expect XML but get pure text.

@stweil
Copy link
Contributor Author

stweil commented Sep 18, 2020

@kba, how should this issue be addressed? Do we require users to set the correct extension by using the ext parameter?

The ext parameter default is set in the JSON file. It is currently .xml, and I am afraid that automatic setting of the right parameter value would not fit to this default value. What about changing the default to an empty string and using that as an indicator to set the file extension based on the MIME type? Do we have other examples of OCR-D processors which determine a parameter value at runtime if no explicit value was given?

@kba
Copy link
Member

kba commented Sep 21, 2020

@kba, how should this issue be addressed? Do we require users to set the correct extension by using the ext parameter?

The ext parameter default is set in the JSON file. It is currently .xml, and I am afraid that automatic setting of the right parameter value would not fit to this default value. What about changing the default to an empty string and using that as an indicator to set the file extension based on the MIME type?

That's a good idea.

Do we have other examples of OCR-D processors which determine a parameter value at runtime if no explicit value was given?

Yes, e.g. gpu_id in ocrd-anybaseocr-dewarp, which defaults to -1, which means to not use GPU at all. Also we do have the MIME_TO_EXT mapping in ocrd_utils and ocrd bashlib constants, so this wouldn't be hard to implement.

@kba
Copy link
Member

kba commented Sep 22, 2020

Fixed in v0.1.0.

@kba kba closed this as completed Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants