Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

correct MIME type for hOCR? #27

Open
jronallo opened this issue Sep 24, 2016 · 3 comments
Open

correct MIME type for hOCR? #27

jronallo opened this issue Sep 24, 2016 · 3 comments
Milestone

Comments

@jronallo
Copy link

I'm publishing some hOCR, but uncertain what MIME type to give. I'm using text/html but that seems incomplete. Is there a standard way to convey that a file is hOCR?

@kba
Copy link
Owner

kba commented Sep 26, 2016

There is no standard MIME type since hOCR could be a subset of HTML or XHTML (see #1, #2), the specs are not clear here.

I've used text/vnd.hocr+html for internal projects, for me this best conveys the nature of the document but browsers won't display a page served with that. Since most hOCR is XHTML and hence XML, a media type with +xml suffix, e.g. text/[vnd.]hocr+xml could make sense.

For most use cases, I do not see the need for a specific media type, unless you have multiple representations in HTML for a document. Hence, text/html is probably the best solution, it gracefully degrades in a browser and hocr-compliant agents will need to parse the metadata for capabilities anyway.

I'm open for suggestions, particularly if you have a use case for a non-HTML media type.

@jronallo
Copy link
Author

In my case this is for use within a JSON-LD manifest where I want to link out to the hOCR and indicate to a client that this page image has an alternate resource which is specifically in hOCR format. It isn't enough to just say it is text/html because it isn't specific enough to tell what the resource is without looking for the .hocr extension on a filename, looking in a label, or actually downloading and inspecting the document.

I think text/vnd.hocr+html is the best suggestion for my case as I'm not concerned with how the web server delivers it or how the client displays it--just that this text over there is specifically hOCR. Thank you.

@kba kba added this to the Version 2.0 milestone Sep 26, 2016
@kba
Copy link
Owner

kba commented Sep 27, 2016

I'm reopening lest I forget it.

@kba kba reopened this Sep 27, 2016
@kba kba modified the milestones: Version 1.1, Version 2.0 Sep 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants