-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Use case: csv from Dutch national body comes back with Content-Type utf-8 but actual body utf-16 LE (BOM FF FE)
Reproduction (based on eXist-db 6)
declare namespace http = "http://expath.org/ns/http-client";
http:send-request(
<http:request method="GET" href="https://publicaties.rvig.nl/media/13307/download">
<http:header name="Accept" value="text/csv"/>
<http:header name="Cache-Control" value="no-cache"/>
<http:header name="Max-Forwards" value="1"/>
</http:request>
)[2]The response comes back with Content-Type header containing utf-8 encoding, but since the actual contents are utf-16 I now get: "Failed to parse server's response: An invalid XML character (Unicode: 0x0) was found in the element content of the document."
I can override the server provided Content-Type using override-media-type="text/csv; charset=utf-16" but this requires me to know the encoding beforehand. I have reported the mismatched content-type to the responsible party but doubtful what or when that has any effect.
I would like to get to a place were I can always access the contents of a send-request() so I can work out some fall back scheme.
Ideally:
- Always allow me access to the body, as binary if all else fails so prevent hard uncatchable errors e.g. about hex 0
- Process body based on BOM if present before relying on Content-Type encoding
- Process body based on Content-Type encoding if no BOM present
- Process body based on UTF-8 if no BOM or Content-Type encoding present