Skip to content

Handling Non UTF 8 Encodings

Paul Crovella edited this page Mar 27, 2018 · 5 revisions

RFC 8259 specifies that:

JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8.

While this allows for other encodings to be used within a closed ecosystem, this parser only supports UTF-8 and there are no plans to change that.

If you need to parse JSON that's in another encoding you can use an iconv stream filter to transcode it to UTF-8 on the fly.

Either set it on an existing stream via stream_filter_append() that's then passed to stream():

use pcrov\JsonReader\JsonReader;

$stream = fopen("UTF-32.json", "rb");
stream_filter_append($stream, "convert.iconv.UTF-32/UTF-8");

$reader = new JsonReader();
$reader->stream($stream);

while($reader->read()) { /* do stuff */ }

$reader->close();
fclose($stream);

Or as part of the URI passed to open():

$reader->open(
    "php://filter/read=" .
    urlencode("convert.iconv.UTF-32/UTF-8") .
    "/resource=UTF-32.json"
);

Keep in mind that any output from the reader will be in UTF-8.