Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: Support for alternate encodings #8937

gopherbot opened this Issue Oct 15, 2014 · 4 comments


None yet
3 participants
Copy link

gopherbot commented Oct 15, 2014

by pico303:

In Go 1.3.3, the XML parser for Go is locked into UTF-8 encodings.  In
encoding/xml/xml.go (around line 576), there's the line:

    enc := procInstEncoding(string(data))
    if enc != "" && enc != "utf-8" && enc != "UTF-8" {

For documents with:

    <?xml version="1.0" encoding="ISO-8859-1"?>

you get this error message:

    Invalid body content: xml: encoding "ISO-8859-1" declared but Decoder.CharsetReader is nil

You can override the reader to support alternative encodings, but this means pre-parse
the XML []byte yourself for the proper encoding, setup the reader, then parse the XML. 

Could the package be adapted somehow so you could provide alternate readers ahead of
time, based on the encoding value?  Something like this (pseudocode):

    func init() {
        xml.AddCharsetReader("iso-8859-1", ISO8859Reader)

    func Parse(doc []byte) (SomeStruct, error) {
        var myobj SomeStruct
        if err := xml.Unmarshal(doc, &myobj); err != nil {
            return nil, err
        return myobj, nil

This comment has been minimized.

Copy link

ianlancetaylor commented Oct 15, 2014

Comment 1:

Labels changed: added repo-main, release-none.


This comment has been minimized.

Copy link

bradfitz commented Oct 16, 2014

Comment 2:

This hook already exists.
Use xml.Decoder, not xml.Unmarshal, and set Decoder.CharsetReader, as the error message

Labels changed: added performance.

Status changed to WorkingAsIntended.


This comment has been minimized.

Copy link

gopherbot commented Oct 16, 2014

Comment 3 by pico303:

Except that to do that, you have to know the encoding ahead of time. Our servers get
messages in either UTF-8 or ISO-8859-1. So we basically have to parse the incoming
stream for the encoding parameter, load the correct reader, and unmarshal.  Feels clunky.

This comment has been minimized.

Copy link

bradfitz commented Oct 16, 2014

Comment 4:

Look at the docs:
        // CharsetReader, if non-nil, defines a function to generate
        // charset-conversion readers, converting from the provided
        // non-UTF-8 charset into UTF-8. If CharsetReader is nil or
        // returns an error, parsing stops with an error. One of the
        // the CharsetReader's result values must be non-nil.
        CharsetReader func(charset string, input io.Reader) (io.Reader, error)
Your hook gets passed in the charset. You don't need to parse it yourself.

@golang golang locked and limited conversation to collaborators Jun 25, 2016

This issue was closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.