Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/marshal: marshal is extremely inefficient #38

Open
SamWhited opened this issue Mar 30, 2020 · 0 comments
Open

internal/marshal: marshal is extremely inefficient #38

SamWhited opened this issue Mar 30, 2020 · 0 comments

Comments

@SamWhited
Copy link
Member

@SamWhited SamWhited commented Mar 30, 2020

The code from internal/marshal is incredibly inefficient due to limitations in encoding/xml because it currently has to create a buffer, encode XML, then decode the XML it just created from the buffer:

// TokenReader returns a reader for the XML encoding of v.
func TokenReader(v interface{}) (xml.TokenReader, error) {
var b bytes.Buffer
err := xml.NewEncoder(&b).Encode(v)
if err != nil {
return nil, err
}
return xml.NewDecoder(&b), nil
}

To fix this we should either:

  1. Look into fixing encoding/xml to also support token encoders (see previous experiments at https://golang.org/cl/127476 and https://golang.org/cl/127415)
  2. or re-implement encoding in terms of the mellium.im/xmlstream module and run it against the Go standard library tests to make sure it outputs the same thing

If we do option 2, it should be developed in internal/ and then moved to xmlstream once it has been verified and used more extensively. If we do option 1 a temporary workaround that uses the token encoder on versions of Go that support it and falls back to the current hack should be added and an issue created reminding us to remove the hack when all supported versions of Go contain the token writer APIs.

Other suggestions for how this could be improved are also welcome.

Remove the following line from the docs when fixed:

// BUG(ssw): This package is very inefficient, see https://mellium.im/issue/38.

@SamWhited SamWhited changed the title Marshal is extremely inefficient internal/marshal: marshal is extremely inefficient May 7, 2020
MelliumBot pushed a commit that referenced this issue Nov 23, 2020
Short circuit from marshal.TokenReader if the input is already a
TokenReader to avoid the overhead of re-encoding everything in this
niche special case.

See #38

Signed-off-by: Sam Whited <sam@samwhited.com>
MelliumBot pushed a commit that referenced this issue Dec 5, 2020
Previously if we wanted to encode an arbitrary type to a token writer we
were forced to encode it to a byte buffer then decode that buffer and
copy it to the writer. This was extremely inefficient. To help us fix
this for at least the types in this module, short circuit if the type is
any of the various interfaces that can be copied directly to the writer
and use their methods for doing so. This patch handles short circuiting
on xml.TokenReader, xmlstream.Marshaler, and xmlstream.WriterTo.

See #38
Fixes #73

Signed-off-by: Sam Whited <sam@samwhited.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant