New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: accepts invalid XML with multiple colons #20396

Open
santhosh-tekuri opened this Issue May 17, 2017 · 5 comments

Comments

Projects
None yet
8 participants
@santhosh-tekuri
Contributor

santhosh-tekuri commented May 17, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.8.1 darwin/amd64

What operating system and processor architecture are you using (go env)?

darwin/amd64

What did you do?

<a:te:st xmlns:a="abcd"/>

parsing above xml with golang standard lib, does not report any error
go program used to test: (https://play.golang.org/p/j_VigKalfk)

package main

import (
	"encoding/xml"
	"fmt"
	"io"
	"strings"
)

func main() {
	decoder := xml.NewDecoder(strings.NewReader(`<a:te:st xmlns:a="abcd"/>`))
	for {
		t, err := decoder.Token()
		if err != io.EOF && err != nil {
			fmt.Println("invalid xml", err)
			break
		}
		if t == nil {
			break
		}
	}
}

with xmllint linux program, it reports invalid xml

$xmllint test.xml
test.xml:1: namespace error : Failed to parse QName 'a:te:'
<a:te:st xmlns:a="abcd"/>
     ^
<?xml version="1.0"?>
<a:te:st xmlns:a="abcd"/>

What did you expect to see?

should report error in parsing xml

What did you see instead?

parses without any error

@ianlancetaylor ianlancetaylor changed the title from element qname should not have more than one colon to encoding/xml: accepts invalid XML with multiple colons May 17, 2017

@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone May 17, 2017

@adams-sarah

This comment has been minimized.

Contributor

adams-sarah commented May 23, 2017

So looking at this, it seems as though the rune-validation table for NameChar (second in https://golang.org/src/encoding/xml/xml.go#L1431) was perhaps generated from an older spec. Or at least, a different one than matches other xml validators.

Where, as the name indicates, a NameChar is a character which may be included in a Name.

Our table was generated from http://www.xml.com/axml/testaxml.htm, which states that a NameChar may be:

[4] 	NameChar	::=	Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender

(under 2.3, Common Syntactic Constructs)

However, there is nothing in this document about qualified names (names with a namespace). Note this quote (below), which states that the use of colons is experimental and not defined by this document.

[Definition:] A Name is a token beginning with a letter or one of a few punctuation characters, and continuing with letters, digits, hyphens, underscores, colons, or full stops, together known as name characters. [...]

Note: The colon character within XML names is reserved for experimentation with name spaces. Its meaning is expected to be standardized at some future point, at which point those documents using the colon for experimental purposes may need to be updated. (There is no guarantee that any name-space mechanism adopted for XML will in fact use the colon as a name-space delimiter.) In practice, this means that authors should not use the colon in XML names except as part of name-space experiments, but that XML processors should accept the colon as a name character.

However, if you look at the w3 docs https://www.w3.org/TR/1998/NOTE-xml-names-0119#dt-qname , qualified names are better defined. This document specifies that there must be at most one colon per name, and that QualifiedName s use a colon as delimiter.

(https://www.w3.org/TR/1998/NOTE-xml-names-0119#wfc-NSDeclared)

To enable the proper use of qualified names, it is necessary to banish colons from all Names which are not qualified

Where a qualified name (in the w3 doc) is composed as:

[4] 	QName	::=	(NSPart ':')? LocalPart
[5] 	NSPart	::=	Name
[6] 	LocalPart	::=	Name

Another note from w3 docs:

There is at most one colon per name. This working draft does not support the use of multiple colons to designate hierarchical namespaces, multiple inheritance or other semantics. It is anticipated that multi-part names will be discussed as mechanics to support implementing these features in a future version of the XML specification.

QNames use a single colon as a separator. The use of a double colon was discussed as a way to improve compatibility with the CSS specification even though most people preferred the single colon from a subjective and aesthetic perspective. After further investigation it was not shown that double colons would actually simplify the development of CSS compatible XML namespace aware applications. The single colon QName syntax used in this working draft is intended to support the use of CSS stylesheets in namespace aware XML applications.

So I would vote that we re-generate the tables based on the w3 docs (or another, more complete set) such that colons are not permitted as NameChars (in our second table). There is no need for a code change.

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

@andybons andybons added the NeedsFix label Mar 16, 2018

@gopherbot

This comment has been minimized.

gopherbot commented Mar 30, 2018

Change https://golang.org/cl/103735 mentions this issue: encoding/xml: fix absence of detection of another : in qualified names

@iWdGo

This comment has been minimized.

Contributor

iWdGo commented Mar 30, 2018

A qualified name (https://www.w3.org/TR/xml-names/#NT-PrefixedName) is treated separately to return prefix and local part. The allowed : is left by char scanning and the occurence of another one was not verified. I submitted a related fix including a specific test.

@gopherbot

This comment has been minimized.

gopherbot commented Mar 31, 2018

Change https://golang.org/cl/103875 mentions this issue: encoding/xml: fix absence of detection of another : in qualified names

@gopherbot

This comment has been minimized.

gopherbot commented Apr 27, 2018

Change https://golang.org/cl/109855 mentions this issue: encoding/xml : Fixes to enforce XML namespace standard

@bradfitz bradfitz modified the milestones: Go1.11, Go1.12 Jun 13, 2018

@ianlancetaylor ianlancetaylor modified the milestones: Go1.12, Go1.13 Dec 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment