Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml with dtd entities makes yq print nothing #1155

Closed
balki opened this issue Mar 25, 2022 · 7 comments
Closed

xml with dtd entities makes yq print nothing #1155

balki opened this issue Mar 25, 2022 · 7 comments
Labels

Comments

@balki
Copy link

balki commented Mar 25, 2022

Describe the bug
When the input xml has dtd entities, nothing is printed in output

Version of yq: 4.22.1
Operating system: linux
Installed via: snap

Input XML
Working foo.xml:

<?xml version="1.0"?>
<!DOCTYPE root [
<!ENTITY writer "Blah.">
<!ENTITY copyright "Blah">
]>
<root>
    <item>Blah.Blah</item>
</root>

Not Working foo.xml:

<?xml version="1.0"?>
<!DOCTYPE root [
<!ENTITY writer "Blah.">
<!ENTITY copyright "Blah">
]>
<root>
    <item>&writer;&copyright;</item>
</root>

Command
The command I ran:

cat foo.xml| yq -p xml .

Actual behavior

Expected behavior

root:
  item: Blah.Blah
@balki
Copy link
Author

balki commented Mar 26, 2022

encoding/xml package throws an error but it is silently ignored by yq here.

package main

import (
    "encoding/xml"
    "fmt"
    "strings"
)

var xmlstr = `
<?xml version="1.0"?>
<!DOCTYPE root [

<!ELEMENT root (item)>
<!ELEMENT item (#PCDATA)>

<!ENTITY writer "Blah.">
<!ENTITY copyright "Blah">
]>
<root>
    <item>Hello &writer;&copyright;</item>
</root>
`

func main() {
    xdec := xml.NewDecoder(strings.NewReader(xmlstr))
    // xdec.Strict = false // This fixes the problem

    for {
            _, err := xdec.Token()
            if err != nil {
                    fmt.Printf("got error! %v", err)
                    break
            }
    }
}

Output

❯ go run main.go
got error! XML syntax error on line 12: invalid character entity &writer;

Setting strict to false solves the issue. I think having a --xml-no-strict flag will be good

@mikefarah
Copy link
Owner

Thanks for looking into this! I'll put that in the next release

@mikefarah
Copy link
Owner

Hey just tried catching the error and setting strict to false - it ends up doing the exact same thing as now - prints nothing :(

I'll still fix the unhandled error bug - not sure on adding the Strict=false option though as it doesn't seem particularly useful - unless I'm mistaken?

@balki
Copy link
Author

balki commented Mar 27, 2022

It looks like returning the error breaks all xml as EOF is an error. Below change works for me.

❯ git diff
diff --git a/pkg/yqlib/decoder_xml.go b/pkg/yqlib/decoder_xml.go
index 43487b9..232e554 100644
--- a/pkg/yqlib/decoder_xml.go
+++ b/pkg/yqlib/decoder_xml.go
@@ -189,6 +189,7 @@ type element struct {
 // of the map keys.
 func (dec *xmlDecoder) decodeXML(root *xmlNode) error {
        xmlDec := xml.NewDecoder(dec.reader)
+       xmlDec.Strict = false
 
        // That will convert the charset if the provided XML is non-UTF-8
        xmlDec.CharsetReader = charset.NewReaderLabel
@@ -202,7 +203,7 @@ func (dec *xmlDecoder) decodeXML(root *xmlNode) error {
        for {
                t, e := xmlDec.Token()
                if e != nil {
-                       return e
+                       log.Debug("got xml error %v", e)
                }
                if t == nil {
                        break

Output:

❯ cat foo.xml | tee /dev/stderr | go run yq.go -p xml '.' 
<?xml version="1.0"?>
<!DOCTYPE root [

<!ELEMENT root (item)>
<!ELEMENT item (#PCDATA)>

<!ENTITY writer "Blah.">
<!ENTITY copyright "Blah">
]>
<root>
    <item>Hello &writer;&copyright;</item>
</root>
root:
  item: Hello &writer;&copyright;

@mikefarah
Copy link
Owner

Got it cheers 👍🏼 will have a fix in the next release

@mikefarah
Copy link
Owner

Fixed in 4.24.2 - note that strict is false by default, thanks for all your help!

@balki
Copy link
Author

balki commented Mar 28, 2022

Awesome! Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants