Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add xml declaration to force utf-8 #71

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

b1rdex
Copy link

@b1rdex b1rdex commented May 8, 2019

See #70

@codecov
Copy link

codecov bot commented May 8, 2019

Codecov Report

Merging #71 into master will decrease coverage by 0.65%.
The diff coverage is 80.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master      #71      +/-   ##
============================================
- Coverage     99.30%   98.65%   -0.66%     
- Complexity      125      130       +5     
============================================
  Files            15       15              
  Lines           288      297       +9     
============================================
+ Hits            286      293       +7     
- Misses            2        4       +2     
Impacted Files Coverage Δ Complexity Δ
src/DOMNodeComparator.php 93.10% <80.00%> (-6.90%) 15.00 <5.00> (+5.00) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 988baa0...f315e2d. Read the comment docs.

@sebastianbergmann
Copy link
Owner

@theseer Does this make sense to you?

@theseer
Copy link

theseer commented May 8, 2019

First off: I'm not sure whether C14N serializes the DOM-Subtree at hand using UTF-8 because it being the internal encoding within libXML or using the original encoding as specified for the containing document.

If it is indeed ignoring the original ecoding, the patch is technically correct but should be superfluous as UTF-8 is the default. It's generally considered nice to specify it anyhow, though.

If C14N* is not using UTF-8 but honors the original document encoding, $node->ownerDocument->actualEncoding should be used instead of a hardcoded UTF-8. There is, to my knowledge, no way to specify the wanted target-encoding with C14N*.

@b1rdex
Copy link
Author

b1rdex commented May 8, 2019 via email

@b1rdex
Copy link
Author

b1rdex commented May 13, 2019

OK, I changed it to $node->ownerDocument->encoding. I used encoding as documentation states that actualEncoding is deprecated and it's a readonly equivalent to encoding.

Here is my original test example with ->encoding, it's working same as utf-8: https://3v4l.org/YfMoq

@sebastianbergmann we can decouple XML canonicalisation out of comparator to write test on it. Should I do it?

@b1rdex
Copy link
Author

b1rdex commented May 13, 2019

I had to add Elvis to fix tests. Apparently, encoding isn't always present.

@eclipxe13
Copy link

@theseer according to https://www.w3.org/TR/xml-c14n11/ it says: "The document is encoded in UTF-8".

@b1rdex:

  • may I suggest using $document = new DOMDocument('1.0', $node->ownerDocument->encoding ?: 'UTF-8'); ?
  • $node->ownerDocument can be null and the comparator also accepts DOMNode, not only DOMDocument, maybe $node->ownerDocument->encoding ?? false ?: 'UTF-8' ?
  • can you provide some testing? Adding a case to DOMNodeComparatorTest.php#L59 could work.

@b1rdex
Copy link
Author

b1rdex commented Jul 10, 2020

  • may I suggest using $document = new DOMDocument('1.0', $node->ownerDocument->encoding ?: 'UTF-8');

That doesn't work. I tried

    /**
     * Tries to get the encoding from DOMNode or falls back to UTF-8
     */
    private function getNodeEncoding(DOMNode $node): string
    {
        $encoding = '';
        if ($node->ownerDocument) {
            $encoding = $node->ownerDocument->xmlEncoding ?: $node->ownerDocument->encoding;
        } elseif ($node instanceof DOMEntity) {
            $encoding = $node->encoding;
        }

        return $encoding ?: 'UTF-8';
    }

I even tried just setting it to UTF-8 directly. Nothing changes until I add a <?xml version="1.0" encoding header.

@eclipxe13 you can play with it yourself https://3v4l.org/FvCAs

I'll update the PR w/ the test shortly.

@sebastianbergmann
Copy link
Owner

Please avoid useless commit messages such as Update DOMNodeComparator.php.

@b1rdex
Copy link
Author

b1rdex commented Jul 10, 2020

Please avoid useless commit messages such as Update DOMNodeComparator.php.

@sebastianbergmann I've force pushed the squashed branch. Also, there is an option on GitHub to squash commits before merging.

@b1rdex b1rdex force-pushed the fix-utf8 branch 3 times, most recently from 5fb0da9 to f6f39fd Compare July 10, 2020 08:22
@eclipxe13
Copy link

@b1rdex I see your point. Nothing changes until I add a <?xml version="1.0" encoding="ENCODING" ?> header.

That is because loadXml() reset the document encoding, even when it was set on construction.

This code also works:

$document = new DOMDocument();
@$document->loadXml(node->C14N());

// at this point $document->xmlEncoding is null, it is also read-only and has the undesired encoding

$document->encoding = $this->getNodeEncoding($node);

// at this point you have the output of `saveXml()` as you expect

@b1rdex
Copy link
Author

b1rdex commented Jul 11, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants