New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOMDocument::savexml and friends ommit xmlns="" declaration for null namespace, creating incorrect xml representation of the DOM #11404
Comments
For the record, it's equally wrong in Python 3: from xml.dom.minidom import parseString
dom1 = parseString('<?xml version="1.0" ?><with xmlns="some:ns" />')
dom2 = parseString('<?xml version="1.0" ?><none />')
dom1.documentElement.appendChild(
dom1.importNode(
dom2.documentElement,True
)
)
print dom1.toxml() I'm not sure if |
Could be a libxml issue, however PHP does have a non-trivial amount of code in its implementation of importNode - including some work with node namespaces - so something there could be responsible as well... |
Nice idea, but <?php declare(strict_types = 1);
$dom1 = new DOMDocument;
$dom1->loadXML('<?xml version="1.0" ?><with xmlns="some:ns" />');
$nodeA = $dom1->createElement('none');
$nodeB = $dom1->createElementNS(null, 'none');
$dom1->documentElement->appendChild($nodeA);
$dom1->documentElement->appendChild($nodeB);
echo $dom1->saveXML(); <?xml version="1.0"?>
<with xmlns="some:ns"><none/><none/></with> While it's of course questionable to use the DOM Level 1 API for creating var_dump($nodeA->namespaceURI, $nodeB->namespaceURI);
A potential workaround when in control of creating the nodes, instead of specifying NULL as namespace, one could use |
Going to assign @nielsdos to this, as they have the best understanding currently of the DOM extension. |
Here's what I found. Some more issues:
These two bullet points are bugs in PHP and can fortunately easily be resolved. About libxml2's behavior. This is further confirmed by this C program: #include <stdio.h>
#include <string.h>
#include <libxml/parser.h>
#include <libxml/tree.h>
static xmlNodePtr firstElementNode(xmlNodePtr node) {
while (node->type != XML_ELEMENT_NODE) node = node->next;
return node;
}
int main() {
// Parse document
const char* xmlString = "<?xml version=\"1.0\" ?><with xmlns=\"some:ns\" />";
xmlDocPtr doc = xmlReadMemory(xmlString, strlen(xmlString), NULL, NULL, 0);
// Add some children
xmlNodePtr rootNode = xmlDocGetRootElement(doc);
xmlNodePtr child1 = xmlNewDocNode(doc, NULL, (const xmlChar *)"child1", NULL);
xmlAddChild(rootNode, child1);
xmlNodePtr child2 = xmlNewDocNode(doc, NULL, (const xmlChar *)"child2", NULL);
xmlAddChild(rootNode, child2);
// Print out doc's string representation
xmlChar* serializedXmlString;
xmlDocDumpFormatMemory(doc, &serializedXmlString, NULL, 1);
printf("%s\n", serializedXmlString);
// Both ->ns pointers are NULL
printf("child1=%p, child1->ns=%p\n", child1, child1->ns);
printf("child2=%p, child2->ns=%p\n", child2, child2->ns);
// Read back the serialized document into newDoc, and print out its string representation
xmlDocPtr newDoc = xmlReadMemory((const char*)serializedXmlString, strlen((const char*)serializedXmlString), NULL, NULL, 0);
xmlFree(serializedXmlString);
xmlDocDumpFormatMemory(newDoc, &serializedXmlString, NULL, 1);
printf("%s\n", serializedXmlString);
xmlFree(serializedXmlString);
// Both ->ns pointers are no longer NULL, they point to the default namespace
child1 = firstElementNode(xmlDocGetRootElement(newDoc)->children);
printf("child1=%p, child1->ns=%p, ns href=%s\n", child1, child1->ns, child1->ns->href);
child2 = firstElementNode(child1->next);
printf("child2=%p, child2->ns=%p, ns href=%s\n", child2, child2->ns, child2->ns->href);
// Cleanup
xmlFreeDoc(newDoc);
xmlFreeDoc(doc);
xmlCleanupParser();
return 0;
} I'll create an issue report on the libxml2 project with some questions. Either this is all expected behaviour and we'll have to tweak something at PHP's side, or this is something wrong in libxml2. EDIT: the libxml2 maintainer replied and said this behaviour is by design and the library user should perform the necessary checks. |
…aration for null namespace, creating incorrect xml representation of the DOM The NULL namespace is only correct when there is no default namespace override. When there is, we need to manually set it to the empty string namespace.
…aration for null namespace, creating incorrect xml representation of the DOM The NULL namespace is only correct when there is no default namespace override. When there is, we need to manually set it to the empty string namespace.
…aration for null namespace, creating incorrect xml representation of the DOM The NULL namespace is only correct when there is no default namespace override. When there is, we need to manually set it to the empty string namespace.
* PHP-8.1: Fix GH-11404: DOMDocument::savexml and friends ommit xmlns="" declaration for null namespace, creating incorrect xml representation of the DOM
* PHP-8.2: Fix GH-11404: DOMDocument::savexml and friends ommit xmlns="" declaration for null namespace, creating incorrect xml representation of the DOM
… declaration for null namespace, creating incorrect xml representation of the DOM" This reverts commit 7eb3e9c. Although the fix follows the spec, it causes issues because a lot of old code assumes the incorrect behaviour PHP had since a long time. We cannot do this yet, especially not in a stable release. We revert this for the time being. See GH-11428.
* PHP-8.1: Revert "Fix GH-11404: DOMDocument::savexml and friends ommit xmlns="" declaration for null namespace, creating incorrect xml representation of the DOM"
Reopening because #11428 (comment) and following comments. |
For future reference, this works correctly in #13031 |
This is fixed in the new opt-in spec-compliance mode, which was merged in #13031. |
Description
The following code:
Resulted in this output:
That is incorrect. The imported node
none
is in no namespace, the correct serialization to XML thus has to explicitly set the xmlns to""
. Otherwise parsing the the generated XML back into a DOM places the nodenone
into the same namespace as its parentwith
. Adding the following to the above example demonstrates that:Output:
The correct XML serialization of the DOM is:
A simple test in JavaScript shows the correct result at least in Firefox:
PHP Version
PHP 8.2.7 / PHP 8.1.20 / PHP 8.0.29 / PHP 7.4.33
Operating System
Fedora Linux 38 x86_64
The text was updated successfully, but these errors were encountered: