Closed
Description
I'm using the term "insignificant whitespace" as defined in What You Need to Know About Whitespace in XML:
Insignificant whitespace is used when editing XML documents for readability. These whitespaces are typically not intended for inclusion in the delivery of the document.
Consider XML that has been formatted for human eyeballs. xml2
can read it w/o error and well-formed XPATH expressions do what one expects. But once you use as_list()
or try to write it back out with write_xml()
, you learn there are problems with whitespace.
cd <- read_xml("http://www.xmlfiles.com/examples/cd_catalog.xml")
Targetted queries work fine:
xml_find_all(cd, ".//TITLE")
#> {xml_nodeset (26)}
#> [1] <TITLE>Empire Burlesque</TITLE>
#> [2] <TITLE>Hide your heart</TITLE>
#> [3] <TITLE>Greatest Hits</TITLE>
#> [4] <TITLE>Still got the blues</TITLE>
#> [5] <TITLE>Eros</TITLE>
#> [6] <TITLE>One night only</TITLE>
#> [7] <TITLE>Sylvias Mother</TITLE>
#> [8] <TITLE>Maggie May</TITLE>
#> [9] <TITLE>Romanza</TITLE>
#> [10] <TITLE>When a man loves a woman</TITLE>
#> [11] <TITLE>Black angel</TITLE>
#> [12] <TITLE>1999 Grammy Nominees</TITLE>
#> [13] <TITLE>For the good times</TITLE>
#> [14] <TITLE>Big Willie style</TITLE>
#> [15] <TITLE>Tupelo Honey</TITLE>
#> [16] <TITLE>Soulsville</TITLE>
#> [17] <TITLE>The very best of</TITLE>
#> [18] <TITLE>Stop</TITLE>
#> [19] <TITLE>Bridge of Spies</TITLE>
#> [20] <TITLE>Private Dancer</TITLE>
#> ...
But as_list()
reveals some problems
str(as_list(cd), max.level = 1, list.len = 5)
#> List of 53
#> $ : chr "\n "
#> $ CD:List of 13
#> .. [list output truncated]
#> $ : chr "\n "
#> $ CD:List of 13
#> .. [list output truncated]
#> $ : chr "\n "
#> [list output truncated]
and indeed you can't invert things with write_xml()
write(cd, "cd_catalog.xml")
#> Error in cat(list(...), file, sep, fill, labels, append): argument 1 (type 'list') cannot be handled by 'cat'
Metadata
Metadata
Assignees
Labels
No labels