How to ignore "insignificant whitespace"?

I'm using the term "insignificant whitespace" as defined in [What You Need to Know About Whitespace in XML](http://www.oracle.com/technetwork/articles/wang-whitespace-092897.html):

> _Insignificant whitespace_ is used when editing XML documents for readability. These whitespaces are typically not intended for inclusion in the delivery of the document.

Consider XML that has been formatted for human eyeballs. `xml2` can read it w/o error and well-formed XPATH expressions do what one expects. But once you use `as_list()` or try to write it back out with `write_xml()`, you learn there are problems with whitespace.

``` r
cd <- read_xml("http://www.xmlfiles.com/examples/cd_catalog.xml")
```

Targetted queries work fine:

``` r
xml_find_all(cd, ".//TITLE")
#> {xml_nodeset (26)}
#>  [1] <TITLE>Empire Burlesque</TITLE>
#>  [2] <TITLE>Hide your heart</TITLE>
#>  [3] <TITLE>Greatest Hits</TITLE>
#>  [4] <TITLE>Still got the blues</TITLE>
#>  [5] <TITLE>Eros</TITLE>
#>  [6] <TITLE>One night only</TITLE>
#>  [7] <TITLE>Sylvias Mother</TITLE>
#>  [8] <TITLE>Maggie May</TITLE>
#>  [9] <TITLE>Romanza</TITLE>
#> [10] <TITLE>When a man loves a woman</TITLE>
#> [11] <TITLE>Black angel</TITLE>
#> [12] <TITLE>1999 Grammy Nominees</TITLE>
#> [13] <TITLE>For the good times</TITLE>
#> [14] <TITLE>Big Willie style</TITLE>
#> [15] <TITLE>Tupelo Honey</TITLE>
#> [16] <TITLE>Soulsville</TITLE>
#> [17] <TITLE>The very best of</TITLE>
#> [18] <TITLE>Stop</TITLE>
#> [19] <TITLE>Bridge of Spies</TITLE>
#> [20] <TITLE>Private Dancer</TITLE>
#> ...
```

But `as_list()` reveals some problems

``` r
str(as_list(cd), max.level = 1, list.len = 5)
#> List of 53
#>  $   : chr "\n  "
#>  $ CD:List of 13
#>   .. [list output truncated]
#>  $   : chr "\n  "
#>  $ CD:List of 13
#>   .. [list output truncated]
#>  $   : chr "\n  "
#>   [list output truncated]
```

and indeed you can't invert things with `write_xml()`

``` r
write(cd, "cd_catalog.xml")
#> Error in cat(list(...), file, sep, fill, labels, append): argument 1 (type 'list') cannot be handled by 'cat'
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to ignore "insignificant whitespace"? #49

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to ignore "insignificant whitespace"? #49

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions