Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
198 lines (158 sloc) 4.98 KB
---
title: "Node Modification"
author: "Jim Hester"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Node Modification}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
library(xml2)
library(magrittr)
```
# Modifying Existing XML
Modifying existing XML can be done in xml2 by using the replacement functions
of the accessors. They all have methods for both individual `xml_node` objects
as well as `xml_nodeset` objects. If a vector of values is provided it is
applied piecewise over the nodeset, otherwise the value is recycled.
## Text Modification ##
Text modification only happens on text nodes. If a given node has more than one
text node only the first will be affected. If you want to modify additional
text nodes you need to select them explicitly with `/text()`.
```{r}
x <- read_xml("<p>This is some <b>text</b>. This is more.</p>")
xml_text(x)
xml_text(x) <- "This is some other text."
xml_text(x)
# You can avoid this by explicitly selecting the text node.
x <- read_xml("<p>This is some text. This is <b>bold!</b></p>")
text_only <- xml_find_all(x, "//text()")
xml_text(text_only) <- c("This is some other text. ", "Still bold!")
xml_text(x)
xml_structure(x)
```
## Attribute and Namespace Definition Modification ##
Attributes and namespace definitions are modified one at a time with
`xml_attr()` or all at once with `xml_attrs()`. In both cases using `NULL` as
the value will remove the attribute completely.
```{r}
x <- read_xml("<a href='invalid!'>xml2</a>")
xml_attr(x, "href")
xml_attr(x, "href") <- "https://github.com/r-lib/xml2"
xml_attr(x, "href")
xml_attrs(x) <- c(id = "xml2", href = "https://github.com/r-lib/xml2")
xml_attrs(x)
x
xml_attrs(x) <- NULL
x
# Namespaces are added with as a xmlns or xmlns:prefix attribute
xml_attr(x, "xmlns") <- "http://foo"
x
xml_attr(x, "xmlns:bar") <- "http://bar"
x
```
## Name Modification ##
Node names are modified with `xml_name()`.
```{r}
x <- read_xml("<a><b/></a>")
x
xml_name(x)
xml_name(x) <- "c"
x
```
# Node modification #
All of these functions have a `.copy` argument. If this is set to `FALSE` they
will remove the new node from its location before inserting it into the new
location. Otherwise they make a copy of the node before insertion.
## Replacing existing nodes ##
```{r}
x <- read_xml("<parent><child>1</child><child>2<child>3</child></child></parent>")
children <- xml_children(x)
t1 <- children[[1]]
t2 <- children[[2]]
t3 <- xml_children(children[[2]])[[1]]
xml_replace(t1, t3)
x
```
## Add a sibling ##
```{r}
x <- read_xml("<parent><child>1</child><child>2<child>3</child></child></parent>")
children <- xml_children(x)
t1 <- children[[1]]
t2 <- children[[2]]
t3 <- xml_children(children[[2]])[[1]]
xml_add_sibling(t1, t3)
x
xml_add_sibling(t3, t1, where = "before")
x
```
## Add a child ##
```{r}
x <- read_xml("<parent><child>1</child><child>2<child>3</child></child></parent>")
children <- xml_children(x)
t1 <- children[[1]]
t2 <- children[[2]]
t3 <- xml_children(children[[2]])[[1]]
xml_add_child(t1, t3)
x
xml_add_child(t1, read_xml("<test/>"))
x
```
## Removing nodes ##
The `xml_remove()` can be used to remove a node (and its children) from a
tree. The default behavior is to unlink the node from the tree, but does _not_
free the memory for the node, so R objects pointing to the node are still
valid.
This allows code like the following to work without crashing R
```{r}
x <- read_xml("<foo><bar><baz/></bar></foo>")
x1 <- x %>% xml_children() %>% .[[1]]
x2 <- x1 %>% xml_children() %>% .[[1]]
xml_remove(x1)
rm(x1)
gc()
x2
```
If you are not planning on referencing these nodes again this memory is wasted.
Calling `xml_remove(free = TRUE)` will remove the nodes _and_ free the memory
used to store them. **Note** In this case _any_ node which previously pointed
to the node or it's children will instead be pointing to free memory and may
cause R to crash. xml2 can't figure this out for you, so its your
responsibility to remove any objects which are no longer valid.
In particular `xml_find_*()` results are easy to overlook, for example
```{r}
x <- read_xml("<a><b /><b><b /></b></a>")
bees <- xml_find_all(x, "//b")
xml_remove(xml_child(x), free = TRUE)
# bees[[1]] is no longer valid!!!
rm(bees)
gc()
```
## Namespaces ##
We want to construct a document with the following namespace layout. (From
http://stackoverflow.com/questions/32939229/creating-xml-in-r-with-namespaces/32941524#32941524).
```xml
<?xml version = "1.0" encoding="UTF-8"?>
<sld xmlns="http://www.o.net/sld"
xmlns:ogc="http://www.o.net/ogc"
xmlns:se="http://www.o.net/se"
version="1.1.0" >
<layer>
<se:Name>My Layer</se:Name>
</layer>
</sld>
```
```{r}
d <- xml_new_root("sld",
xmlns = "http://www.o.net/sld",
"xmlns:ogc" = "http://www.o.net/ogc",
"xmlns:se" = "http://www.o.net/se",
version = "1.1.0") %>%
xml_add_child("layer") %>%
xml_add_child("se:Name", "My Layer") %>%
xml_root()
d
```
You can’t perform that action at this time.