New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IN_PROGRESS: Mutating/Creating nodes #76

Merged
merged 74 commits into from May 18, 2016

Conversation

Projects
None yet
4 participants
@jimhester
Member

jimhester commented Feb 17, 2016

Clearly this work isn't done, but this PR will be a good place for discussion.

@jimhester jimhester referenced this pull request Feb 19, 2016

Closed

Use actual maps for NsMap #77

@jimhester jimhester changed the title from IN_PROGRESS: Very basic modification of single nodes to IN_PROGRESS: Mutating/Creating nodes Feb 22, 2016

@jimhester

This comment has been minimized.

Member

jimhester commented May 2, 2016

@hadley I think this is mostly in a good state now and modifying existing nodes has most of the features I think we want (with possible exception of setting the namespace of a node). I wrote a simple vignette to explain the functionality but you can also look at the tests to get a feel for the API.

Questions

  • What should the modification/replacement functions do with xml_missing objects, throw an error or return them unchanged? If the latter should we issue a warning?
  • How do we want to support creation of new nodes? Currently you can use read_xml() with character vector input and use the modification functions to insert the node into a tree. But this is clearly less efficient than creating the nodes directly.
  • xml_text()<- uses xmlNodeSetContentLen internally, which means that if it is called on a parent with non-text children they will be destroyed. This is the expected behavior for xmlNodeSetContentLen, but it could cause confusion or errors for users if they have objects pointing to the child nodes.
R/modify.R Outdated
#' then the node will be moved from it's current location.
#' @param value node or nodeset to replace with.
#' @export
`xml_replace<-` <- function(x, copy = TRUE, value) UseMethod("xml_replace<-")

This comment has been minimized.

@hadley

hadley May 2, 2016

Member

I think these look better with {}

R/modify.R Outdated
Map(xml_add_sibling, rev(x), rev(value), where, copy)
}
#' Add a child node

This comment has been minimized.

@hadley

hadley May 2, 2016

Member

I think all three replacement methods could be documented together

stop("`value` must be a named character vector or `NULL`", call. = FALSE)
}
attrs <- names(value)

This comment has been minimized.

@hadley

hadley May 2, 2016

Member

Need to check this is not NULL, and elements not "" or NA? (You can just throw an error if not)

has_names <- function(x) {
  nms <- names(x)
  if (is.null(nms)) {
    rep_along(x, FALSE)
  } else {
    !(is.na(nms) | nms == "")
  }
}
@hadley

This comment has been minimized.

Member

hadley commented May 2, 2016

For creation, I'm imagining something like:

xml_new_node("nodename", child1, child2, attr1 = "foo", attr3 = "bar")

I'm not sure how namespaces would work here. One option would be to have special name and attribute constructors:

xml_new_node(nsName("url", "nodename"), child1, child2, nsAttr("attr1", "url", "foo"))

And instead of url could you use a prefix, if you also supplied the ns map.

@hadley

This comment has been minimized.

Member

hadley commented May 2, 2016

@jimhester

This comment has been minimized.

Member

jimhester commented May 6, 2016

After a81f82c you can now do the following to get the desired XML from the SO post.

Maybe it would be nicer to have the second argument to xml_new_namespace() be a character vector of namespaces to add, with the names as the prefixes.

d <- xml_new_document(xml_new_node("sld", version = "1.1.0", xml_new_node("layer", xml_new_node("Name"))))
sld <- xml_find_one(d, "//sld")
name <- xml_find_one(d, "//Name")
xml_new_namespace(sld, "http://www.o.net/sld")
xml_new_namespace(sld, "http://www.o.net/ogc", "ogc")
xml_new_namespace(sld, "http://www.o.net/se", "se")
xml_set_namespace(name, "se")
cat(as.character(d))
#> <?xml version="1.0"?>
#> <sld xmlns="http://www.o.net/sld" xmlns:ogc="http://www.o.net/ogc" xmlns:se="http://www.o.net/se" version="1.1.0"><layer><se:Name/></layer></sld>
@hadley

This comment has been minimized.

Member

hadley commented May 6, 2016

Yeah, I like the idea to add multiple namespaces at once.

Could we make something like this work?

xml_new_document() %>% 
  xml_add_child("sld", version = "1.1.0") %>% 
  xml_add_namespace(
    "http://www.o.net/sld", 
    ogc = "http://www.o.net/ogc",
    se = "http://www.o.net/se"
  ) %>% 
  xml_add_child("layer") %>% 
  xml_add_child("se:Name")
@hadley

This comment has been minimized.

Member

hadley commented May 6, 2016

Or maybe even:

xml_new_document() %>% 
  xml_add_child("sld", 
    version = "1.1.0"
    ns = "http://www.o.net/sld", 
    "ns:ogc" = "http://www.o.net/ogc",
    "ns:se" = "http://www.o.net/se"
  ) %>% 
  xml_add_child("layer") %>% 
  xml_add_child("se:Name")

(That would require more work on the backend to parse the attribute names and create ns objects where necessary)

@jimhester

This comment has been minimized.

Member

jimhester commented May 6, 2016

Yeah I can get that working. The main issue is the current xml_add_* functions all expect the value argument to be a xml_node, so I either need to define a xml_add_new_* functions to create a new node with the specified name or have different behavior if value is a character(1). The current implementation was the simplest path forward with the current API, but I see the appeal of the proposed interface.

I think a format like the following will be possible (ns needs to be .ns to avoid classes with an attribute named ns).

xml_new_document() %>% 
  xml_add_child("sld", 
    version = "1.1.0"
    .ns = c("http://www.o.net/sld", 
      ogc = "http://www.o.net/ogc",
      se = "http://www.o.net/se")
  ) %>% 
  xml_add_child("layer") %>% 
  xml_add_child("se:Name")
@hadley

This comment has been minimized.

Member

hadley commented May 6, 2016

Doing S3 dispatch on character vs. xmlNode seems reasonable to me. But I'm not sure what using .ns buys us over ns, ns:ogc etc.

@jimhester

This comment has been minimized.

Member

jimhester commented May 6, 2016

How do you create an attribute named 'ns' otherwise? The argument would
have to be named 'xmlns' to avoid a conflict.
On May 6, 2016 6:14 PM, "Hadley Wickham" notifications@github.com wrote:

Doing S3 dispatch on character vs. xmlNode seems reasonable to me. But
I'm not sure what using .ns buys us over ns, ns:ogc etc.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#76 (comment)

@hadley

This comment has been minimized.

Member

hadley commented May 7, 2016

Oh yeah, I meant xmlns instead of ns everywhere (I forgot what ns attributes look like)

@jimhester

This comment has been minimized.

Member

jimhester commented May 9, 2016

as of c2fcf36 you can now do

d1 <- xml_new_document(
  xml_new_node("sld",
    version = "1.1.0",
    xmlns = "http://www.o.net/sld",
    "xmlns:ogc" = "http://www.o.net/ogc",
    "xmlns:se" = "http://www.o.net/se") %>%
  xml_add_child("layer") %>%
  xml_add_child("se:Name"))
as.character(d1)
#> [1] "<?xml version=\"1.0\"?>\n<sld xmlns=\"http://www.o.net/sld\" xmlns:ogc=\"http://www.o.net/ogc\" xmlns:se=\"http://www.o.net/se\" version=\"1.1.0\"><layer/><se:Name/></sld>\n"

Using the exact syntax from above for xml_new_document() is tricky as xml_document objects expect to have a root node defined. I am still working on a good way to implement it.

@jimhester

This comment has been minimized.

Member

jimhester commented May 9, 2016

As of 2c3c002 you can use the following

d1 <- xml_new_document()
xml_add_child(d1, "sld",
    version = "1.1.0",
    xmlns = "http://www.o.net/sld",
    "xmlns:ogc" = "http://www.o.net/ogc",
    "xmlns:se" = "http://www.o.net/se") %>%
  xml_add_child("layer") %>%
  xml_add_child("se:Name")
#> {xml_node}
#> <Name>
as.character(d1)
#> [1] "<?xml version=\"1.0\"?>\n<sld xmlns=\"http://www.o.net/sld\" xmlns:ogc=\"http://www.o.net/ogc\" xmlns:se=\"http://www.o.net/se\" version=\"1.1.0\"><layer><se:Name/></layer></sld>\n"

The only thing preventing you doing a full chain like the original example is that xml_add_child() returns the child node rather than the parent so you have way to access the whole document otherwise.

@hadley

This comment has been minimized.

Member

hadley commented May 9, 2016

Perfect!!

@jennybc how does this look to you?

@jennybc

This comment has been minimized.

Member

jennybc commented May 10, 2016

I will try this out in the next day or two.

@codecov-io

This comment has been minimized.

codecov-io commented May 11, 2016

Current coverage is 59.45%

Merging #76 into master will increase coverage by +9.45%

@@             master        #76   diff @@
==========================================
  Files            29         30     +1   
  Lines           852       1233   +381   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            426        733   +307   
- Misses          426        500    +74   
  Partials          0          0          
  1. 2 files in src were modified. more
    • Misses -14
    • Hits +14
  2. 4 files in R were modified. more
    • Misses -6
    • Hits +6

Powered by Codecov. Last updated by bbb035c...61735cd

@jennybc

This comment has been minimized.

Member

jennybc commented May 12, 2016

Is there an equivalent of XML::xmlNode(), i.e. a function to create a single new node de novo? I'm trying to translate something currently implemented with XML. I create a gazillion nodes as components of a list with purrr::pmap() and XML::xmlNode(), based on a data frame where each row provides info needed to populate one node. Then I add them to a (new) xml document en masse via XML::addChildren(kids = LIST_OF_GAZILLION_NODES).

It feels like xml2::xml_add_child() can play the role XML::addChildren(), once I figure out how to create my gazillion nodes as a nodeset (?). But how am I supposed to do that?

Am I overlooking some suitably wrapped version of the unexported function create_node()?

@jimhester

This comment has been minimized.

Member

jimhester commented May 12, 2016

@jennybc we removed support for creating free nodes, as it makes it impossible to assign a namespace to the nodes on creation.

For your use case can't you just pmap() directly with xml2::xml_add_child(), to create and add the nodes in one operation?

If you can point to your XML example code I can also take a look at how it would be converted.

@jimhester

This comment has been minimized.

Member

jimhester commented May 12, 2016

@jennybc Can you do something like the following?

``` r
mtcars$name <- gsub(" ", "-", row.names(mtcars))
d <- xml_add_child(xml_new_document(), "cars")
add_child <- function(...) {
  x <- purrr::map(list(...), as.character)

  purrr::invoke(xml_add_child,
    c(list(.x = d, .value = x$name), x[names(x) != 'name']))
}
purrr::pwalk(mtcars, add_child)
d
#> {xml_document}
#> <cars>
#>  [1] <Mazda-RX4 mpg="21" cyl="6" disp="160" hp="110" drat="3.9" wt="2.62 ...
#>  [2] <Mazda-RX4-Wag mpg="21" cyl="6" disp="160" hp="110" drat="3.9" wt=" ...
#>  [3] <Datsun-710 mpg="22.8" cyl="4" disp="108" hp="93" drat="3.85" wt="2 ...
#>  [4] <Hornet-4-Drive mpg="21.4" cyl="6" disp="258" hp="110" drat="3.08"  ...
#>  [5] <Hornet-Sportabout mpg="18.7" cyl="8" disp="360" hp="175" drat="3.1 ...
#>  [6] <Valiant mpg="18.1" cyl="6" disp="225" hp="105" drat="2.76" wt="3.4 ...
#>  [7] <Duster-360 mpg="14.3" cyl="8" disp="360" hp="245" drat="3.21" wt=" ...
#>  [8] <Merc-240D mpg="24.4" cyl="4" disp="146.7" hp="62" drat="3.69" wt=" ...
#>  [9] <Merc-230 mpg="22.8" cyl="4" disp="140.8" hp="95" drat="3.92" wt="3 ...
#> [10] <Merc-280 mpg="19.2" cyl="6" disp="167.6" hp="123" drat="3.92" wt=" ...
#> [11] <Merc-280C mpg="17.8" cyl="6" disp="167.6" hp="123" drat="3.92" wt= ...
#> [12] <Merc-450SE mpg="16.4" cyl="8" disp="275.8" hp="180" drat="3.07" wt ...
#> [13] <Merc-450SL mpg="17.3" cyl="8" disp="275.8" hp="180" drat="3.07" wt ...
#> [14] <Merc-450SLC mpg="15.2" cyl="8" disp="275.8" hp="180" drat="3.07" w ...
#> [15] <Cadillac-Fleetwood mpg="10.4" cyl="8" disp="472" hp="205" drat="2. ...
#> [16] <Lincoln-Continental mpg="10.4" cyl="8" disp="460" hp="215" drat="3 ...
#> [17] <Chrysler-Imperial mpg="14.7" cyl="8" disp="440" hp="230" drat="3.2 ...
#> [18] <Fiat-128 mpg="32.4" cyl="4" disp="78.7" hp="66" drat="4.08" wt="2. ...
#> [19] <Honda-Civic mpg="30.4" cyl="4" disp="75.7" hp="52" drat="4.93" wt= ...
#> [20] <Toyota-Corolla mpg="33.9" cyl="4" disp="71.1" hp="65" drat="4.22"  ...
#> ...
@jennybc

This comment has been minimized.

Member

jennybc commented May 12, 2016

I am trying to whittle my problem down to something I can either (a) solve or (b) inflict on you @jimhester.

@jennybc

This comment has been minimized.

Member

jennybc commented May 12, 2016

I choose (b) let @jimhester figure it out. Sorry this is not very minimal, but the examples above and the tests, etc., really don't get into the bits that feel hard about my actual problem. So I present in its full glory.

https://gist.github.com/jennybc/37398c989852aa0c7abd9c5cea75e4f6

There's an R script, corresponding md, input as csv, and XML output.

@jennybc

This comment has been minimized.

Member

jennybc commented May 13, 2016

Some features of the XML creation style of XML that seem to be missing in what's proposed for xml2:

  • The ability to create free nodes, which is actually encouraged in XML. I liked that. Then again, I didn't like XML's memory leak problems and I suppose those are probably related 😬.
  • The ability to add lots of children at once and to define children recursively.

I think anyone translating from XML code will need to know how they're supposed to do these things with xml2. Or what to do instead.

@jimhester

This comment has been minimized.

Member

jimhester commented May 13, 2016

@jennybc This is how I would reproduce the XML output with the current code (63e5c1c).

update_fodder <- read_csv("27_update-fodder.csv")
africa_cellsfeed <- "SOME_URL"

library(xml2)
d <- xml_new_document() %>%
  xml_add_child("feed",
    xmlns = "http://www.w3.org/2005/Atom",
    "xmlns:batch" = "http://schemas.google.com/gdata/batch",
    "xmlns:gs" = "http://schemas.google.com/spreadsheets/2006")

d %>% xml_add_child("id", africa_cellsfeed)
f_XML <- function(cell, cell_id, edit_link, row, col, update_value) {
  d %>%
    xml_add_child("entry") %>%
    xml_add_child("batch:id", cell) %>%
    xml_add_sibling("batch:operation", type = "update") %>%
    xml_add_sibling("id", cell_id) %>%
    xml_add_sibling("link", rel = "edit", type = "application/atom+xml", href = edit_link) %>%
    xml_add_sibling("gs:cell", row = as.character(row), col = as.character(col), inputValue = update_value)
}

update_fodder %>% pwalk(f_XML)

write_xml(d, "27_feed-node-xml2.xml")

The only difference from the XML output is the use of a XML declaration, which while optional is usually recommended, libXML includes it by default when you create a new document object.

diff -ub 27_feed-node-XML.xml 27_feed-node-xml2.xml
--- 27_feed-node-XML.xml        2016-05-13 09:17:26.000000000 -0500
+++ 27_feed-node-xml2.xml       2016-05-13 10:34:13.000000000 -0500
@@ -1,3 +1,4 @@
+<?xml version="1.0"?>
 <feed xmlns="http://www.w3.org/2005/Atom" xmlns:batch="http://schemas.google.co
m/gdata/batch" xmlns:gs="http://schemas.google.com/spreadsheets/2006">
  <id>SOME_URL</id>
  <entry>
@jennybc

This comment has been minimized.

Member

jennybc commented May 13, 2016

Yes that all works me (with a caveat below about whitespace).

As for the XML declaration, I will try sending it along. I have just been following these instructions to the letter and it does not include such a declaration (select "protocol", if it's not already selected, to see the target XML):

https://developers.google.com/google-apps/spreadsheets/#updating_multiple_cells_with_a_batch_request

Now that xml2 writes XML with line breaks, it needs to consume it as well. I updated my gist to show the problem and I opened an issue about this last year (#49).

@jennybc

This comment has been minimized.

Member

jennybc commented May 14, 2016

I used @jimhester's basic approach to swap xml2 for XML in all instances of XML creation in my actual package. Also switched to xml2 for XML mutation where I was previously using regular expressions. Was able to remove XML entirely and all tests pass.

jenga

@jimhester jimhester merged commit 7e980dd into r-lib:master May 18, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment