Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IN_PROGRESS: Mutating/Creating nodes #76

Merged
merged 74 commits into from May 18, 2016

Conversation

@jimhester
Copy link
Member

jimhester commented Feb 17, 2016

Clearly this work isn't done, but this PR will be a good place for discussion.

@jimhester jimhester changed the title IN_PROGRESS: Very basic modification of single nodes IN_PROGRESS: Mutating/Creating nodes Feb 22, 2016
@jimhester jimhester force-pushed the jimhester:modify-nodes branch 2 times, most recently from 19a94ca to 3a3b324 Apr 28, 2016
@jimhester

This comment has been minimized.

Copy link
Member Author

jimhester commented May 2, 2016

@hadley I think this is mostly in a good state now and modifying existing nodes has most of the features I think we want (with possible exception of setting the namespace of a node). I wrote a simple vignette to explain the functionality but you can also look at the tests to get a feel for the API.

Questions

  • What should the modification/replacement functions do with xml_missing objects, throw an error or return them unchanged? If the latter should we issue a warning?
  • How do we want to support creation of new nodes? Currently you can use read_xml() with character vector input and use the modification functions to insert the node into a tree. But this is clearly less efficient than creating the nodes directly.
  • xml_text()<- uses xmlNodeSetContentLen internally, which means that if it is called on a parent with non-text children they will be destroyed. This is the expected behavior for xmlNodeSetContentLen, but it could cause confusion or errors for users if they have objects pointing to the child nodes.
R/modify.R Outdated
#' then the node will be moved from it's current location.
#' @param value node or nodeset to replace with.
#' @export
`xml_replace<-` <- function(x, copy = TRUE, value) UseMethod("xml_replace<-")

This comment has been minimized.

Copy link
@hadley

hadley May 2, 2016

Member

I think these look better with {}

R/modify.R Outdated
Map(xml_add_sibling, rev(x), rev(value), where, copy)
}

#' Add a child node

This comment has been minimized.

Copy link
@hadley

hadley May 2, 2016

Member

I think all three replacement methods could be documented together

stop("`value` must be a named character vector or `NULL`", call. = FALSE)
}

attrs <- names(value)

This comment has been minimized.

Copy link
@hadley

hadley May 2, 2016

Member

Need to check this is not NULL, and elements not "" or NA? (You can just throw an error if not)

has_names <- function(x) {
  nms <- names(x)
  if (is.null(nms)) {
    rep_along(x, FALSE)
  } else {
    !(is.na(nms) | nms == "")
  }
}
@hadley

This comment has been minimized.

Copy link
Member

hadley commented May 2, 2016

For creation, I'm imagining something like:

xml_new_node("nodename", child1, child2, attr1 = "foo", attr3 = "bar")

I'm not sure how namespaces would work here. One option would be to have special name and attribute constructors:

xml_new_node(nsName("url", "nodename"), child1, child2, nsAttr("attr1", "url", "foo"))

And instead of url could you use a prefix, if you also supplied the ns map.

@hadley

This comment has been minimized.

Copy link
Member

hadley commented May 2, 2016

@jimhester

This comment has been minimized.

Copy link
Member Author

jimhester commented May 6, 2016

After a81f82c you can now do the following to get the desired XML from the SO post.

Maybe it would be nicer to have the second argument to xml_new_namespace() be a character vector of namespaces to add, with the names as the prefixes.

d <- xml_new_document(xml_new_node("sld", version = "1.1.0", xml_new_node("layer", xml_new_node("Name"))))
sld <- xml_find_one(d, "//sld")
name <- xml_find_one(d, "//Name")
xml_new_namespace(sld, "http://www.o.net/sld")
xml_new_namespace(sld, "http://www.o.net/ogc", "ogc")
xml_new_namespace(sld, "http://www.o.net/se", "se")
xml_set_namespace(name, "se")
cat(as.character(d))
#> <?xml version="1.0"?>
#> <sld xmlns="http://www.o.net/sld" xmlns:ogc="http://www.o.net/ogc" xmlns:se="http://www.o.net/se" version="1.1.0"><layer><se:Name/></layer></sld>
@hadley

This comment has been minimized.

Copy link
Member

hadley commented May 6, 2016

Yeah, I like the idea to add multiple namespaces at once.

Could we make something like this work?

xml_new_document() %>% 
  xml_add_child("sld", version = "1.1.0") %>% 
  xml_add_namespace(
    "http://www.o.net/sld", 
    ogc = "http://www.o.net/ogc",
    se = "http://www.o.net/se"
  ) %>% 
  xml_add_child("layer") %>% 
  xml_add_child("se:Name")
@hadley

This comment has been minimized.

Copy link
Member

hadley commented May 6, 2016

Or maybe even:

xml_new_document() %>% 
  xml_add_child("sld", 
    version = "1.1.0"
    ns = "http://www.o.net/sld", 
    "ns:ogc" = "http://www.o.net/ogc",
    "ns:se" = "http://www.o.net/se"
  ) %>% 
  xml_add_child("layer") %>% 
  xml_add_child("se:Name")

(That would require more work on the backend to parse the attribute names and create ns objects where necessary)

@jimhester

This comment has been minimized.

Copy link
Member Author

jimhester commented May 6, 2016

Yeah I can get that working. The main issue is the current xml_add_* functions all expect the value argument to be a xml_node, so I either need to define a xml_add_new_* functions to create a new node with the specified name or have different behavior if value is a character(1). The current implementation was the simplest path forward with the current API, but I see the appeal of the proposed interface.

I think a format like the following will be possible (ns needs to be .ns to avoid classes with an attribute named ns).

xml_new_document() %>% 
  xml_add_child("sld", 
    version = "1.1.0"
    .ns = c("http://www.o.net/sld", 
      ogc = "http://www.o.net/ogc",
      se = "http://www.o.net/se")
  ) %>% 
  xml_add_child("layer") %>% 
  xml_add_child("se:Name")
@hadley

This comment has been minimized.

Copy link
Member

hadley commented May 6, 2016

Doing S3 dispatch on character vs. xmlNode seems reasonable to me. But I'm not sure what using .ns buys us over ns, ns:ogc etc.

@jimhester

This comment has been minimized.

Copy link
Member Author

jimhester commented May 6, 2016

How do you create an attribute named 'ns' otherwise? The argument would
have to be named 'xmlns' to avoid a conflict.
On May 6, 2016 6:14 PM, "Hadley Wickham" notifications@github.com wrote:

Doing S3 dispatch on character vs. xmlNode seems reasonable to me. But
I'm not sure what using .ns buys us over ns, ns:ogc etc.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#76 (comment)

@hadley

This comment has been minimized.

Copy link
Member

hadley commented May 7, 2016

Oh yeah, I meant xmlns instead of ns everywhere (I forgot what ns attributes look like)

@jimhester

This comment has been minimized.

Copy link
Member Author

jimhester commented May 9, 2016

as of c2fcf36 you can now do

d1 <- xml_new_document(
  xml_new_node("sld",
    version = "1.1.0",
    xmlns = "http://www.o.net/sld",
    "xmlns:ogc" = "http://www.o.net/ogc",
    "xmlns:se" = "http://www.o.net/se") %>%
  xml_add_child("layer") %>%
  xml_add_child("se:Name"))
as.character(d1)
#> [1] "<?xml version=\"1.0\"?>\n<sld xmlns=\"http://www.o.net/sld\" xmlns:ogc=\"http://www.o.net/ogc\" xmlns:se=\"http://www.o.net/se\" version=\"1.1.0\"><layer/><se:Name/></sld>\n"

Using the exact syntax from above for xml_new_document() is tricky as xml_document objects expect to have a root node defined. I am still working on a good way to implement it.

@jimhester

This comment has been minimized.

Copy link
Member Author

jimhester commented May 9, 2016

As of 2c3c002 you can use the following

d1 <- xml_new_document()
xml_add_child(d1, "sld",
    version = "1.1.0",
    xmlns = "http://www.o.net/sld",
    "xmlns:ogc" = "http://www.o.net/ogc",
    "xmlns:se" = "http://www.o.net/se") %>%
  xml_add_child("layer") %>%
  xml_add_child("se:Name")
#> {xml_node}
#> <Name>
as.character(d1)
#> [1] "<?xml version=\"1.0\"?>\n<sld xmlns=\"http://www.o.net/sld\" xmlns:ogc=\"http://www.o.net/ogc\" xmlns:se=\"http://www.o.net/se\" version=\"1.1.0\"><layer><se:Name/></layer></sld>\n"

The only thing preventing you doing a full chain like the original example is that xml_add_child() returns the child node rather than the parent so you have way to access the whole document otherwise.

@hadley

This comment has been minimized.

Copy link
Member

hadley commented May 9, 2016

Perfect!!

@jennybc how does this look to you?

@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented May 10, 2016

I will try this out in the next day or two.

@jimhester jimhester force-pushed the jimhester:modify-nodes branch from 3f6822a to 04a83fe May 11, 2016
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented May 11, 2016

Current coverage is 59.45%

Merging #76 into master will increase coverage by +9.45%

@@             master        #76   diff @@
==========================================
  Files            29         30     +1   
  Lines           852       1233   +381   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            426        733   +307   
- Misses          426        500    +74   
  Partials          0          0          
  1. 2 files in src were modified. more
    • Misses -14
    • Hits +14
  2. 4 files in R were modified. more
    • Misses -6
    • Hits +6

Powered by Codecov. Last updated by bbb035c...61735cd

@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented May 12, 2016

Is there an equivalent of XML::xmlNode(), i.e. a function to create a single new node de novo? I'm trying to translate something currently implemented with XML. I create a gazillion nodes as components of a list with purrr::pmap() and XML::xmlNode(), based on a data frame where each row provides info needed to populate one node. Then I add them to a (new) xml document en masse via XML::addChildren(kids = LIST_OF_GAZILLION_NODES).

It feels like xml2::xml_add_child() can play the role XML::addChildren(), once I figure out how to create my gazillion nodes as a nodeset (?). But how am I supposed to do that?

Am I overlooking some suitably wrapped version of the unexported function create_node()?

@jimhester

This comment has been minimized.

Copy link
Member Author

jimhester commented May 12, 2016

@jennybc we removed support for creating free nodes, as it makes it impossible to assign a namespace to the nodes on creation.

For your use case can't you just pmap() directly with xml2::xml_add_child(), to create and add the nodes in one operation?

If you can point to your XML example code I can also take a look at how it would be converted.

@jimhester

This comment has been minimized.

Copy link
Member Author

jimhester commented May 12, 2016

@jennybc Can you do something like the following?

``` r
mtcars$name <- gsub(" ", "-", row.names(mtcars))
d <- xml_add_child(xml_new_document(), "cars")
add_child <- function(...) {
  x <- purrr::map(list(...), as.character)

  purrr::invoke(xml_add_child,
    c(list(.x = d, .value = x$name), x[names(x) != 'name']))
}
purrr::pwalk(mtcars, add_child)
d
#> {xml_document}
#> <cars>
#>  [1] <Mazda-RX4 mpg="21" cyl="6" disp="160" hp="110" drat="3.9" wt="2.62 ...
#>  [2] <Mazda-RX4-Wag mpg="21" cyl="6" disp="160" hp="110" drat="3.9" wt=" ...
#>  [3] <Datsun-710 mpg="22.8" cyl="4" disp="108" hp="93" drat="3.85" wt="2 ...
#>  [4] <Hornet-4-Drive mpg="21.4" cyl="6" disp="258" hp="110" drat="3.08"  ...
#>  [5] <Hornet-Sportabout mpg="18.7" cyl="8" disp="360" hp="175" drat="3.1 ...
#>  [6] <Valiant mpg="18.1" cyl="6" disp="225" hp="105" drat="2.76" wt="3.4 ...
#>  [7] <Duster-360 mpg="14.3" cyl="8" disp="360" hp="245" drat="3.21" wt=" ...
#>  [8] <Merc-240D mpg="24.4" cyl="4" disp="146.7" hp="62" drat="3.69" wt=" ...
#>  [9] <Merc-230 mpg="22.8" cyl="4" disp="140.8" hp="95" drat="3.92" wt="3 ...
#> [10] <Merc-280 mpg="19.2" cyl="6" disp="167.6" hp="123" drat="3.92" wt=" ...
#> [11] <Merc-280C mpg="17.8" cyl="6" disp="167.6" hp="123" drat="3.92" wt= ...
#> [12] <Merc-450SE mpg="16.4" cyl="8" disp="275.8" hp="180" drat="3.07" wt ...
#> [13] <Merc-450SL mpg="17.3" cyl="8" disp="275.8" hp="180" drat="3.07" wt ...
#> [14] <Merc-450SLC mpg="15.2" cyl="8" disp="275.8" hp="180" drat="3.07" w ...
#> [15] <Cadillac-Fleetwood mpg="10.4" cyl="8" disp="472" hp="205" drat="2. ...
#> [16] <Lincoln-Continental mpg="10.4" cyl="8" disp="460" hp="215" drat="3 ...
#> [17] <Chrysler-Imperial mpg="14.7" cyl="8" disp="440" hp="230" drat="3.2 ...
#> [18] <Fiat-128 mpg="32.4" cyl="4" disp="78.7" hp="66" drat="4.08" wt="2. ...
#> [19] <Honda-Civic mpg="30.4" cyl="4" disp="75.7" hp="52" drat="4.93" wt= ...
#> [20] <Toyota-Corolla mpg="33.9" cyl="4" disp="71.1" hp="65" drat="4.22"  ...
#> ...
@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented May 12, 2016

I am trying to whittle my problem down to something I can either (a) solve or (b) inflict on you @jimhester.

@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented May 12, 2016

I choose (b) let @jimhester figure it out. Sorry this is not very minimal, but the examples above and the tests, etc., really don't get into the bits that feel hard about my actual problem. So I present in its full glory.

https://gist.github.com/jennybc/37398c989852aa0c7abd9c5cea75e4f6

There's an R script, corresponding md, input as csv, and XML output.

@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented May 13, 2016

Some features of the XML creation style of XML that seem to be missing in what's proposed for xml2:

  • The ability to create free nodes, which is actually encouraged in XML. I liked that. Then again, I didn't like XML's memory leak problems and I suppose those are probably related 😬.
  • The ability to add lots of children at once and to define children recursively.

I think anyone translating from XML code will need to know how they're supposed to do these things with xml2. Or what to do instead.

@jimhester jimhester force-pushed the jimhester:modify-nodes branch from 63e5c1c to e632d3c May 18, 2016
Relying on the finalizer was too error prone.
@jimhester jimhester merged commit 7e980dd into r-lib:master May 18, 2016
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.