Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

returning a list of nodesets #311

Closed
jakejh opened this issue Jul 11, 2020 · 3 comments · Fixed by #312
Closed

returning a list of nodesets #311

jakejh opened this issue Jul 11, 2020 · 3 comments · Fixed by #312

Comments

@jakejh
Copy link
Contributor

jakejh commented Jul 11, 2020

Hello, I have a use case in which I want to run xml_find_all on a nodeset, but have the result be a list of nodesets (one element for each node) instead of one flattened nodeset. In this use case, it's important that I know which result node came from which original node. In my testing it's way faster to have this functionality within the xml2 package, since it can directly call C in the lapply, than to call xml_find_all in a for or foreach loop. I can imagine a couple ways to do this.

  1. Extend the xml_find_all.xml_nodeset method by adding an unlist argument. This would break the convention that "xml_find_all always returns a nodeset", but seems cleaner to me. Something like below:
xml_find_all.xml_nodeset <- function(x, xpath, ns = xml_ns(x), unlist = TRUE) {
  if (length(x) == 0)
    return(xml_nodeset())

  if (isTRUE(unlist)) {
    nodes <- unlist(recursive = FALSE,
                    lapply(x, function(x)
                      .Call(xpath_search, x$node, x$doc, xpath, ns, Inf)))

    nodes <- xml_nodeset(nodes)

  } else {
    nodes <- lapply(x, function(x)
      xml_nodeset(.Call(xpath_search, x$node, x$doc, xpath, ns, Inf)))
  }

  return(nodes)
}
  1. Add a separate function called xml_find_all_list, something like below:
xml_find_all_list <- function(x, xpath, ns = xml_ns(x)) {
  if (length(x) == 0)
    return(xml_nodeset())

  lapply(x, function(x)
    xml_nodeset(.Call(xpath_search, x$node, x$doc, xpath, ns, Inf)))
}

I'm happy to make a PR, just let me know how to proceed. Thanks.

@jimhester
Copy link
Member

What about calling the argument flatten with an implementation like the following?

xml_find_all.xml_nodeset <- function (x, xpath, ns = xml_ns(x), flatten = TRUE) {
  if (length(x) == 0)
    return(xml_nodeset())

  res <- lapply(x, function(x) .Call(xpath_search, x$node, x$doc, xpath, ns, Inf))

  if (isTRUE(flatten)) {
    return(xml_nodeset(unlist(recursive = FALSE, res)))
  }

  res[] <- lapply(res, xml_nodeset)
  res
}

@jakejh
Copy link
Contributor Author

jakejh commented Jul 13, 2020

I like the minimal redundancy. I initially considered something similar, but didn't know if calling lapply twice could slow the function down. Now that I've actually tested it though, the timings are virtually identical, at least on my xml files.

Looks like you have this under control. I'm sure you can add documentation and unit tests faster than I can, but let me know how I can help.

@jimhester
Copy link
Member

If you want this feature please open a PR including the documentation and unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants