New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml_find_first de-duplicates results -- regression? #194

Closed
nightroman opened this Issue Oct 19, 2017 · 3 comments

Comments

Projects
None yet
1 participant
@nightroman

nightroman commented Oct 19, 2017

Issue Description and Expected Result

xml_find_first seemingly de-duplicates results, though the help says
"The output is always the same size as the input":

xml_find_first returns a node if applied to a node, and a nodeset if applied
to a nodeset. The output is always the same size as the input. If there are
no matches, xml_find_first will return a missing node; if there are multiple
matches, it will return the first only.

Reproducible Example

library(xml2)

x <- read_xml(
"
<root>
<p name='p1'>
  <v name='v1'/>
  <v name='v2'/>
</p>
<p name='p2'>
  <v name='v1'/>
  <v name='v2'/>
</p>
</root>
"
)

# List of 4, OK
v <- xml_find_all(x, "/root/p/v")

# List of 2, expected 4
p <- xml_find_first(v, "..")

p is expected to have 4 nodes, as the input v.
But it has 2, i.e. the result is de-duplicated.

@nightroman

This comment has been minimized.

nightroman commented Oct 19, 2017

Here is the code that used to work with an older version of xml2

library(xml2)

x <- read_xml("NuGet.xml")
logs <- xml_find_all(x, "/*/package/version/log")

day <- Sys.Date() - 60
logs <- logs[as.Date(xml_attr(logs, "d")) > day]
version <- xml_find_one(logs, "..")
package <- xml_find_one(version, "..")

ds <- data.frame(
    Name = xml_attr(package, "name"),
    Version = xml_attr(version, "name"),
    Date = as.Date(xml_attr(logs, "d")),
    Count = sapply(
        as.numeric(xml_attr(logs, "n")),
        function(x) { if (x <= 8) 3 else log(x, 2) }
    )
)

With the latest xml2 it fails:

Error in data.frame(Name = xml_attr(package, "name"), Version = xml_attr(version,  : 
  arguments imply differing number of rows: 20, 35, 465

because version and package contain de-duplicated results unexpectedly.
They used to contain the same number of items as logs.

The code uses the obsolete xml_find_one. Using the newer xml_find_first does not change anything.

@nightroman nightroman changed the title from xml_find_first de-duplicates results? to xml_find_first de-duplicates results -- regression? Oct 19, 2017

@nightroman

This comment has been minimized.

nightroman commented Nov 3, 2017

Somewhat related SO question: https://stackoverflow.com/q/46435629/323582
OP cannot find an effective way to get node parents without de-duplication.

@jimhester jimhester closed this in 9409d38 Jan 4, 2018

@nightroman

This comment has been minimized.

nightroman commented Jan 4, 2018

Thank you!

How soon is the package usually released after a fix?

Question: is xml_parent(x) considered the same as xml_find_first(x, "..")? If yes then the similar fix is probably needed. If no then it's perhaps worth mentioning in the manual.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment