New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attach the namespace to the xml_document when it is created #89

Merged
merged 14 commits into from May 26, 2016

Conversation

Projects
None yet
4 participants
@jimhester
Member

jimhester commented May 18, 2016

Fixes #28

Needs some tests still.

Also I feel this might be confusing if a XPath query works on the document, but then doesn't work on the xml_nodeset returned.

We could pass the namespace object down to the nodeset objects as well, which might be the way to go...

@hadley

This comment has been minimized.

Member

hadley commented May 18, 2016

We could also change the default to xml_ns(document_root(x)) - then you would be stuck using the default prefixes, but that might not be so bad when combined with #35.

@jennybc do you have any use cases where you don't want to use the default prefixes?

@jimhester

This comment has been minimized.

Member

jimhester commented May 18, 2016

Yeah I like a default of xml_ns(xml_root(x)), will change.

@jennybc

This comment has been minimized.

Member

jennybc commented May 18, 2016

No, I almost always use the default prefixes.

I see that I do make some use of xml2::xml_ns_rename() to give a new alias to the default namespace. I think that's due to something annoying about the Sheets API?

I don't understand why attaching the namespace to the xml document means we're stuck with the default prefixes? Can we not still rename them or add new prefixes?

I just know that this is what every call to xml_find_XXX() I have ever written looks like:

xml2::xml_find_XXX(x, xpath, ns = xml2::xml_ns(x))

and I wish it was implied that, until I say otherwise, I want to query x with its own namespace.

@hadley

This comment has been minimized.

Member

hadley commented May 21, 2016

@jennybc what I'm thinking is that xml_find_xxx would get a default value of ns = xml_ns(x). If you're happy with the default prefixes, that'd mean you could just do xml_find_XXX(x, xpath) everywhere. If you didn't like the default prefixes, you'd still need to do xml_find_XXX(x, xpath, ns = my_ns). (In other words, this will make your life much easier if you rely on the default prefixes, but it won't get better if you're customising them).

Does that sound reasonable? Or do you want a system for modifying the prefixes for a document once and for all? (that's possible but would be a decent amount of work so I'd rather avoid if the proposed compromise works for you)

@@ -62,23 +62,23 @@
#' ')
#' xml_find_all(x, ".//f:doc")
#' xml_find_all(x, ".//f:doc", xml_ns(x))
xml_find_all <- function(x, xpath, ns = character()) {
xml_find_all <- function(x, xpath, ns = xml_ns(xml_root(x))) {

This comment has been minimized.

@hadley

hadley May 21, 2016

Member

I think this would be better as ns = xml_ns(x), i.e. the logic for finding the root should be in xml_ns() (so we should probably make it generic)

@jennybc

This comment has been minimized.

Member

jennybc commented May 21, 2016

The proposed solution makes sense and would eliminate a great deal of fussiness. That alone would be awesome.

I'm pretty sure the stuff in this gist is not on the table here, but I'll link just in case. It definitely would require something along the lines of 'modifying the prefixes for a document once and for all'.

@hadley

This comment has been minimized.

Member

hadley commented May 22, 2016

Hmmmmm, @jimhester how hard would it be to write a function that simply removes all the default namespace specifications from the tree?

(@jennybc I think that gist is slightly too simple since you're talking about a "a" default namespace when in fact there might be multiple, but I suspect that's unlikely to arise very often in practice. Simply stripping all default namespaces seems like a nice hack that will help most people).

@jennybc

This comment has been minimized.

Member

jennybc commented May 22, 2016

Oh, I have never had the pleasure of multiple default namespaces. Something to look forward to! Does that change anything re: the safety or feasibility of removing them? One risk I thought of: people can puzzle themselves if they write this de-namespaced XML back out, when it should have the same namespaces as the original. E.g. mutating XML coming from / going back to an API.

That said, if there was a way to make default namespaces disappear, I would gladly sign some sort of waiver.

@hadley

This comment has been minimized.

Member

hadley commented May 22, 2016

Namespaces were designed to let you combine xml documents where (say) in one document <board> refers to a plank of wood, and in another <board> refers to the people running the company. If you have a document that where "board" means different things in different parts of the document, you must use namespaces as using the name of the node is not enough (that's why xpath works the way it does - it's being excessively cautious so that you never accidentally confuse same-named nodes). However, I don't think this actually crops up very often in practice (and you'd normally be able to work around it by specifying a parent node), so I think for 95% of uses, wiping out the namespaces would make life much easier with little risk of problems.

@jimhester

This comment has been minimized.

Member

jimhester commented May 23, 2016

Yes this would be possible. There are functions to remove the namespace declaration node and then reconcile the rest of the namespaces on the tree.

If you wanted to include the namespace at the end you would have to add it back manually.

I will work on it!

@jennybc

This comment has been minimized.

Member

jennybc commented May 23, 2016

@jimhester Pardon me if I misunderstand. You will remove all namespaces? Or just the default one? My frustration is focused only on the default one(s). The kind that has no associated prefix (yet requires user to create a prefix in order to write XPath queries).

@jimhester

This comment has been minimized.

Member

jimhester commented May 23, 2016

@jennybc Well xml2 isn't going to do anything by default. But we should support modifying namespace definitions including removing them, so you will be able to remove the default namespace from a node if you would prefer that behavior.

@jimhester

This comment has been minimized.

Member

jimhester commented May 24, 2016

As of c2e65a4 you can now add and remove namespaces.

x <- read_xml('
<catalog xmlns="http://www.edankert.com/examples/">
  <cd>
    <artist>Sufjan Stevens</artist>
    <title>Illinois</title>
    <src>http://www.sufjan.com/</src>
  </cd>
  <cd>
    <artist>Stoat</artist>
    <title>Future come and get me</title>
    <src>http://www.stoatmusic.com/</src>
  </cd>
  <cd>
    <artist>The White Stripes</artist>
    <title>Get behind me satan</title>
    <src>http://www.whitestripes.com/</src>
  </cd>
</catalog>')

old <- as.character(x)

xml_attr(x, "xmlns") <- NULL

xml_find_all(x, "//cd")
#> {xml_nodeset (3)}
#> [1] <cd>\n  <artist>Sufjan Stevens</artist>\n  <title>Illinois</title>\n ...
#> [2] <cd>\n  <artist>Stoat</artist>\n  <title>Future come and get me</tit ...
#> [3] <cd>\n  <artist>The White Stripes</artist>\n  <title>Get behind me s ...

xml_attr(x, "xmlns") <- "http://www.edankert.com/examples/"

new <- as.character(x)

identical(old, new)
#> [1] TRUE

There is an error on travis which only seems to occur when running R CMD check on a similar machine to travis. I can reproduce it using a vagrant VM but not when running the tests interactively or using R CMD BATCH.

@hadley

This comment has been minimized.

Member

hadley commented May 24, 2016

Could we maybe have xml_strip_ns() too?

@jimhester

This comment has been minimized.

Member

jimhester commented May 24, 2016

Does xml_string_ns() strip only default namespaces or all namespaces and does it work only on the root node or does it need to search all children for namespace declarations as well?

Assign substring to a variable
This should ensure it sticks around until it is used
@hadley

This comment has been minimized.

Member

hadley commented May 25, 2016

It would recursively strip default namespaces - I think that's the most useful case.

@jimhester

This comment has been minimized.

Member

jimhester commented May 25, 2016

As of 2a2ced5 you can now use xml_strip_ns() to strip all of the default namespaces from a document.

  x <- read_xml(system.file(package = "xml2", "tests/testthat/ns-multiple-default.xml"))
  x
#> {xml_document}
#> <root>
#> [1] <doc1 xmlns="http://foo.com">\n  <bar/>\n</doc1>
#> [2] <doc2 xmlns="http://bar.com">\n  <bar/>\n</doc2>
  ns <- unclass(xml_ns(x))
  expect_equal(ns, c(d1 = "http://foo.com", d2 = "http://bar.com"))
  expect_equal(length(xml_find_all(x, "//bar")), 0)

  xml_strip_ns(x)
  x
#> {xml_document}
#> <root>
#> [1] <doc1>\n  <bar/>\n</doc1>
#> [2] <doc2>\n  <bar/>\n</doc2>
  ns <- unclass(xml_ns(x))

  expect_equivalent(ns, character())
  expect_equal(length(xml_find_all(x, "//bar")), 2)
@codecov-io

This comment has been minimized.

codecov-io commented May 25, 2016

Current coverage is 59.48%

Merging #89 into master will increase coverage by 1.45%

@@             master        #89   diff @@
==========================================
  Files            30         30          
  Lines          1284       1345    +61   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            745        800    +55   
- Misses          539        545     +6   
  Partials          0          0          

Powered by Codecov. Last updated by 248e9fb...2e2c681

#' Strip the default namespaces from a document
#'
#' @inheritParams xml_name

This comment has been minimized.

@hadley

hadley May 25, 2016

Member

Missing @export

@hadley

This comment has been minimized.

Member

hadley commented May 25, 2016

  • Need to cross-reference xml_ns_remove() from the xpath functions
  • xml_ns_remove() needs a couple of examples
  • Needs bullet point in NEWS.md
@jennybc

This comment has been minimized.

Member

jennybc commented May 25, 2016

When I use xml2 from this PR, read_xml() prints all my XML to screen.

@jimhester

This comment has been minimized.

Member

jimhester commented May 26, 2016

@jennybc can you give an example of the output you are seeing and what you were expecting to see?

@jennybc

This comment has been minimized.

Member

jennybc commented May 26, 2016

@jimhester See this gist. I was expecting to see nothing, which is what happens when I install xml2 from this repo, but not this PR.

jimhester added some commits May 26, 2016

Remove str.xml_node
As it prints debugging output on Windows when reading files.
@jimhester

This comment has been minimized.

Member

jimhester commented May 26, 2016

@jennybc This behavior should be gone with 44c89cf. I had enabled a debug flag to include the debug file to dump output for str.xml_node, but doing so also turns on node dumping results when you read a file.

Interestingly I could only reproduce your results on Windows, perhaps because it uses a newer version of libxml2 by default than I have on my OS X machine.

Anyway I removed that code, so this should now be fixed.

@jimhester jimhester merged commit c93169a into r-lib:master May 26, 2016

3 checks passed

codecov/patch 77.34% of diff hit (target 58.02%)
Details
codecov/project 59.48% (+1.45%) compared to 248e9fb
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jennybc

This comment has been minimized.

Member

jennybc commented May 26, 2016

Yes, it's fixed! I upgraded to Xcode 7.3.1 earlier this month, in case that sheds any light.

@jennybc

This comment has been minimized.

Member

jennybc commented May 26, 2016

The default namespace now comes along when you use xml_attrs(), which breaks some things for me. Is that how this is supposed to work? Should I open a new issue about this?

@jennybc

This comment has been minimized.

Member

jennybc commented May 27, 2016

Example

library(xml2)
x <- read_xml('
  <root xmlns="http://default.com" count="80" uniqueCount="33">
    <item>one</item>
    <item>two</item>
  </root>')

With 248e9fb

xml_attrs(x)
#> count uniqueCount 
#> "80"        "33" 
as.integer(xml_attrs(x))
#> [1] 80 33

With c93169a

xml_attrs(x)
#>                count          uniqueCount                xmlns 
#>                 "80"                 "33" "http://default.com"
as.integer(xml_attrs(x))
#> Warning: NAs introduced by coercion
#> [1] 80 33 NA
@jimhester

This comment has been minimized.

Member

jimhester commented May 27, 2016

Returning the namespace definitions as an attribute is how it is supposed to be working.

@hadley

This comment has been minimized.

Member

hadley commented May 27, 2016

In other words, it was a bug that they weren't returned previously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment