Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding of XML created de novo #142

Closed
jennybc opened this issue Oct 27, 2016 · 2 comments
Closed

encoding of XML created de novo #142

jennybc opened this issue Oct 27, 2016 · 2 comments

Comments

@jennybc
Copy link
Member

jennybc commented Oct 27, 2016

When I create XML de novo, the encoding is not explicitly set to UTF-8 when I send the document through as.character() or write_xml(), even thought the text I'm putting into the nodes is. Am I missing a way to do that? This happens automatically when XML is created by read_xml().

library(xml2)

xml <- xml_new_document() %>%
  xml_add_child("root", name = "people") %>%
  xml_root()
Encoding(gabor <- "Gábor Csárdi")
#> [1] "UTF-8"
Encoding(maelle <- "Maëlle Salmon")
#> [1] "UTF-8"
xml_add_child(xml, "person", "Gábor Csárdi")
#> {xml_node}
#> <person>
y <- xml_add_child(xml, "person")
xml_text(y) <- maelle
xml
#> {xml_document}
#> <root name="people">
#> [1] <person>Gábor Csárdi</person>
#> [2] <person>Maëlle Salmon</person>
as.character(xml)
#> [1] "<?xml version=\"1.0\"?>\n<root name=\"people\"><person>G&#xE1;bor Cs&#xE1;rdi</person><person>Ma&#xEB;lle Salmon</person></root>\n"
xml3 <- read_xml("<root name = 'people'><person>Gábor Csárdi</person><person>Maëlle Salmon</person></root>")
as.character(xml3)
#> [1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<root name=\"people\"><person>Gábor Csárdi</person><person>Maëlle Salmon</person></root>\n"
@jennybc
Copy link
Member Author

jennybc commented Oct 27, 2016

A workaround is to create the document + root node via read_xml(), then install child nodes as above.

library(xml2)
xml <- read_xml("<people></people>")
xml_add_child(xml, "person", "Gábor Csárdi")
#> {xml_node}
#> <person>
xml_add_child(xml, "person", "Maëlle Salmon")
#> {xml_node}
#> <person>
as.character(xml)
#> [1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<people><person>Gábor Csárdi</person><person>Maëlle Salmon</person></people>\n"

@hadley
Copy link
Member

hadley commented Dec 12, 2016

The default encoding of an xml doc is UTF-8, so I wouldn't have thought you needed to specified it. @jimhester maybe we just need to tell xml2? I'm fine with making it difficult to make xml files that are not utf-8 encoded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants