New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding of XML created de novo #142

Closed
jennybc opened this Issue Oct 27, 2016 · 2 comments

Comments

Projects
None yet
2 participants
@jennybc
Member

jennybc commented Oct 27, 2016

When I create XML de novo, the encoding is not explicitly set to UTF-8 when I send the document through as.character() or write_xml(), even thought the text I'm putting into the nodes is. Am I missing a way to do that? This happens automatically when XML is created by read_xml().

library(xml2)

xml <- xml_new_document() %>%
  xml_add_child("root", name = "people") %>%
  xml_root()
Encoding(gabor <- "Gábor Csárdi")
#> [1] "UTF-8"
Encoding(maelle <- "Maëlle Salmon")
#> [1] "UTF-8"
xml_add_child(xml, "person", "Gábor Csárdi")
#> {xml_node}
#> <person>
y <- xml_add_child(xml, "person")
xml_text(y) <- maelle
xml
#> {xml_document}
#> <root name="people">
#> [1] <person>Gábor Csárdi</person>
#> [2] <person>Maëlle Salmon</person>
as.character(xml)
#> [1] "<?xml version=\"1.0\"?>\n<root name=\"people\"><person>G&#xE1;bor Cs&#xE1;rdi</person><person>Ma&#xEB;lle Salmon</person></root>\n"
xml3 <- read_xml("<root name = 'people'><person>Gábor Csárdi</person><person>Maëlle Salmon</person></root>")
as.character(xml3)
#> [1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<root name=\"people\"><person>Gábor Csárdi</person><person>Maëlle Salmon</person></root>\n"
@jennybc

This comment has been minimized.

Member

jennybc commented Oct 27, 2016

A workaround is to create the document + root node via read_xml(), then install child nodes as above.

library(xml2)
xml <- read_xml("<people></people>")
xml_add_child(xml, "person", "Gábor Csárdi")
#> {xml_node}
#> <person>
xml_add_child(xml, "person", "Maëlle Salmon")
#> {xml_node}
#> <person>
as.character(xml)
#> [1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<people><person>Gábor Csárdi</person><person>Maëlle Salmon</person></people>\n"

jennybc added a commit to jennybc/xml2 that referenced this issue Oct 27, 2016

@hadley

This comment has been minimized.

Member

hadley commented Dec 12, 2016

The default encoding of an xml doc is UTF-8, so I wouldn't have thought you needed to specified it. @jimhester maybe we just need to tell xml2? I'm fine with making it difficult to make xml files that are not utf-8 encoded.

@jimhester jimhester closed this in b04ab24 Dec 15, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment