Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with nested entities in xml2 #241

Closed
HanOostdijk opened this issue Nov 26, 2018 · 0 comments
Closed

Problem with nested entities in xml2 #241

HanOostdijk opened this issue Nov 26, 2018 · 0 comments

Comments

@HanOostdijk
Copy link

When I try to handle DTD entities in xml2 my R session (both in RGui and RStudio) consistently aborts. In RStudio (1.2.1114) a dialog window appears with R Session Aborted , R encountered a fatal error, The session was terminated,Start New Session.

I enclose a session where I

  • define a DTD and a XML example file
  • read the XML with the package XML that works as expected: the field field1 contains 'Han Oostdijk Quantitative Consultancy' i.e. the resolved entity hoqc with the nested entity author defined in the DTD.
  • do the same with package xml2. I have commented out the line c1 = xml2::xml_find_all(w2,"//field1") because this is the line that causes the abortion of the R session.
  • use package xml2 but replace the line in the XML with the entity hoqc by <field1>a none-entity</field1>. This works as expected: results in the string 'a none-entity'.
  • use package xml2 but replace the line in the XML with the entity hoqc by <field1>&author;</field1>. This works as expected: results in the string 'Han Oostdijk'.

My conclusion: xml2 can not handle nested entities where XML has no problems (?)

getwd()
#> [1] "D:/data/R/default_working_directory"
dtd_info <- c(
  '<!ENTITY author "Han Oostdijk">',
  '<!ENTITY hoqc "&author; Quantitative Consultancy">',
  "<!ELEMENT records (record+)>",
  "<!ELEMENT record (field1)>",
  "<!ELEMENT field1 (#PCDATA)>"
)
xml_data <- c(
  '<?xml version="1.0" encoding="UTF-8"?>',
  '<!DOCTYPE records SYSTEM "records.dtd">',
  "<records>",
  "<record>",
  "<field1>&hoqc;</field1>",
  "</record>",
  "</records>"
)
writeLines(dtd_info, "records.dtd")
xml_data1 <- paste(xml_data, collapse = "\n")
# read with package XML
w1 <- XML::xmlParse(xml_data1, options = c(XML::DTDVALID))
print(w1)
#> <?xml version="1.0" encoding="UTF-8"?>
#> <!DOCTYPE records SYSTEM "records.dtd">
#> <records>
#>   <record>
#>     <field1>&hoqc;</field1>
#>   </record>
#> </records>
#> 
unlist(XML::xpathApply(w1, "//field1", XML::xmlValue))
#> [1] "Han Oostdijk Quantitative Consultancy"
# read with package xml2
w2 <- xml2::read_xml(xml_data1, options = c("DTDVALID"))
print(w2)
#> {xml_document}
#> <records>
#> [1] <record>\n<field1>&hoqc;</field1>\n</record>
######## next line will cause abort of session
# c1 = xml2::xml_find_all(w2,"//field1")
######## previous line will cause abort of session
# replace '&hoqc;' by 'a none-entity'
xml_data2 <- xml_data
xml_data2[5] <- "<field1>a none-entity</field1>"
xml_data2 <- paste(xml_data2, collapse = "\n")
w3 <- xml2::read_xml(xml_data2, options = c("DTDVALID"))
print(w3)
#> {xml_document}
#> <records>
#> [1] <record>\n<field1>a none-entity</field1>\n</record>
c2 <- xml2::xml_find_all(w3, "//field1")
purrr::map_chr(c2, xml2::xml_text)
#> [1] "a none-entity"
# replace '&hoqc;' by '&author;'
xml_data3 <- xml_data
xml_data3[5] <- "<field1>&author;</field1>"
xml_data3 <- paste(xml_data3, collapse = "\n")
w4 <- xml2::read_xml(xml_data3, options = c("DTDVALID"))
print(w4)
#> {xml_document}
#> <records>
#> [1] <record>\n<field1>&author;</field1>\n</record>
c3 <- xml2::xml_find_all(w4, "//field1")
purrr::map_chr(c3, xml2::xml_text)
#> [1] "Han Oostdijk"

Created on 2018-11-26 by the reprex package (v0.2.1)

Session info
devtools::session_info()
#> - Session info ----------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.0 (2018-04-23)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       Europe/Berlin               
#>  date     2018-11-26                  
#> 
#> - Packages --------------------------------------------------------------
#>  package     * version    date       lib source                     
#>  assertthat    0.2.0      2017-04-11 [1] CRAN (R 3.5.0)             
#>  backports     1.1.2      2017-12-13 [1] CRAN (R 3.5.0)             
#>  callr         2.0.4      2018-05-15 [1] CRAN (R 3.5.1)             
#>  cli           1.0.1      2018-09-25 [1] CRAN (R 3.5.1)             
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.5.0)             
#>  debugme       1.1.0      2017-10-22 [1] CRAN (R 3.5.0)             
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 3.5.0)             
#>  devtools      2.0.1      2018-10-26 [1] CRAN (R 3.5.1)             
#>  digest        0.6.17     2018-09-12 [1] CRAN (R 3.5.0)             
#>  evaluate      0.11       2018-07-17 [1] CRAN (R 3.5.1)             
#>  fs            1.2.6      2018-08-23 [1] CRAN (R 3.5.1)             
#>  glue          1.3.0      2018-07-17 [1] CRAN (R 3.5.1)             
#>  highr         0.7        2018-06-09 [1] CRAN (R 3.5.1)             
#>  htmltools     0.3.6      2017-04-28 [1] CRAN (R 3.5.0)             
#>  knitr         1.20.22    2018-11-25 [1] local                      
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.5.0)             
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.5.0)             
#>  pkgbuild      1.0.2      2018-10-16 [1] CRAN (R 3.5.1)             
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.5.1)             
#>  prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.5.0)             
#>  processx      3.1.0      2018-05-15 [1] CRAN (R 3.5.1)             
#>  purrr         0.2.5      2018-05-29 [1] CRAN (R 3.5.0)             
#>  R6            2.3.0      2018-10-04 [1] CRAN (R 3.5.1)             
#>  Rcpp          1.0.0      2018-11-07 [1] CRAN (R 3.5.1)             
#>  remotes       2.0.1      2018-10-19 [1] CRAN (R 3.5.1)             
#>  rlang         0.3.0.1    2018-10-25 [1] CRAN (R 3.5.1)             
#>  rmarkdown     1.10       2018-06-11 [1] CRAN (R 3.5.0)             
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.5.0)             
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.5.1)             
#>  stringi       1.2.4      2018-07-20 [1] CRAN (R 3.5.0)             
#>  stringr       1.3.1      2018-05-10 [1] CRAN (R 3.5.0)             
#>  testthat      2.0.0      2017-12-13 [1] CRAN (R 3.5.0)             
#>  usethis       1.4.0      2018-08-14 [1] CRAN (R 3.5.0)             
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.5.0)             
#>  xfun          0.3        2018-07-06 [1] CRAN (R 3.5.1)             
#>  XML           3.98-1.16  2018-08-19 [1] CRAN (R 3.5.1)             
#>  xml2          1.2.0.9000 2018-11-17 [1] Github (r-lib/xml2@de9781d)
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.5.1)             
#> 
#> [1] D:/tools/R/Packages
#> [2] D:/tools/R/R-3.5.0/library
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant