Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: sep argument for xml_text #152

Closed
izahn opened this issue Dec 5, 2016 · 4 comments
Closed

Feature request: sep argument for xml_text #152

izahn opened this issue Dec 5, 2016 · 4 comments

Comments

@izahn
Copy link

izahn commented Dec 5, 2016

xml_text extracts texts from a node and all child nodes. That is very convenient. It would be even more convenient if there were a way to insert a separator between the text retrieved from sibling or child nodes.

Currently

x <-  read_xml("<ul><li>this is item one</li><li>here comes item two</li></ul>")
xml_text(x)

produces
# [1] "this is item onehere comes item two"

It would be really nice if xml_text had a sep argument, so we could say (e.g.)

xml_text(x, sep = "\n")

rather than

paste(sapply(xml_children(x), xml_text), collapse = "\n")
@rentrop
Copy link

rentrop commented Dec 12, 2016

+1 This would be great. As a quick fix you could use my solution presented in tidyverse/rvest#175.

Using this on your example results in:

> html_text_collapse(x)
#[1] "this is item one\nhere comes item two"

@hadley
Copy link
Member

hadley commented Dec 12, 2016

This feels out of scope for xml2 to me. The job of xml2 is to give you low-level access to the xml tree, and this doesn't feel like that to me.

@hadley hadley closed this as completed Dec 12, 2016
@izahn
Copy link
Author

izahn commented Dec 20, 2016

It makes sense that this is out of scope for the xml2 package, but... Where is the high-level user-friendly xml parsing package? Would this feature request be welcomed for rvest::html_text?

@hadley
Copy link
Member

hadley commented Dec 20, 2016

Yes, it would be more suitable for rvest, but it seems quite special purpose to me, and doing the operation by hand is quite easy and looks natural in a pipe:

library(rvest)
library(purrr)

x <- read_xml("<root>
  <a><b>1</b> <b>2</b></a>
  <a><b>3</b></a>
</root>")

x %>% 
  xml_find_all("a") %>% 
  map_chr(. %>% xml_find_all("b") %>% xml_text() %>% paste(collapse = ", "))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants