Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml_text does not trim "&nbsp" #151

Closed
rentrop opened this issue Nov 30, 2016 · 3 comments
Closed

xml_text does not trim "&nbsp" #151

rentrop opened this issue Nov 30, 2016 · 3 comments

Comments

@rentrop
Copy link

@rentrop rentrop commented Nov 30, 2016

In my html-code i have &nbsp (i.e. non non-breaking space). If i try to get the the text via xml_text(..., trim=TRUE) it returns the non-breaking space instead of an empty string.
Is this a feature? IMHO the expected behavior would be to return an empty string...

Minimal-Example:

require(xml2)
space <- rawToChar(as.raw(c(0xc2, 0xa0)))
doc <- read_xml(paste0('<td style="text-align:left;">', space, '</td>'))
xml_text(doc, trim = TRUE) == "" # FALSE
charToRaw(xml_text(doc, trim = TRUE)) #[1] c2 a0

Workaround:
stringi::stri_trim_both(xml_text(doc, trim = TRUE)) or stringr::str_trim

@rentrop rentrop changed the title xml_text does not delete &nbsp xml_text does not trim "&nbsp" Dec 1, 2016
@jimhester jimhester closed this in 13ec091 Dec 6, 2016
@rentrop
Copy link
Author

@rentrop rentrop commented Dec 15, 2016

@jimhester thanks for getting on this so fast. Unfortunately this fix opens another error:

Take this example:

devtools::install_github("hadley/xml2") # Version 1.0.0.9002
require(xml2)
doc <- read_html('<td>31.12.2010&nbsp;<br>€
                 </td>')
text_nodes <- xml_find_all(doc, ".//text()[normalize-space()]")
xml_text(text_nodes, trim = TRUE) # "31.12.2010" ""

So xml_text now removes the -sign.

The expected result would be:

stringi::stri_trim_both(text_nodes) # "31.12.2010" "€"
@jimhester jimhester reopened this Dec 15, 2016
@jimhester jimhester closed this in 0eaa61c Dec 15, 2016
@jimhester
Copy link
Member

@jimhester jimhester commented Dec 15, 2016

Ok I refactored how this was being done, thanks for the reproducible example.

@rentrop
Copy link
Author

@rentrop rentrop commented Dec 15, 2016

Perfect, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants