Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gt Word escaping potentially invalid chars #1016

Merged
merged 17 commits into from Aug 23, 2022

Conversation

thebioengineer
Copy link
Collaborator

Summary

Thank you for contributing to gt! To make this process easier for everyone, please explain the context and purpose of your contribution. Also, list the changes made to the existing code or documentation.

I found that if there were carrots (>,<) in the text of cells, it would cause the document not to render correctly. Adding this in escapes this and other potentially impactful symbols using htmltools::htmlEscape. If you can come up with something else I am all ears, because this will prevent people from passing their own valid ooxml code through gt to word.

Also realized the code to add a summary rows added the bottom row lines after the first row.

Related GitHub Issues and PRs

  • Ref: # N/A

Checklist

@rich-iannone rich-iannone self-requested a review August 22, 2022 18:45
@thebioengineer
Copy link
Collaborator Author

Hey @rich-iannone , I'd appreciate your thoughts on using htmlEscape vs a more targeted solution

@rich-iannone
Copy link
Member

@thebioengineer I think using htmlEscape() here is the right choice.

I've lightly re-styled the .R file and merged in the dev branch. However, from that, a test is failing. On my machine, anyway. Could you pull from the changes here, make some manual tests, and fix if necessary? Seems fine on CI so I wonder if one of my packages is out of date. (I'll keep trying to figure this out in the meantime.)

@thebioengineer
Copy link
Collaborator Author

I think you are getting a failing test locally but not on CI because there are skips of tests on CI and CRAN, but not locally.

There is now a bug in that because cells have footnote tags added before they are processed by xml_t(). Since the footnote tags are written in as a string representing xml code, the string is escaped as text. I need to think a moment on how to fix this

@thebioengineer
Copy link
Collaborator Author

I had an idea...there is the process_text() function that blindly returns text (or processes the markdown text to then normal text). I can apply the htmlEscape there and that is honestly probably a better location than in xml_t() - which makes more sense to not modify text

@rich-iannone
Copy link
Member

Nice! I'll check out the branch and test locally here as well.

@rich-iannone
Copy link
Member

Perhaps it hasn't been addressed yet (or maybe it's different on your system) but I get a test error with this statement (line 749 of test-as_word.R):

  expect_equal(
    docx_table_body_header %>%
      xml2::xml_find_all(".//w:p") %>%
      xml2::xml_text(),
    c(
      "numtrue1false", "chartrue2false", "fctr", "date", "time",
      "datetime", "currency", "row", "group"
    )
  )

Error output is:

Failure (test-as_word.R:749:3): tables with footnotes can be added to a word doc
docx_table_body_header %>% xml2::xml_find_all(".//w:p") %>% xml2::xml_text() (`actual`) not equal to c(...) (`expected`).

actual[1:5] vs expected[1:5]
- "num<w:rPr><w:vertAlign w:val=\"superscript\"></w:vertAlign><w:i>true</w:i><w:t xml:space=\"default\">1</w:t><w:i>false</w:i><w:vertAlign w:val=\"baseline\"></w:vertAlign></w:rPr>"
+ "numtrue1false"
- "char<w:rPr><w:vertAlign w:val=\"superscript\"></w:vertAlign><w:i>true</w:i><w:t xml:space=\"default\">2</w:t><w:i>false</w:i><w:vertAlign w:val=\"baseline\"></w:vertAlign></w:rPr>"
+ "chartrue2false"
  "fctr"
  "date"
  "time"

@thebioengineer
Copy link
Collaborator Author

Perhaps it hasn't been addressed yet (or maybe it's different on your system) but I get a test error with this statement (line 749 of test-as_word.R):

  expect_equal(
    docx_table_body_header %>%
      xml2::xml_find_all(".//w:p") %>%
      xml2::xml_text(),
    c(
      "numtrue1false", "chartrue2false", "fctr", "date", "time",
      "datetime", "currency", "row", "group"
    )
  )

Error output is:

Failure (test-as_word.R:749:3): tables with footnotes can be added to a word doc
docx_table_body_header %>% xml2::xml_find_all(".//w:p") %>% xml2::xml_text() (`actual`) not equal to c(...) (`expected`).

actual[1:5] vs expected[1:5]
- "num<w:rPr><w:vertAlign w:val=\"superscript\"></w:vertAlign><w:i>true</w:i><w:t xml:space=\"default\">1</w:t><w:i>false</w:i><w:vertAlign w:val=\"baseline\"></w:vertAlign></w:rPr>"
+ "numtrue1false"
- "char<w:rPr><w:vertAlign w:val=\"superscript\"></w:vertAlign><w:i>true</w:i><w:t xml:space=\"default\">2</w:t><w:i>false</w:i><w:vertAlign w:val=\"baseline\"></w:vertAlign></w:rPr>"
+ "chartrue2false"
  "fctr"
  "date"
  "time"

Yes, this should be addressed now...it was because of htmlEscape being applied at the xml_t() location, instead of escaping the text earlier.

Copy link
Member

@rich-iannone rich-iannone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@rich-iannone rich-iannone merged commit 4a819ed into rstudio:master Aug 23, 2022
@thebioengineer thebioengineer deleted the gt_word_bug_fixes branch November 16, 2022 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants