-
Notifications
You must be signed in to change notification settings - Fork 6
Simplify argument tables in help pages #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9189b5a to
a365552
Compare
683a5e4 to
fd563e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jeanchristophe13v, this is definitely a good idea and the extra table formatting elements is an unfortunate accident!
As you can see from the snapshot that changed, the intention was to have arguments presented as markdown tables, but the problem is that when arguments have descriptions that include more than one paragraph, pandoc can't convert the table into a simple markdown table and instead uses raw HTML.
My preference would be to re-format the arguments table to use headings for each argument, as that will naturally support argument descriptions regardless of content.
The final result should look something like this:
#### Arguments
##### `.x`
A list or atomic vector.
##### `.f`
A function, specified in one of the following ways: ...(Note that btw_tool_docs_help_page() shifts heading levels up by one, so these are h3 and h4 headings in the HTML source.)
Also, I'd prefer that simplify_help_tables be named something like simplify_help_page_arguments() and live in R/tool-docs.R. In that vein, we should also be careful that we're only finding the arguments table in the arguments sub-section and we shouldn't modify any other tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I tested my original flat format with purrr::map's .f argument and found it was merging all the bullet points into one paragraph. I read some papers and confirmed presenting options as bullet points generally outperforms using plain descriptions[1].
I compared three approaches:
- Current flat format (~339 tokens) - but loses list structure
- Improved flat format (~341 tokens) - preserves paragraphs but still flattens lists
- Your heading format (~415 tokens) - preserves everything
The heading format uses about 22% more tokens, but I think it's worth it to keep those bullet points intact.
Changes made:
- Renamed
simplify_help_tables()→simplify_help_page_arguments()and moved toR/tool-docs.Rcontext - Now only targets the Arguments section (uses
//h3[normalize-space(text())='Arguments']since R help HTML doesn't set id attributes) - Uses
<h3>tags that become####after the shift - Preserves full HTML structure including lists, multiple paragraphs, code blocks, etc.
All tests pass now. Let me know if there's anything else that needs adjusting.
[1] Han, Y., Wu, Y., & Willard, J. (2025). Effect of Selection Format on LLM Performance. arXiv preprint. https://doi.org/10.48550/arXiv.2503.06926
gadenbuie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @jeanchristophe13v!
Two notes for future reference:
-
I appreciate the pull requests! But it's still useful to start with an issue so we can talk through the approach. I don't mind that you started with a PR here, as long as you're okay with me potentially asking for some larger changes or recommending an entirely different approach. Just something to keep in mind.
-
It's best not to create pull requests from the
mainbranch of your fork. usethis has some excellent helper functions for managing pull requests, and I highly recommend the usethis workflow. Using feature branches keeps yourmainbranch clean and makes it easier to stay up to date. Especially if the PR is squash-merged (squashed into a single commit when merged).
These are both minor things; I appreciate your contributions!
|
@gadenbuie I really appreciate your guidance on best practices. I’ll take time to learn more about contributing to R open-source projects and proper GitHub workflows to avoid these issues in the future. Thanks for your contributions and for being so understanding! |
Issue & Why it matters
When retrieving help pages with
btw_tool_docs_help_page(), the Arguments section contains verbose HTML table markup fromtools::Rd2HTML(). This bloats token usage unnecessarily. For example,purrr::maphelp page has the Arguments section consume ~630 tokens, with lots of presentational markup like<table role="presentation">,<colgroup>,<tr>,<td>, etc.To implement the concept of context engineering, we need to solve this problem to optimize token usage more effectively. Although this change is minor and the tokens saved may be negligible, it still embodies the essence of context engineering and makes the help doc more readable for both humans and AI.
Solution
Added a helper function
simplify_help_tables()that extracts semantic content from argument tables and converts them to simple paragraph format before pandoc conversion. It:xml2::read_html()table[role="presentation"])`param`: descriptionResult: ~27-30% token reduction in Arguments sections, while other sections (Description, Usage, Value, Examples) remain unchanged.
Testing
Tested with 15 functions from popular packages (ggplot2, dplyr, tidyr, purrr, readr, base, stats, utils). The temporary test scripts was included in
inst/examples/demo_token_savings.R. After running it to directly see the difference, just delete it :)Results:
purrr::map(46%),dplyr::mutate(42%)Before:
After: