Simplify argument tables in help pages #123

jeanchristophe13v · 2025-11-17T03:54:10Z

Issue & Why it matters

When retrieving help pages with btw_tool_docs_help_page(), the Arguments section contains verbose HTML table markup from tools::Rd2HTML(). This bloats token usage unnecessarily. For example, purrr::map help page has the Arguments section consume ~630 tokens, with lots of presentational markup like <table role="presentation">, <colgroup>, <tr>, <td>, etc.

To implement the concept of context engineering, we need to solve this problem to optimize token usage more effectively. Although this change is minor and the tokens saved may be negligible, it still embodies the essence of context engineering and makes the help doc more readable for both humans and AI.

Solution

Added a helper function simplify_help_tables() that extracts semantic content from argument tables and converts them to simple paragraph format before pandoc conversion. It:

Parses HTML with xml2::read_html()
Finds argument tables (table[role="presentation"])
Extracts parameter names and descriptions
Converts to: `param`: description

Result: ~27-30% token reduction in Arguments sections, while other sections (Description, Usage, Value, Examples) remain unchanged.

Testing

Tested with 15 functions from popular packages (ggplot2, dplyr, tidyr, purrr, readr, base, stats, utils). The temporary test scripts was included in inst/examples/demo_token_savings.R. After running it to directly see the difference, just delete it :)

Results:

Average reduction: 27%
Total tokens saved: ~5,100 across test cases
Best cases: purrr::map (46%), dplyr::mutate (42%)

Before:

#### Arguments

<table role="presentation">
<tr>
<td><code id=".x">.x</code></td>
<td><p>A list or atomic vector.</p></td>
</tr>
...

After:

#### Arguments

`.x`: A list or atomic vector.

`.f`: A function, specified in one of the following ways: ...

…w as MCP server

gadenbuie

Thanks @jeanchristophe13v, this is definitely a good idea and the extra table formatting elements is an unfortunate accident!

As you can see from the snapshot that changed, the intention was to have arguments presented as markdown tables, but the problem is that when arguments have descriptions that include more than one paragraph, pandoc can't convert the table into a simple markdown table and instead uses raw HTML.

My preference would be to re-format the arguments table to use headings for each argument, as that will naturally support argument descriptions regardless of content.

The final result should look something like this:

#### Arguments

##### `.x`

A list or atomic vector.

##### `.f`

A function, specified in one of the following ways: ...

(Note that btw_tool_docs_help_page() shifts heading levels up by one, so these are h3 and h4 headings in the HTML source.)

Also, I'd prefer that simplify_help_tables be named something like simplify_help_page_arguments() and live in R/tool-docs.R. In that vein, we should also be careful that we're only finding the arguments table in the arguments sub-section and we shouldn't modify any other tables.

jeanchristophe13v

Thanks for the feedback! I tested my original flat format with purrr::map's .f argument and found it was merging all the bullet points into one paragraph. I read some papers and confirmed presenting options as bullet points generally outperforms using plain descriptions[1].

I compared three approaches:

Current flat format (~339 tokens) - but loses list structure
Improved flat format (~341 tokens) - preserves paragraphs but still flattens lists
Your heading format (~415 tokens) - preserves everything

The heading format uses about 22% more tokens, but I think it's worth it to keep those bullet points intact.

Changes made:

Renamed simplify_help_tables() → simplify_help_page_arguments() and moved to R/tool-docs.R context
Now only targets the Arguments section (uses //h3[normalize-space(text())='Arguments'] since R help HTML doesn't set id attributes)
Uses <h3> tags that become #### after the shift
Preserves full HTML structure including lists, multiple paragraphs, code blocks, etc.

All tests pass now. Let me know if there's anything else that needs adjusting.

[1] Han, Y., Wu, Y., & Willard, J. (2025). Effect of Selection Format on LLM Performance. arXiv preprint. https://doi.org/10.48550/arXiv.2503.06926

gadenbuie

Thanks again @jeanchristophe13v!

Two notes for future reference:

I appreciate the pull requests! But it's still useful to start with an issue so we can talk through the approach. I don't mind that you started with a PR here, as long as you're okay with me potentially asking for some larger changes or recommending an entirely different approach. Just something to keep in mind.
It's best not to create pull requests from the main branch of your fork. usethis has some excellent helper functions for managing pull requests, and I highly recommend the usethis workflow. Using feature branches keeps your main branch clean and makes it easier to stay up to date. Especially if the PR is squash-merged (squashed into a single commit when merged).

These are both minor things; I appreciate your contributions!

jeanchristophe13v · 2025-11-17T20:51:33Z

@gadenbuie I really appreciate your guidance on best practices. I’ll take time to learn more about contributing to R open-source projects and proper GitHub workflows to avoid these issues in the future.

Thanks for your contributions and for being so understanding!

jeanchristophe13v and others added 7 commits November 9, 2025 20:08

fix(vignettes): Remove base64 images to reduce context when using bt…

32a3db5

…w as MCP server

refactor(vignettes): Simplify remove_base64_images()

c277e08

refactor(utils): Apply code review suggestions and add tests

87bd370

refactor: Rewrite tests using describe/it style

744a5d6

Merged origin/main into jeanchristophe13v-main

a202b30

chore: Add news item

daf3dbc

Merge branch 'posit-dev:main' into main

eb3e5b7

jeanchristophe13v force-pushed the main branch from 9189b5a to a365552 Compare November 17, 2025 04:14

perf(docs_help_page): Simplify argument tables in help pages

fd563e6

jeanchristophe13v force-pushed the main branch 3 times, most recently from 683a5e4 to fd563e6 Compare November 17, 2025 04:41

gadenbuie requested changes Nov 17, 2025

View reviewed changes

refactor(docs_help_page): use heading format for arguments

6e6470d

jeanchristophe13v commented Nov 17, 2025

View reviewed changes

gadenbuie added 5 commits November 17, 2025 13:17

refactor: xml_from_html() helper

30d08ac

refactor: Move implementation to R/tool-docs.R

865ee8b

chore: Update snapshots

a30cc0f

tests: Refactor and move arg simplification tests to tool-docs file

51085fe

chore: Add NEWS item

c5b56cf

gadenbuie approved these changes Nov 17, 2025

View reviewed changes

gadenbuie merged commit bea5b96 into posit-dev:main Nov 17, 2025
7 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify argument tables in help pages #123

Simplify argument tables in help pages #123

Uh oh!

jeanchristophe13v commented Nov 17, 2025 •

edited

Loading

Uh oh!

gadenbuie left a comment •

edited

Loading

Uh oh!

jeanchristophe13v left a comment •

edited by gadenbuie

Loading

Uh oh!

gadenbuie left a comment

Uh oh!

Uh oh!

jeanchristophe13v commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Simplify argument tables in help pages #123

Simplify argument tables in help pages #123

Uh oh!

Conversation

jeanchristophe13v commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue & Why it matters

Solution

Testing

Uh oh!

gadenbuie left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeanchristophe13v left a comment • edited by gadenbuie Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gadenbuie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeanchristophe13v commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeanchristophe13v commented Nov 17, 2025 •

edited

Loading

gadenbuie left a comment •

edited

Loading

jeanchristophe13v left a comment •

edited by gadenbuie

Loading