Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api-documenter] Regression in Table Cells escaping with latest version #4586

Open
Lightning00Blade opened this issue Mar 18, 2024 · 9 comments
Labels
bug Something isn't working as intended priority The maintainers consider it to be an important issue. We should try to fix it soon.

Comments

@Lightning00Blade
Copy link

Lightning00Blade commented Mar 18, 2024

Summary

Then running generation of docs I want to have custom escaping for Table cells.

Repro steps

Create a SpecialCustomMarkdownEmitter that extends CustomMarkdownEmitter.
Provide getTableEscapedText with custom escape mechanism:

export class CustomMarkdownEmitter extends ApiFormatterMarkdownEmitter {
    getTableEscapedText(text) {
        // Remove special notes as they can't be rendered in the table view
        return text.replace(/^:::$/g, '').replace(/^:::[A-Za-z]+$/g, '');
    }
}

Expected:
The TableCell generated don't include ::: or text starting with :::
Actual:
The TableCell include the aforementioned things.

Details

This is the Pr that introduced the regression - #4578

Standard questions

Please answer these questions to help us investigate your issue more quickly:

Question Answer
@microsoft/api-documenter version? 7.24.0
Operating system? Linux
Documentation target? Markdown
Would you consider contributing a PR? No
TypeScript compiler version? 5.3.3
Node.js version (node -v)? 20.10.0
@iclanton
Copy link
Member

The old table formatting didn't inject :::. Is that part of your own table contents that is now getting stripped out?

Can you share a repro?

@Lightning00Blade
Copy link
Author

Lightning00Blade commented Mar 18, 2024

@iclanton It's part of our in our TsDoc summery:

class A {
/**
   * :::tip
   *
   * I am a tip
   *
   * :::
   * /
doSomething(){}
}

We use things on top of that to make messages more distinguished when people visit the website Example Tip.

The problem is that now you emit table (which is fine), but our custom format start being unable to parse these combination. We had similar issue with other things and used the getEscapedText() to remove things and get the correct formatting. We also used getTableEscapedText() to do the same for TableCells.

I can confirm that getTableEscapedText() is never called after #4578, and you can see that this line removes the check and never reintroduces it. This may make the current version prone to XSS due to text never being escaped.

Here is a Gist to change the apps/api-documenter/src/markdown/test/CustomMarkdownEmitter.test.ts
https://gist.github.com/Lightning00Blade/5904e90b3c92f3e16580dedc832e5150

When you run the test I the snapshot should sort of pass (most likely some extra new line will be added) for old version.

@phryneas
Copy link
Contributor

I believe you were relying on an implementation detail here rather than on an intentional API.

Looking at where that function comes from, it was introduced in #1183 to HTML-escape some characters that could not be rendered in Github-style markdown tables.

I don't think it was meant to be extended to add further functionality on top :(

To be honest, I missed that function in my PR, or I would probably completely have removed it.

This may make the current version prone to XSS due to text never being escaped.

I don't think that this function was meant to prevent XSS in the first place - it was added because type unions could not be displayed in markdown tables. (#1065).

Could you maybe elaborate what attack you are foreseeing here, and how you are relying on that?

As for your code above to remove :::tip sections - maybe that could be achieved in a remark plugin? 🤔

@octogonz octogonz added bug Something isn't working as intended priority The maintainers consider it to be an important issue. We should try to fix it soon. labels Mar 19, 2024
@octogonz
Copy link
Collaborator

octogonz commented Mar 19, 2024

Some background

There is a longnstanding issue that API Documenter generates "markdown" yet there is no standardized syntax for Markdown. Of course CommonMark is such a standard, but all real world implementations add additional custom syntaxes, and because CommonMark lacks formalized extensibility, such custom syntaxes will always break the ability to parse true CommonMark correctly. As one example, suppose we introduce :::tip as mentioned above - consider this case:

:::tip
```
[link](http://example.com)
:::
```

Is [link](http://example.com) a hyperlink or not? Most likely the :::tip parser will naively snip out the code and parse it as Markdown, interpreting ``` as literal backticks because the closing ``` is now outside the chunk being parsed. As a result [link](http://example.com) is a real hyperlink. Whereas CommonMark would interpret [link](http://example.com) as being inside a code fence and NOT a hyperlink.

This mutual intelligibility is a mild inconvenience for human authors, but a major frustration for automated tools such as API Documenter that need to emit thousands of pages of API docs and have them render correctly. (BTW TSDoc explicitly aims to solve this problem by forbidding custom syntaxes and relying instead on HTML elements for advanced custom syntaxes.)

What it means

Until now, API Documenter has aimed to produce a balanced dialect of markdown syntax that is likely to get rendered correctly by the most popular engines (Jekyll and GitHub at the time). Where it doesn't work, people have implemented various hacks like api-documenter-docusaurus-plugin that try rewrite the incorrect expressions. That is exactly what @Lightning00Blade did, too.

PR #4578 will break that delicate equilibrium, so unfortunately we probably shouldn't have merged this PR. The safest move may be to revert it. 😅

A better solution

But how to solve @phryneas's problem then?

What we can do instead is introduce api-documenter.json settings to control the output dialect. In this way API Documenter can be optimized for a known Markdown engine rather than trying to be a "least common denominator." We've been planning this already, since Docusuarus 3 has finally adopted standard MDX (whose syntax is based on JSX not CommonMark and thus really needs its own target profile).

This week I'll put together a design proposal.

@phryneas
Copy link
Contributor

phryneas commented Mar 19, 2024

@octogonz are you sure that's the core of this issue, though?
I believe we interpret the issue reported here differently, so I want to double-check if we're at the same point here.

If I am understanding @Lightning00Blade's report correctly, it's not about that it's breaking markdown generation or that the generated markdown isn't parsable in their dialect anymore.

They relied on the fact that there was a getTableEscapedText function that was called only inside of tables, because they were overwriting this (implementation detail?) with their own functionality by extending the ApiFormatterMarkdownEmitter and overwriting that protected function - and now that there's no inherent reason for that escaping anymore, their override isn't called anymore.
Essentially, instead of using an external plugin like api-documenter-docusaurus-plugin, they were changing the implementation itself.

Of course, I don't know if my change breaks anything else regarding markdown parsing (I believe it should be more stable for most parsers out there, or I wouldn't have opened the PR), but looking at all the related PRs and issues around the generation of that function, it seems like getTableEscapedText was not meant as a public API with the intent of "library users should override it".

If getTableEscapedText was indeed an intended public api, it should also be possible to restore that method call without rolling all other changes back, or creating multiple different emitter strategies.

@phryneas
Copy link
Contributor

phryneas commented Mar 19, 2024

PS: Before you start targeting different Markdown engines (which would probably end up with a ton of work for you guys, both to write and maintain), one suggestion from me:

Roll back my change from #4578 and instead consider either a HTML emitter (or a Markdown AST emitter?).
Most people will use the generated markdown as an intermediate step to HTML anyways, and catering to a dozen engines that will all result in HTML is probably more work than directly generating that HTML (which is probably supported by all consumers like Docusaurus as well, especially if it sticks to semantic HTML with "markdown-y" tags only).

@Lightning00Blade
Copy link
Author

Firstly I want to clarify that I know we are using internal APIs, that may brake without notice, the burden is on us to fix them (be it with more overwrites or other patches).
I reported an issue due to this not being a deliberate one while making the changes.

Also I have reworked some our TsDocs to be more inline with what the transformer is expecting. And no patching is needed anymore.
The issue was we had Markdown that was not fully what it expected, but due to all it being Markdown it was making the correct assumptions of how to patch it. With the new HTML it can no longer do that.

Keeping the bug open (at least) until the functionality is added back or fully remove, based on solely your decisions.

@nsaunders
Copy link

nsaunders commented Mar 29, 2024

Roll back my change from #4578

Thanks for your contribution @phryneas. I appreciate that you solved some issues in that PR, but unfortunately it comes at a price (for me, at least). The problem is that my Markdown renderer (Marked) doesn't, to my knowledge, allow me to customize the HTML output for elements that are already encoded in HTML rather than Markdown syntax.

@davidlj95
Copy link

Seems that Material for Mkdocs doesn't like HTML tables either. Markdown syntax introduced in there isn't rendered.

You can check a (short living) example here:
https://19acf12c.ngx-meta.pages.dev/api/ngx-meta/

I'd say same would happen for sites powered by Mkdocs but not 100% sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as intended priority The maintainers consider it to be an important issue. We should try to fix it soon.
Projects
Status: Waiting for Author
Development

No branches or pull requests

6 participants