Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Markdown languages #268

Merged
merged 21 commits into from
Aug 21, 2024
Merged

feat: Add Markdown languages #268

merged 21 commits into from
Aug 21, 2024

Conversation

nzakas
Copy link
Member

@nzakas nzakas commented Aug 6, 2024

This pull request adds two markdown languages to the plugin:

  • commonmark - CommonMark syntax
  • gfm - GitHub-Flavored Markdown syntax

It also adds several rules that apply to Markdown content. These are all intended to find problems and not to format the document. I've included documentation for each rule.

I changed the recommended config to be for linting Markdown content, not using the processor. There is a new processor config that implements the old recommended config.

Note on types: Because the @types/eslint package defines types specific to ESTree, I can't really add more types into the rules until we have a more generic type definition for rules. I use RuleModule to ensure that the plugin itself will type check correctly, but that definition doesn't allow visitor methods for non-ESTree values.

Refs #160

Copy link
Member

@fasttime fasttime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I noted a few things we may want to get fixed before releasing the changes.

src/language/markdown-source-code.js Outdated Show resolved Hide resolved
src/rules/fenced-code-language.js Outdated Show resolved Hide resolved
src/rules/no-html.js Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
return;
}

const tagName = node.value.match(/<([a-zA-Z0-9]+)/u)?.[1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not correctly match custom tag names with non-alphanumeric characters, if someone accidentally inserts them in the markdown. For example:

<ng-template>
    Hello, World!
</ng-template>

Will give the error:

HTML element "ng" is not allowed.

I'm not sure what tag names are considered valid by parsers, but I think it would be better to make the regexp more permissive. Maybe like this?

Suggested change
const tagName = node.value.match(/<([a-zA-Z0-9]+)/u)?.[1];
const tagName = node.value.match(/<([^\/>\s]+)/u)?.[1];

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll need to test the parser a bit to see what happens.

docs/processors/markdown.md Outdated Show resolved Hide resolved
docs/processors/markdown.md Outdated Show resolved Hide resolved
docs/processors/markdown.md Outdated Show resolved Hide resolved
docs/processors/markdown.md Outdated Show resolved Hide resolved
Comment on lines 31 to 32
"[foo][]\n\n[foo]: http://bar.com/image.jpg",
"![foo][]\n\n[foo]: http://bar.com/image.jpg"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like GitHub is trimming spaces from within brackets, and even a single line break is ignored. So the following should be all valid equivalents of [foo][]\n\n[foo]: http://bar.com/image.jpg but they are being reported as errors:

"[  foo ][]\n\n[foo]: http://bar.com/image.jpg",
"[foo][ ]\n\n[foo]: http://bar.com/image.jpg",
"[\nfoo\n][\n]\n\n[foo]: http://bar.com/image.jpg",

I'm not sure if this is different in standard CommonMark.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up. I'll investigate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this with the parser in both CommonMark and GFM modes.

For the first one, it correctly parses in both modes with the label of "foo", so it's trimming the label. The rule is not flagging this as an error.

The second and third are not being parsed as link references in either CommonMark or GFM mode. When I remove the whitespace between the [ and ], then they are recognized as link references. This appears to be a bug in the parser. I'll need to follow up with them, but I can still flag it in the rule.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the parser maintainers, [] with any amount of whitespace in between is not supposed to create a link. It seems that CommonMark forbids this and the fact that it's working on GitHub is actually a bug in GitHub's implementation.

So really, only the first example is a valid link. The other two are not considered links and therefore shouldn't check the link reference. I'll need to update accordingly.

We probably also need a no-sloppy-link-ref rule to flag those as not compatible with CommonMark.

Ref: syntax-tree/mdast-util-from-markdown#39 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into this. The discussion in the mdast-util-from-markdown is helpful. If it's true that the different behavior in GFM is a bug and not the specified behavior then we can use probably https://spec.commonmark.org/dingus/ as a references to test how links should regardless of the markdown language.

nzakas and others added 10 commits August 12, 2024 11:11
Co-authored-by: Francesco Trotta <github@fasttime.org>
Co-authored-by: Francesco Trotta <github@fasttime.org>
Co-authored-by: Francesco Trotta <github@fasttime.org>
Co-authored-by: Francesco Trotta <github@fasttime.org>
Co-authored-by: Francesco Trotta <github@fasttime.org>
Co-authored-by: Francesco Trotta <github@fasttime.org>
@nzakas
Copy link
Member Author

nzakas commented Aug 13, 2024

Added a new no-invalid-label-ref rule to account for the cases mentioned in #268 (comment)

src/rules/no-invalid-label-refs.js Outdated Show resolved Hide resolved
src/rules/no-invalid-label-refs.js Outdated Show resolved Hide resolved
@nzakas
Copy link
Member Author

nzakas commented Aug 16, 2024

I updated no-invalid-label-refs so that it highlights the brackets rather than the label. I think that makes more sense because it's the brackets that are the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this rule, the column calculation is incorrect when the link starts after the beginning of the line. For example with this test case:

        {
            code: "- - - [foo]",
            errors: [
                {
                    messageId: "notFound",
                    data: { label: "foo" },
                    line: 1,
                    column: 8,
                    endLine: 1,
                    endColumn: 11
                }
            ]
        }

This is similar to the issue we had with no-invalid-label-refs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, same logic. Thanks.

Copy link
Member

@fasttime fasttime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, I run the new rules against the .md files in https://github.com/eslint/eslint.org/. This reported a few hundred violations, some of them a bit unexpected like no-missing-label-refs on [object Object]. But on careful look, those were all legitimate violations. Duplicate headings are also very common and probably not something we want to address. Overall, I think this is working well and it's a useful reference for how to add language capability to a plugin.


/*
* Search the entire document text to find the preceding open bracket.
* We add one to the start index to account for a preceding ! in an image.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are no longer adding one.

@fasttime
Copy link
Member

Just the above note, then LGTM.

@nzakas
Copy link
Member Author

nzakas commented Aug 21, 2024

Thanks! Yeah Markdownlint has some other options for these rules that allow for more edge cases, but I wanted to start simple. We can always add more options to the rules as we go.

@nzakas nzakas merged commit d79c42b into main Aug 21, 2024
11 checks passed
@nzakas nzakas deleted the language2 branch August 21, 2024 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Complete
Development

Successfully merging this pull request may close these issues.

2 participants