Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for broken full reference links in Markdown #456

Open
norswap opened this issue Jan 11, 2022 · 6 comments
Open

Check for broken full reference links in Markdown #456

norswap opened this issue Jan 11, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@norswap
Copy link

norswap commented Jan 11, 2022

Currently lychee does not check for bad "link references" (not sure if this is the proper name), e.g.

This is a [link text][link-ref].

[link-ref-with-typo]: https://nyan.cat

It would be great if that was caught!

@lebensterben
Copy link
Member

lebensterben commented Jan 11, 2022

First, this is not the correct syntax for link reference. See
https://github.github.com/gfm/#link-reference-definitions

It should be

[foo]: /url "title"

[foo]

Second, we relies on pulldown-cmark to parse hyperlinks in markdown documents.
Thankfully, it's able to find broken links via https://docs.rs/pulldown-cmark/latest/pulldown_cmark/struct.Parser.html#method.new_with_broken_link_callback

But if you want to identify possible 'typo's, the job would be harder as we may need to calculate some text distances.

@mre
Copy link
Member

mre commented Jan 11, 2022

Oh, I did not know about new_with_broken_link_callback! Thanks for mentioning it.
We could start by printing a warning for broken links? That would already be a step into the right direction.

@norswap
Copy link
Author

norswap commented Jan 11, 2022

@lebensterben These types of links have been in Markdown since the very start (https://daringfireball.net/projects/markdown/syntax#link) and have always been supported by Github. They are in fact specified in the document you linked (https://github.github.com/gfm/#example-535). What you linked is the reference for the link definition (the [ref]: link part).

@mre A warning meaning it does not cause the program to return a non-zero code?

@lebensterben
Copy link
Member

@norswap
Thanks for pointing that out. It's a valid full reference link. https://github.github.com/gfm/#full-reference-link

I suggest to change the title of this issue accordingly.

@mre
Copy link
Member

mre commented Jan 12, 2022

@mre A warning meaning it does not cause the program to return a non-zero code?

Good point. We can actually treat it as an error. After all, the link is broken.

@norswap norswap changed the title Check for invalid link references Check for invalid full reference links Jan 12, 2022
@mre mre added the enhancement New feature or request label Feb 4, 2022
@mre mre changed the title Check for invalid full reference links Check for broken full reference links in Markdown Jun 22, 2022
@nuke-web3
Copy link

Would love to see this supported https://spec.commonmark.org/0.31.2/#full-reference-link has the full variety of syntax that should/must be supported.

Example of another link checker that uses new_with_broken_link_callback for reference: https://github.com/becheran/mlc/blob/b0cb310fda856cf4a7734bfa6bca20029ffcf89b/src/link_extractors/markdown_link_extractor.rs#L12-L20 (incomplete impl though )

In minimal testing so far, I can see that any [text][id] in the main text without a matching [id]: ... in the input will hit the callback, but will completely skip any unreferenced [not-used-in-doc]: ... reference. Not sure if that is the behavior we want... as a lint I would love to get a warning if that is the case as I probably meant to use the link or deleted where it was used. Checking if the link works would be good too (perhaps as a CLI/config option)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants