Skip to content

Cannot extract relative reference links in Markdown #1657

@wks

Description

@wks

Test case:

Inline [link1](target1.md)

Reference [link2][link2]

[link2]: target2.md

Collapsed [link3][]

[link3]: target3.md

Shortcut [link4]

[link4]: target4.md

Shortcut [link5] with full URL

[link5]: file:///path/to/target5.md

Save this as ~/junk/lychee/baz.md and process it with lychee baz.md --dump -vv, and it prints:

file:///home/wks/junk/lychee/target1.md (baz.md)
file:///path/to/target5.md (baz.md)

It successfully extracts the link to target1.md and resolved it as a relative URL starting with file:///....

But link2 to link4 failed to be extracted. Link5 points to a full URL instead of a filename, and it is extracted, too.

I think the problem is in the handling of links in the markdown parser.

// excerpt from lychee-lib/src/extract/markdown.rs

pub(crate) fn extract_markdown(input: &str, include_verbatim: bool) -> Vec<RawUri> {
// ...
                match link_type {
                    LinkType::Inline => {
                        Some(vec![RawUri {
                            text: dest_url.to_string(),
                            element: Some("a".to_string()),
                            attribute: Some("href".to_string()),
                        }])
                    }
                    LinkType::Reference |
                    LinkType::ReferenceUnknown |
                    LinkType::Collapsed|
                    LinkType::CollapsedUnknown |
                    LinkType::Shortcut |
                    LinkType::ShortcutUnknown |
                    LinkType::Autolink |
                    LinkType::Email =>
                     Some(extract_raw_uri_from_plaintext(&dest_url)),

For inline links, it simply treats dest_url as the href. But for all other kinds of links, it will invoke extract_raw_uri_from_plaintext which uses some kind of heuristics to detect URLs. So anything that doesn't look like a URL in [label]: foo_bar_baz.md are ignored.

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions