-
-
Notifications
You must be signed in to change notification settings - Fork 189
Labels
Description
Test case:
Inline [link1](target1.md)
Reference [link2][link2]
[link2]: target2.md
Collapsed [link3][]
[link3]: target3.md
Shortcut [link4]
[link4]: target4.md
Shortcut [link5] with full URL
[link5]: file:///path/to/target5.mdSave this as ~/junk/lychee/baz.md and process it with lychee baz.md --dump -vv, and it prints:
file:///home/wks/junk/lychee/target1.md (baz.md)
file:///path/to/target5.md (baz.md)
It successfully extracts the link to target1.md and resolved it as a relative URL starting with file:///....
But link2 to link4 failed to be extracted. Link5 points to a full URL instead of a filename, and it is extracted, too.
I think the problem is in the handling of links in the markdown parser.
// excerpt from lychee-lib/src/extract/markdown.rs
pub(crate) fn extract_markdown(input: &str, include_verbatim: bool) -> Vec<RawUri> {
// ...
match link_type {
LinkType::Inline => {
Some(vec![RawUri {
text: dest_url.to_string(),
element: Some("a".to_string()),
attribute: Some("href".to_string()),
}])
}
LinkType::Reference |
LinkType::ReferenceUnknown |
LinkType::Collapsed|
LinkType::CollapsedUnknown |
LinkType::Shortcut |
LinkType::ShortcutUnknown |
LinkType::Autolink |
LinkType::Email =>
Some(extract_raw_uri_from_plaintext(&dest_url)),For inline links, it simply treats dest_url as the href. But for all other kinds of links, it will invoke extract_raw_uri_from_plaintext which uses some kind of heuristics to detect URLs. So anything that doesn't look like a URL in [label]: foo_bar_baz.md are ignored.