Please make SUMMARY.md support urlencode markdown file name #1640

xiefucai · 2021-09-03T07:16:14Z

Example,There is a markdown file with a question mark in its file name.

How can I get data from AsyncStorage and add it to initial state when using useReducer?.md

now i have to show it in SUMMARY.md

- [How can I get data from AsyncStorage and add it to initial state when using useReducer?](React/How can I get data from AsyncStorage and add it to initial state when using useReducer?)

and the result in toc is

<a href="React/How can I get data from AsyncStorage and add it to initial state when using useReducer?.html">How can I get data from AsyncStorage and add it to initial state when using useReducer?</a>

after click the link, the page will jump to the wrong url, the part "?.html" of the url will be the search part of url.

I hope I can write the markdown as

- [How can I get data from AsyncStorage and add it to initial state when using useReducer?](React/How%20can%20I%20get%20data%20from%20AsyncStorage%20and%20add%20it%20to%20initial%20state%20when%20using%20useReducer%3F.md)

and the result is

<a href="React/How%20can%20I%20get%20data%20from%20AsyncStorage%20and%20add%20it%20to%20initial%20state%20when%20using%20useReducer%3F.html">How can I get data from AsyncStorage and add it to initial state when using useReducer?</a>

so, please support

someone can urldecode the file name in SUMMARY.md;
urldecode the file path before check file exists;
urlencode the name of markdown file before output as html;

The text was updated successfully, but these errors were encountered:

apatniv · 2021-10-04T04:01:13Z

Can I take this if no one else is already working on this?

ehuss · 2021-10-04T16:38:17Z

Yea, of course! I would probably start looking at around here. I don't know if mdbook has any existing escape function, or if one is readily available in a dependency.

apatniv · 2021-10-09T21:38:03Z

Thank you @ehuss for the pointers.

I am trying to see how mdbook reacts to the following entry in summary.md

- [Hello World](./Hello World?.md)

mdbook build fails with the following error:

021-10-09 16:47:47 [WARN] (mdbook::book::summary): Expected a start of a link, actually got Some(Text(Borrowed("[")))
2021-10-09 16:47:47 [ERROR] (mdbook::utils): Error: Summary parsing failed for file="/tmp/mdBook/sample_testing/src/SUMMARY.md"
2021-10-09 16:47:47 [ERROR] (mdbook::utils):    Caused By: There was an error parsing the numbered chapters
2021-10-09 16:47:47 [ERROR] (mdbook::utils):    Caused By: There was an error parsing the numbered chapters
2021-10-09 16:47:47 [ERROR] (mdbook::utils):    Caused By: failed to parse SUMMARY.md line 10, column 3: The link items for nested chapters must only contain a hyperlink

The mdbook uses pulldown-cmark library to parse links in summary.md. The presense of "?" breaks the parser.

let text = "[Hello World](./Hello World?.md)";
let parser = pulldown_cmark::Parser::new(text);
for event in parser {
    println!("event={:?}", event);
}

The above code dumps the following response:

event=Start(Paragraph)
event=Text(Borrowed("["))
event=Text(Borrowed("Hello World"))
event=Text(Borrowed("]"))
event=Text(Borrowed("(./Hello World?.md)"))
event=End(Paragraph)

The library is able to parse the links if we use the format [](<>) format

Example:

let text = "[Hello World](<./Hello World?.md>)";
let parser = pulldown_cmark::Parser::new(text);
for event in parser {
    println!("event={:?}", event);
}

event=Start(Paragraph)
event=Start(Link(Inline, Borrowed("./Hello World?.md"), Borrowed("")))
event=Text(Borrowed("Hello World"))
event=End(Link(Inline, Borrowed("./Hello World?.md"), Borrowed("")))
event=End(Paragraph)

There is an existing function but it only works for "<" and ">".

I found a percent_encoding which lets us control what characters need to be escaped.

I am assuming since the toc should only contain names of the md book file, is it safe to escape "?" character as well?

There seems to be some weird rules like this.

YJDoc2 · 2021-10-10T16:39:43Z

Hey, sorry to jump in the middle of your work, but I would like to ask, would it be better to sanitize the path instead of urlencoding it?

Urlencoding can get tricky with multiple characters, and as per the stack overflow link @apatniv has added, it depends on which RFC is used, which encoding is used etc. Added to that, the url-encoded name will not be easily legible as a file name, if generating a file, with possible added issues that the filesystem, depending on OS might not support the characters used, especially with chapter names that might contain Unicode non-ascii symbols.
Looking from #1611 , if mdBook is ready to support that, would it be a better idea, that rather than urlencoding, sanitize the names, so that all blanks or other characters are replaced by '_' or such a character, which is valid in url and would produce quite legible generated file names?
What do you think @ehuss @xiefucai @apatniv ?
Again, apologies for jumping in between your work 😅

ehuss · 2021-10-24T16:56:57Z

The mdbook uses pulldown-cmark library to parse links in summary.md. The presense of "?" breaks the parser.

I don't think it is the ?, it is the space. Spaces in links are not valid markdown.
I'm actually not sure I now follow the original report, since the given example is rejected by mdbook.

In general, mdbook doesn't seem to support any sort of escaping in the summary other than space (added in #1293). Although I'd be reluctant to encourage anyone to use strange characters like ?, it may be worthwhile to add full support for unescaping % encoding in the link, so a question mark could be %3f. In which case, the fix would go here.

apatniv · 2021-10-24T17:53:48Z

Good observation. It is actually the space that is breaking the parser.

text = [first](./HelloWorld?.md)

event=Start(Paragraph)
event=Start(Link(Inline, Borrowed("./HelloWorld?.md"), Borrowed("")))
event=Text(Borrowed("first"))
event=End(Link(Inline, Borrowed("./HelloWorld?.md"), Borrowed("")))
event=End(Paragraph)

text = [first](./Hello World?.md)

event=Start(Paragraph)
event=Text(Borrowed("["))
event=Text(Borrowed("first"))
event=Text(Borrowed("]"))
event=Text(Borrowed("(./Hello World?.md)"))
event=End(Paragraph)

ISSOtm · 2022-06-18T10:45:27Z

Considering that #1766 wants "URLs" in SUMMARY.md to all be treated as paths (which is also more or less what the code is already doing), I agree with @YJDoc2 that paths should be sanitized rather than URL-encoded. (Doing so would also take care of that issue.)

plule · 2023-09-13T21:51:40Z

Hello, I have hit this issue when working with an external tool that generate markdowns with url-encoded links. The current implementation only replaces %20 with spaces, so any other special character (accents for example) create 404 links.

The common mark specification qualifies links as "URI" and other editors supporting markdown (vscode, obsidian) also support percent-encoded links, I think that urldecoding links is a sane approach here.

I have an implementation using urlencoding to do this, and simply decoding the url from summary fixes my issue with exported markdown and the question mark one of this issue. It does not enforce it, so both [💜](%F0%9F%92%9C.md) and [💜](💜.md) will generate valid links.

@ehuss and other readers, if this approach is alright, I'll open the PR :) (the only missing thing is documenting the change I think)

ISSOtm · 2023-09-14T08:25:00Z

Good observation. It is actually the space that is breaking the parser.

text = [first](./HelloWorld?.md)

event=Start(Paragraph)
event=Start(Link(Inline, Borrowed("./HelloWorld?.md"), Borrowed("")))
event=Text(Borrowed("first"))
event=End(Link(Inline, Borrowed("./HelloWorld?.md"), Borrowed("")))
event=End(Paragraph)

text = [first](./Hello World?.md)

event=Start(Paragraph)
event=Text(Borrowed("["))
event=Text(Borrowed("first"))
event=Text(Borrowed("]"))
event=Text(Borrowed("(./Hello World?.md)"))
event=End(Paragraph)

By the way, the fix according to CommonMark is to enclose the destination in angle brackets. So both of these should function normally:

[first](<./Hello World?.md>)
- [How can I get data from AsyncStorage and add it to initial state when using useReducer?](<React/How can I get data from AsyncStorage and add it to initial state when using useReducer?>)

ehuss added A-Links Area: Issues with links A-Summary Area: The summary page, organization of pages. E-Help-wanted Experience: Help Needed labels Sep 3, 2021

ISSOtm mentioned this issue Jun 18, 2022

Crash when SUMMARY contains URLs #1766

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please make SUMMARY.md support urlencode markdown file name #1640

Please make SUMMARY.md support urlencode markdown file name #1640

xiefucai commented Sep 3, 2021

apatniv commented Oct 4, 2021

ehuss commented Oct 4, 2021

apatniv commented Oct 9, 2021

YJDoc2 commented Oct 10, 2021

ehuss commented Oct 24, 2021

apatniv commented Oct 24, 2021

ISSOtm commented Jun 18, 2022

plule commented Sep 13, 2023 •

edited

ISSOtm commented Sep 14, 2023

Please make SUMMARY.md support urlencode markdown file name #1640

Please make SUMMARY.md support urlencode markdown file name #1640

Comments

xiefucai commented Sep 3, 2021

apatniv commented Oct 4, 2021

ehuss commented Oct 4, 2021

apatniv commented Oct 9, 2021

YJDoc2 commented Oct 10, 2021

ehuss commented Oct 24, 2021

apatniv commented Oct 24, 2021

ISSOtm commented Jun 18, 2022

plule commented Sep 13, 2023 • edited

ISSOtm commented Sep 14, 2023

plule commented Sep 13, 2023 •

edited